Re: [R] matrix manipulation question

2015-03-27 Thread peter dalgaard

On 27 Mar 2015, at 09:58 , Stéphane Adamowicz 
stephane.adamow...@avignon.inra.fr wrote:

 data_no_NA - data[, complete.cases(t(data))==T]

Ouch! logical == TRUE is bad, logical == T is worse:

data[, complete.cases(t(data))]


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matrix manipulation question

2015-03-27 Thread PIKAL Petr
Very, very, very bad solution.

as.matrix can change silently your data to unwanted format, complete.cases()==T 
is silly as Peter already pointed out.

I use

head(airquality[ ,colSums(is.na(airquality))==0])
  Wind Temp Month Day
1  7.4   67 5   1
2  8.0   72 5   2
3 12.6   74 5   3
4 11.5   62 5   4
5 14.3   56 5   5
6 14.9   66 5   6

if I want to get rid of columns with NA.

Cheers
Petr


From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Stéphane 
Adamowicz
Sent: Friday, March 27, 2015 11:42 AM
To: peter dalgaard
Cc: r-help@r-project.org
Subject: Re: [R] matrix manipulation question

Well, it seems to work with me.

Y - as.matrix(airquality)
head(Y, n=8)
 Ozone Solar.R Wind Temp Month Day
[1,]41 190  7.4   67 5   1
[2,]36 118  8.0   72 5   2
[3,]12 149 12.6   74 5   3
[4,]18 313 11.5   62 5   4
[5,]NA  NA 14.3   56 5   5
[6,]28  NA 14.9   66 5   6
[7,]23 299  8.6   65 5   7
[8,]19  99 13.8   59 5   8

Z - Y[,complete.cases(t(Y))==T]

head(Z, n=8)
 Wind Temp Month Day
[1,]  7.4   67 5   1
[2,]  8.0   72 5   2
[3,] 12.6   74 5   3
[4,] 11.5   62 5   4
[5,] 14.3   56 5   5
[6,] 14.9   66 5   6
[7,]  8.6   65 5   7
[8,] 13.8   59 5   8

The columns that contained NA were deleted.


Le 27 mars 2015 � 10:38, peter dalgaard 
pda...@gmail.commailto:pda...@gmail.com a �crit :


 On 27 Mar 2015, at 09:58 , St�phane Adamowicz 
 stephane.adamow...@avignon.inra.frmailto:stephane.adamow...@avignon.inra.fr
  wrote:

 data_no_NA - data[, complete.cases(t(data))==T]

 Ouch! logical == TRUE is bad, logical == T is worse:

 data[, complete.cases(t(data))]


 --
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Office: A 4.23
 Email: pd@cbs.dkmailto:pd@cbs.dk  Priv: 
 pda...@gmail.commailto:pda...@gmail.com












_
St�phane Adamowicz
Inra, centre de recherche Paca, unit� PSH
228, route de l'a�rodrome
CS 40509
domaine St Paul, site Agroparc
84914 Avignon, cedex 9
France

stephane.adamow...@avignon.inra.frmailto:stephane.adamow...@avignon.inra.fr
tel.  +33 (0)4 32 72 24 35
fax. +33 (0)4 32 72 24 32
do not dial 0 when out of France
web PSH  : https://www6.paca.inra.fr/psh
web Inra : http://www.inra.fr/
_


[[alternative HTML version deleted]]
__
R-help@r-project.orgmailto:R-help@r-project.org mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without 

Re: [R] matrix manipulation question

2015-03-27 Thread Stéphane Adamowicz
Well, it seems to work with me.

Y - as.matrix(airquality)
head(Y, n=8)
 Ozone Solar.R Wind Temp Month Day
[1,]41 190  7.4   67 5   1
[2,]36 118  8.0   72 5   2
[3,]12 149 12.6   74 5   3
[4,]18 313 11.5   62 5   4
[5,]NA  NA 14.3   56 5   5
[6,]28  NA 14.9   66 5   6
[7,]23 299  8.6   65 5   7
[8,]19  99 13.8   59 5   8

Z - Y[,complete.cases(t(Y))==T]

head(Z, n=8)
 Wind Temp Month Day
[1,]  7.4   67 5   1
[2,]  8.0   72 5   2
[3,] 12.6   74 5   3
[4,] 11.5   62 5   4
[5,] 14.3   56 5   5
[6,] 14.9   66 5   6
[7,]  8.6   65 5   7
[8,] 13.8   59 5   8

The columns that contained NA were deleted.


Le 27 mars 2015 � 10:38, peter dalgaard pda...@gmail.com a �crit :

 
 On 27 Mar 2015, at 09:58 , St�phane Adamowicz 
 stephane.adamow...@avignon.inra.fr wrote:
 
 data_no_NA - data[, complete.cases(t(data))==T]
 
 Ouch! logical == TRUE is bad, logical == T is worse:
 
 data[, complete.cases(t(data))]
 
 
 -- 
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Office: A 4.23
 Email: pd@cbs.dk  Priv: pda...@gmail.com
 
 
 
 
 
 
 
 
 



_
St�phane Adamowicz
Inra, centre de recherche Paca, unit� PSH
228, route de l'a�rodrome
CS 40509
domaine St Paul, site Agroparc
84914 Avignon, cedex 9
France

stephane.adamow...@avignon.inra.fr
tel.  +33 (0)4 32 72 24 35
fax. +33 (0)4 32 72 24 32
do not dial 0 when out of France
web PSH  : https://www6.paca.inra.fr/psh
web Inra : http://www.inra.fr/
_


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix manipulation question

2015-03-27 Thread Stéphane Adamowicz
Why not use complete.cases() ?

data_no_NA - data[, complete.cases(t(data))==T]


Le 27 mars 2015 à 06:13, Jatin Kala jatin.kala...@gmail.com a écrit :

 Hi,
 I've got a rather large matrix of about 800 rows and 60 columns.
 Each column is a time-series 800 long.
 
 Out of these 60 time series, some have missing values (NA).
 I want to strip out all columns that have one or more NA values, i.e., only 
 want full time series.
 
 This should do the trick:
 data_no_NA - data[,!apply(is.na(data), 2, any)]
 
 I now use data_no_NA as input to a function, which returns output as a matrix 
 of the same size as data_no_NA
 
 The trick is that i now need to put these columns back into a new 800 by 
 60 empty matrix, at their original locations.
 Any suggestions on how to do that? hopefully without having to use loops.
 I'm using R/3.0.3
 
 Cheers,
 Jatin.
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why can't I access this type?

2015-03-27 Thread Patrick Connolly
On Thu, 26-Mar-2015 at 04:58PM -0400, yoursurrogate...@gmail.com wrote:

[...]
|  I agree with you on the indexing approach.  But even after using
| within, I still get the same error.  

You leave us to guess just what you tried, but if you did this:

 all.states  - within(as.data.frame(state.x77), state - rownames(state.x77))
and then again did this:

 cold.states - all.states[all.states$Frost  150, c(Name, Frost)]

of course it will give the same error, because as you haven't
addressed the problem as you've been told

On Sun, 22-Mar-2015 at 08:06AM -0800, John Kane wrote:

| Well, first off, you have no variable called Name.  You have lost
| the state names as they are rownames in the matrix state.x77 and
| not a variable.

If you did this:

 all.states  - within(as.data.frame(state.x77), Name - rownames(state.x77))
instead of
 all.states  - within(as.data.frame(state.x77), state - rownames(state.x77))

then this would worka;
 cold.states - all.states[all.states$Frost  150, c(Name, Frost)]

Modify the above to match where my guess at what you tried is in error.


HTH

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___Patrick Connolly   
 {~._.~}   Great minds discuss ideas
 _( Y )_ Average minds discuss events 
(:_~*~_:)  Small minds discuss people  
 (_)-(_)  . Eleanor Roosevelt
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vennerable Plots for Publications

2015-03-27 Thread Jim Lemon
HI Dario,
Have you tried creating a larger PNG image and then shrinking the result
with an image manipulation program (e.g. GIMP)?

Jim


On Fri, Mar 27, 2015 at 4:00 PM, Dario Strbenac dstr7...@uni.sydney.edu.au
wrote:

 Does anyone make Venn diagrams for publication using Vennerable ? I found
 that the font size is too big when the plot is created at 300 DPI, and
 there's no option to change it, even when the point size argument to the
 device is changed.

 aVenn - Venn(Sets = list(A = 1:5, B = 3:6))
 png(forPublication.png, units = in, h = 2.55, w = 2.4, res = 300) #
 Changing pointsize to a smaller number has no effect on size of the text.
 plot(aVenn)
 dev.off()

 --
 Dario Strbenac
 PhD Student
 University of Sydney
 Camperdown NSW 2050
 Australia

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matrix manipulation question

2015-03-27 Thread Stéphane Adamowicz

Le 27 mars 2015 à 12:34, PIKAL Petr petr.pi...@precheza.cz a écrit :

 Very, very, very bad solution.
  
 as.matrix can change silently your data to unwanted format, 
 complete.cases()==T is silly as Peter already pointed out.
  
 

Perhaps, but it happens that in the original message, the question dealt with a 
matrix not a dataframe, and thus I needed a matrix example. Furthermore in my 
example no unwanted format occurred. You can check easily that the final matrix 
contains only « numeric » data as in the original data frame.

Stéphane
  
  
  
  
  
 
 
 
 _
 St�phane Adamowicz
 Inra, centre de recherche Paca, unit� PSH
 228, route de l'a�rodrome
 CS 40509
 domaine St Paul, site Agroparc
 84914 Avignon, cedex 9
 France
 
 stephane.adamow...@avignon.inra.fr
 tel.  +33 (0)4 32 72 24 35
 fax. +33 (0)4 32 72 24 32
 do not dial 0 when out of France
 web PSH  : https://www6.paca.inra.fr/psh
 web Inra : http://www.inra.fr/
 _
 
 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou 
 určeny pouze jeho adresátům.
 Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
 jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
 svého systému.
 Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
 jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
 Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
 zpožděním přenosu e-mailu.
 
 V případě, že je tento e-mail součástí obchodního jednání:
 - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, 
 a to z jakéhokoliv důvodu i bez uvedení důvodu.
 - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
 Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany 
 příjemce s dodatkem či odchylkou.
 - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
 dosažením shody na všech jejích náležitostech.
 - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
 žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
 pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu 
 případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je 
 adresátovi či osobě jím zastoupené známá.
 
 This e-mail and any documents attached to it may be confidential and are 
 intended only for its intended recipients.
 If you received this e-mail by mistake, please immediately inform its sender. 
 Delete the contents of this e-mail with all attachments and its copies from 
 your system.
 If you are not the intended recipient of this e-mail, you are not authorized 
 to use, disseminate, copy or disclose this e-mail in any manner.
 The sender of this e-mail shall not be liable for any possible damage caused 
 by modifications of the e-mail or by delay with transfer of the email.
 
 In case that this e-mail forms part of business dealings:
 - the sender reserves the right to end negotiations about entering into a 
 contract in any time, for any reason, and without stating any reasoning.
 - if the e-mail contains an offer, the recipient is entitled to immediately 
 accept such offer; The sender of this e-mail (offer) excludes any acceptance 
 of the offer on the part of the recipient containing any amendment or 
 variation.
 - the sender insists on that the respective contract is concluded only upon 
 an express mutual agreement on all its aspects.
 - the sender of this e-mail informs that he/she is not authorized to enter 
 into any contracts on behalf of the company except for cases in which he/she 
 is expressly authorized to do so in writing, and such authorization or power 
 of attorney is submitted to the recipient or the person represented by the 
 recipient, or the existence of such authorization is known to the recipient 
 of the person represented by the recipient.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix manipulation question

2015-03-27 Thread PIKAL Petr
Hi

 -Original Message-
 From: Stéphane Adamowicz [mailto:stephane.adamow...@avignon.inra.fr]
 Sent: Friday, March 27, 2015 1:26 PM
 To: PIKAL Petr
 Cc: peter dalgaard; r-help@r-project.org
 Subject: Re: [R] matrix manipulation question


 Le 27 mars 2015 à 12:34, PIKAL Petr petr.pi...@precheza.cz a écrit :

  Very, very, very bad solution.
 
  as.matrix can change silently your data to unwanted format,
 complete.cases()==T is silly as Peter already pointed out.
 
 

 Perhaps, but it happens that in the original message, the question

I do not have original message.

 dealt with a matrix not a dataframe, and thus I needed a matrix

But you made matrix from data frame. If one column was not numeric all 
resulting matrix would chnge to nonnumeric format.

 example. Furthermore in my example no unwanted format occurred. You can

Yes because data.frame was (luckily) numeric.

 check easily that the final matrix contains only « numeric » data as in
 the original data frame.

You want matrix? Here it is.

 head(as.matrix(airquality)[ ,colSums(is.na(airquality))==0])
 Wind Temp Month Day
[1,]  7.4   67 5   1
[2,]  8.0   72 5   2
[3,] 12.6   74 5   3
[4,] 11.5   62 5   4
[5,] 14.3   56 5   5
[6,] 14.9   66 5   6

Works same with matrix as with data frame without need to transform it.

Cheers
Petr


 Stéphane
  
  
  
  
  
 
 
 
  _
  St phane Adamowicz
  Inra, centre de recherche Paca, unit  PSH 228, route de l'a rodrome
 CS
  40509 domaine St Paul, site Agroparc
  84914 Avignon, cedex 9
  France
 
  stephane.adamow...@avignon.inra.fr
  tel.  +33 (0)4 32 72 24 35
  fax. +33 (0)4 32 72 24 32
  do not dial 0 when out of France
  web PSH  : https://www6.paca.inra.fr/psh web Inra :
  http://www.inra.fr/ _
 
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a
 jsou určeny pouze jeho adresátům.
  Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
 neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho
 kopie vymažte ze svého systému.
  Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento
 email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
  Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou
 modifikacemi či zpožděním přenosu e-mailu.
 
  V případě, že je tento e-mail součástí obchodního jednání:
  - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
 smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
  - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně
 přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky
 ze strany příjemce s dodatkem či odchylkou.
  - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
 výslovným dosažením shody na všech jejích náležitostech.
  - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
 společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně
 zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly
 adresátovi tohoto emailu případně osobě, kterou adresát zastupuje,
 předloženy nebo jejich existence je adresátovi či osobě jím zastoupené
 známá.
 
  This e-mail and any documents attached to it may be confidential and
 are intended only for its intended recipients.
  If you received this e-mail by mistake, please immediately inform its
 sender. Delete the contents of this e-mail with all attachments and its
 copies from your system.
  If you are not the intended recipient of this e-mail, you are not
 authorized to use, disseminate, copy or disclose this e-mail in any
 manner.
  The sender of this e-mail shall not be liable for any possible damage
 caused by modifications of the e-mail or by delay with transfer of the
 email.
 
  In case that this e-mail forms part of business dealings:
  - the sender reserves the right to end negotiations about entering
 into a contract in any time, for any reason, and without stating any
 reasoning.
  - if the e-mail contains an offer, the recipient is entitled to
 immediately accept such offer; The sender of this e-mail (offer)
 excludes any acceptance of the offer on the part of the recipient
 containing any amendment or variation.
  - the sender insists on that the respective contract is concluded
 only upon an express mutual agreement on all its aspects.
  - the sender of this e-mail informs that he/she is not authorized to
 enter into any contracts on behalf of the company except for cases in
 which he/she is expressly authorized to do so in writing, and such
 authorization or power of attorney is submitted to the 

Re: [R] matrix manipulation question

2015-03-27 Thread Stéphane Adamowicz

 
 example. Furthermore in my example no unwanted format occurred. You can
 
 Yes because data.frame was (luckily) numeric.
 

Luck has nothing to do with this. I Chose this example on purpose …

Stéphane

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating area of polygons created from a spatial intersect

2015-03-27 Thread Walter Anderson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello all,

I am attempting to automate an analysis that I developed with ArcInfo
using R and the gdal and geos packages (or any other) if possible.

Here is the basic process

I have a shape file (lines) that defines the limits of all of the
projects with each project having a unique identifier.

I have another shape file (polys) that contains total population and
low income population and represent Census block groups.  This shape
file has an area field which has the acreage of the total block group.

Process

Step 1.
I then buffer these project lines to create a second shape file that
represents the 'footprint' of the project. (Creates polys).

Step 2.
In ArcInfo, I perform an intersection of the two shape files
(footprint and census blocks) and this creates a third shape file
which has a unique polygon for every project/census block intersection.

Step 3.
I then perform an area calculation (acres) on this new poly shape file
and use this calculated area divided by the original area of the
associated census block group to apportion the two population datum to
this new polygon.

Step 4.
Finally, I sum the two population datums for each of the projects from
the attribute table of this final shape file.

When I try to replicate the above procedure I run into a problem with
Step 2 when I use what I think is the appropriate command:

gIntersects(buffered_projects, census_blocks, byID=TRUE)

This command is producing a matrix of each project/census block
combination and only providing me a true/false indication.  Is there
any way to replicate the process from ArcInfo that I outlined above
within R?

Walter Anderson
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJVFWYWAAoJEHfnxjvhypCiMc8P/2Dsja+h4RKuR7ygHx+oI+4/
oEIxl/NtnHwPh6szyL6CBndSYI6hvdWWwBUm86IsJLmLSSFivB1Ru54nkFq+kfKL
tWpxyOAXNZoa2xn1ADaG1ChFiY/hF937zlTTv8D3a5pYAnYtTeyg6UJ3AuHsfjqG
PbFAg6T+QD3AlJvV73JGmEchgYoj7NlxiEmdcfB3X9cgLMMOCsfLgm4d5g5J/mhh
LKZm3Xg9+eXEjPJazHYB9xc0+AF8Jp6SH9XnnZ/DMFN3DuyR3KuTJr6YnHUKvtUs
o/Uog3zAGuVUDqNwF1H9+WNuz4Fm7XXiHl4xX0n9faE3niTe2b63bVn/Ueiyofb9
ky3wIpAr412/Ne3dtMtSDPkE3w2TsdIUKki2VP9duXB/4vEtHHXvQxNtfKdKmlYX
cnyyK/1ZwULiwWhyxZKJNUd6N2GyLYJ8MmJ7AXnT7EboJjNkhNta1BhWBE9Kzx8p
fUN1UwS8P96iFXztgg2jw3aYTPdPIp9rFYFJax5nKCl6n+YbjUw11GuO6F4lqNDv
PoLllcKkmsGWFo04P0TbS+x1zhc0wmyMn2EV8FcIXJ/80pqT/dWwksbjTfrQGoWx
Xo1m1vTR2LVVrdf0vSkWnxHA3xVQPv7YH5erVNBGWvuhgbLRx8j7MPUp7lFHOJvQ
bq1VJbpnZFRvJyZfII2p
=cZWI
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with download.file ?

2015-03-27 Thread Giles Crane

# download.file() Seems to put the xlsx file onto hard drive.

download.file(http://www.udel.edu/johnmack/data_library/zipcode_centroids.xlsx;,
 zipcode_centroids.xlsx)
trying URL 'http://www.udel.edu/johnmack/data_library/zipcode_centroids.xlsx'
Content type 
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' length 
2785832 bytes (2.7 Mb)
opened URL
downloaded 2.7 Mb


# Trouble reading file with xlsx.

library(xlsx)
Loading required package: rJava
Loading required package: xlsxjars
Warning messages:
1: package ‘xlsx’ was built under R version 3.1.3
2: package ‘rJava’ was built under R version 3.1.3

df - read.xlsx2(zipcode_centroids.xlsx, sheetIndex=1)
Error in .jcall(RJavaTools, Ljava/lang/Object;, invokeMethod, cl,  :
   java.util.zip.ZipException: invalid entry size (expected 1168 but got 1173 
bytes)


# I downloaded the file manually (same name) from the web page and tried again.
# Then I read the file into R with xlsx successfully.


df - read.xlsx2(/zipdist/zipcode_centroids.xlsx, sheetIndex=1)
str(df)
'data.frame':   42961 obs. of  8 variables:
  $ ZIPCODE  : Factor w/ 42961 levels 01001,01002,..: 1 2 3 4 5 6 7 8 9 10 
...
  $ TOWN.: Factor w/ 18955 levels Aaronsburg,Abbeville,..: 85 333 333 
333 898 1089 1459 1620 1899 2929 ...
  $ STATE: Factor w/ 52 levels AK,AL,AR,..: 21 21 21 21 21 21 21 21 
21 21 ...
  $ LATITUDE : Factor w/ 37352 levels -7.209975,19.101978,..: 28020 28948 
28916 28971 29047 28624 28326 28418 28197 28603 ...
  $ LONGITUDE: Factor w/ 37241 levels -100.00991,-100.02632,..: 8799 8706 
8811 8715 8470 8639 9019 8608 8531 9065 ...
  $ STFIPS   : Factor w/ 51 levels 01,02,04,..: 22 22 22 22 22 22 22 22 
22 22 ...
  $ CD   : Factor w/ 55 levels 00,01,02,..: 3 2 2 2 2 2 2 3 3 2 ...
  $ CONG_DIST: Factor w/ 436 levels 01_01,01_02,..: 191 190 190 190 190 190 
190 191 191 190 ...

# Is there a problem with download.file() when file is an Excel file or this 
particular Excel file?

-- 
Giles L Crane, MPH, ASA, NJPHA
Statistical Consultant and R Instructor
621 Lake Drive
Princeton, NJ  08540
Phone: 609 924-0971
Email: gilescr...@verizon.net


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with download.file ?

2015-03-27 Thread William Dunlap
Add the argument mode=wb to your call to download.file().  On Windows
this means to use 'binary' format - do not change line endings.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Mar 27, 2015 at 7:25 AM, Giles Crane gilescr...@verizon.net wrote:


 # download.file() Seems to put the xlsx file onto hard drive.

 download.file(
 http://www.udel.edu/johnmack/data_library/zipcode_centroids.xlsx;,
 zipcode_centroids.xlsx)
 trying URL '
 http://www.udel.edu/johnmack/data_library/zipcode_centroids.xlsx'
 Content type
 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' length
 2785832 bytes (2.7 Mb)
 opened URL
 downloaded 2.7 Mb


 # Trouble reading file with xlsx.

 library(xlsx)
 Loading required package: rJava
 Loading required package: xlsxjars
 Warning messages:
 1: package ‘xlsx’ was built under R version 3.1.3
 2: package ‘rJava’ was built under R version 3.1.3

 df - read.xlsx2(zipcode_centroids.xlsx, sheetIndex=1)
 Error in .jcall(RJavaTools, Ljava/lang/Object;, invokeMethod, cl,  :
java.util.zip.ZipException: invalid entry size (expected 1168 but got
 1173 bytes)


 # I downloaded the file manually (same name) from the web page and tried
 again.
 # Then I read the file into R with xlsx successfully.


 df - read.xlsx2(/zipdist/zipcode_centroids.xlsx, sheetIndex=1)
 str(df)
 'data.frame':   42961 obs. of  8 variables:
   $ ZIPCODE  : Factor w/ 42961 levels 01001,01002,..: 1 2 3 4 5 6 7 8
 9 10 ...
   $ TOWN.: Factor w/ 18955 levels Aaronsburg,Abbeville,..: 85 333
 333 333 898 1089 1459 1620 1899 2929 ...
   $ STATE: Factor w/ 52 levels AK,AL,AR,..: 21 21 21 21 21 21 21
 21 21 21 ...
   $ LATITUDE : Factor w/ 37352 levels -7.209975,19.101978,..: 28020
 28948 28916 28971 29047 28624 28326 28418 28197 28603 ...
   $ LONGITUDE: Factor w/ 37241 levels -100.00991,-100.02632,..: 8799
 8706 8811 8715 8470 8639 9019 8608 8531 9065 ...
   $ STFIPS   : Factor w/ 51 levels 01,02,04,..: 22 22 22 22 22 22 22
 22 22 22 ...
   $ CD   : Factor w/ 55 levels 00,01,02,..: 3 2 2 2 2 2 2 3 3 2
 ...
   $ CONG_DIST: Factor w/ 436 levels 01_01,01_02,..: 191 190 190 190
 190 190 190 191 191 190 ...

 # Is there a problem with download.file() when file is an Excel file or
 this particular Excel file?

 --
 Giles L Crane, MPH, ASA, NJPHA
 Statistical Consultant and R Instructor
 621 Lake Drive
 Princeton, NJ  08540
 Phone: 609 924-0971
 Email: gilescr...@verizon.net


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix manipulation question

2015-03-27 Thread David Winsemius

On Mar 27, 2015, at 3:41 AM, Stéphane Adamowicz wrote:

 Well, it seems to work with me.
 

No one is doubting that it worked for you in this instance. What Peter D. was 
criticizing was the construction :

complete.cases(t(Y))==T

... and it was on two bases that it is wrong. The first is that `T` is not 
guaranteed to be TRUE. The second is that the test ==T (or similarly ==TRUE) is 
completely unnecessary because `complete.cases` returns a logical vector and so 
that expression is a waste of time.

(The issue of matrix versus dataframe was raised by someone else.)

-- 
David.


 Y - as.matrix(airquality)
 head(Y, n=8)
 Ozone Solar.R Wind Temp Month Day
 [1,]41 190  7.4   67 5   1
 [2,]36 118  8.0   72 5   2
 [3,]12 149 12.6   74 5   3
 [4,]18 313 11.5   62 5   4
 [5,]NA  NA 14.3   56 5   5
 [6,]28  NA 14.9   66 5   6
 [7,]23 299  8.6   65 5   7
 [8,]19  99 13.8   59 5   8
 
 Z - Y[,complete.cases(t(Y))==T]
 
 head(Z, n=8)
 Wind Temp Month Day
 [1,]  7.4   67 5   1
 [2,]  8.0   72 5   2
 [3,] 12.6   74 5   3
 [4,] 11.5   62 5   4
 [5,] 14.3   56 5   5
 [6,] 14.9   66 5   6
 [7,]  8.6   65 5   7
 [8,] 13.8   59 5   8
 
 The columns that contained NA were deleted.
 
 
 Le 27 mars 2015 à 10:38, peter dalgaard pda...@gmail.com a écrit :
 
 
 On 27 Mar 2015, at 09:58 , Stéphane Adamowicz 
 stephane.adamow...@avignon.inra.fr wrote:
 
 data_no_NA - data[, complete.cases(t(data))==T]
 
 Ouch! logical == TRUE is bad, logical == T is worse:
 
 data[, complete.cases(t(data))]
 
 
 -- 
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Office: A 4.23
 Email: pd@cbs.dk  Priv: pda...@gmail.com
 
 
 
 
 
 
 
 
 
 
 
 
 _
 Stéphane Adamowicz
 Inra, centre de recherche Paca, unité PSH
 228, route de l'aérodrome
 CS 40509
 domaine St Paul, site Agroparc
 84914 Avignon, cedex 9
 France
 
 stephane.adamow...@avignon.inra.fr
 tel.  +33 (0)4 32 72 24 35
 fax. +33 (0)4 32 72 24 32
 do not dial 0 when out of France
 web PSH  : https://www6.paca.inra.fr/psh
 web Inra : http://www.inra.fr/
 _
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating area of polygons created from a spatial intersect

2015-03-27 Thread MacQueen, Don
Suggest (strongly) that you move this question to r-sig-geo. Much more
appropriate there, and more people there are more familiar with this kind
of work. But ... I suspect you want gIntersection(), not gIntersects().

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 3/27/15, 7:16 AM, Walter Anderson wandrso...@gmail.com wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello all,

I am attempting to automate an analysis that I developed with ArcInfo
using R and the gdal and geos packages (or any other) if possible.

Here is the basic process

I have a shape file (lines) that defines the limits of all of the
projects with each project having a unique identifier.

I have another shape file (polys) that contains total population and
low income population and represent Census block groups.  This shape
file has an area field which has the acreage of the total block group.

Process

Step 1.
I then buffer these project lines to create a second shape file that
represents the 'footprint' of the project. (Creates polys).

Step 2.
In ArcInfo, I perform an intersection of the two shape files
(footprint and census blocks) and this creates a third shape file
which has a unique polygon for every project/census block intersection.

Step 3.
I then perform an area calculation (acres) on this new poly shape file
and use this calculated area divided by the original area of the
associated census block group to apportion the two population datum to
this new polygon.

Step 4.
Finally, I sum the two population datums for each of the projects from
the attribute table of this final shape file.

When I try to replicate the above procedure I run into a problem with
Step 2 when I use what I think is the appropriate command:

gIntersects(buffered_projects, census_blocks, byID=TRUE)

This command is producing a matrix of each project/census block
combination and only providing me a true/false indication.  Is there
any way to replicate the process from ArcInfo that I outlined above
within R?

Walter Anderson
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJVFWYWAAoJEHfnxjvhypCiMc8P/2Dsja+h4RKuR7ygHx+oI+4/
oEIxl/NtnHwPh6szyL6CBndSYI6hvdWWwBUm86IsJLmLSSFivB1Ru54nkFq+kfKL
tWpxyOAXNZoa2xn1ADaG1ChFiY/hF937zlTTv8D3a5pYAnYtTeyg6UJ3AuHsfjqG
PbFAg6T+QD3AlJvV73JGmEchgYoj7NlxiEmdcfB3X9cgLMMOCsfLgm4d5g5J/mhh
LKZm3Xg9+eXEjPJazHYB9xc0+AF8Jp6SH9XnnZ/DMFN3DuyR3KuTJr6YnHUKvtUs
o/Uog3zAGuVUDqNwF1H9+WNuz4Fm7XXiHl4xX0n9faE3niTe2b63bVn/Ueiyofb9
ky3wIpAr412/Ne3dtMtSDPkE3w2TsdIUKki2VP9duXB/4vEtHHXvQxNtfKdKmlYX
cnyyK/1ZwULiwWhyxZKJNUd6N2GyLYJ8MmJ7AXnT7EboJjNkhNta1BhWBE9Kzx8p
fUN1UwS8P96iFXztgg2jw3aYTPdPIp9rFYFJax5nKCl6n+YbjUw11GuO6F4lqNDv
PoLllcKkmsGWFo04P0TbS+x1zhc0wmyMn2EV8FcIXJ/80pqT/dWwksbjTfrQGoWx
Xo1m1vTR2LVVrdf0vSkWnxHA3xVQPv7YH5erVNBGWvuhgbLRx8j7MPUp7lFHOJvQ
bq1VJbpnZFRvJyZfII2p
=cZWI
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Having trouble with gdata read in

2015-03-27 Thread Benjamin Baker
Anthony,




XLSX won’t read an XLS file. Additionally, the legacy Java that is required for 
the xlsx package really effs up my computer. Have to reinstall my OS to fix it.




—
Sent from Mailbox

On Wed, Mar 25, 2015 at 3:51 PM, Anthony Damico ajdam...@gmail.com
wrote:

 maybe
 library(xlsx)
 tf - tempfile()
 ami - 
 http://www.ferc.gov/industries/electric/indus-act/demand-response/2008/survey/ami_survey_responses.xls
 
 download.file( ami , tf , mode = 'wb' )
 ami.data2008 - read.xlsx( tf , sheetIndex = 1 )
 On Wed, Mar 25, 2015 at 5:01 PM, Benjamin Baker bba...@reed.edu wrote:
 Trying to read and clean up the FERC data on Advanced Metering
 infrastructure. Of course it is in XLS for the first two survey years and
 then converts to XLSX for the final two. Bad enough that it is all in
 excel, they had to change the survey design and data format as well. Still,
 I’m sorting through it. However, when I try and read in the 2008 data, I’m
 getting this error:
 ###
 Wide character in print at
 /Library/Frameworks/R.framework/Versions/3.1/Resources/library/gdata/perl/
 xls2csv.pl line 270.
 Warning message:
 In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
   EOF within quoted string
 ###



 Here is the code I’m running to get the data:
 ###
 install.packages(gdata)
 library(gdata)
 fileUrl - 
 http://www.ferc.gov/industries/electric/indus-act/demand-response/2008/survey/ami_survey_responses.xls
 
 download.file(fileUrl, destfile=./ami.data/ami-data2008.xls)
 list.files(ami.data)
 dateDown.2008 - date()
 ami.data2008 - read.xls(./ami.data/ami-data2008.xls, sheet=1,
 header=TRUE)
 ###


 Reviewed the data in the XLS file, and both “” and # are present within
 it. Don’t know how to get the read.xls to ignore them so I can read all the
 data into my data frame. Tried :
 ###
 ami.data2008 - read.xls(./ami.data/ami-data2008.xls, sheet=1, quote=,
 header=TRUE)
 ###


 And it spits out “More columns than column names” output.


 Been searching this, and I can find some “solutions” for read.table, but
 nothing specific to read.xls


 Many thanks,


 Benjamin Baker



 —
 Sent from Mailbox
 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Categorizing by month

2015-03-27 Thread lychang
Hi everyone,

I'm trying to categorize by month in R. How can I do this if my dates are in
date/month/year form?

Thanks,
Lois



--
View this message in context: 
http://r.789695.n4.nabble.com/Categorizing-by-month-tp4705173.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] vif in package car: there are aliased coefficients in the model

2015-03-27 Thread Rodolfo Pelinson
Hello. I'm trying to use the function vif from package car in a lm. However
it returns the following error:
Error in vif.default(lm(MDescores.sitescores ~ hidroperiodo + localizacao
+  : there are aliased coefficients in the model

When I exclude any predictor from the model, it returns this warning
message:
Warning message: In cov2cor(v) : diag(.) had 0 or NA entries; non-finite
result is doubtful

When I exclude any other predictor from the model vif finally works. I
can't figure it out whats the problem. This are the results that R returns
me:

 vif(lm(MDescores.sitescores ~ hidroperiodo + localizacao + area +
profundidade + NTVM +  NTVI + PCs...c.1.., data = MDVIF))
Error in vif.default(lm(MDescores.sitescores ~ hidroperiodo + localizacao +
 :   there are aliased coefficients in the model

 vif(lm(MDescores.sitescores ~ localizacao + area + profundidade + NTVM +
 NTVI + PCs...c.1.., data = MDVIF))
 GVIF Df GVIF^(1/(2*Df))
localizacao   NaN  2 NaN
area  NaN  1 NaN
profundidade  NaN  1 NaN
NTVM  NaN  1 NaN
NTVI  NaN  1 NaN
PCs...c.1..   NaN  1 NaN
Warning message:
In cov2cor(v) : diag(.) had 0 or NA entries; non-finite result is doubtful

Thanks.
-- 
Rodolfo Mei Pelinson.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matrix manipulation question

2015-03-27 Thread Henric Winell

On 2015-03-27 11:41, Stéphane Adamowicz wrote:


Well, it seems to work with me.

Y - as.matrix(airquality)
head(Y, n=8)
  Ozone Solar.R Wind Temp Month Day
[1,]41 190  7.4   67 5   1
[2,]36 118  8.0   72 5   2
[3,]12 149 12.6   74 5   3
[4,]18 313 11.5   62 5   4
[5,]NA  NA 14.3   56 5   5
[6,]28  NA 14.9   66 5   6
[7,]23 299  8.6   65 5   7
[8,]19  99 13.8   59 5   8

Z - Y[,complete.cases(t(Y))==T]


Peter's point, I guess, is that

1. complete.cases(t(Y)) is already a vector of logicals
2. T (and F) can be redefined, so what if T - FALSE?


Henric Winell





head(Z, n=8)
  Wind Temp Month Day
[1,]  7.4   67 5   1
[2,]  8.0   72 5   2
[3,] 12.6   74 5   3
[4,] 11.5   62 5   4
[5,] 14.3   56 5   5
[6,] 14.9   66 5   6
[7,]  8.6   65 5   7
[8,] 13.8   59 5   8

The columns that contained NA were deleted.


Le 27 mars 2015 � 10:38, peter dalgaard pda...@gmail.com a �crit :



On 27 Mar 2015, at 09:58 , St�phane Adamowicz 
stephane.adamow...@avignon.inra.fr wrote:


data_no_NA - data[, complete.cases(t(data))==T]


Ouch! logical == TRUE is bad, logical == T is worse:

data[, complete.cases(t(data))]


--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com













_
St�phane Adamowicz
Inra, centre de recherche Paca, unit� PSH
228, route de l'a�rodrome
CS 40509
domaine St Paul, site Agroparc
84914 Avignon, cedex 9
France

stephane.adamow...@avignon.inra.fr
tel.  +33 (0)4 32 72 24 35
fax. +33 (0)4 32 72 24 32
do not dial 0 when out of France
web PSH  : https://www6.paca.inra.fr/psh
web Inra : http://www.inra.fr/
_


[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Having trouble with gdata read in

2015-03-27 Thread Benjamin Baker
Jim,




Thanks, XLConnect with proper syntax works great for both types of files.


—
Sent from Mailbox

On Thu, Mar 26, 2015 at 5:15 AM, jim holtman jholt...@gmail.com wrote:

 My suggestion is to use XLConnect to read the file:
 x -
 C:\\Users\\jh52822\\AppData\\Local\\Temp\\Rtmp6nVgFC\\file385c632aba3.xls
 require(XLConnect)
 Loading required package: XLConnect
 Loading required package: XLConnectJars
 XLConnect 0.2-10 by Mirai Solutions GmbH [aut],
   Martin Studer [cre],
   The Apache Software Foundation [ctb, cph] (Apache POI, Apache Commons
 Codec),
   Stephen Colebourne [ctb, cph] (Joda-Time Java library)
 http://www.mirai-solutions.com ,
 http://miraisolutions.wordpress.com
 input - f.readXLSheet(x, 1)

 str(input)
 'data.frame':   2266 obs. of  51 variables:
  $ EIA  : num  34 59 87 97 108 118 123 149
 150 157 ...
  $ Entity.Name  : chr  City of Abbeville City of
 Abbeville City of Ada Adams Electric Cooperative ...
  $ State: chr  SC LA MN IL ...
  $ NERC.Region  : chr  SERC SPP MRO SERC ...
  $ Filing.Order : num  12 11 1237 392 252 ...
  $ Q5.MultRegion: chr  ...
  $ Q6.OwnMeters.: chr  Yes Yes Yes Yes ...
  $ Q7.ResMeters : num  3051 4253 857 8154 33670 ...
  $ Q7.ComMeters : num  531 972 132 155 1719 ...
  $ Q7.IntMeters : num  0 19 32 NA 626 NA 29 0 2 NA
 ...
  $ Q7.TransMeters   : num  0 NA NA NA NA NA NA 0 0 NA
 ...
  $ Q7.OtherMeters   : num  0 NA NA 57 NA NA NA 0 0 NA
 ...
  $ Q7...total.meters: num  3582 5244 1021 8366 36015 ...
  $ Q8.15Min.ResAMI  : num  0 NA NA NA NA NA NA NA NA NA
 ...
  $ Q8.15Min.ComAMI  : num  0 NA NA 155 NA NA NA NA NA
 NA ...
  $ Q8.15Min.IndAMI  : num  0 NA NA NA NA NA NA NA NA NA
 ...
  $ Q8.15Min.TransAMI: num  0 NA NA NA NA NA NA NA NA NA
 ...
  $ Q8.15Min.OtherAMI: num  0 NA NA NA NA NA NA NA NA NA
 ...
  $ Q8.15Min.TotalAMI: num  0 0 0 155 0 0 0 0 0 0 ...
  $ Q8.Hourly.ResAMI : num  0 NA NA NA 16100 NA NA NA NA
 NA ...
  $ Q8.Hourly.ComAMI : num  0 NA NA NA 1600 NA NA NA NA
 NA ...
 
 Jim Holtman
 Data Munger Guru
 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.
 On Wed, Mar 25, 2015 at 5:01 PM, Benjamin Baker bba...@reed.edu wrote:
 Trying to read and clean up the FERC data on Advanced Metering
 infrastructure. Of course it is in XLS for the first two survey years and
 then converts to XLSX for the final two. Bad enough that it is all in
 excel, they had to change the survey design and data format as well. Still,
 I’m sorting through it. However, when I try and read in the 2008 data, I’m
 getting this error:
 ###
 Wide character in print at
 /Library/Frameworks/R.framework/Versions/3.1/Resources/library/gdata/perl/
 xls2csv.pl line 270.
 Warning message:
 In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
   EOF within quoted string
 ###



 Here is the code I’m running to get the data:
 ###
 install.packages(gdata)
 library(gdata)
 fileUrl - 
 http://www.ferc.gov/industries/electric/indus-act/demand-response/2008/survey/ami_survey_responses.xls
 
 download.file(fileUrl, destfile=./ami.data/ami-data2008.xls)
 list.files(ami.data)
 dateDown.2008 - date()
 ami.data2008 - read.xls(./ami.data/ami-data2008.xls, sheet=1,
 header=TRUE)
 ###


 Reviewed the data in the XLS file, and both “” and # are present within
 it. Don’t know how to get the read.xls to ignore them so I can read all the
 data into my data frame. Tried :
 ###
 ami.data2008 - read.xls(./ami.data/ami-data2008.xls, sheet=1, quote=,
 header=TRUE)
 ###


 And it spits out “More columns than column names” output.


 Been searching this, and I can find some “solutions” for read.table, but
 nothing specific to read.xls


 Many thanks,


 Benjamin Baker



 —
 Sent from Mailbox
 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix manipulation question

2015-03-27 Thread Jatin Kala

Thanks Richard,
This works, rather obvious now that i think of it!
=)

On 27/03/2015 4:30 pm, Richard M. Heiberger wrote:

just reverse what you did before.

newdata - data
newdata[] - NA
newdata[,!apply(is.na(data), 2, any)] - myfunction(data_no_NA)

On Fri, Mar 27, 2015 at 1:13 AM, Jatin Kala jatin.kala...@gmail.com wrote:

Hi,
I've got a rather large matrix of about 800 rows and 60 columns.
Each column is a time-series 800 long.

Out of these 60 time series, some have missing values (NA).
I want to strip out all columns that have one or more NA values, i.e., only
want full time series.

This should do the trick:
data_no_NA - data[,!apply(is.na(data), 2, any)]

I now use data_no_NA as input to a function, which returns output as a
matrix of the same size as data_no_NA

The trick is that i now need to put these columns back into a new 800 by
60 empty matrix, at their original locations.
Any suggestions on how to do that? hopefully without having to use loops.
I'm using R/3.0.3

Cheers,
Jatin.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why can't I access this type?

2015-03-27 Thread Henric Winell

On 2015-03-27 09:19, Patrick Connolly wrote:


[...]

On Sun, 22-Mar-2015 at 08:06AM -0800, John Kane wrote:

| Well, first off, you have no variable called Name.  You have lost
| the state names as they are rownames in the matrix state.x77 and
| not a variable.

If you did this:


all.states  - within(as.data.frame(state.x77), Name - rownames(state.x77))

instead of

all.states  - within(as.data.frame(state.x77), state - rownames(state.x77))


Alternatively, since 'data.frame()' coerces internally, one could do

all.states - data.frame(state.x77, Name = rownames(state.x77))


Henric Winell





then this would worka;

cold.states - all.states[all.states$Frost  150, c(Name, Frost)]


Modify the above to match where my guess at what you tried is in error.


HTH



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using and abusing %% (was Re: Why can't I access this type?)

2015-03-27 Thread Henric Winell

On 2015-03-26 07:48, Patrick Connolly wrote:


On Wed, 25-Mar-2015 at 03:14PM +0100, Henric Winell wrote:

...

| Well...  Opinions may perhaps differ, but apart from '%%' being
| butt-ugly it's also fairly slow:

Beauty, it is said, is in the eye of the beholder.  I'm impressed by
the way using %% reduces or eliminates complicated nested brackets.


I didn't dispute whether '%%' may be useful -- I just pointed out that 
it is slow.  However, it is only part of the problem: 'filter()' and 
'select()', although aesthetically pleasing, also seem to be slow:


 all.states - data.frame(state.x77, Name = rownames(state.x77))

 f1 - function()
+ all.states[all.states$Frost  150, c(Name, Frost)]

 f2 - function()
+ subset(all.states, Frost  150, select = c(Name, Frost))

 f3 - function() {
+ filt - subset(all.states, Frost  150)
+ subset(filt, select = c(Name, Frost))
+ }

 f4 - function()
+ all.states %% subset(Frost  150) %%
+ subset(select = c(Name, Frost))

 f5 - function()
+ select(filter(all.states, Frost  150), Name, Frost)

 f6 - function()
+ all.states %% filter(Frost  150) %% select(Name, Frost)

 mb - microbenchmark(
+ f1(), f2(), f3(), f4(), f5(), f6(),
+ times = 1000L
+ )
 print(mb, signif = 3L)
Unit: microseconds
 expr min   lq  mean median   uq  max neval   cld
 f1() 115  124  134.8812129  134 1500  1000 a
 f2() 128  141  147.4694145  151 1520  1000 a
 f3() 303  328  344.3175338  348 1740  1000  b
 f4() 458  494  518.0830510  523 1890  1000   c
 f5() 806  848  887.7270875  894 3510  1000d
 f6() 971 1010 1056.5659   1040 1060 3110  1000 e

So, using '%%', but leaving 'filter()' and 'select()' out of the 
equation, as in 'f4()' is only half as bad as the full 'dplyr' idiom 
in 'f6()'.  In this case, since we're talking microseconds, the speed-up 
is negligible but that *is* beside the point.



In this tiny example it's not obvious but it's very clear if the
objective is to sort the dataframe by three or four columns and
various lots of aggregation then returning a largish number of
consecutive columns, omitting the rest.  It's very easy to see what's
going on without the need for intermediate objects.


Why are you opposed to using intermediate objects?  In this case, as can 
be seen from 'f3()', it will also have the benefit of being faster than 
either '%%' or the full 'dplyr' idiom.



| [...]

It's no surprise that instructing a computer in something closer to
human language is an order of magnitude slower.


Certainly not true, at least for compiled languages.  In any case, 
judging from off-list correspondence, it definitely came as a surprise 
to some R users...


Given that '%%' is so heavily marketed through 'dplyr', where the 
latter is said to provide blazing fast performance for in-memory data 
by writing key pieces in C++ and a fast, consistent tool for working 
with data frame like objects, both in memory and out of memory, I don't 
think it's far-fetched to expect that it should be more performant than 
base R.



I'm sure you'd get something even quicker using machine code.


Don't be ridiculous.  We're mainly discussing

all.states[all.states$Frost  150, c(state, Frost)]

vs.

all.states %% filter(Frost  150) %% select(state, Frost)

i.e., pure R code.


I spend 3 or 4 orders of magnitude more time writing code than running it.


You and me both.  But that doesn't mean speed is of no or little importance.


It's much more important to me to be able to read and modify than

 it is to have it run at optimum speed.

Good for you.  But surely, if this is your goal, nothing beats 
intermediate objects.  And like I said, it may still be faster than the 
'dplyr' idiom.



| Of course, this doesn't matter for interactive one-off use.  But
| lately I've seen examples of the '%%' operator creeping into
| functions in packages.

That could indicate that %% is seductively easy to use.  It's
probably true that there are places where it should be done the hard
way.


We all know how easy it is to write ugly and sluggish code in R.  But 
'foo[i,j]' is neither ugly nor sluggish and certainly not the hard way.



|  However, it would be nice to see a fast pipe operator as part of
| base R.


Heck, it doesn't even have to be fast as long as it's a bit more elegant 
than '%%'.



Henric Winell





|
|
| Henric Winell
|



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Having trouble with gdata read in

2015-03-27 Thread Benjamin Baker
Jim,




I’m not seeing the command f.readXLSheet in the documentation, nor is it 
executing in my code.




—
Sent from Mailbox





On Thursday, Mar 26, 2015 at 5:15 AM, jim holtman jholt...@gmail.com, wrote:

My suggestion is to use XLConnect to read the file:






 x - 
 C:\\Users\\jh52822\\AppData\\Local\\Temp\\Rtmp6nVgFC\\file385c632aba3.xls

 require(XLConnect)

Loading required package: XLConnect

Loading required package: XLConnectJars

XLConnect 0.2-10 by Mirai Solutions GmbH [aut],

  Martin Studer [cre],

  The Apache Software Foundation [ctb, cph] (Apache POI, Apache Commons

    Codec),

  Stephen Colebourne [ctb, cph] (Joda-Time Java library)


http://www.mirai-solutions.com ,

http://miraisolutions.wordpress.com

 input - f.readXLSheet(x, 1)

 

 str(input)

'data.frame':   2266 obs. of  51 variables:

 $ EIA                                  : num  34 59 87 97 108 118 123 149 150 
157 ...

 $ Entity.Name                          : chr  City of Abbeville City of 
Abbeville City of Ada Adams Electric Cooperative ...

 $ State                                : chr  SC LA MN IL ...

 $ NERC.Region                          : chr  SERC SPP MRO SERC ...

 $ Filing.Order                         : num  12 11 1237 392 252 ...

 $ Q5.MultRegion                        : chr  ...

 $ Q6.OwnMeters.                        : chr  Yes Yes Yes Yes ...

 $ Q7.ResMeters                         : num  3051 4253 857 8154 33670 ...

 $ Q7.ComMeters                         : num  531 972 132 155 1719 ...

 $ Q7.IntMeters                         : num  0 19 32 NA 626 NA 29 0 2 NA ...

 $ Q7.TransMeters                       : num  0 NA NA NA NA NA NA 0 0 NA ...

 $ Q7.OtherMeters                       : num  0 NA NA 57 NA NA NA 0 0 NA ...

 $ Q7...total.meters                    : num  3582 5244 1021 8366 36015 ...

 $ Q8.15Min.ResAMI                      : num  0 NA NA NA NA NA NA NA NA NA ...

 $ Q8.15Min.ComAMI                      : num  0 NA NA 155 NA NA NA NA NA NA ...

 $ Q8.15Min.IndAMI                      : num  0 NA NA NA NA NA NA NA NA NA ...

 $ Q8.15Min.TransAMI                    : num  0 NA NA NA NA NA NA NA NA NA ...

 $ Q8.15Min.OtherAMI                    : num  0 NA NA NA NA NA NA NA NA NA ...

 $ Q8.15Min.TotalAMI                    : num  0 0 0 155 0 0 0 0 0 0 ...

 $ Q8.Hourly.ResAMI                     : num  0 NA NA NA 16100 NA NA NA NA NA 
...

 $ Q8.Hourly.ComAMI                     : num  0 NA NA NA 1600 NA NA NA NA NA 
...














Jim Holtman
Data Munger Guru
 
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



On Wed, Mar 25, 2015 at 5:01 PM, Benjamin Baker bba...@reed.edu wrote:
Trying to read and clean up the FERC data on Advanced Metering infrastructure. 
Of course it is in XLS for the first two survey years and then converts to XLSX 
for the final two. Bad enough that it is all in excel, they had to change the 
survey design and data format as well. Still, I’m sorting through it. However, 
when I try and read in the 2008 data, I’m getting this error:

###

Wide character in print at 
/Library/Frameworks/R.framework/Versions/3.1/Resources/library/gdata/perl/xls2csv.pl
 line 270.

Warning message:

In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :

  EOF within quoted string

###




Here is the code I’m running to get the data:

###

install.packages(gdata)

library(gdata)

fileUrl - 
http://www.ferc.gov/industries/electric/indus-act/demand-response/2008/survey/ami_survey_responses.xls;

download.file(fileUrl, destfile=./ami.data/ami-data2008.xls)

list.files(ami.data)

dateDown.2008 - date()

ami.data2008 - read.xls(./ami.data/ami-data2008.xls, sheet=1, header=TRUE)

###



Reviewed the data in the XLS file, and both “” and # are present within it. 
Don’t know how to get the read.xls to ignore them so I can read all the data 
into my data frame. Tried :

###

ami.data2008 - read.xls(./ami.data/ami-data2008.xls, sheet=1, quote=, 
header=TRUE)

###



And it spits out “More columns than column names” output.



Been searching this, and I can find some “solutions” for read.table, but 
nothing specific to read.xls



Many thanks,



Benjamin Baker




—

Sent from Mailbox

        [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] hash - extract key values

2015-03-27 Thread Brian Smith
Hi,

I was trying to use hash, but can't seem to get the keys from the hash.
According to the hash documentation ('hash' package pdf, the following
should work:

 hx - hash( c('a','b','c'), 1:3 )
 class(hx)
[1] hash
attr(,package)
[1] hash
 hx$a
[1] 1
 keys(hx)
Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘keys’ for signature
‘hash’

How can I get the keys for my hash?

thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Categorizing by month

2015-03-27 Thread Ben Tupper
Hi,

On Mar 27, 2015, at 12:06 PM, lychang lych...@emory.edu wrote:

 Hi everyone,
 
 I'm trying to categorize by month in R. How can I do this if my dates are in
 date/month/year form?
 

I'm not sure about the date form you describe, but if you have the dates as 
POSIXct you can extract the month as character and categorize with that.

x - seq(from = as.POSIXct(2000/1/1, format=%Y/%m/%d), to = 
as.POSIXct(2009/12/1, format=%Y/%m/%d), by = 'month')
mon - format(x, '%m')
xx - split(x, mon)
str(xx)
List of 12
 $ 01: POSIXct[1:10], format: 2000-01-01 2001-01-01 2002-01-01 
2003-01-01 ...
 $ 02: POSIXct[1:10], format: 2000-02-01 2001-02-01 2002-02-01 
2003-02-01 ...
 $ 03: POSIXct[1:10], format: 2000-03-01 2001-03-01 2002-03-01 
2003-03-01 ...
 $ 04: POSIXct[1:10], format: 2000-04-01 2001-04-01 2002-04-01 
2003-04-01 ...
 $ 05: POSIXct[1:10], format: 2000-05-01 2001-05-01 2002-05-01 
2003-05-01 ...
 $ 06: POSIXct[1:10], format: 2000-06-01 2001-06-01 2002-06-01 
2003-06-01 ...
 $ 07: POSIXct[1:10], format: 2000-07-01 2001-07-01 2002-07-01 
2003-07-01 ...
 $ 08: POSIXct[1:10], format: 2000-08-01 2001-08-01 2002-08-01 
2003-08-01 ...
 $ 09: POSIXct[1:10], format: 2000-09-01 2001-09-01 2002-09-01 
2003-09-01 ...
 $ 10: POSIXct[1:10], format: 2000-10-01 2001-10-01 2002-10-01 
2003-10-01 ...
 $ 11: POSIXct[1:10], format: 2000-11-01 2001-11-01 2002-11-01 
2003-11-01 ...
 $ 12: POSIXct[1:10], format: 2000-12-01 2001-12-01 2002-12-01 
2003-12-01 ...

Does that help?

Ben

 Thanks,
 Lois
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Categorizing-by-month-tp4705173.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Having trouble with gdata read in

2015-03-27 Thread jim holtman
pardon me it was my function which is just a call to readWorksheetFromFile


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Mar 27, 2015 at 3:52 PM, Benjamin Baker bba...@reed.edu wrote:

  Jim,

 I’m not seeing the command f.readXLSheet in the documentation, nor is it
 executing in my code.


 —
 Sent from Mailbox https://www.dropbox.com/mailbox

 On Thursday, Mar 26, 2015 at 5:15 AM, jim holtman jholt...@gmail.com,
 wrote:

 My suggestion is to use XLConnect to read the file:


   x -
 C:\\Users\\jh52822\\AppData\\Local\\Temp\\Rtmp6nVgFC\\file385c632aba3.xls
  require(XLConnect)
 Loading required package: XLConnect
 Loading required package: XLConnectJars
 XLConnect 0.2-10 by Mirai Solutions GmbH [aut],
   Martin Studer [cre],
   The Apache Software Foundation [ctb, cph] (Apache POI, Apache Commons
 Codec),
   Stephen Colebourne [ctb, cph] (Joda-Time Java library)
  http://www.mirai-solutions.com ,
 http://miraisolutions.wordpress.com
  input - f.readXLSheet(x, 1)
 
  str(input)
 'data.frame':   2266 obs. of  51 variables:
  $ EIA  : num  34 59 87 97 108 118 123
 149 150 157 ...
  $ Entity.Name  : chr  City of Abbeville City
 of Abbeville City of Ada Adams Electric Cooperative ...
  $ State: chr  SC LA MN IL ...
  $ NERC.Region  : chr  SERC SPP MRO SERC
 ...
  $ Filing.Order : num  12 11 1237 392 252 ...
  $ Q5.MultRegion: chr  ...
  $ Q6.OwnMeters.: chr  Yes Yes Yes Yes ...
  $ Q7.ResMeters : num  3051 4253 857 8154 33670
 ...
  $ Q7.ComMeters : num  531 972 132 155 1719 ...
  $ Q7.IntMeters : num  0 19 32 NA 626 NA 29 0 2
 NA ...
  $ Q7.TransMeters   : num  0 NA NA NA NA NA NA 0 0 NA
 ...
  $ Q7.OtherMeters   : num  0 NA NA 57 NA NA NA 0 0 NA
 ...
  $ Q7...total.meters: num  3582 5244 1021 8366 36015
 ...
  $ Q8.15Min.ResAMI  : num  0 NA NA NA NA NA NA NA NA
 NA ...
  $ Q8.15Min.ComAMI  : num  0 NA NA 155 NA NA NA NA NA
 NA ...
  $ Q8.15Min.IndAMI  : num  0 NA NA NA NA NA NA NA NA
 NA ...
  $ Q8.15Min.TransAMI: num  0 NA NA NA NA NA NA NA NA
 NA ...
  $ Q8.15Min.OtherAMI: num  0 NA NA NA NA NA NA NA NA
 NA ...
  $ Q8.15Min.TotalAMI: num  0 0 0 155 0 0 0 0 0 0 ...
  $ Q8.Hourly.ResAMI : num  0 NA NA NA 16100 NA NA NA
 NA NA ...
  $ Q8.Hourly.ComAMI : num  0 NA NA NA 1600 NA NA NA
 NA NA ...
  



 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.

 On Wed, Mar 25, 2015 at 5:01 PM, Benjamin Baker bba...@reed.edu wrote:

 Trying to read and clean up the FERC data on Advanced Metering
 infrastructure. Of course it is in XLS for the first two survey years and
 then converts to XLSX for the final two. Bad enough that it is all in
 excel, they had to change the survey design and data format as well. Still,
 I’m sorting through it. However, when I try and read in the 2008 data, I’m
 getting this error:
 ###
 Wide character in print at
 /Library/Frameworks/R.framework/Versions/3.1/Resources/library/gdata/perl/
 xls2csv.pl line 270.
 Warning message:
 In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
   EOF within quoted string
 ###



 Here is the code I’m running to get the data:
 ###
 install.packages(gdata)
 library(gdata)
 fileUrl - 
 http://www.ferc.gov/industries/electric/indus-act/demand-response/2008/survey/ami_survey_responses.xls
 
 download.file(fileUrl, destfile=./ami.data/ami-data2008.xls)
 list.files(ami.data)
 dateDown.2008 - date()
 ami.data2008 - read.xls(./ami.data/ami-data2008.xls, sheet=1,
 header=TRUE)
 ###


 Reviewed the data in the XLS file, and both “” and # are present within
 it. Don’t know how to get the read.xls to ignore them so I can read all the
 data into my data frame. Tried :
 ###
 ami.data2008 - read.xls(./ami.data/ami-data2008.xls, sheet=1,
 quote=, header=TRUE)
 ###


 And it spits out “More columns than column names” output.


 Been searching this, and I can find some “solutions” for read.table, but
 nothing specific to read.xls


 Many thanks,


 Benjamin Baker



 —
 Sent from Mailbox
 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]


Re: [R] hash - extract key values

2015-03-27 Thread Boris Steipe
Works for me :

 library(hash)
hash-2.2.6 provided by Decision Patterns

 hx - hash( c('a','b','c'), 1:3 )
 class(hx)
[1] hash
attr(,package)
[1] hash
 hx$a
[1] 1
 keys(hx)
[1] a b c


Maybe restart your session? Clear your workspace? Upgrade?

B.





On Mar 27, 2015, at 7:39 PM, Brian Smith bsmith030...@gmail.com wrote:

 Hi,
 
 I was trying to use hash, but can't seem to get the keys from the hash.
 According to the hash documentation ('hash' package pdf, the following
 should work:
 
 hx - hash( c('a','b','c'), 1:3 )
 class(hx)
 [1] hash
 attr(,package)
 [1] hash
 hx$a
 [1] 1
 keys(hx)
 Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘keys’ for signature
 ‘hash’
 
 How can I get the keys for my hash?
 
 thanks!
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] vif in package car: there are aliased coefficients in the model

2015-03-27 Thread John Fox
Dear Rodolfo,

It's apparently the case that at least one of the columns of the model
matrix for your model is perfectly collinear with others. 

There's not nearly enough information here to figure out exactly what the
problem is, and the information that you provided certainly falls short of
allowing me or anyone else to reproduce your problem and diagnose it
properly. It's not even clear from your message exactly what the structure
of the model is, although localizacao  is apparently a factor with 3 levels.


If you look at the summary() output for your model or just print it, you
should at least see which coefficients are aliased, and that might help you
understand what went wrong.

I hope this helps,
 John

---
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/


 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rodolfo
 Pelinson
 Sent: March-27-15 3:07 PM
 To: r-help@r-project.org
 Subject: [R] vif in package car: there are aliased coefficients in the
model
 
 Hello. I'm trying to use the function vif from package car in a lm.
However it
 returns the following error:
 Error in vif.default(lm(MDescores.sitescores ~ hidroperiodo + localizacao
 +  : there are aliased coefficients in the model
 
 When I exclude any predictor from the model, it returns this warning
 message:
 Warning message: In cov2cor(v) : diag(.) had 0 or NA entries; non-finite
 result is doubtful
 
 When I exclude any other predictor from the model vif finally works. I
can't
 figure it out whats the problem. This are the results that R returns
 me:
 
  vif(lm(MDescores.sitescores ~ hidroperiodo + localizacao + area +
 profundidade + NTVM +  NTVI + PCs...c.1.., data = MDVIF)) Error in
 vif.default(lm(MDescores.sitescores ~ hidroperiodo + localizacao +
  :   there are aliased coefficients in the model
 
  vif(lm(MDescores.sitescores ~ localizacao + area + profundidade + NTVM
  +
  NTVI + PCs...c.1.., data = MDVIF))
  GVIF Df GVIF^(1/(2*Df))
 localizacao   NaN  2 NaN
 area  NaN  1 NaN
 profundidade  NaN  1 NaN
 NTVM  NaN  1 NaN
 NTVI  NaN  1 NaN
 PCs...c.1..   NaN  1 NaN
 Warning message:
 In cov2cor(v) : diag(.) had 0 or NA entries; non-finite result is doubtful
 
 Thanks.
 --
 Rodolfo Mei Pelinson.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using and abusing %% (was Re: Why can't I access this type?)

2015-03-27 Thread Hadley Wickham
 I didn't dispute whether '%%' may be useful -- I just pointed out that it
 is slow.  However, it is only part of the problem: 'filter()' and
 'select()', although aesthetically pleasing, also seem to be slow:

 all.states - data.frame(state.x77, Name = rownames(state.x77))

 f1 - function()
 + all.states[all.states$Frost  150, c(Name, Frost)]

 f2 - function()
 + subset(all.states, Frost  150, select = c(Name, Frost))

 f3 - function() {
 + filt - subset(all.states, Frost  150)
 + subset(filt, select = c(Name, Frost))
 + }

 f4 - function()
 + all.states %% subset(Frost  150) %%
 + subset(select = c(Name, Frost))

 f5 - function()
 + select(filter(all.states, Frost  150), Name, Frost)

 f6 - function()
 + all.states %% filter(Frost  150) %% select(Name, Frost)

 mb - microbenchmark(
 + f1(), f2(), f3(), f4(), f5(), f6(),
 + times = 1000L
 + )
 print(mb, signif = 3L)
 Unit: microseconds
  expr min   lq  mean median   uq  max neval   cld
  f1() 115  124  134.8812129  134 1500  1000 a
  f2() 128  141  147.4694145  151 1520  1000 a
  f3() 303  328  344.3175338  348 1740  1000  b
  f4() 458  494  518.0830510  523 1890  1000   c
  f5() 806  848  887.7270875  894 3510  1000d
  f6() 971 1010 1056.5659   1040 1060 3110  1000 e

 So, using '%%', but leaving 'filter()' and 'select()' out of the equation,
 as in 'f4()' is only half as bad as the full 'dplyr' idiom in 'f6()'.  In
 this case, since we're talking microseconds, the speed-up is negligible but
 that *is* beside the point.

When benchmarking it's important to consider both the relative and
absolute difference and to think about how the cost scales as the data
grows - the cost of using using %% is fixed, and 500 µs doesn't seem
like a huge performance penalty to me.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.