date:20121213


HI, 



now I have dataset:

  Product Price_LC.1 Price_LC.2 Price_elasticity.1 
Price_elasticity.2 Mean_Price Mean_Price_elasticity Trade_Price_Band Country
1 100   357580.1   527483.6 -4.1498383  
   -2.8459564   473934.0-3.6935476 0-542811  VN
51208   436931.9   536143.9 -3.9432305  
   -3.4570170   469330.2-3.6595372 0-542811 VN
61280   419666.6   520936.3 -1.7357983  
   -0.7689443   461367.0-1.2848528 0-542811  VN
2 101   629371.0   735167.2 -5.2289933  
   -3.0364372   676059.9-3.8059064542812-904779  VN
71616   576816.1   663369.6 -4.5528840  
   -3.9523261   614864.5-4.3181914542812-904779  VN
81661   583587.9   689853.0 -5.0948101  
   -4.3427497   650680.0-4.8109781542812-904779  VN


I want to get the following dataset:


Product   VN Price_band
100  -3.69354760-542811
   [357580.1, 527483.6 ]  
   [-2.8459564, 473934.0]

1208-3.6595372   0-542811 
   [436931.9, 536143.9]
   [-3.9432305, -3.4570170]







how do I get this in r?? Thanks.


Kind regards,
Lingyi

  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] remove NA in df results in NA, NA.1 ... rows

2012-12-13 Thread Gerrit Eichner


Hi Raphael,

see below.


I have the following data frame (df):
...
 df2

X.PAD2 Y.PAD2
73 574618.3 179650
74 574719.0 179688
75 574719.0 179688
76 574723.5 179678
77 574724.9 179673
78 574747.1 179598
79 574641.8 179570
80 574639.6 179573
81 574618.3 179650
82   NA NA
83   NA NA
...
44   NA NA
45   NA NA
46   NA NA

followed by removing the NA's using

 df2 - df2[!is.na(df2),]

...


is.na( df2) produces a logical matrix (!), and you are then indexing the 
rows of your data frame with a matrix which is converted into a vector 
of its elements producing far too many logical indices for your task (so 
to say).


I assume you should be using


na.omit( df2)


instead.

 Hth  --  Gerrit

-
Dr. Gerrit Eichner   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104  Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109http://www.uni-giessen.de/cms/eichner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] remove NA in df results in NA, NA.1 ... rows

2012-12-13 Thread Anthony Damico

df2 - df2[!is.na(df2),] isn't doing what you want it to do because
df2 is a data.frame and not a vector

to solve your problem, review

http://stackoverflow.com/questions/4862178/r-remove-rows-with-nas-in-data-frame








On Thu, Dec 13, 2012 at 3:20 AM, raphael.fel...@art.admin.ch wrote:

 Good morning!

 I have the following data frame (df):

 X.outer  Y.outer   X.PAD1   Y.PAD1   X.PAD2 Y.PAD2   X.PAD3 Y.PAD3
 X.PAD4 Y.PAD4
 73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674
 574747.1 179598
 74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699
 574724.9 179673
 75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787
 574729.2 179674
 76 574663.6 179736.0 574699.1 179734.0 574723.5 179678 574703.4 179760
 574831.8 179699
 77 574649.9 179734.0 574704.7 179724.0 574724.9 179673 574702.4 179755
 574852.3 179626
 78 574647.3 179742.0 574706.9 179719.0 574747.1 179598 574702.0 179754
 574747.1 179598
 79 574633.6 179739.0 574711.4 179710.0 574641.8 179570 574698.0 179747
   NA NA
 80 574634.9 179732.0 574716.6 179698.0 574639.6 179573 574700.2 179738
   NA NA
 81 574616.5 179728.6 574716.7 179695.0 574618.3 179650 574704.4 179729
   NA NA
 82 574615.4 179731.0 574718.2 179690.0   NA NA 574708.1 179724
   NA NA
 83 574614.4 179733.6 574719.1 179688.0   NA NA 574709.3 179720
   NA NA
 ...

 44 574702.0 179754.0   NA   NA   NA NA   NA NA
   NA NA

 45 574695.1 179751.0   NA   NA   NA NA   NA NA
   NA NA

 46 574694.4 179752.0   NA   NA   NA NA   NA NA
   NA NA

 Which I subset to

 df2 - df[,c(X.PAD2,Y.PAD2)]

 df2

  X.PAD2 Y.PAD2

 73 574618.3 179650

 74 574719.0 179688

 75 574719.0 179688

 76 574723.5 179678

 77 574724.9 179673

 78 574747.1 179598

 79 574641.8 179570

 80 574639.6 179573

 81 574618.3 179650

 82   NA NA

 83   NA NA

 ...

 44   NA NA

 45   NA NA

 46   NA NA





 followed by removing the NA's using



 df2 - df2[!is.na(df2),]



 If I now call df2, I get:



X.PAD2 Y.PAD2

 73   574618.3 179650

 74   574719.0 179688

 75   574719.0 179688

 76   574723.5 179678

 77   574724.9 179673

 78   574747.1 179598

 79   574641.8 179570

 80   574639.6 179573

 81   574618.3 179650

 NA NA NA

 NA.1   NA NA

 NA.2   NA NA

 NA.3   NA NA

 NA.4   NA NA

 NA.5   NA NA

 NA.6   NA NA

 NA.7   NA NA

 NA.8   NA NA



 It seems there are still NA's in my data frame. How can I get rid of them?
 What is the meaning of the rows numbered NA, NA.1 and so on?



 Thanks for any hints.



 Best regards



 Raphael Felber


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to aggregate the dataset?


HI,

I want to transform the following dataset

 Product Price_LC.1 Price_LC.2 Price_elasticity.1 Price_elasticity.2 Mean_Price 
Mean_Price_elasticity Trade_Price_Band Country
 100 35   52   -4.14   -2.84
   47  -3.69 0-542811  VN
   1208 43   53   -3.94   -3.45 
  47  -3.65 0-542811  VN



into:

  
Product   VN  Price_Band
  
100   -3.69 0-542811   
[35,52] 
[43,53]

1208 -3.65 0-542811
   [43,53]
   [-3.94,-3.45]


How do I get it in R? I have large dataset like this, I need to create 
mechanism to tranform those. Thanks.


Kind regards,
Lingyi



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] remove NA in df results in NA, NA.1 ... rows

2012-12-13 Thread Jeff Newmiller

is.na(df2) is not doing what you think it is doing. Perhaps you should read 
?na.omit.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

raphael.fel...@art.admin.ch wrote:

Good morning!

I have the following data frame (df):

X.outer  Y.outer   X.PAD1   Y.PAD1   X.PAD2 Y.PAD2   X.PAD3 Y.PAD3  
X.PAD4 Y.PAD4
73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674
574747.1 179598
74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699
574724.9 179673
75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787
574729.2 179674
76 574663.6 179736.0 574699.1 179734.0 574723.5 179678 574703.4 179760
574831.8 179699
77 574649.9 179734.0 574704.7 179724.0 574724.9 179673 574702.4 179755
574852.3 179626
78 574647.3 179742.0 574706.9 179719.0 574747.1 179598 574702.0 179754
574747.1 179598
79 574633.6 179739.0 574711.4 179710.0 574641.8 179570 574698.0 179747 
 NA NA
80 574634.9 179732.0 574716.6 179698.0 574639.6 179573 574700.2 179738 
 NA NA
81 574616.5 179728.6 574716.7 179695.0 574618.3 179650 574704.4 179729 
 NA NA
82 574615.4 179731.0 574718.2 179690.0   NA NA 574708.1 179724 
 NA NA
83 574614.4 179733.6 574719.1 179688.0   NA NA 574709.3 179720 
 NA NA
...

44 574702.0 179754.0   NA   NA   NA NA   NA NA 
 NA NA

45 574695.1 179751.0   NA   NA   NA NA   NA NA 
 NA NA

46 574694.4 179752.0   NA   NA   NA NA   NA NA 
 NA NA

Which I subset to

df2 - df[,c(X.PAD2,Y.PAD2)]

df2

 X.PAD2 Y.PAD2

73 574618.3 179650

74 574719.0 179688

75 574719.0 179688

76 574723.5 179678

77 574724.9 179673

78 574747.1 179598

79 574641.8 179570

80 574639.6 179573

81 574618.3 179650

82   NA NA

83   NA NA

...

44   NA NA

45   NA NA

46   NA NA





followed by removing the NA's using



df2 - df2[!is.na(df2),]



If I now call df2, I get:



   X.PAD2 Y.PAD2

73   574618.3 179650

74   574719.0 179688

75   574719.0 179688

76   574723.5 179678

77   574724.9 179673

78   574747.1 179598

79   574641.8 179570

80   574639.6 179573

81   574618.3 179650

NA NA NA

NA.1   NA NA

NA.2   NA NA

NA.3   NA NA

NA.4   NA NA

NA.5   NA NA

NA.6   NA NA

NA.7   NA NA

NA.8   NA NA



It seems there are still NA's in my data frame. How can I get rid of
them? What is the meaning of the rows numbered NA, NA.1 and so on?



Thanks for any hints.



Best regards



Raphael Felber


   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] create a color palette with custom ranges between colors

2012-12-13 Thread jeff6868

Thank you Nicole!

I did it with the color.palette function in the link you gave me.
I added then in my levelplot function a sequence with at:

at=seq(-40,40,1)

And it works quite good. 

Thanks again Nicole.

Merci à toi aussi pascal, et vive le CRC ainsi que le grand C. C. !
;)





--
View this message in context: 
http://r.789695.n4.nabble.com/create-a-color-palette-with-custom-ranges-between-colors-tp4652875p4652969.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] possible bug in function 'mclapply' of package parallel

2012-12-13 Thread Michael Weylandt

Mailing list ate the attachment. 

Can you send it plain text (if short) or post it somewhere online?

Michael

On Dec 13, 2012, at 1:54 AM, Asis Hallab asis.hal...@gmail.com wrote:

 Dear parallel users and developers,
 
 I might have encountered a bug in the function 'mclapply' of package
 'parallel'. I construct a matrix using the same input data and code with a
 single difference: Once I use mclapply and the other time lapply.
 
 Shockingly the result is NOT the same.
 
 To evaluate please unpack the attached archive and execute
 Rscript mclapply_test.R
 
 I put the two simple functions I wrote inside the R script and the
 serialized input matrix.
 My function is once executed using mclapply and the other time lapply
 internally. - There's an argument lapply.funk, one can set to mclapply.
 
 The results are checked for identity with a striking FALSE.
 
 Any hints on my misuse and or misunderstanding of mclapply or verification
 of a true bug will be much appreciated.
 
 Kind regards!
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to aggregate the dataset


HI,



Sorry for messing up..

I want to transform the following dataset:

product min_price  max_price mean_price country   price_band
11   34   50   40 VN 0-300
22   10   30   15 VN 0-300


Into:

product   VN   price_band
  11400-300
[34,50]
  22150-300
[10,30]


How can I do this in r? I have large dataset like this. I want to transform all 
into that one. Thanks a lot.


Kind regards,
Tammy



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to aggregate the dataset


Really sorry for messing up.

I want to transform:

 product min_price  max_price mean_price country   price_band
 11 34 50   40 
 VN   0-300
 2210  30
 15 VN   0-300


into 

 Into:
 
 product   VN   price_band
   11   40 0-300
  [34,50]
   22 15 0-300
 [10,30]


How Can I do this in r?


Kind regards,
Tammy




From: metal_lical...@live.com
To: metal_lical...@live.com
Subject: RE: [R] how to aggregate the dataset
Date: Thu, 13 Dec 2012 14:22:54 +0300





HI, 


I want it looks like this:

 Into:
 
 product   VN   price_band
   11   40 0-300
  [34,50]
   22 15 0-300
 [10,30]
 

 From: metal_lical...@live.com
 To: r-help@r-project.org
 Date: Thu, 13 Dec 2012 13:42:35 +0300
 Subject: [R] how to aggregate the dataset
 
 
 HI,
 
 
 
 Sorry for messing up..
 
 I want to transform the following dataset:
 
 product min_price  max_price mean_price country   price_band
 11   34   50   40 VN 0-300
 22   10   30   15 VN 0-300
 
 
 Into:
 
 product VN   price_band
   11 40 0-300
  [34,50]
   22   15 0-300
  [10,30]
 
 
 How can I do this in r? I have large dataset like this. I want to transform 
 all into that one. Thanks a lot.
 
 
 Kind regards,
 Tammy
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do you use agrep inside a loop

2012-12-13 Thread surekha nagabhushan

Thank you all it worked after I checked for length of agrep's result :)

On Tue, Dec 11, 2012 at 6:11 PM, Rui Barradas ruipbarra...@sapo.pt wrote:

 Hello,

 Inline.

 Em 11-12-2012 12:04, surekha nagabhushan escreveu:

 Rui,


 I have initialized it...doesn't seem to help...

 result_vector - vector()


 No! This must be just before the loop in 'j'


  result - vector(list, (length(test1)-1))
 for(i in 1:(length(test1)-1))
 {
for(j in (i+1):length(test1))
{
result_vector[j-i] - agrep(test1[i], test1[j], ignore.case = TRUE,
 value
 = TRUE, max.distance = 0.1)
}
result[[i]]- result_vector
 }

 whenever agrep does not find a match it returns character(0), length zero,
 do you suppose it has anything to do with that?


 Yes, without testing for length zero it throws an error, replacement has
 length zero.


 Hope this helps,

 Rui Barradas


 Thank you.

 On Tue, Dec 11, 2012 at 5:13 PM, Rui Barradas ruipbarra...@sapo.pt
 wrote:

  Hello,

 See if this is it. You must reinitialize 'result_vector' just before the
 loop that constructs it.


 test1 - c(Vashi, Vashi,navi Mumbai, Thane, Vashi,new Mumbai,

  Thana, Surekha, Thane(w), surekhaN)

 result - vector(list, (length(test1)-1))
 for(i in 1:(length(test1)-1)){
  result_vector - vector()
  for(j in (i+1):length(test1)){
  tmp - agrep(test1[i], test1[j],

  ignore.case = TRUE, value = TRUE,
  max.distance = 0.1)
  if(length(tmp)  0) result_vector[j-i] - tmp
  }
  result[[i]] - result_vector
 }
 result



 Hope this helps,

 Rui Barradas
 Em 11-12-2012 11:23, surekha nagabhushan escreveu:

  Pascal,

 result_vector - vector()
 result - vector(list, (length(test1)-1))
 for(i in 1:(length(test1)-1))
 {
 for(j in (i+1):length(test1))
 {
 result_vector[j-i] - agrep(test1[i], test1[j], ignore.case = TRUE,
 value
 = TRUE, max.distance = 0.1)
 }
 result[[i]]- result_vector
 }

 I'm not sure what the problem is with the dimension/length of result
 which
 is a list. But I just use the second line: result - vector(list,
 (length(test1)-1))

 What am I missing?

 Thank you Rui Barradas.

 On Tue, Dec 11, 2012 at 4:25 PM, Rui Barradas ruipbarra...@sapo.pt
 wrote:

   Hello,

 And another error in line 2. It should be

 for(j in (i+1):length(test1))


 Hope this helps,

 Rui Barradas

 Em 11-12-2012 07:54, Pascal Oettli escreveu:

Hi,

  There is a mistake in the first line. It should be:

  for(i in 1:(length(test1)-1))

  Regards,
 Pascal


 Le 11/12/2012 16:01, surekha nagabhushan a écrit :

   Hi all.

 This is my first message at R-help...so I'm hoping I have some
 beginner's
 luck and get some good help for my problem!

 FYI I have just started using R recently so my knowledge of R is
 pretty
 preliminary.

 Okay here is what I need help with - I need to know how to use agrep
 in a
 for loop.

 I need to compare elements of a vector of names with other elements
 of
 the
 same vector.

 However if I use something like this:

 for(i in 1:length(test1)-1)
 {
  for(j in i+1:length(test1))
  {
  result[[i]][j] - agrep(test1[i], test1[j], ignore.case = TRUE,
 value
 =
 TRUE, max.distance = 0.1)
  }

 }

 I get an error message saying - invalid 'pattern' argument. -* Error
 in
 agrep(test1[i], test1[j], ignore.case = TRUE, value = TRUE,
 max.distance
 =
 0.1) : *
 *  invalid 'pattern' argument*

 Test 1 being - c(Vashi, Vashi,navi Mumbai, Thane, Vashi,new
 Mumbai,
 Thana, Surekha, Thane(w), surekhaN)

 This is the first time I'm using agrep, I do not understand how it
 works
 fully...

 Kindly help...

 Thank you.

 Su.

   [[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 https://**stat.ethz.ch/mailman/listinfo/r-helphttps://stat.ethz.ch/mailman/**listinfo/r-help
 
 https://stat.**ethz.ch/**mailman/listinfo/r-**helphttp://ethz.ch/mailman/listinfo/r-**help
 http**s://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 

 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html 
 http://www.R-project.org/posting-guide.htmlhttp://www.R-project.org/**posting-guide.html
 http://www.**R-project.org/posting-guide.**htmlhttp://www.R-project.org/posting-guide.html
 

 and provide commented, minimal, self-contained, reproducible code.


   __**

 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 https://**stat.ethz.ch/mailman/listinfo/r-helphttps://stat.ethz.ch/mailman/**listinfo/r-help
 
 https://stat.**ethz.ch/**mailman/listinfo/r-**helphttp://ethz.ch/mailman/listinfo/r-**help

Re: [R] how to aggregate the dataset

2012-12-13 Thread Rui Barradas


Hello,

maybe something like this?

range - with(dat, paste0([, min_price, ,, max_price, ]))
dat2 - with(dat, data.frame(product = product, VN = mean_price, range = 
range, price_band = price_band))


Unless it's a printing problem and you really want the range below VN.


Hope this helps,

Rui Barradas

Em 13-12-2012 11:24, Tammy Ma escreveu:

Really sorry for messing up.

I want to transform:


product min_price  max_price mean_price country   price_band
11 34 50   40   
  VN   0-300
2210  3015  
   VN   0-300


into


Into:

product   VN   price_band
   11   40 0-300
  [34,50]
   22 15 0-300
 [10,30]


How Can I do this in r?


Kind regards,
Tammy




From: metal_lical...@live.com
To: metal_lical...@live.com
Subject: RE: [R] how to aggregate the dataset
Date: Thu, 13 Dec 2012 14:22:54 +0300





HI,


I want it looks like this:


Into:

product   VN   price_band
   11   40 0-300
  [34,50]
   22 15 0-300
 [10,30]

From: metal_lical...@live.com
To: r-help@r-project.org
Date: Thu, 13 Dec 2012 13:42:35 +0300
Subject: [R] how to aggregate the dataset


HI,



Sorry for messing up..

I want to transform the following dataset:

product min_price  max_price mean_price country   price_band
11   34   50   40 VN 0-300
22   10   30   15 VN 0-300


Into:

product VN   price_band
   11 40 0-300
  [34,50]
   22   15 0-300
  [10,30]


How can I do this in r? I have large dataset like this. I want to transform all 
into that one. Thanks a lot.


Kind regards,
Tammy




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Running MCMC in R

2012-12-13 Thread Whit Armstrong

Why don't you use one of the existing MCMC packages.  There are many to
choose from...


On Wed, Dec 12, 2012 at 10:49 PM, Chenyi Pan cp...@virginia.edu wrote:

 Dear all
 I am now running a MCMC iteration in the R program. But it is always
 stucked in some loop. This cause big problems for my research. So I want to
 know whether we can skip the current dataset and move to next simulated
 data when the iteration is stucked? Alternatively, can the MCMC chain skip
 the current iteration when it is stucked and automatically to start another
 chain with different starting values.

 I am looking forward to your reply.

 Best,
 Chenyi

 --
 Chenyi Pan
 Department of Statisitics
 Graduate School of Arts and Sciences, University of Virginia
 Tel: 434-466-9209

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] max_prepared_stmt_count exceeded using RODBC + 64-bit win7

2012-12-13 Thread Giles

Hi

I am running R2.15.2 64-bit on Windows 7, using RODBC 1.3-6, MySQL5.5.20,
MySQL Connector 5.5.2 - these are the latest 64-bit versions AFAIK.

sqlQuery and sqlSave work fin as expected, but in a long session with a few
sqlSave() calls, I get an error, for example:

Error in sqlSave(channel = channel, dat = USArrests[, 1, drop = FALSE],  :
  HY000 1461 [MySQL][ODBC 5.2(w) Driver][mysqld-5.5.20]Can't create more
than max_prepared_stmt_count statements (current value: 16384)
[RODBC] ERROR: Could not SQLPrepare 'INSERT INTO `usarrests` ( `murder` )
VALUES ( ? )'

In my setup the MySQL global variable max_prepared_stmt_count has the
default setting of 16K.  If I reset the variable higher, I can run a while
longer, but this is not a permanent solution.

Digging around for a solution, I see that the following may cast some
light:

show global status like 'com_stmt%';

+-+---+
| Variable_name   | Value |
+-+---+
| Com_stmt_close  | 0 |
| Com_stmt_execute| 49931 |
| Com_stmt_fetch  | 0 |
| Com_stmt_prepare| 36|
| Com_stmt_reprepare  | 0 |
| Com_stmt_reset  | 36|
| Com_stmt_send_long_data | 0 |
+-+---+

If I understand right, the number of Com_stmt_close should be 'close to or
equal to' Com_stmt_execute, but is not.  Rolling back to all 32-bit R2.13.2
etc does work, all entries in the table above remaining at zero,

The number Com_stmt_execute increases with each row written using
sqlSave(), but does not increase if I use sqlQuery()

#This causes Com_stmt_execute to increase 50:
sqlQuery(channel=channel,query=DROP TABLE IF EXISTS USArrests)
sqlSave(channel=channel, dat=USArrests[,1,drop=FALSE],rownames=FALSE)

#This causes no change in Com_stmt_execute :
sqlQuery(channel=channel,query=INSERT INTO USArrests (murder) values (1))

This behaviour did not occur with R2.13.2  RODBC 1.3-3 32-bit.

I could just revert one thing at a time to narrow it down but if anyone can
offer a shortcut I'd be delighted.

Thanks

Giles Heywood

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Installing Packages from a Local Repository

2012-12-13 Thread Tommy O'Dell

Hi everyone,

I've followed the instructions from R-Admin Section 6.6 for creating a
local repository. I've modified my Rprofile.site file to add the local
repository to my repos, but I haven't been able to successfully install my
package from the repo.

Here's the code that I've run.

##
sessionInfo()
getOption(repos)
setwd(Q:/Integrated Planning/R)
list.files(path = ., recursive = TRUE)
tools::write_PACKAGES(bin/windows/contrib/2.15, type = win.binary)
list.files(path = ., recursive = TRUE)

install.packages(RTIO)
install.packages(RTIO, repos = Q:/Integrated Planning/R)
install.packages(RTIO, repos = Q:/Integrated Planning/R, type =
win.binary)

unlink(c(bin/windows/contrib/2.15/PACKAGES,bin/windows/contrib/2.15/PACKAGES.gz))



And here it is with output included:
###
 sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252
 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
LC_TIME=English_Australia.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.15.1
 getOption(repos)
CRANCRANextra
   MyLocal
http://cran.ms.unimelb.edu.au/; http://www.stats. ox.ac.uk/pub/RWin
   file://Q:/Integrated Planning/R
 setwd(Q:/Integrated Planning/R)
 list.files(path = ., recursive = TRUE)
[1] bin/windows/contrib/2.15/RTIO_0.1-2.zip
 tools::write_PACKAGES(bin/windows/contrib/2.15, type = win.binary)
 list.files(path = ., recursive = TRUE)
[1] bin/windows/contrib/2.15/PACKAGES
bin/windows/contrib/2.15/PACKAGES.gz
bin/windows/contrib/2.15/RTIO_0.1-2.zip

 install.packages(RTIO)
Installing package(s) into C:/Program Files/R/R-2.15.1/library
(as lib is unspecified)
Warning in install.packages :
  cannot open compressed file '//Q:/Integrated
Planning/R/bin/windows/contrib/2.15/PACKAGES', probable reason 'No such
file or directory'
Error in install.packages : cannot open the connection
 install.packages(RTIO, repos = Q:/Integrated Planning/R)
Installing package(s) into C:/Program Files/R/R-2.15.1/library
(as lib is unspecified)
Warning in install.packages :
  unable to access index for repository Q:/Integrated
Planning/R/bin/windows/contrib/2.15
Warning in install.packages :
  package RTIO is not available (for R version 2.15.1)
 install.packages(RTIO, repos = Q:/Integrated Planning/R, type =
win.binary)
Installing package(s) into C:/Program Files/R/R-2.15.1/library
(as lib is unspecified)
Warning in install.packages :
  unable to access index for repository Q:/Integrated
Planning/R/bin/windows/contrib/2.15
Warning in install.packages :
  package RTIO is not available (for R version 2.15.1)


unlink(c(bin/windows/contrib/2.15/PACKAGES,bin/windows/contrib/2.15/PACKAGES.gz))

###

I'd really like to be able to use install.packages(RTIO) without having
to specify the repo, as this will make it easy for our other less
experienced R users.

Any ideas why I get warning: cannot open compressed file and error:
cannot open the connection? As far as I can tell, I've followed the
R-Admin 6.6 instructions exactly.

If it matters, Q: is a mapped network drive.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] neural net

2012-12-13 Thread dada

Thanks for your reply. I have compared my data with some other which works
and I cannot see the difference... 

The structure of my data is shown below: 

 str(data)
'data.frame':   19 obs. of  7 variables:
 $ drug: Factor w/ 19 levels A,B,C,D,..: 1 2 3 4 5 6 7 8 9 10
...
 $ param1  : int  111 347 335 477 863 737 390 209 376 262 ...
 $ param2 : int  15 13 9 37 24 28 63 93 72 16 ...
 $ param3 : int  125 280 119 75 180 150 167 200 201 205 ...
 $ param4 : int  40 55 89 2 10 15 12 48 45 49 ...
 $ param5 : num  0.5 3 -40 0 5 6 0 45 -60 25 ...
 $ Class   : int  1 2 1 1 2 2 3 3 3 3 ...

 summary(data)
  drugparam1 param2 param3 param4   
  
param5 Class  
 A  : 1   Min.   :111.0   Min.   : 2.0   Min.   : 75.0   Min.   :-20.00  
Min.   :-60.000   Min.   :1.000  
 B  : 1   1st Qu.:253.5   1st Qu.:15.0   1st Qu.:132.5   1st Qu.: 12.00  
1st Qu.:  0.000   1st Qu.:1.000  
 C  : 1   Median :335.0   Median :28.0   Median :164.0   Median : 40.00  
Median :  6.000   Median :2.000  
 D  : 1   Mean   :383.0   Mean   :33.0   Mean   :166.0   Mean   : 35.26  
Mean   :  4.447   Mean   :1.895  
 E  : 1   3rd Qu.:433.5   3rd Qu.:42.5   3rd Qu.:200.5   3rd Qu.: 54.00  
3rd Qu.: 20.500   3rd Qu.:2.000  
 F  : 1   Max.   :863.0   Max.   :93.0   Max.   :280.0   Max.   : 89.00  
Max.   : 45.000   Max.   :3.000  
 (Other):13

The structure of the example data which worked is shown below: 

 str(infert)
'data.frame':   248 obs. of  8 variables:
 $ education : Factor w/ 3 levels 0-5yrs,6-11yrs,..: 1 1 1 1 2 2 2 2
2 2 ...
 $ age   : num  26 42 39 34 35 36 23 32 21 28 ...
 $ parity: num  6 1 6 4 3 4 1 2 1 2 ...
 $ induced   : num  1 1 2 2 1 2 0 0 0 0 ...
 $ case  : num  1 1 1 1 1 1 1 1 1 1 ...
 $ spontaneous   : num  2 0 0 0 1 1 0 0 1 0 ...
 $ stratum   : int  1 2 3 4 5 6 7 8 9 10 ...
 $ pooled.stratum: num  3 1 4 2 32 36 6 22 5 19 ...

 summary(infert)
   educationageparity inducedcase   
 
spontaneousstratum  pooled.stratum 
 0-5yrs : 12   Min.   :21.00   Min.   :1.000   Min.   :0.   Min.  
:0.   Min.   :0.   Min.   : 1.00   Min.   : 1.00  
 6-11yrs:120   1st Qu.:28.00   1st Qu.:1.000   1st Qu.:0.   1st
Qu.:0.   1st Qu.:0.   1st Qu.:21.00   1st Qu.:19.00  
 12+ yrs:116   Median :31.00   Median :2.000   Median :0.   Median
:0.   Median :0.   Median :42.00   Median :36.00  
   Mean   :31.50   Mean   :2.093   Mean   :0.5726   Mean  
:0.3347   Mean   :0.5766   Mean   :41.87   Mean   :33.58  
   3rd Qu.:35.25   3rd Qu.:3.000   3rd Qu.:1.   3rd
Qu.:1.   3rd Qu.:1.   3rd Qu.:62.25   3rd Qu.:48.25  
   Max.   :44.00   Max.   :6.000   Max.   :2.   Max.  
:1.   Max.   :2.   Max.   :83.00   Max.   :63.00  

So still not sure how to solve the problem  




--
View this message in context: 
http://r.789695.n4.nabble.com/neural-net-tp4652927p4652984.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to create multiple country's data into multiple sheets of one excel


HI, 


I have large dataset of many countries. I have written the program to run 
through each country to generate one output for each country. I want to put the 
output like this:

one sheet has output for one country. How do I achieve it by r.

I have tried this:
library(xlsx)
write.xlsx(nnn, vn.xlsx, sheetName=Sheet1)   [1]

but when I change sheetName=Sheet2 to add up another country into one sheet. 
it autimatically deleted which I have down on [1].

index-unique(dataset$country)
for (i in 1:length(index)){

data-dataset[dataset$country==index[i],]
(...)
output-dd
#then how do I create each country's output into one sheet of one excel???

}


Kind regards,
Tammy
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] remove NA in df results in NA, NA.1 ... rows

2012-12-13 Thread jim holtman

You can use complete.cases:

df - df[complete.cases(df), ]


On Thu, Dec 13, 2012 at 3:20 AM,  raphael.fel...@art.admin.ch wrote:
 Good morning!

 I have the following data frame (df):

 X.outer  Y.outer   X.PAD1   Y.PAD1   X.PAD2 Y.PAD2   X.PAD3 Y.PAD3   
 X.PAD4 Y.PAD4
 73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674 
 574747.1 179598
 74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699 
 574724.9 179673
 75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787 
 574729.2 179674
 76 574663.6 179736.0 574699.1 179734.0 574723.5 179678 574703.4 179760 
 574831.8 179699
 77 574649.9 179734.0 574704.7 179724.0 574724.9 179673 574702.4 179755 
 574852.3 179626
 78 574647.3 179742.0 574706.9 179719.0 574747.1 179598 574702.0 179754 
 574747.1 179598
 79 574633.6 179739.0 574711.4 179710.0 574641.8 179570 574698.0 179747   
 NA NA
 80 574634.9 179732.0 574716.6 179698.0 574639.6 179573 574700.2 179738   
 NA NA
 81 574616.5 179728.6 574716.7 179695.0 574618.3 179650 574704.4 179729   
 NA NA
 82 574615.4 179731.0 574718.2 179690.0   NA NA 574708.1 179724   
 NA NA
 83 574614.4 179733.6 574719.1 179688.0   NA NA 574709.3 179720   
 NA NA
 ...

 44 574702.0 179754.0   NA   NA   NA NA   NA NA   
 NA NA

 45 574695.1 179751.0   NA   NA   NA NA   NA NA   
 NA NA

 46 574694.4 179752.0   NA   NA   NA NA   NA NA   
 NA NA

 Which I subset to

 df2 - df[,c(X.PAD2,Y.PAD2)]

 df2

  X.PAD2 Y.PAD2

 73 574618.3 179650

 74 574719.0 179688

 75 574719.0 179688

 76 574723.5 179678

 77 574724.9 179673

 78 574747.1 179598

 79 574641.8 179570

 80 574639.6 179573

 81 574618.3 179650

 82   NA NA

 83   NA NA

 ...

 44   NA NA

 45   NA NA

 46   NA NA





 followed by removing the NA's using



 df2 - df2[!is.na(df2),]



 If I now call df2, I get:



X.PAD2 Y.PAD2

 73   574618.3 179650

 74   574719.0 179688

 75   574719.0 179688

 76   574723.5 179678

 77   574724.9 179673

 78   574747.1 179598

 79   574641.8 179570

 80   574639.6 179573

 81   574618.3 179650

 NA NA NA

 NA.1   NA NA

 NA.2   NA NA

 NA.3   NA NA

 NA.4   NA NA

 NA.5   NA NA

 NA.6   NA NA

 NA.7   NA NA

 NA.8   NA NA



 It seems there are still NA's in my data frame. How can I get rid of them? 
 What is the meaning of the rows numbered NA, NA.1 and so on?



 Thanks for any hints.



 Best regards



 Raphael Felber


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to create multiple country's data into multiple sheets of one excel

2012-12-13 Thread Anthony Damico

use append = TRUE inside your write.xlsx() function


On Thu, Dec 13, 2012 at 7:52 AM, Tammy Ma metal_lical...@live.com wrote:


 HI,


 I have large dataset of many countries. I have written the program to run
 through each country to generate one output for each country. I want to put
 the output like this:

 one sheet has output for one country. How do I achieve it by r.

 I have tried this:
 library(xlsx)
 write.xlsx(nnn, vn.xlsx, sheetName=Sheet1)   [1]

 but when I change sheetName=Sheet2 to add up another country into one
 sheet. it autimatically deleted which I have down on [1].

 index-unique(dataset$country)
 for (i in 1:length(index)){

 data-dataset[dataset$country==index[i],]
 (...)
 output-dd
 #then how do I create each country's output into one sheet of one excel???

 }


 Kind regards,
 Tammy

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to create multiple country's data into multiple sheets of one excel

2012-12-13 Thread jim holtman

I use the XLConnect package to write out multiple sheets to an Excel workbook

On Thu, Dec 13, 2012 at 7:52 AM, Tammy Ma metal_lical...@live.com wrote:

 HI,


 I have large dataset of many countries. I have written the program to run 
 through each country to generate one output for each country. I want to put 
 the output like this:

 one sheet has output for one country. How do I achieve it by r.

 I have tried this:
 library(xlsx)
 write.xlsx(nnn, vn.xlsx, sheetName=Sheet1)   [1]

 but when I change sheetName=Sheet2 to add up another country into one 
 sheet. it autimatically deleted which I have down on [1].

 index-unique(dataset$country)
 for (i in 1:length(index)){

 data-dataset[dataset$country==index[i],]
 (...)
 output-dd
 #then how do I create each country's output into one sheet of one excel???

 }


 Kind regards,
 Tammy

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] VarimpAUC in Party Package

2012-12-13 Thread S Ellison

  Error: could not find function varimpAUC
  
  Was this function NOT included in the Windows binary I 
 downloaded and installed? 

Which windows binary are you talking about? The R installer, the Party .zip or 
something else?

S Ellison

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Running MCMC in R

2012-12-13 Thread Bert Gunter

?try
?tryCatch

(if the suggestion to use an MCMC package does not fix your problem).

-- Bert

On Wed, Dec 12, 2012 at 7:49 PM, Chenyi Pan cp...@virginia.edu wrote:

Dear all
I am now running a MCMC iteration in the R program. But it is always
stucked in some loop. This cause big problems for my research. So I want to
know whether we can skip the current dataset and move to next simulated
data when the iteration is stucked? Alternatively, can the MCMC chain skip
the current iteration when it is stucked and automatically to start another
chain with different starting values.

I am looking forward to your reply.

Best,
Chenyi

--
Chenyi Pan
Department of Statisitics
Graduate School of Arts and Sciences, University of Virginia
Tel: 434-466-9209

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Webinar: Advances in Gradient Boosting: the Power of Post-Processing. TOMORROW, 10-11 a.m., PST

2012-12-13 Thread Lisa Solomon

Webinar: Advances in Gradient Boosting: the Power of Post-Processing

TOMORROW: December 14, 10-11 a.m., PST



Webinar Registration: 
http://2.salford-systems.com/gradientboosting-and-post-processing/


Course Outline:
I. Gradient Boosting and Post-Processing:
o What is missing from Gradient Boosting?
o Why post-processing techniques are used?

II. Applications Benefiting from Post-Processing: Examples from a variety of 
industries.
o Financial Services
o Biomedical
o Environmental
o Manufacturing
o Adserving

III. Typical Post-Processing Steps

IV. Techniques:
o Generalized Path Seeker (GPS): modern high-speed LASSO-style regularized 
regression.
o Importance Sampled Learning Ensembles (ISLE): identify and reweight the most 
influential trees.
o Rulefit: ISLE on steroids. Identify the most influential nodes and rules.

V. Case Study Example:
o Output/Results without Post-Processing
o Output/Results with Post-Processing
o Demo


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Running MCMC in R

2012-12-13 Thread S Ellison


 I am now running a MCMC iteration in the R program. But it is 
 always stucked in some loop. 
I never like it when folk just say Please read and follow the posting guide 
referenced in every R help email but ... please, read and follow the posting 
guide referenced in every R help email if you want any useful kind of answer.

In the mean time, it would have helped a lot to say
i) Is this a problem in your own code or in a contributed or core package?
ii) If not your own, in which package and which function?
iii) What does 'stuck' mean? Failing to converge (and according to what 
criterion)? Failing to complete a prescribed number of iterations? Failing to 
start? 

and also if you could have provided data and an example that would let someone 
see what is going on instead of trying to guess.

Failing that, however, the answers to your questions are

 So I want to know whether we can skip the current 
 dataset and move to next simulated data when the iteration is 
 stucked? 
Yes, though the method of doing so will depend entirely on which function and 
which package you are using.

and
 Alternatively, can the MCMC chain skip the current 
 iteration when it is stucked and automatically to start 
 another chain with different starting values.
Possibly, though the method of doing so will depend entirely on which function 
and which package you are using.

S

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] abline of an lm fit not correct

2012-12-13 Thread Robert U

Hello fellow
R-users,
Â 
Iâm stuck
with something i think is pretty stupid, but i canât find out where iâm 
wrong,
and itâs turning me crazy! 
Â 
I am doing
a very simple linear regression with Northing/Easting data, then I plot the
data as well as the regression line : 
Â 
 plot(x=Dataset$EASTING,
y=Dataset$NORTHING)
 fit - lm(formula = NORTHING ~ EASTING,
data = Dataset)
 abline(fit)
 fit
Â 
Call:
lm(formula = NORTHING ~ EASTING, data =
Dataset)
Â 
Coefficients:
(Intercept)Â Â Â Â Â  EASTINGÂ  
Â  5.376e+05Â Â Â  4.692e-02Â  
Â 
Later on, when I use the
command âablineâ with the coefficient provided by âsummary(fit)â, the 
line is
not the same as abline(fit) !
Â 
To summarize,
those two lines are different: 
Â 
 abline(fit)

abline(5.376e+05, 4.692e-02)
Â 
The âbâ coefficients
appear equal, but the intercepts are different.
Â 
Where am I missing
something? L
Â 
Thanks
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] abline of an lm fit not correct

2012-12-13 Thread Sarah Goslee

You don't provide a reproducible example, but my first guess is that the
print method is rounding what appears on the screen, so you aren't actually
using the slope and intercept. See ?print.default and the digits argument
under ?options for more.

Why do you need to copy and paste the coefficients? Just to check
your understanding?

Sarah

On Thursday, December 13, 2012, Robert U wrote:

 Hello fellow
 R-users,

 Im stuck
 with something i think is pretty stupid, but i cant find out where im
 wrong,
 and its turning me crazy!

 I am doing
 a very simple linear regression with Northing/Easting data, then I plot the
 data as well as the regression line :

  plot(x=Dataset$EASTING,
 y=Dataset$NORTHING)
  fit - lm(formula = NORTHING ~ EASTING,
 data = Dataset)
  abline(fit)
  fit

 Call:
 lm(formula = NORTHING ~ EASTING, data =
 Dataset)

 Coefficients:
 (Intercept)  EASTING
   5.376e+054.692e-02

 Later on, when I use the
 command abline with the coefficient provided by summary(fit), the line
 is
 not the same as abline(fit) !

 To summarize,
 those two lines are different:

  abline(fit)
 
 abline(5.376e+05, 4.692e-02)

 The b coefficients
 appear equal, but the intercepts are different.

 Where am I missing
 something? L

 Thanks
 [[alternative HTML version deleted]]



-- 
Sarah Goslee
http://www.stringpage.com
http://www.sarahgoslee.com
http://www.functionaldiversity.org

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Pairwise deletion in a linear regression and in a GLM ?

2012-12-13 Thread Arnaud Mosnier

Dear useRs,

In a thesis, I found a mention of the use of pairwise deletion in linear
regression and GLM (binomial family).
The author said that he has used R to do the statistics, but I did not find
the option allowing pairwise deletion in both lm and glm functions. Is
there somewhere a package allowing that ?

Thanks,

Arnaud

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] recursion depth limitations

2012-12-13 Thread Suzen, Mehmet

Hello List,

I am aware that one can set with recursion depth 'options(expressions
= #)', but it has 500K limit. Why do we have a 500K limit on this?
While some algorithms are only solvable  feasibility with recursion
and 500K sounds not too much i.e. graph algorithms
for example dependency trees with large nodes easily reach to that number.

Best,
-m

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pairwise deletion in a linear regression and in a GLM ?

2012-12-13 Thread Jose Iparraguirre

Hi Arnaud,

A quick help search of lm or glm tells you that 'the factory-fresh default is 
na.omit'.
If you then look up 'na.omit', you'll read that it 'returns the object with 
incomplete cases removed'.
So, pairwise deletion is the default option in both lm and glm.

On a related note, it goes without saying that pairwise deletion is not good 
practice in most cases, and that R has ways to impute these missing cases 
depending on assumptions regarding the cause or nature of their missingness.

Regards,

José


José Iparraguirre
Chief Economist
Age UK




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Arnaud Mosnier
Sent: 13 December 2012 15:40
To: r-help@r-project.org
Subject: [R] Pairwise deletion in a linear regression and in a GLM ?

Dear useRs,

In a thesis, I found a mention of the use of pairwise deletion in linear
regression and GLM (binomial family).
The author said that he has used R to do the statistics, but I did not find
the option allowing pairwise deletion in both lm and glm functions. Is
there somewhere a package allowing that ?

Thanks,

Arnaud

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Wrap Up and Run 10k is back! 

Also, new for 2013 – 2km intergenerational walks at selected venues. So recruit 
a buddy, dust off the trainers and beat the winter blues by 
signing up now:

http://www.ageuk.org.uk/10k

 Milton Keynes | Oxford | Sheffield | Crystal Palace | Exeter | 
Harewood House, Leeds | 
 Tatton Park, Cheshire | Southampton | Coventry



Age UK Improving later life

http://www.ageuk.org.uk


 

---
Age UK is a registered charity and company limited by guarantee, (registered 
charity number 1128267, registered company number 6825798). 
Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.

For the purposes of promoting Age UK Insurance, Age UK is an Appointed 
Representative of Age UK Enterprises Limited, Age UK is an Introducer 
Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth 
Access for the purposes of introducing potential annuity and health 
cash plans customers respectively.  Age UK Enterprises Limited, JLT Benefit 
Solutions Limited and Simplyhealth Access are all authorised and 
regulated by the Financial Services Authority. 
--

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are 
addressed. If you receive a message in error, please advise the sender and 
delete immediately.

Except where this email is sent in the usual course of our business, any 
opinions expressed in this email are those of the author and do not 
necessarily reflect the opinions of Age UK or its subsidiaries and associated 
companies. Age UK monitors all e-mail transmissions passing 
through its network and may block or modify mails which are deemed to be 
unsuitable.

Age Concern England (charity number 261794) and Help the Aged (charity number 
272786) and their trading and other associated companies merged 
on 1st April 2009.  Together they have formed the Age UK Group, dedicated to 
improving the lives of people in later life.  The three national 
Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help 
the Aged in these nations to form three registered charities: 
Age Scotland, Age NI, Age Cymru.










__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to select a subset data to do a barplot in ggplot2

2012-12-13 Thread Yao He

Hi,everybody

I have a dataframe like this

FID IID STATUS
14621live
14628dead
24631live
24632live
24633live
24634live
64675live
64679dead
104716dead
104719live
104721dead
114726live
114728nosperm
114730nosperm
124732live
174783live
174783live
174784live

I just want a barblot to count live or dead in every FID, and fill
the bar with different colour.

I try these codes:

p-ggplot(data,aes(x=FID));
p+geom_bar(aes(x=factor(FID),y=..count..,fill=STATUS))

But how could I exclude nosperm or other levels just in the use of
ggplot2 without generating another dataframe

Thanks a lot

Yao He

Master candidate in 2rd year
Department of Animal genetics  breeding
Room 436,College of Animial ScienceTechnology,
China Agriculture University,Beijing,100193
E-mail: yao.h.1...@gmail.com ming...@vt.edu


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] remove NA in df results in NA, NA.1 ... rows

Hi,

You could use either:
?na.omit() #the option was already suggested
#or
df2[complete.cases(df2),]

#In this case, this should also work
sapply(df2,function(x) x[!is.na(x)]) 
#or
 apply(df2,2,function(x) x[!is.na(x)]) #If the NAs are not in the same rows, 
then the ouptut will be a list with the list elements differ in length.
A.K.






- Original Message -
From: raphael.fel...@art.admin.ch raphael.fel...@art.admin.ch
To: r-help@r-project.org
Cc: 
Sent: Thursday, December 13, 2012 3:20 AM
Subject: [R] remove NA in df results in NA, NA.1 ... rows

Good morning!

I have the following data frame (df):

    X.outer  Y.outer   X.PAD1   Y.PAD1   X.PAD2 Y.PAD2   X.PAD3 Y.PAD3   X.PAD4 
Y.PAD4
73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674 574747.1 
179598
74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699 574724.9 
179673
75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787 574729.2 
179674
76 574663.6 179736.0 574699.1 179734.0 574723.5 179678 574703.4 179760 574831.8 
179699
77 574649.9 179734.0 574704.7 179724.0 574724.9 179673 574702.4 179755 574852.3 
179626
78 574647.3 179742.0 574706.9 179719.0 574747.1 179598 574702.0 179754 574747.1 
179598
79 574633.6 179739.0 574711.4 179710.0 574641.8 179570 574698.0 179747       
NA     NA
80 574634.9 179732.0 574716.6 179698.0 574639.6 179573 574700.2 179738       
NA     NA
81 574616.5 179728.6 574716.7 179695.0 574618.3 179650 574704.4 179729       
NA     NA
82 574615.4 179731.0 574718.2 179690.0       NA     NA 574708.1 179724       
NA     NA
83 574614.4 179733.6 574719.1 179688.0       NA     NA 574709.3 179720       
NA     NA
...

44 574702.0 179754.0       NA       NA       NA     NA       NA     NA       
NA     NA

45 574695.1 179751.0       NA       NA       NA     NA       NA     NA       
NA     NA

46 574694.4 179752.0       NA       NA       NA     NA       NA     NA       
NA     NA

Which I subset to

df2 - df[,c(X.PAD2,Y.PAD2)]

df2

     X.PAD2 Y.PAD2

73 574618.3 179650

74 574719.0 179688

75 574719.0 179688

76 574723.5 179678

77 574724.9 179673

78 574747.1 179598

79 574641.8 179570

80 574639.6 179573

81 574618.3 179650

82       NA     NA

83       NA     NA

...

44       NA     NA

45       NA     NA

46       NA     NA





followed by removing the NA's using



df2 - df2[!is.na(df2),]



If I now call df2, I get:



       X.PAD2 Y.PAD2

73   574618.3 179650

74   574719.0 179688

75   574719.0 179688

76   574723.5 179678

77   574724.9 179673

78   574747.1 179598

79   574641.8 179570

80   574639.6 179573

81   574618.3 179650

NA         NA     NA

NA.1       NA     NA

NA.2       NA     NA

NA.3       NA     NA

NA.4       NA     NA

NA.5       NA     NA

NA.6       NA     NA

NA.7       NA     NA

NA.8       NA     NA



It seems there are still NA's in my data frame. How can I get rid of them? What 
is the meaning of the rows numbered NA, NA.1 and so on?



Thanks for any hints.



Best regards



Raphael Felber


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to aggregate the dataset

Hi,
You could try this:
dat3-read.table(text=
product min_price  max_price mean_price country  price_band
11  34  50  40    VN    0-300
22  10  30  15    VN    0-300
,sep=,header=TRUE,stringsAsFactors=FALSE)
library(reshape2)
SubsetPrice-dat3[grep(price,names(dat3))]
dat3$newPrice-paste(SubsetPrice[,3],paste([,SubsetPrice[,1],,,SubsetPrice[,2],],sep=),sep=
 )
 dcast(dat3,product+price_band~country,value.var=newPrice)
#  product price_band VN
#1  11  0-300 40 [34,50]
#2  22  0-300 15 [10,30]
A.K.




- Original Message -
From: Tammy Ma metal_lical...@live.com
To: r-help@r-project.org r-help@r-project.org
Cc: 
Sent: Thursday, December 13, 2012 5:42 AM
Subject: [R] how to aggregate the dataset


HI,



Sorry for messing up..

I want to transform the following dataset:

product min_price  max_price mean_price country   price_band
11           34           50               40             VN         0-300
22           10           30               15             VN         0-300


Into:

product   VN           price_band
  11        40            0-300
            [34,50]
  22        15            0-300
            [10,30]


How can I do this in r? I have large dataset like this. I want to transform all 
into that one. Thanks a lot.


Kind regards,
Tammy



                          
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Repeat elements of matrix based on vector counts

2012-12-13 Thread Sarah Haas

I have two dataframes (df) that share a column header (plot.id).  In the
1st df, plot.id records are repeated a variable number of times based on
the number of trees monitored within each plot. The 2nd df only has a
single record for each plot.id, and contains a variable named load that
is collected at the plot-level and is only listed once per plot record.

*OBJECTIVE:*  I need to repeat the load values from the 2nd df based on
how many times plot.id is repeated in the 1st df (all plots are repeated
a different number of times). My example dfs are below:


 df1 - data.frame(plot.id = rep(c(plot1, plot2, plot3),
c(3,2,5)),
 tree.tag = c(111,112,113,222,223,333,334,335,336,337))

 df2 - data.frame(plot.id = c(plot1, plot2, plot3), load=c(17,
6, 24))


I have gotten close to solving this, but alas I'm on day 2 of
problem-shooting and can't get it! Thanks for any help you might provide.

--Sarah

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How do I make a loop to extract a column from multiple lists and then bind them together to make a new matrix?

2012-12-13 Thread Corinne Lapare

Hi!  I am new to looping and R in general; and I have sent wy to much
time on this one problem and am about a hair away from doing it manually
for the next two days.



So, there is a package that while calculating the statistic creates lists
(that look like matrices) in the background.  Each item (there are 10
items) has one of these matrix looking list that I need to extract data
from. The list has 5 rows that represent 5 groups, and 8 columns.  I need
to extract 3 of the columns (Lo Score=[,2], Hi Score=[,3], and
Mean=[,7]) for each of the items.  I then want to turn the extracted data
into 3 matrices (Lo Score, Hi Score, and Mean) where the rows are the
5 groups and the columns are items 1-10.



This is how I can create the mean matrix by hand.  MDD.mean.s10 is the
matrix I want in the end.  (notice the first bracket after $results is the
only part that changes 1-10 (to represent the 10 items) and the last
bracket is [,7] to represent the mean located in column 7)

  m.1a - MC_MDD.noNA$results[[1]][[2]][,7]

  m.2b - MC_MDD.noNA$results[[2]][[2]][,7]

  m.3c - MC_MDD.noNA$results[[3]][[2]][,7]

  m.4d - MC_MDD.noNA$results[[4]][[2]][,7]

  m.5e - MC_MDD.noNA$results[[5]][[2]][,7]

  m.6f - MC_MDD.noNA$results[[6]][[2]][,7]

  m.7g - MC_MDD.noNA$results[[7]][[2]][,7]

  m.8h - MC_MDD.noNA$results[[8]][[2]][,7]

  m.9i - MC_MDD.noNA$results[[9]][[2]][,7]

  m.10j - MC_MDD.noNA$results[[10]][[2]][,7]

 MDD.mean.s10 - cbind(m.1a, m.2b, m.3c, m.4d, m.5e, m.6f, m.7g, m.8h,
m.9i, m.10j)



 MDD.mean.s10

  m.1a  m.2b  m.3c  m.4d  m.5e  m.6f  m.7g
m.8h  m.9i m.10j

[1,] 0.8707865 0.7393939 0.7769231 0.7591241 0.853 0.7925926 0.8258065
0.8410596 0.8843931 0.5638298

[2,] 0.8323353 0.7302632 0.5913978 0.5868263 0.6923077 0.6182796 0.6964286
0.6839080 0.7911392 0.3212121

[3,] 0.8726115 0.7159763 0.7117647 0.6163522 0.7987805 0.7105263 0.7613636
0.7674419 0.8034682 0.4011299

[4,] 0.9024390 0.7894737 0.7795276 0.6530612 0.8593750 0.7112676 0.8672566
0.8629032 0.9152542 0.4834437

[5,] 0.986 0.9102564 0.8452381 0.8160920 0.9726027 0.8658537 0.8352941
0.9342105 0.947 0.6454545





But I cant do this by hand every time, as this comes up over and over and
over again in multiple lists.

I have figured out how to loop this procedure and name the vector as it
goes along:

 for(i in 1:10){

+   assign(paste(m, i, sep = ), MC_MDD.noNA$results[[i]][[2]][,7])

+  }





  m1

[1] 0.8707865 0.8323353 0.8726115 0.9024390 0.986

  m2

[1] 0.7393939 0.7302632 0.7159763 0.7894737 0.9102564

  m3

[1] 0.7769231 0.5913978 0.7117647 0.7795276 0.8452381

  m4

[1] 0.7591241 0.5868263 0.6163522 0.6530612 0.8160920

  m5

[1] 0.853 0.6923077 0.7987805 0.8593750 0.9726027

  m6

[1] 0.7925926 0.6182796 0.7105263 0.7112676 0.8658537

  m7

[1] 0.8258065 0.6964286 0.7613636 0.8672566 0.8352941

  m8

[1] 0.8410596 0.6839080 0.7674419 0.8629032 0.9342105

  m9

[1] 0.8843931 0.7911392 0.8034682 0.9152542 0.947

  m10

[1] 0.5638298 0.3212121 0.4011299 0.4834437 0.6454545





Now here where I get stuck how do I cbind these vectors without typing it
out expliciity? ie. mean.MDD - cbind(m1,m2,m3,m4,m5,m6,m7,m8,m9,10)



Everything I have tried keeps overwriting the data instead of building a
matrix.  Basically I, start with a matrix (5x10) of zeros.  Then I wind up
with a few values in the beginning, but the rest is still zeros.

Example of terrible code:

 fo - matrix(0,5,10)

  colnames(fo) - paste('f', 1:10, sep = )

  fo

 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10

[1,]  0  0  0  0  0  0  0  0  0   0

[2,]  0  0  0  0  0  0  0  0  0   0

[3,]  0  0  0  0  0  0  0  0  0   0

[4,]  0  0  0  0  0  0  0  0  0   0

[5,]  0  0  0  0  0  0  0  0  0   0

  for(i in 1:10){

+   fo - assign(paste(f, i, sep = ), MC_MDD.noNA$results[[i]][[2]][,7])

+  }

  fo

[1] 0.5638298 0.3212121 0.4011299 0.4834437 0.6454545



 fo - matrix(0,5,10)

  colnames(fo) - paste('f', 1:10, sep = )

  fo

 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10

[1,]  0  0  0  0  0  0  0  0  0   0

[2,]  0  0  0  0  0  0  0  0  0   0

[3,]  0  0  0  0  0  0  0  0  0   0

[4,]  0  0  0  0  0  0  0  0  0   0

[5,]  0  0  0  0  0  0  0  0  0   0

  for(i in 1:10){

+   fo - cbind(assign(paste(f, i, sep = ),
MC_MDD.noNA$results[[i]][[2]][,7]))

+  }

  fo

  [,1]

[1,] 0.5638298

[2,] 0.3212121

[3,] 0.4011299

[4,] 0.4834437

[5,] 0.6454545









Thanks for your help in advance!!!  (c:

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Running MCMC in R

2012-12-13 Thread William Dunlap

And if by stuck you mean taking too long a time you can generate an error 
at a given
time limit by using setTimeLimit() and tryCatch() or try() can catch that 
error. E.g.

timeOut - function (expr, cpu = Inf, elapsed = Inf) 
   {
   setTimeLimit(cpu = cpu, elapsed = elapsed, transient = TRUE)
   on.exit(setTimeLimit()) # should not be needed, but averts future error 
message
   expr
   }
timeOut({s-0 ; for(i in 1:1e7)s - s + 1/i ; s}, elapsed=1)
   Error: reached elapsed time limit
timeOut({s-0 ; for(i in 1:1e7)s - s + 1/i ; s}, elapsed=10) # log(1e7) + 
gamma
   [1] 16.69531
tryCatch(timeOut({s-0 ; for(i in 1:1e7)s - s + 1/i ; s}, elapsed=1),
   +  error = function(e) NA_real_)
   [1] NA

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Bert Gunter
 Sent: Thursday, December 13, 2012 6:21 AM
 To: Chenyi Pan
 Cc: R-help@r-project.org
 Subject: Re: [R] Running MCMC in R
 
 ?try
 ?tryCatch
 
 (if the suggestion to use an MCMC package does not fix your problem).
 
 -- Bert
 
 On Wed, Dec 12, 2012 at 7:49 PM, Chenyi Pan cp...@virginia.edu wrote:
 
  Dear all
  I am now running a MCMC iteration in the R program. But it is always
  stucked in some loop. This cause big problems for my research. So I want to
  know whether we can skip the current dataset and move to next simulated
  data when the iteration is stucked? Alternatively, can the MCMC chain skip
  the current iteration when it is stucked and automatically to start another
  chain with different starting values.
 
  I am looking forward to your reply.
 
  Best,
  Chenyi
 
  --
  Chenyi Pan
  Department of Statisitics
  Graduate School of Arts and Sciences, University of Virginia
  Tel: 434-466-9209
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 --
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
 biostatistics/pdb-ncb-home.htm
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pairwise deletion in a linear regression and in a GLM ?

2012-12-13 Thread Arnaud Mosnier

Hi Jose,

To my perception na.omit is different from a pairwise deletion.
With na.omit, you omit totally that case if there is a missing value for
one of the variable you consider in the model.
In the pairwise deletion, the case with some missing value is kept and
values that are not missing are used in the statistics calculations.

However, I agree that pairwise deletion is not good practice (so it would
be surprising that it is the default in lm !!).
I just when to be able to recalculate the statistics given in this thesis.

Arnaud




2012/12/13 Jose Iparraguirre jose.iparragui...@ageuk.org.uk

 Hi Arnaud,

 A quick help search of lm or glm tells you that 'the factory-fresh
 default is na.omit'.
 If you then look up 'na.omit', you'll read that it 'returns the object
 with incomplete cases removed'.
 So, pairwise deletion is the default option in both lm and glm.

 On a related note, it goes without saying that pairwise deletion is not
 good practice in most cases, and that R has ways to impute these missing
 cases depending on assumptions regarding the cause or nature of their
 missingness.

 Regards,

 José


 José Iparraguirre
 Chief Economist
 Age UK




 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Arnaud Mosnier
 Sent: 13 December 2012 15:40
 To: r-help@r-project.org
 Subject: [R] Pairwise deletion in a linear regression and in a GLM ?

 Dear useRs,

 In a thesis, I found a mention of the use of pairwise deletion in linear
 regression and GLM (binomial family).
 The author said that he has used R to do the statistics, but I did not find
 the option allowing pairwise deletion in both lm and glm functions. Is
 there somewhere a package allowing that ?

 Thanks,

 Arnaud

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 Wrap Up and Run 10k is back!

 Also, new for 2013  2km intergenerational walks at selected venues. So
 recruit a buddy, dust off the trainers and beat the winter blues by
 signing up now:

 http://www.ageuk.org.uk/10k

  Milton Keynes | Oxford | Sheffield | Crystal Palace |
 Exeter | Harewood House, Leeds |
  Tatton Park, Cheshire | Southampton |
 Coventry



 Age UK Improving later life

 http://www.ageuk.org.uk




 ---
 Age UK is a registered charity and company limited by guarantee,
 (registered charity number 1128267, registered company number 6825798).
 Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.

 For the purposes of promoting Age UK Insurance, Age UK is an Appointed
 Representative of Age UK Enterprises Limited, Age UK is an Introducer
 Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth
 Access for the purposes of introducing potential annuity and health
 cash plans customers respectively.  Age UK Enterprises Limited, JLT
 Benefit Solutions Limited and Simplyhealth Access are all authorised and
 regulated by the Financial Services Authority.
 --

 This email and any files transmitted with it are confidential and intended
 solely for the use of the individual or entity to whom they are
 addressed. If you receive a message in error, please advise the sender and
 delete immediately.

 Except where this email is sent in the usual course of our business, any
 opinions expressed in this email are those of the author and do not
 necessarily reflect the opinions of Age UK or its subsidiaries and
 associated companies. Age UK monitors all e-mail transmissions passing
 through its network and may block or modify mails which are deemed to be
 unsuitable.

 Age Concern England (charity number 261794) and Help the Aged (charity
 number 272786) and their trading and other associated companies merged
 on 1st April 2009.  Together they have formed the Age UK Group, dedicated
 to improving the lives of people in later life.  The three national
 Age Concerns in Scotland, Northern Ireland and Wales have also merged with
 Help the Aged in these nations to form three registered charities:
 Age Scotland, Age NI, Age Cymru.












[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Repeat elements of matrix based on vector counts

2012-12-13 Thread Sarah Goslee

Hi Sarah,

If I understand your requirements correctly, the easiest thing to do
is approach it from a different direction:
df3a - merge(df1, df2)

But you can also use rep for this simple example because plot.id in
df2 is sorted:
nindex - table(df1$plot.id)
df3b - df2[rep(1:length(nindex), times=nindex),]

Thanks for the reproducible example,
Sarah

On Thu, Dec 13, 2012 at 9:15 AM, Sarah Haas haaszool...@gmail.com wrote:
 I have two dataframes (df) that share a column header (plot.id).  In the
 1st df, plot.id records are repeated a variable number of times based on
 the number of trees monitored within each plot. The 2nd df only has a
 single record for each plot.id, and contains a variable named load that
 is collected at the plot-level and is only listed once per plot record.

 *OBJECTIVE:*  I need to repeat the load values from the 2nd df based on
 how many times plot.id is repeated in the 1st df (all plots are repeated
 a different number of times). My example dfs are below:

 
  df1 - data.frame(plot.id = rep(c(plot1, plot2, plot3),
 c(3,2,5)),
  tree.tag = c(111,112,113,222,223,333,334,335,336,337))

  df2 - data.frame(plot.id = c(plot1, plot2, plot3), load=c(17,
 6, 24))
 

 I have gotten close to solving this, but alas I'm on day 2 of
 problem-shooting and can't get it! Thanks for any help you might provide.

 --Sarah

 [[alternative HTML version deleted]]


--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Repeat elements of matrix based on vector counts

2012-12-13 Thread Rui Barradas


Hello,

Something like this?

rep(df2$load, table(df1$plot.id))


Hope this helps,

Rui Barradas
Em 13-12-2012 14:15, Sarah Haas escreveu:

I have two dataframes (df) that share a column header (plot.id).  In the
1st df, plot.id records are repeated a variable number of times based on
the number of trees monitored within each plot. The 2nd df only has a
single record for each plot.id, and contains a variable named load that
is collected at the plot-level and is only listed once per plot record.

*OBJECTIVE:*  I need to repeat the load values from the 2nd df based on
how many times plot.id is repeated in the 1st df (all plots are repeated
a different number of times). My example dfs are below:


  df1 - data.frame(plot.id = rep(c(plot1, plot2, plot3),
c(3,2,5)),
  tree.tag = c(111,112,113,222,223,333,334,335,336,337))

  df2 - data.frame(plot.id = c(plot1, plot2, plot3), load=c(17,
6, 24))


I have gotten close to solving this, but alas I'm on day 2 of
problem-shooting and can't get it! Thanks for any help you might provide.

--Sarah

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Repeat elements of matrix based on vector counts

Hi,
Try ?merge() or ?join() from library(plyr)
res-merge(df1,df2,by=plot.id)
 head(res,6)
#  plot.id tree.tag load
#1   plot1  111   17
#2   plot1  112   17
#3   plot1  113   17
#4   plot2  222    6
#5   plot2  223    6
#6   plot3  333   24
A.K




- Original Message -
From: Sarah Haas haaszool...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Thursday, December 13, 2012 9:15 AM
Subject: [R] Repeat elements of matrix based on vector counts

I have two dataframes (df) that share a column header (plot.id).  In the
1st df, plot.id records are repeated a variable number of times based on
the number of trees monitored within each plot. The 2nd df only has a
single record for each plot.id, and contains a variable named load that
is collected at the plot-level and is only listed once per plot record.

*OBJECTIVE:*  I need to repeat the load values from the 2nd df based on
how many times plot.id is repeated in the 1st df (all plots are repeated
a different number of times). My example dfs are below:


     df1 - data.frame(plot.id = rep(c(plot1, plot2, plot3),
c(3,2,5)),
                 tree.tag = c(111,112,113,222,223,333,334,335,336,337))

     df2 - data.frame(plot.id = c(plot1, plot2, plot3), load=c(17,
6, 24))


I have gotten close to solving this, but alas I'm on day 2 of
problem-shooting and can't get it! Thanks for any help you might provide.

--Sarah

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pairwise deletion in a linear regression and in a GLM ?

2012-12-13 Thread Jose Iparraguirre

Sorry, Arnaud, I misinterpreted the question.
There isn't a built-in option in lm or glm to run pairwise deletion, but in the 
'psych' package you can run regressions on covariance matrices rather than on 
raw data. So, first, you can obtain a covariance matrix by cov() with the 
option use=pairwise.complete.obs -or within 'psych', 
set.cor(...,use=pairwise), which will give you the correlations pairwise, and 
then you use the function mat.regress using the pairwise matrix.
Hope this helps,

José


From: Arnaud Mosnier [mailto:a.mosn...@gmail.com]
Sent: 13 December 2012 16:13
To: Jose Iparraguirre
Cc: r-help@r-project.org
Subject: Re: [R] Pairwise deletion in a linear regression and in a GLM ?

Hi Jose,

To my perception na.omit is different from a pairwise deletion.
With na.omit, you omit totally that case if there is a missing value for one of 
the variable you consider in the model.
In the pairwise deletion, the case with some missing value is kept and values 
that are not missing are used in the statistics calculations.

However, I agree that pairwise deletion is not good practice (so it would be 
surprising that it is the default in lm !!).
I just when to be able to recalculate the statistics given in this thesis.

Arnaud



2012/12/13 Jose Iparraguirre 
jose.iparragui...@ageuk.org.ukmailto:jose.iparragui...@ageuk.org.uk
Hi Arnaud,

A quick help search of lm or glm tells you that 'the factory-fresh default is 
na.omit'.
If you then look up 'na.omit', you'll read that it 'returns the object with 
incomplete cases removed'.
So, pairwise deletion is the default option in both lm and glm.

On a related note, it goes without saying that pairwise deletion is not good 
practice in most cases, and that R has ways to impute these missing cases 
depending on assumptions regarding the cause or nature of their missingness.

Regards,

José


José Iparraguirre
Chief Economist
Age UK




-Original Message-
From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org] On 
Behalf Of Arnaud Mosnier
Sent: 13 December 2012 15:40
To: r-help@r-project.orgmailto:r-help@r-project.org
Subject: [R] Pairwise deletion in a linear regression and in a GLM ?

Dear useRs,

In a thesis, I found a mention of the use of pairwise deletion in linear
regression and GLM (binomial family).
The author said that he has used R to do the statistics, but I did not find
the option allowing pairwise deletion in both lm and glm functions. Is
there somewhere a package allowing that ?

Thanks,

Arnaud
[[alternative HTML version deleted]]

__
R-help@r-project.orgmailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Wrap Up and Run 10k is back!

Also, new for 2013 - 2km intergenerational walks at selected venues. So recruit 
a buddy, dust off the trainers and beat the winter blues by
signing up now:

http://www.ageuk.org.uk/10k

 Milton Keynes | Oxford | Sheffield | Crystal Palace | Exeter | 
Harewood House, Leeds |
 Tatton Park, Cheshire | Southampton | Coventry



Age UK Improving later life

http://www.ageuk.org.uk




---
Age UK is a registered charity and company limited by guarantee, (registered 
charity number 1128267, registered company number 6825798).
Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.

For the purposes of promoting Age UK Insurance, Age UK is an Appointed 
Representative of Age UK Enterprises Limited, Age UK is an Introducer
Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth 
Access for the purposes of introducing potential annuity and health
cash plans customers respectively.  Age UK Enterprises Limited, JLT Benefit 
Solutions Limited and Simplyhealth Access are all authorised and
regulated by the Financial Services Authority.
--

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are
addressed. If you receive a message in error, please advise the sender and 
delete immediately.

Except where this email is sent in the usual course of our business, any 
opinions expressed in this email are those of the author and do not
necessarily reflect the opinions of Age UK or its subsidiaries and associated 
companies. Age UK monitors all e-mail transmissions passing
through its network and may block or modify mails which are deemed to be 
unsuitable.

Age Concern England (charity number 261794) and Help the Aged (charity number 
272786) and their trading and other associated companies merged
on 1st April 2009.  Together they have formed the Age UK Group, dedicated to 
improving the lives of

Re: [R] Pairwise deletion in a linear regression and in a GLM ?

2012-12-13 Thread Arnaud Mosnier

Thanks Jose, but I doubt that the author of these analysis used such a
complex approach.

Arnaud


2012/12/13 Jose Iparraguirre jose.iparragui...@ageuk.org.uk

  Sorry, Arnaud, I misinterpreted the question.

 There isnt a built-in option in lm or glm to run pairwise deletion, but
 in the psych package you can run regressions on covariance matrices
 rather than on raw data. So, first, you can obtain a covariance matrix by
 cov() with the option use=pairwise.complete.obs or within psych,
 set.cor(,use=pairwise), which will give you the correlations pairwise,
 and then you use the function mat.regress using the pairwise matrix.

 Hope this helps,

 ** **

 José 

 ** **

 ** **

 *From:* Arnaud Mosnier [mailto:a.mosn...@gmail.com]
 *Sent:* 13 December 2012 16:13
 *To:* Jose Iparraguirre
 *Cc:* r-help@r-project.org
 *Subject:* Re: [R] Pairwise deletion in a linear regression and in a GLM ?
 

 ** **

 Hi Jose,

 To my perception na.omit is different from a pairwise deletion.
 With na.omit, you omit totally that case if there is a missing value for
 one of the variable you consider in the model.
 In the pairwise deletion, the case with some missing value is kept and
 values that are not missing are used in the statistics calculations.

 However, I agree that pairwise deletion is not good practice (so it would
 be surprising that it is the default in lm !!).
 I just when to be able to recalculate the statistics given in this thesis.

 Arnaud

   

 ** **

 2012/12/13 Jose Iparraguirre jose.iparragui...@ageuk.org.uk

 Hi Arnaud,

 A quick help search of lm or glm tells you that 'the factory-fresh
 default is na.omit'.
 If you then look up 'na.omit', you'll read that it 'returns the object
 with incomplete cases removed'.
 So, pairwise deletion is the default option in both lm and glm.

 On a related note, it goes without saying that pairwise deletion is not
 good practice in most cases, and that R has ways to impute these missing
 cases depending on assumptions regarding the cause or nature of their
 missingness.

 Regards,

 José


 José Iparraguirre
 Chief Economist
 Age UK





 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Arnaud Mosnier
 Sent: 13 December 2012 15:40
 To: r-help@r-project.org
 Subject: [R] Pairwise deletion in a linear regression and in a GLM ?

 Dear useRs,

 In a thesis, I found a mention of the use of pairwise deletion in linear
 regression and GLM (binomial family).
 The author said that he has used R to do the statistics, but I did not find
 the option allowing pairwise deletion in both lm and glm functions. Is
 there somewhere a package allowing that ?

 Thanks,

 Arnaud

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 Wrap Up and Run 10k is back!

 Also, new for 2013  2km intergenerational walks at selected venues. So
 recruit a buddy, dust off the trainers and beat the winter blues by
 signing up now:

 http://www.ageuk.org.uk/10k

  Milton Keynes | Oxford | Sheffield | Crystal Palace |
 Exeter | Harewood House, Leeds |
  Tatton Park, Cheshire | Southampton |
 Coventry



 Age UK Improving later life

 http://www.ageuk.org.uk




 ---
 Age UK is a registered charity and company limited by guarantee,
 (registered charity number 1128267, registered company number 6825798).
 Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.

 For the purposes of promoting Age UK Insurance, Age UK is an Appointed
 Representative of Age UK Enterprises Limited, Age UK is an Introducer
 Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth
 Access for the purposes of introducing potential annuity and health
 cash plans customers respectively.  Age UK Enterprises Limited, JLT
 Benefit Solutions Limited and Simplyhealth Access are all authorised and
 regulated by the Financial Services Authority.
 --

 This email and any files transmitted with it are confidential and intended
 solely for the use of the individual or entity to whom they are
 addressed. If you receive a message in error, please advise the sender and
 delete immediately.

 Except where this email is sent in the usual course of our business, any
 opinions expressed in this email are those of the author and do not
 necessarily reflect the opinions of Age UK or its subsidiaries and
 associated companies. Age UK monitors all e-mail transmissions passing
 through its network and may block or modify mails which are deemed to be
 unsuitable.

 Age Concern England (charity number 261794) and Help the Aged (charity
 number

[R] subsetting time series

2012-12-13 Thread m p

Hello,
my series of dates look like

  [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC
  [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC
  [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC
  [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC
  [9] 2012-05-31 02:30:00 UTC 2012-05-31 00:30:00 UTC
 [11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC
 [13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC
 [15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC
 [17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC
 [19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC
...

I'd like to subset this to four series

1)
  [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC
  [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC
  [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC
  [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC
  [9] 2012-05-31 02:30:00 UTC

 [10] 2012-05-31 18:30:00 UTC 2012-05-31 19:30:00 UTC
...

2)
2012-05-31 00:30:00 UTC
- [1]
 [11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC - [2,3]
 [13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC
 [15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC
 [17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC

 [10] 2012-06-01 00:30:00 UTC


3)
 [19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC
...

so that I can plot data for each of the series separately without e.g. data
at hour  2012-05-31 02:30:00 UTC  connecting in the figure to 2012-05-31
00:30:00 UTC

Basically, cycling through the series with period 9

Thanks for any suggestions/help,
thanks,

Mark

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [R-sig-hpc] recursion depth limitations

2012-12-13 Thread Simon Urbanek

On Dec 13, 2012, at 10:45 AM, Suzen, Mehmet wrote:

 Hello List,
 
 I am aware that one can set with recursion depth 'options(expressions
 = #)', but it has 500K limit. Why do we have a 500K limit on this?

Because it's far beyond what you can handle without changing a lot of other 
things. 500k expressions will require at least about 320Mb of stack (!) in the 
eval() chain alone -- compare that to the 8Mb stack size which is default in 
most OSes, so you'll hit the wall way before that limit is reached.


 While some algorithms are only solvable  feasibility with recursion
 and 500K sounds not too much i.e. graph algorithms
 for example dependency trees with large nodes easily reach to that number.
 

I don't see how large nodes have anything to do with this since we are 
talking about expression depth, not about sizes of any kind. Again, in any 
realistic example you'll hit other limits first anyway.

Cheers,
Simon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to select a subset data to do a barplot in ggplot2

Hi,

May be this:
p-ggplot(subset(dat1,STATUS!=nosperm),aes(x=FID))
 p+geom_bar(aes(x=factor(FID),y=..count..,fill=STATUS))
A.K.




- Original Message -
From: Yao He yao.h.1...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Thursday, December 13, 2012 7:38 AM
Subject: [R] How to select a subset data to do a barplot in ggplot2

Hi,everybody

I have a dataframe like this

FID IID STATUS
1    4621    live
1    4628    dead
2    4631    live
2    4632    live
2    4633    live
2    4634    live
6    4675    live
6    4679    dead
10    4716    dead
10    4719    live
10    4721    dead
11    4726    live
11    4728    nosperm
11    4730    nosperm
12    4732    live
17    4783    live
17    4783    live
17    4784    live

I just want a barblot to count live or dead in every FID, and fill
the bar with different colour.

I try these codes:

p-ggplot(data,aes(x=FID));
p+geom_bar(aes(x=factor(FID),y=..count..,fill=STATUS))

But how could I exclude nosperm or other levels just in the use of
ggplot2 without generating another dataframe

Thanks a lot

Yao He
—
Master candidate in 2rd year
Department of Animal genetics  breeding
Room 436,College of Animial ScienceTechnology,
China Agriculture University,Beijing,100193
E-mail: yao.h.1...@gmail.com ming...@vt.edu
——

    [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsetting time series

2012-12-13 Thread stephen sefick

Is this a one off or not?  Why not do it manually?  If you need to write a
function some example data would be helpful.


On Thu, Dec 13, 2012 at 10:52 AM, m p mzp3...@gmail.com wrote:

 Hello,
 my series of dates look like

   [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC
   [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC
   [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC
   [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC
   [9] 2012-05-31 02:30:00 UTC 2012-05-31 00:30:00 UTC
  [11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC
  [13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC
  [15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC
  [17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC
  [19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC
 ...

 I'd like to subset this to four series

 1)
   [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC
   [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC
   [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC
   [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC
   [9] 2012-05-31 02:30:00 UTC

  [10] 2012-05-31 18:30:00 UTC 2012-05-31 19:30:00 UTC
 ...

 2)
 2012-05-31 00:30:00 UTC
 - [1]
  [11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC - [2,3]
  [13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC
  [15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC
  [17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC

  [10] 2012-06-01 00:30:00 UTC


 3)
  [19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC
 ...

 so that I can plot data for each of the series separately without e.g. data
 at hour  2012-05-31 02:30:00 UTC  connecting in the figure to 2012-05-31
 00:30:00 UTC

 Basically, cycling through the series with period 9

 Thanks for any suggestions/help,
 thanks,

 Mark

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Stephen Sefick
**
Auburn University
Biological Sciences
331 Funchess Hall
Auburn, Alabama
36849
**
sas0...@auburn.edu
http://www.auburn.edu/~sas0025
**

Let's not spend our time and resources thinking about things that are so
little or so large that all they really do for us is puff us up and make us
feel like gods.  We are mammals, and have not exhausted the annoying little
problems of being mammals.

-K. Mullis

A big computer, a complex algorithm and a long time does not equal
science.

  -Robert Gentleman

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsetting time series



On Dec 13, 2012, at 8:52 AM, m p wrote:


Hello,
my series of dates look like

 [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC
 [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC
 [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC
 [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC
 [9] 2012-05-31 02:30:00 UTC 2012-05-31 00:30:00 UTC
[11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC
[13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC
[15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC
[17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC
[19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC
...



Better would have been
 series - seq(as.POSIXct(2012-05-30 18:30:00, tz= UTC),  
length=40, by=1 hour)



I'd like to subset this to four series


Although you latter describe the problem differently, so I am  
following that description. See if this split approach with modulo  
arithmetic is helpful:


split(series, (0:length(series)-1) %/% 9 )

--
David.


1)
 [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC
 [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC
 [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC
 [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC
 [9] 2012-05-31 02:30:00 UTC

[10] 2012-05-31 18:30:00 UTC 2012-05-31 19:30:00 UTC
...

2)
   2012-05-31 00:30:00  
UTC

- [1]
[11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC - [2,3]
[13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC
[15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC
[17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC

[10] 2012-06-01 00:30:00 UTC


3)
[19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC
...

so that I can plot data for each of the series separately without  
e.g. data
at hour  2012-05-31 02:30:00 UTC  connecting in the figure to  
2012-05-31

00:30:00 UTC

Basically, cycling through the series with period 9

Thanks for any suggestions/help,
thanks,

Mark

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] More efficient use of reshape?



On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:


Hi all,

I have played a bit with the reshape package and function along with
melt and cast, but I feel I still don't have a good handle on  
how to
use them efficiently. Below I have included a application of  
reshape that

is rather clunky and I'm hoping someone can offer advice on how to use
reshape (or melt/cast) more efficiently.



You do realize that the 'reshape' function is _not_ in the reshape  
package, right? And also that the reshape package has been superseded  
by the reshape2 package?


--
David.



#For this example I am using climate change data available on-line

file - (
http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv;)
clim.data - read.csv(file, header=TRUE)

library(lubridate)
library(reshape)

#I've been playing with the lubridate package a bit to work with  
dates, but

as the climate dataset only uses year and month I have
#added a day to each entry in the yr_mn column and then used  
dym from

lubridate to generate the POSIXlt formatted dates in
#a new column clim.data$date

clim.data$yr_mn-paste(01, clim.data$yr_mn, sep=)
clim.data$date-dym(clim.data$yr_mn)

#Now to the reshape. The dataframe is in a wide format. The columns  
GISS,

HAD, NOAA, RSS, and UAH are all different sources
#from which the global temperature anomaly has been calculated since  
1880

(actually only 1978 for RSS and UAH). What I would like to
#do is plot the temperature anomaly vs date and use ggplot to facet  
by the

different data source (GISS, HAD, etc.). Thus I need the
#data in long format with a date column, a temperature anomaly  
column, and

a data source column. The code below works, but its
#really very clunky and I'm sure I am not using these tools as  
efficiently

as I can.

#The varying=list(3:7) specifies the columns in the dataframe that
corresponded to the sources (GISS, etc.), though then in the resulting
#reshaped dataframe the sources are numbered 1-5, so I have to  
reassigned

their names. In addition, the original dataframe has
#additional data columns I do not want and so after reshaping I create
another! dataframe with just the columns I need, and
#then I have to rename them so that I can keep track of what  
everything is.

Whew! Not the most elegant of code.

d-reshape(clim.data, varying=list(3:7),idvar=date,
v.names=anomaly,direction=long)

d$time-ifelse(d$time==1,GISS,d$time)
d$time-ifelse(d$time==2,HAD,d$time)
d$time-ifelse(d$time==3,NOAA,d$time)
d$time-ifelse(d$time==4,RSS,d$time)
d$time-ifelse(d$time==5,UAH,d$time)

new.data-data.frame(d$date,d$time,d$anomaly)
names(new.data)-c(date,source,anomaly)

I realize this is a mess, though it works. I think with just some  
help on
how better to work this example I'll probably get over the learning  
hump
and actually figure out how to use these data manipulation functions  
more

cleanly.

Any advice or assistance would be appreciated.
Thanks,
Nate

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] More efficient use of reshape?

2012-12-13 Thread Nathan Miller

Sorry David,

In my attempt to simplify example and just include the code I felt was
necessary I left out the loading of ggplot2, which then imports reshape2,
and which was actually used in the code I provided. Sorry to the mistake
and my misunderstanding of where the reshape function was coming from.
Should have checked that more carefully.

Thanks,
Nate


On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:

  Hi all,

 I have played a bit with the reshape package and function along with
 melt and cast, but I feel I still don't have a good handle on how to
 use them efficiently. Below I have included a application of reshape
 that
 is rather clunky and I'm hoping someone can offer advice on how to use
 reshape (or melt/cast) more efficiently.


 You do realize that the 'reshape' function is _not_ in the reshape
 package, right? And also that the reshape package has been superseded by
 the reshape2 package?

 --
 David.


 #For this example I am using climate change data available on-line

 file - (
 http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csvhttp://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv
 )
 clim.data - read.csv(file, header=TRUE)

 library(lubridate)
 library(reshape)

 #I've been playing with the lubridate package a bit to work with dates,
 but
 as the climate dataset only uses year and month I have
 #added a day to each entry in the yr_mn column and then used dym
 from
 lubridate to generate the POSIXlt formatted dates in
 #a new column clim.data$date

 clim.data$yr_mn-paste(01, clim.data$yr_mn, sep=)
 clim.data$date-dym(clim.data$**yr_mn)

 #Now to the reshape. The dataframe is in a wide format. The columns GISS,
 HAD, NOAA, RSS, and UAH are all different sources
 #from which the global temperature anomaly has been calculated since 1880
 (actually only 1978 for RSS and UAH). What I would like to
 #do is plot the temperature anomaly vs date and use ggplot to facet by the
 different data source (GISS, HAD, etc.). Thus I need the
 #data in long format with a date column, a temperature anomaly column, and
 a data source column. The code below works, but its
 #really very clunky and I'm sure I am not using these tools as efficiently
 as I can.

 #The varying=list(3:7) specifies the columns in the dataframe that
 corresponded to the sources (GISS, etc.), though then in the resulting
 #reshaped dataframe the sources are numbered 1-5, so I have to reassigned
 their names. In addition, the original dataframe has
 #additional data columns I do not want and so after reshaping I create
 another! dataframe with just the columns I need, and
 #then I have to rename them so that I can keep track of what everything
 is.
 Whew! Not the most elegant of code.

 d-reshape(clim.data, varying=list(3:7),idvar=date**,
 v.names=anomaly,direction=**long)

 d$time-ifelse(d$time==1,**GISS,d$time)
 d$time-ifelse(d$time==2,HAD**,d$time)
 d$time-ifelse(d$time==3,**NOAA,d$time)
 d$time-ifelse(d$time==4,RSS**,d$time)
 d$time-ifelse(d$time==5,UAH**,d$time)

 new.data-data.frame(d$date,d$**time,d$anomaly)
 names(new.data)-c(date,**source,anomaly)

 I realize this is a mess, though it works. I think with just some help on
 how better to work this example I'll probably get over the learning hump
 and actually figure out how to use these data manipulation functions more
 cleanly.

 Any advice or assistance would be appreciated.
 Thanks,
 Nate

 [[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 David Winsemius, MD
 Alameda, CA, USA



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] CPOS from cwhmisc package not found

2012-12-13 Thread Shirley Lee

Hi:
I wonder if anyone can help me about cpos function not found error: :

path.package(cwhmisc, quiet = FALSE) [1] 
C:/Users/slee/Documents/R/win-library/2.15/cwhmisc So I have package cwhmisc 
where there is cpos function. But I got error:
cpos(ab,b,1) Error: could not find function cpos

Then I tried to install on R prompt but got this error message:

install.packages(cwhmisc,lib=C:/Program Files/R/R-2.15.2/library/) Warning 
message:
package 'cwhmisc' is not available (for R version 2.15.2)

So I don't understand why the package is not available for the version of R I 
am running. Anyone has any idea?


---
Shirley Lee





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsetting time series

Hi,
Try this:
seq1-seq(from=as.POSIXct(2012-05-30 
18:30:00,tz=UTC),to=as.POSIXct(2012-05-31 02:30:00,tz=UTC),by=1 hour)
seq2-seq(from=as.POSIXct(2012-05-31 
00:30:00,tz=UTC),to=as.POSIXct(2012-05-31 08:30:00,tz=UTC),by=1 hour)
seq3-seq(from=as.POSIXct(2012-05-31 
06:30:00,tz=UTC),to=as.POSIXct(2012-05-31 07:30:00,tz=UTC),by=1 hour)
Sys.setenv(TZ=UTC)
 Series1-c(seq1,seq2,seq3)
 split(Series1,rep(1:3,each=9))
#or individually if it is a small dataset 
Series1[1:9]
Series1[10:18]
etc.

A.K.




- Original Message -
From: m p mzp3...@gmail.com
To: r-h...@stat.math.ethz.ch
Cc: 
Sent: Thursday, December 13, 2012 11:52 AM
Subject: [R] subsetting time series

Hello,
my series of dates look like

  [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC
  [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC
  [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC
  [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC
  [9] 2012-05-31 02:30:00 UTC 2012-05-31 00:30:00 UTC
[11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC
[13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC
[15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC
[17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC
[19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC
...

I'd like to subset this to four series

1)
  [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC
  [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC
  [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC
  [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC
  [9] 2012-05-31 02:30:00 UTC

[10] 2012-05-31 18:30:00 UTC 2012-05-31 19:30:00 UTC
...

2)
                                                2012-05-31 00:30:00 UTC
- [1]
[11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC - [2,3]
[13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC
[15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC
[17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC

[10] 2012-06-01 00:30:00 UTC


3)
[19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC
...

so that I can plot data for each of the series separately without e.g. data
at hour  2012-05-31 02:30:00 UTC  connecting in the figure to 2012-05-31
00:30:00 UTC

Basically, cycling through the series with period 9

Thanks for any suggestions/help,
thanks,

Mark

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] abline of an lm fit not correct

2012-12-13 Thread Patrick Connolly


Easting and northing data uses numbers requiring more digits than R's
default of 7.  In all the years I've used R, the only time I've needed
to adjust the default digits is with easting and northing data.

Try something like

options(digits = 11)


HTH

On Thu, 13-Dec-2012 at 03:22PM +, Robert U wrote:

| Hello fellow
| R-users,
| ??
| I???m stuck
| with something i think is pretty stupid, but i can???t find out where i???m 
wrong,
| and it???s turning me crazy! 
| ??
| I am doing
| a very simple linear regression with Northing/Easting data, then I plot the
| data as well as the regression line : 
| ??
|  plot(x=Dataset$EASTING,
| y=Dataset$NORTHING)
|  fit - lm(formula = NORTHING ~ EASTING,
| data = Dataset)
|  abline(fit)
|  fit
| ??
| Call:
| lm(formula = NORTHING ~ EASTING, data =
| Dataset)
| ??
| Coefficients:
| (Intercept)?? EASTING?? 
| ?? 5.376e+05?? 4.692e-02?? 
| ??
| Later on, when I use the
| command ???abline??? with the coefficient provided by ???summary(fit)???, 
the line is
| not the same as abline(fit) !
| ??
| To summarize,
| those two lines are different: 
| ??
|  abline(fit)
| 
| abline(5.376e+05, 4.692e-02)
| ??
| The ???b??? coefficients
| appear equal, but the intercepts are different.
| ??
| Where am I missing
| something? L
| ??
| Thanks
|  [[alternative HTML version deleted]]
| 

| __
| R-help@r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-help
| PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
| and provide commented, minimal, self-contained, reproducible code.


-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___Patrick Connolly   
 {~._.~}   Great minds discuss ideas
 _( Y )_ Average minds discuss events 
(:_~*~_:)  Small minds discuss people  
 (_)-(_)  . Eleanor Roosevelt
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do I make a loop to extract a column from multiple lists and then bind them together to make a new matrix?

2012-12-13 Thread Adams, Jean

Try this ...

MDD.mean.s10 - sapply(MC_MDD.noNA$results, function(x) x[[2]][, 7])

Jean


On Thu, Dec 13, 2012 at 8:31 AM, Corinne Lapare corinnelap...@gmail.comwrote:

 Hi!  I am new to looping and R in general; and I have sent wy to much
 time on this one problem and am about a hair away from doing it manually
 for the next two days.



 So, there is a package that while calculating the statistic creates lists
 (that look like matrices) in the background.  Each item (there are 10
 items) has one of these matrix looking list that I need to extract data
 from. The list has 5 rows that represent 5 groups, and 8 columns.  I need
 to extract 3 of the columns (Lo Score=[,2], Hi Score=[,3], and
 Mean=[,7]) for each of the items.  I then want to turn the extracted data
 into 3 matrices (Lo Score, Hi Score, and Mean) where the rows are the
 5 groups and the columns are items 1-10.



 This is how I can create the mean matrix by hand.  MDD.mean.s10 is the
 matrix I want in the end.  (notice the first bracket after $results is the
 only part that changes 1-10 (to represent the 10 items) and the last
 bracket is [,7] to represent the mean located in column 7)

   m.1a - MC_MDD.noNA$results[[1]][[2]][,7]

   m.2b - MC_MDD.noNA$results[[2]][[2]][,7]

   m.3c - MC_MDD.noNA$results[[3]][[2]][,7]

   m.4d - MC_MDD.noNA$results[[4]][[2]][,7]

   m.5e - MC_MDD.noNA$results[[5]][[2]][,7]

   m.6f - MC_MDD.noNA$results[[6]][[2]][,7]

   m.7g - MC_MDD.noNA$results[[7]][[2]][,7]

   m.8h - MC_MDD.noNA$results[[8]][[2]][,7]

   m.9i - MC_MDD.noNA$results[[9]][[2]][,7]

   m.10j - MC_MDD.noNA$results[[10]][[2]][,7]

  MDD.mean.s10 - cbind(m.1a, m.2b, m.3c, m.4d, m.5e, m.6f, m.7g, m.8h,
 m.9i, m.10j)

 

  MDD.mean.s10

   m.1a  m.2b  m.3c  m.4d  m.5e  m.6f  m.7g
 m.8h  m.9i m.10j

 [1,] 0.8707865 0.7393939 0.7769231 0.7591241 0.853 0.7925926 0.8258065
 0.8410596 0.8843931 0.5638298

 [2,] 0.8323353 0.7302632 0.5913978 0.5868263 0.6923077 0.6182796 0.6964286
 0.6839080 0.7911392 0.3212121

 [3,] 0.8726115 0.7159763 0.7117647 0.6163522 0.7987805 0.7105263 0.7613636
 0.7674419 0.8034682 0.4011299

 [4,] 0.9024390 0.7894737 0.7795276 0.6530612 0.8593750 0.7112676 0.8672566
 0.8629032 0.9152542 0.4834437

 [5,] 0.986 0.9102564 0.8452381 0.8160920 0.9726027 0.8658537 0.8352941
 0.9342105 0.947 0.6454545





 But I cant do this by hand every time, as this comes up over and over and
 over again in multiple lists.

 I have figured out how to loop this procedure and name the vector as it
 goes along:

  for(i in 1:10){

 +   assign(paste(m, i, sep = ), MC_MDD.noNA$results[[i]][[2]][,7])

 +  }

 

 

   m1

 [1] 0.8707865 0.8323353 0.8726115 0.9024390 0.986

   m2

 [1] 0.7393939 0.7302632 0.7159763 0.7894737 0.9102564

   m3

 [1] 0.7769231 0.5913978 0.7117647 0.7795276 0.8452381

   m4

 [1] 0.7591241 0.5868263 0.6163522 0.6530612 0.8160920

   m5

 [1] 0.853 0.6923077 0.7987805 0.8593750 0.9726027

   m6

 [1] 0.7925926 0.6182796 0.7105263 0.7112676 0.8658537

   m7

 [1] 0.8258065 0.6964286 0.7613636 0.8672566 0.8352941

   m8

 [1] 0.8410596 0.6839080 0.7674419 0.8629032 0.9342105

   m9

 [1] 0.8843931 0.7911392 0.8034682 0.9152542 0.947

   m10

 [1] 0.5638298 0.3212121 0.4011299 0.4834437 0.6454545





 Now here where I get stuck how do I cbind these vectors without typing it
 out expliciity? ie. mean.MDD - cbind(m1,m2,m3,m4,m5,m6,m7,m8,m9,10)



 Everything I have tried keeps overwriting the data instead of building a
 matrix.  Basically I, start with a matrix (5x10) of zeros.  Then I wind up
 with a few values in the beginning, but the rest is still zeros.

 Example of terrible code:

  fo - matrix(0,5,10)

   colnames(fo) - paste('f', 1:10, sep = )

   fo

  f1 f2 f3 f4 f5 f6 f7 f8 f9 f10

 [1,]  0  0  0  0  0  0  0  0  0   0

 [2,]  0  0  0  0  0  0  0  0  0   0

 [3,]  0  0  0  0  0  0  0  0  0   0

 [4,]  0  0  0  0  0  0  0  0  0   0

 [5,]  0  0  0  0  0  0  0  0  0   0

   for(i in 1:10){

 +   fo - assign(paste(f, i, sep = ),
 MC_MDD.noNA$results[[i]][[2]][,7])

 +  }

   fo

 [1] 0.5638298 0.3212121 0.4011299 0.4834437 0.6454545



  fo - matrix(0,5,10)

   colnames(fo) - paste('f', 1:10, sep = )

   fo

  f1 f2 f3 f4 f5 f6 f7 f8 f9 f10

 [1,]  0  0  0  0  0  0  0  0  0   0

 [2,]  0  0  0  0  0  0  0  0  0   0

 [3,]  0  0  0  0  0  0  0  0  0   0

 [4,]  0  0  0  0  0  0  0  0  0   0

 [5,]  0  0  0  0  0  0  0  0  0   0

   for(i in 1:10){

 +   fo - cbind(assign(paste(f, i, sep = ),
 MC_MDD.noNA$results[[i]][[2]][,7]))

 +  }

   fo

   [,1]

 [1,] 0.5638298

 [2,] 0.3212121

 [3,] 0.4011299

 [4,] 0.4834437

 [5,] 0.6454545









 Thanks for your help in advance!!!  (c:

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide

Re: [R] subsetting time series

2012-12-13 Thread m p

Thats works perfectly, thanks a lot,
Mark

On Thu, Dec 13, 2012 at 11:34 AM, arun smartpink...@yahoo.com wrote:

 Hi,
 Try this:
 seq1-seq(from=as.POSIXct(2012-05-30
 18:30:00,tz=UTC),to=as.POSIXct(2012-05-31 02:30:00,tz=UTC),by=1
 hour)
 seq2-seq(from=as.POSIXct(2012-05-31
 00:30:00,tz=UTC),to=as.POSIXct(2012-05-31 08:30:00,tz=UTC),by=1
 hour)
 seq3-seq(from=as.POSIXct(2012-05-31
 06:30:00,tz=UTC),to=as.POSIXct(2012-05-31 07:30:00,tz=UTC),by=1
 hour)
 Sys.setenv(TZ=UTC)
  Series1-c(seq1,seq2,seq3)
  split(Series1,rep(1:3,each=9))
 #or individually if it is a small dataset
 Series1[1:9]
 Series1[10:18]
 etc.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Available Memory

2012-12-13 Thread Muhammad Abuizzah

I have a large database on sql Server 2012 Developers edition,  Windows 7 
ultimate edition,
some of my tables are as large as 10GB,
I am running R15.2 with a 64-bit build

I have been connecting fine to the database and extracting info.  but  it seams 
this was the first time I tried to pull a large (1/2 gb) amount of data in one 
query, The query didn't have anything fancy, it was code that always worked!

R dropped the work without providing an error message.  
I got the sand clock running for a couple of seconds, as if R had stared 
communication with the database, but 
then nothing. I looked at my windows task manger, and CPU utilization was at 
zero.

I ran memory.size() function to confirm availability of memory and it read 24 
thousand I don't remember the rest, I have 24GBs of ram on my computer.  the 
size of the other R objects in memory was around 2GB
I used RODBC to connect to the database, I understand the number you get when 
you run memory.size is in thousands of MBs, so a read of 24,000 means 24GB, 
which is consistent with the amount of ram in my machine.


Is there anything that I missed?  is there another way to check availability of 
memory? or allocated memory for an R session?
Are there issues with RODBC which might cause a failure of data transfer when 
the amount of data requested is large?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] CPOS from cwhmisc package not found



On Dec 13, 2012, at 10:56 AM, Shirley Lee wrote:


Hi:
I wonder if anyone can help me about cpos function not found error: :

path.package(cwhmisc, quiet = FALSE) [1] C:/Users/slee/Documents/ 
R/win-library/2.15/cwhmisc So I have package cwhmisc where there is  
cpos function. But I got error:

cpos(ab,b,1) Error: could not find function cpos

Then I tried to install on R prompt but got this error message:

install.packages(cwhmisc,lib=C:/Program Files/R/R-2.15.2/ 
library/) Warning message:

package 'cwhmisc' is not available (for R version 2.15.2)

So I don't understand why the package is not available for the  
version of R I am running. Anyone has any idea?


http://cran.r-project.org/web/packages/cwhmisc/index.html

It has been withdrawn for some reason. You are advised in the Posting  
Guide (that no one seems to read) to contact the maintainer (or search  
the Archives) for such questions.


--
David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [R-sig-hpc] recursion depth limitations

2012-12-13 Thread Suzen, Mehmet

On 13 December 2012 17:52, Simon Urbanek simon.urba...@r-project.org wrote:
 Because it's far beyond what you can handle without changing a lot of other 
 things. 500k expressions will require at least about 320Mb of stack (!) in 
 the eval() chain alone -- compare that to the 8Mb stack size which is default 
 in most OSes, so you'll hit the wall way before that limit is reached.


Thank you for the explanation. Sorry to be dummy on this but why one
need a stack? I thought pointing to itself
has no memory cost for a function. Is it about how compilers designed
or about R being dynamic language?


 While some algorithms are only solvable  feasibility with recursion
 and 500K sounds not too much i.e. graph algorithms
 for example dependency trees with large nodes easily reach to that number.


 I don't see how large nodes have anything to do with this since we are 
 talking about expression depth, not about sizes of any kind.
 Again, in any realistic example you'll hit other limits first anyway.

I was thinking about very big tree with large depth, so each recursion
step may correspond to one leaf. Well not sure what
would be the application that has million depth, maybe a genetic algorithm.

Cheers,
Mehmet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] changing character strings with hash marks

2012-12-13 Thread simona mancini

Hi R users,

I am quite new to R and I don't know how to deal with this (surely) easy issue. 
I need to replace words in sentences with as many hash marks as the number of 
characters per each word, as in the following example:

Mary plays football
 # 

Any suggestion about the function to be used?
Thanks a lot.

S.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] changing character strings with hash marks

2012-12-13 Thread Uwe Ligges




On 13.12.2012 22:30, simona mancini wrote:

Hi R users,

I am quite new to R and I don't know how to deal with this (surely) easy issue. 
I need to replace words in sentences with as many hash marks as the number of 
characters per each word, as in the following example:

Mary plays football
 # 


gsub([[:alpha:]], #, Mary plays football)

Uwe Ligges



Any suggestion about the function to be used?
Thanks a lot.

S.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] changing character strings with hash marks

2012-12-13 Thread Bert Gunter

Simona:

If you intend to work with text, you need to learn about regular
expressions. There are many tutorials on this topic on the web. Go search.

Then learn about how R handles them via:
?regex ## at the R prompt

Then ask your question more clearly, although by this time you'll probably
have figured it out yourself: For example, you failed to specify whether
punctuation could appear in the sentences or what language (and character
set) is used.

Finally, an answer (there are others) to the question you posed -- which is
probably not going to be sufficient -- is:

gsub([^ ],#,Mary plays football)
[1] #

Cheers,
Bert

On Thu, Dec 13, 2012 at 1:30 PM, simona mancini mancinisim...@yahoo.itwrote:

Hi R users,

I am quite new to R and I don't know how to deal with this (surely) easy
issue. I need to replace words in sentences with as many hash marks as the
number of characters per each word, as in the following example:

Mary plays football
#

Any suggestion about the function to be used?
Thanks a lot.

S.
[[alternative HTML version deleted]]

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [R-sig-hpc] recursion depth limitations

2012-12-13 Thread Rui Barradas


Hello.

Inline.
Em 13-12-2012 21:31, Suzen, Mehmet escreveu:

On 13 December 2012 17:52, Simon Urbanek simon.urba...@r-project.org wrote:

Because it's far beyond what you can handle without changing a lot of other 
things. 500k expressions will require at least about 320Mb of stack (!) in the 
eval() chain alone -- compare that to the 8Mb stack size which is default in 
most OSes, so you'll hit the wall way before that limit is reached.


Thank you for the explanation. Sorry to be dummy on this but why one
need a stack? I thought pointing to itself
has no memory cost for a function.


But it does, each recursive call will load another copy of the function, 
and another copy of the variables used.
In fact, the cost can become quite large since everything is loaded in 
memory again.


Hope this helps,

Rui Barradas

  Is it about how compilers designed
or about R being dynamic language?


While some algorithms are only solvable  feasibility with recursion
and 500K sounds not too much i.e. graph algorithms
for example dependency trees with large nodes easily reach to that number.


I don't see how large nodes have anything to do with this since we are 
talking about expression depth, not about sizes of any kind.
Again, in any realistic example you'll hit other limits first anyway.

I was thinking about very big tree with large depth, so each recursion
step may correspond to one leaf. Well not sure what
would be the application that has million depth, maybe a genetic algorithm.

Cheers,
Mehmet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] replace parenthetical phrases in a string

2012-12-13 Thread Adams, Jean

R-helpers,

I have a vector of character strings in which I would like to replace each
parenthetical phrase with a single space,  .  For example if I start with
x, I would like to end up with y.

x - c(My toast=bog(keep=3 no=4) and eggs(er34)omit=32,
dogs have ears,
cats have tails (and ears, too!))

y - c(My toast=bog  and eggs omit=32,
dogs have ears,
cats have tails  )

I'm guessing that this can be done with gsub(), but I have never mastered
the mysteries of regular expressions.

I would greatly appreciate any pointers.

Thanks.

Jean

P.S.  I'm using R version 2.15.2 on Windows 7.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replace parenthetical phrases in a string

2012-12-13 Thread Adams, Jean

My apologies.  I sent too soon!

I did a bit more digging, and found a solution on the R-help archives.

y - gsub( *\\([^)]*\\) *, , x)

Jean



On Thu, Dec 13, 2012 at 4:53 PM, Adams, Jean jvad...@usgs.gov wrote:

 R-helpers,

 I have a vector of character strings in which I would like to replace each
 parenthetical phrase with a single space,  .  For example if I start with
 x, I would like to end up with y.

 x - c(My toast=bog(keep=3 no=4) and eggs(er34)omit=32,
 dogs have ears,
 cats have tails (and ears, too!))

 y - c(My toast=bog  and eggs omit=32,
 dogs have ears,
 cats have tails  )

 I'm guessing that this can be done with gsub(), but I have never mastered
 the mysteries of regular expressions.

 I would greatly appreciate any pointers.

 Thanks.

 Jean

 P.S.  I'm using R version 2.15.2 on Windows 7.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] duplicated.data.frame() and POSIXct with DST shift

2012-12-13 Thread Tobias Gauster

Hi,

I encountered the behavior, that the duplicated method for data.frames gives 
false positives if there are columns of class POSIXct with a clock shift from 
DST to standard time. 

time - as.POSIXct(2012-10-28 02:00, tz=Europe/Vienna) + c(0, 60*60)
time
[1] 2012-10-28 02:00:00 CEST 2012-10-28 02:00:00 CET 

df - data.frame(time, text=foo)
duplicated(df)
[1] FALSE  TRUE

This is because the timezone is lost after calling paste():
do.call(paste, c(df, sep = \r))
[1] 2012-10-28 02:00:00\rfoo 2012-10-28 02:00:00\rfoo

I can't really figure out if this behavior is desired or not. If so, a short 
warning in ?duplicated could be helpful. It is mentioned how 
duplicated.data.frame() works, but I didn't find a hint to properly handle 
POSIXct-objects. 

My particular problem was to cast a data.frame like this one with cast() (which 
calls reshape1(), which calls duplicated()):

df2 - data.frame(time, time1=as.numeric(time),
  lab=rep(1:3, each=2), value=101:106, 
  text=rep(c(foo, bar), each=3))

library(reshape2)

Using the column of class POSIXct as a variable in the formula gives:
cast(lab*time~text, data=df2, value=value)
Aggregation requires fun.aggregate: length used as default
  labtime bar foo
1   1 2012-10-28 02:00:00   0   2
2   2 2012-10-28 02:00:00   1   1
3   3 2012-10-28 02:00:00   2   0

Converting to numeric, casting and converting back works as expected, although 
the timezone is not visible, because print.data.frame() calls format.POSIXct() 
with, usetz = FALSE:
y - cast(lab*time1~text, data=df2, value=value)
y$time1 - as.POSIXct(1970-01-01 01:00) + as.numeric(y$time1)

Can anyone suggest a more elegant solution? 

Best,
Tobias

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] changing character strings with hash marks

Hi,
You could also use:

gsub(\\w,#,Mary plays football)
#[1]  # 
#or
gsub([A-Za-z], #, Mary plays football)

A.K.

- Original Message -
From: Uwe Ligges lig...@statistik.tu-dortmund.de
To: simona mancini mancinisim...@yahoo.it
Cc: r-help@r-project.org r-help@r-project.org
Sent: Thursday, December 13, 2012 5:38 PM
Subject: Re: [R] changing character strings with hash marks



On 13.12.2012 22:30, simona mancini wrote:
 Hi R users,

 I am quite new to R and I don't know how to deal with this (surely) easy 
 issue. I need to replace words in sentences with as many hash marks as the 
 number of characters per each word, as in the following example:

 Mary plays football
  # 

gsub([[:alpha:]], #, Mary plays football)

Uwe Ligges


 Any suggestion about the function to be used?
 Thanks a lot.

 S.
     [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Combined Marimekko/heatmap

2012-12-13 Thread Neal Humphrey

Hi all, 

I'm trying to figure out a way to create a data graphic that I haven't ever 
seen an example of before, but hopefully there's an R package out there for it. 
The idea is to essentially create a heatmap, but to allow each column and/or 
row to be a different width, rather than having uniform column and row height. 
This is sort of like a Marimekko chart in appearance, except that rather than 
use a single color to represent the category, the color represents a value and 
all the y-axis heights in each column line up with each other. That way color 
represents one variable, while the area of the cell represents another. 

In my application, my heatmap has discrete categorical data rather than 
continuous. Rows are countries, columns are appliances, and I want to scale the 
width and height of each column to be the fraction of global energy consumed by 
the country and the fraction of energy use consumed by that appliance type. The 
color coding would then indicate whether or not that appliance is regulated in 
that country. 

Any ideas how to make such a chart, or even what it might be called?


Neal Humphrey
nhumph...@clasponline.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] neural net

2012-12-13 Thread Katarzyna Nurzynska

Hi
Thanks for your reply. I have compared my data with some other which works and 
I cannot see the difference... 

The structure of my data is shown below: 

 str(data) 
'data.frame':   19 obs. of  7 variables: 
 $ drug: Factor w/ 19 levels A,B,C,D,..: 1 2 3 4 5 6 7 8 9 10 ... 
 $ param1  : int  111 347 335 477 863 737 390 209 376 262 ... 
 $ param2 : int  15 13 9 37 24 28 63 93 72 16 ... 
 $ param3 : int  125 280 119 75 180 150 167 200 201 205 ... 
 $ param4 : int  40 55 89 2 10 15 12 48 45 49 ... 
 $ param5 : num  0.5 3 -40 0 5 6 0 45 -60 25 ... 
 $ Class   : int  1 2 1 1 2 2 3 3 3 3 ... 

 summary(data) 
  drugparam1 param2 param3 param4   
   param5 Class   
 A  : 1   Min.   :111.0   Min.   : 2.0   Min.   : 75.0   Min.   :-20.00   
Min.   :-60.000   Min.   :1.000   
 B  : 1   1st Qu.:253.5   1st Qu.:15.0   1st Qu.:132.5   1st Qu.: 12.00   
1st Qu.:  0.000   1st Qu.:1.000   
 C  : 1   Median :335.0   Median :28.0   Median :164.0   Median : 40.00   
Median :  6.000   Median :2.000   
 D  : 1   Mean   :383.0   Mean   :33.0   Mean   :166.0   Mean   : 35.26   
Mean   :  4.447   Mean   :1.895   
 E  : 1   3rd Qu.:433.5   3rd Qu.:42.5   3rd Qu.:200.5   3rd Qu.: 54.00   
3rd Qu.: 20.500   3rd Qu.:2.000   
 F  : 1   Max.   :863.0   Max.   :93.0   Max.   :280.0   Max.   : 89.00   
Max.   : 45.000   Max.   :3.000   
 (Other):13 

The structure of the example data which worked is shown below: 

 str(infert) 
'data.frame':   248 obs. of  8 variables: 
 $ education : Factor w/ 3 levels 0-5yrs,6-11yrs,..: 1 1 1 1 2 2 2 2 2 
2 ... 
 $ age   : num  26 42 39 34 35 36 23 32 21 28 ... 
 $ parity: num  6 1 6 4 3 4 1 2 1 2 ... 
 $ induced   : num  1 1 2 2 1 2 0 0 0 0 ... 
 $ case  : num  1 1 1 1 1 1 1 1 1 1 ... 
 $ spontaneous   : num  2 0 0 0 1 1 0 0 1 0 ... 
 $ stratum   : int  1 2 3 4 5 6 7 8 9 10 ... 
 $ pooled.stratum: num  3 1 4 2 32 36 6 22 5 19 ... 

 summary(infert) 
   educationageparity inducedcase   
  spontaneousstratum  pooled.stratum 
 0-5yrs : 12   Min.   :21.00   Min.   :1.000   Min.   :0.   Min.   :0.  
 Min.   :0.   Min.   : 1.00   Min.   : 1.00   
 6-11yrs:120   1st Qu.:28.00   1st Qu.:1.000   1st Qu.:0.   1st Qu.:0.  
 1st Qu.:0.   1st Qu.:21.00   1st Qu.:19.00   
 12+ yrs:116   Median :31.00   Median :2.000   Median :0.   Median :0.  
 Median :0.   Median :42.00   Median :36.00   
   Mean   :31.50   Mean   :2.093   Mean   :0.5726   Mean   :0.3347  
 Mean   :0.5766   Mean   :41.87   Mean   :33.58   
   3rd Qu.:35.25   3rd Qu.:3.000   3rd Qu.:1.   3rd Qu.:1.  
 3rd Qu.:1.   3rd Qu.:62.25   3rd Qu.:48.25   
   Max.   :44.00   Max.   :6.000   Max.   :2.   Max.   :1.  
 Max.   :2.   Max.   :83.00   Max.   :63.00   

So still not sure how to solve the problem  






_
From: PIKAL Petr [petr.pi...@precheza.cz]
Sent: 13 December 2012 07:16
To: dada; r-help@r-project.org
Subject: RE: [R] neural net

Hi

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of dada
 Sent: Thursday, December 13, 2012 12:41 AM
 To: r-help@r-project.org
 Subject: [R] neural net

 Hi
 I would like to do neural netowrk analysis on my data. It look like
 this:

 drug  param1  param2  param3  param4  param5  class
 A 111 15  125 40  0.5 1
 B 347 13  280 55  3   2
 C 335 9   119 89  -40 1
 D 477 37  75  2   0   1
 E 863 24  180 10  5   2
 F 737 28  150 15  6   2
 G 390 63  167 12  0   3
 H 209 93  200 48  45  3
 I 376 72  201 45  -60 3
 J 262 16  205 49  25  3
 K 273 39  267 53  11  1
 L 192 33  164 19  15  2
 M 282 2   213 86  30  1
 N 111 11  198 68  -21 1
 O 387 20  143 12  16  2
 P 674 15  78  -20 -17 2
 R 734 54  140 24  7   2
 S 272 46  159 57  28  2
 T 245 37  90  6   31  2

 I have entered the code below:
  nn - neuralnet(
 + class~param1+param2+param3+param4+param5+param5,
 + data=mydata, hidden=2, err.fct=ce,
 + linear.output=FALSE)

 However the error appeared:
 Error in model.frame.default(formula.reverse, data) :
   object is not a matrix

 I changed the data frame to matrix:
 mydata.mat=as.matrix(mydata)

This is not very wise. It changes all numeric values to character. From 
documentation and your data frame there is nothing obviously wrong. However you

Re: [R] duplicated.data.frame() and POSIXct with DST shift



On Dec 13, 2012, at 1:43 PM, Tobias Gauster wrote:


Hi,

I encountered the behavior, that the duplicated method for  
data.frames gives false positives if there are columns of class  
POSIXct with a clock shift from DST to standard time.


time - as.POSIXct(2012-10-28 02:00, tz=Europe/Vienna) + c(0,  
60*60)

time
[1] 2012-10-28 02:00:00 CEST 2012-10-28 02:00:00 CET

df - data.frame(time, text=foo)
duplicated(df)
[1] FALSE  TRUE


In this instance


This is because the timezone is lost after calling paste():
do.call(paste, c(df, sep = \r))


I suspect the problem arise when 'paste' coerces to character:

 as.character(time)
[1] 2012-10-28 02:00:00 2012-10-28 02:00:00

I think that as.character might get missed since the 'paste' operation  
is done internally.


 as.character(time, usetz=TRUE)
[1] 2012-10-28 02:00:00 CEST 2012-10-28 02:00:00 CET


--
David.


[1] 2012-10-28 02:00:00\rfoo 2012-10-28 02:00:00\rfoo





I can't really figure out if this behavior is desired or not. If so,  
a short warning in ?duplicated could be helpful. It is mentioned how  
duplicated.data.frame() works, but I didn't find a hint to properly  
handle POSIXct-objects.


There is no duplicated.POSIXct method


My particular problem was to cast a data.frame like this one with  
cast() (which calls reshape1(), which calls duplicated()):


df2 - data.frame(time, time1=as.numeric(time),
 lab=rep(1:3, each=2), value=101:106,
 text=rep(c(foo, bar), each=3))

library(reshape2)

Using the column of class POSIXct as a variable in the formula gives:
cast(lab*time~text, data=df2, value=value)
Aggregation requires fun.aggregate: length used as default
 labtime bar foo
1   1 2012-10-28 02:00:00   0   2
2   2 2012-10-28 02:00:00   1   1
3   3 2012-10-28 02:00:00   2   0

Converting to numeric, casting and converting back works as  
expected, although the timezone is not visible, because  
print.data.frame() calls format.POSIXct() with, usetz = FALSE:

y - cast(lab*time1~text, data=df2, value=value)
y$time1 - as.POSIXct(1970-01-01 01:00) + as.numeric(y$time1)

Can anyone suggest a more elegant solution?

Best,
Tobias

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to select a subset data to do a barplot in ggplot2

2012-12-13 Thread Dennis Murphy

Hi:

The simplest way to do it is to modify the input data frame by taking
out the records not having status live or dead and then redefining the
factor in the new data frame to get rid of the removed levels. Calling
your input data frame DF rather than data,

DF - structure(list(FID = c(1L, 1L, 2L, 2L, 2L, 2L, 6L, 6L, 10L, 10L,
10L, 11L, 11L, 11L, 12L, 17L, 17L, 17L), IID = c(4621L, 4628L,
4631L, 4632L, 4633L, 4634L, 4675L, 4679L, 4716L, 4719L, 4721L,
4726L, 4728L, 4730L, 4732L, 4783L, 4783L, 4784L), STATUS = structure(c(2L,
1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 3L, 3L, 2L, 2L, 2L,
2L), .Label = c(dead, live, nosperm), class = factor)), .Names
= c(FID,
IID, STATUS), class = data.frame, row.names = c(NA, -18L
))

# The right hand side above came from dput(DF), where DF was created by
# DF - read.table(textConnection(your posted data), header = TRUE)
# Consider using dput() to represent your data in the future.

# Retain the records with status live or dead only
DF2 - DF[DF$STATUS %in% c(live, dead), ]

# This does not get rid of the original levels...
levels(DF2$STATUS)
# ...so redefine the factor
DF2$STATUS - factor(DF2$STATUS)

 str(DF2)
'data.frame':   16 obs. of  3 variables:
 $ FID   : int  1 1 2 2 2 2 6 6 10 10 ...
 $ IID   : int  4621 4628 4631 4632 4633 4634 4675 4679 4716 4719 ...
 $ STATUS: Factor w/ 2 levels dead,live: 2 1 2 2 2 2 2 1 1 2 ...

# now plot:

# (1) FID numeric
ggplot(DF2, aes(x = FID, fill = STATUS)) + geom_bar()

# (2) FID factor
ggplot(DF2, aes(x = factor(FID), fill = STATUS)) + geom_bar()

The second one makes more sense to me, but you may have reasons to
prefer the first.

Dennis

On Thu, Dec 13, 2012 at 4:38 AM, Yao He yao.h.1...@gmail.com wrote:
 FID IID STATUS
 14621live
 14628dead
 24631live
 24632live
 24633live
 24634live
 64675live
 64679dead
 104716dead
 104719live
 104721dead
 114726live
 114728nosperm
 114730nosperm
 124732live
 174783live
 174783live
 174784live

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How can I read the following complicated table

2012-12-13 Thread jpm miao

Hello,

   I have a table (in a txt file) which look like this:

Monday 12 78 89
Tuesday 34 44 67
Wednesday 78 98 2
Thursday 34 55 4

   Then the table repeats Monday , Tuesday, ... followed by several numbers

   My goal is to read values after the table. My problem is a little more
complicated, but I just present a simpler case for ease of illustration. Is
there any way to ask R to read several number after you see the word
'Monday' and store somewhere, and read several number after you see the
word 'Tuesday' and store somewhere?

  Thanks,

miao

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] duplicated.data.frame() and POSIXct with DST shift


On Dec 13, 2012, at 5:01 PM, David Winsemius wrote:

 
 On Dec 13, 2012, at 1:43 PM, Tobias Gauster wrote:
 
 Hi,
 
 I encountered the behavior, that the duplicated method for data.frames gives 
 false positives if there are columns of class POSIXct with a clock shift 
 from DST to standard time.
 
 time - as.POSIXct(2012-10-28 02:00, tz=Europe/Vienna) + c(0, 60*60)
 time
 [1] 2012-10-28 02:00:00 CEST 2012-10-28 02:00:00 CET
 
 df - data.frame(time, text=foo)
 duplicated(df)
 [1] FALSE  TRUE
 
 In this instance
 
 This is because the timezone is lost after calling paste():
 do.call(paste, c(df, sep = \r))
 
 I suspect the problem arise when 'paste' coerces to character:
 
  as.character(time)
 [1] 2012-10-28 02:00:00 2012-10-28 02:00:00
 
 I think that as.character might get missed since the 'paste' operation is 
 done internally.
 
  as.character(time, usetz=TRUE)
 [1] 2012-10-28 02:00:00 CEST 2012-10-28 02:00:00 CET

This would work as intended if you pre-processed the argument to duplicated 
with:

 data.frame(lapply(df, as.character, usetz=TRUE) )
  time text
1 2012-10-28 02:00:00 CEST  foo
2  2012-10-28 02:00:00 CET  foo

  duplicated( data.frame(lapply(df, as.character, usetz=TRUE) ) ) 
[1] FALSE FALSE

 

 
 -- 
 David.
 
 
 [1] 2012-10-28 02:00:00\rfoo 2012-10-28 02:00:00\rfoo
 
 
 
 I can't really figure out if this behavior is desired or not. If so, a short 
 warning in ?duplicated could be helpful. It is mentioned how 
 duplicated.data.frame() works, but I didn't find a hint to properly handle 
 POSIXct-objects.
 
 There is no duplicated.POSIXct method
 
 My particular problem was to cast a data.frame like this one with cast() 
 (which calls reshape1(), which calls duplicated()):
 
 df2 - data.frame(time, time1=as.numeric(time),
 lab=rep(1:3, each=2), value=101:106,
 text=rep(c(foo, bar), each=3))
 
 library(reshape2)
 
 Using the column of class POSIXct as a variable in the formula gives:
 cast(lab*time~text, data=df2, value=value)
 Aggregation requires fun.aggregate: length used as default
 labtime bar foo
 1   1 2012-10-28 02:00:00   0   2
 2   2 2012-10-28 02:00:00   1   1
 3   3 2012-10-28 02:00:00   2   0
 
 Converting to numeric, casting and converting back works as expected, 
 although the timezone is not visible, because print.data.frame() calls 
 format.POSIXct() with, usetz = FALSE:
 y - cast(lab*time1~text, data=df2, value=value)
 y$time1 - as.POSIXct(1970-01-01 01:00) + as.numeric(y$time1)
 
 Can anyone suggest a more elegant solution?
 
 Best,
 Tobias
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius, MD
 Alameda, CA, USA
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I read the following complicated table

2012-12-13 Thread Rainer Schuermann

What have you tried so far that did not work, and what do you want the result 
of your reading the text file look like? What is store somewhere?

Why does 

 myDF - read.table( myData.txt )

which gives you

 myDF
 V1 V2 V3 V4
1Monday 12 78 89
2   Tuesday 34 44 67
3 Wednesday 78 98  2
4  Thursday 34 55  4

as a starting point, not suffice?

Rgds,
Rainer


On Friday 14 December 2012 10:50:56 jpm miao wrote:
 Hello,
 
I have a table (in a txt file) which look like this:
 
 Monday 12 78 89
 Tuesday 34 44 67
 Wednesday 78 98 2
 Thursday 34 55 4
 
Then the table repeats Monday , Tuesday, ... followed by several numbers
 
My goal is to read values after the table. My problem is a little more
 complicated, but I just present a simpler case for ease of illustration. Is
 there any way to ask R to read several number after you see the word
 'Monday' and store somewhere, and read several number after you see the
 word 'Tuesday' and store somewhere?
 
   Thanks,
 
 miao
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replace parenthetical phrases in a string

Hi,

I guess there are some problems with spaces in this solution.

 y
[1] My toast=bog  and eggs omit=32 dogs have ears    
[3] cats have tails   
 gsub( *\\([^)]*\\) *, , x)
#[1] My toast=bogand eggsomit=32 dogs have ears 
#[3] cats have tails   


You could try this:
 gsub((\\(.*\\))+?, ,x)
#[1] My toast=bog  and eggs omit=32 dogs have ears    
#[3] cats have tails   
A.K.

- Original Message -
From: Adams, Jean jvad...@usgs.gov
To: r-help@r-project.org
Cc: 
Sent: Thursday, December 13, 2012 6:03 PM
Subject: Re: [R] replace parenthetical phrases in a string

My apologies.  I sent too soon!

I did a bit more digging, and found a solution on the R-help archives.

y - gsub( *\\([^)]*\\) *, , x)

Jean



On Thu, Dec 13, 2012 at 4:53 PM, Adams, Jean jvad...@usgs.gov wrote:

 R-helpers,

 I have a vector of character strings in which I would like to replace each
 parenthetical phrase with a single space,  .  For example if I start with
 x, I would like to end up with y.

 x - c(My toast=bog(keep=3 no=4) and eggs(er34)omit=32,
 dogs have ears,
 cats have tails (and ears, too!))

 y - c(My toast=bog  and eggs omit=32,
 dogs have ears,
 cats have tails  )

 I'm guessing that this can be done with gsub(), but I have never mastered
 the mysteries of regular expressions.

 I would greatly appreciate any pointers.

 Thanks.

 Jean

 P.S.  I'm using R version 2.15.2 on Windows 7.


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] neural net

Hi,
I tried your dataset.  I couldn't reproduce the Error: message.  Instead,

mydata-read.table(text=
drug param1 param2 param3 param4 param5 class
A 111 15 125 40 0.5 1
B 347 13 280 55 3 2
C 335 9 119 89 -40 1
D 477 37 75 2 0 1
E 863 24 180 10 5 2
F 737 28 150 15 6 2
G 390 63 167 12 0 3
H 209 93 200 48 45 3
I 376 72 201 45 -60 3
J 262 16 205 49 25 3
K 273 39 267 53 11 1
L 192 33 164 19 15 2
M 282 2 213 86 30 1
N 111 11 198 68 -21 1
O 387 20 143 12 16 2
P 674 15 78 -20 -17 2
R 734 54 140 24 7 2
S 272 46 159 57 28 2
T 245 37 90 6 31 2
,sep=,header=TRUE,stringsAsFactors=TRUE)

library(neuralnet)
nn - neuralnet(
  class~param1+param2+param3+param4+param5+param5,  #param5 is duplicated(typo?)
  data=mydata, hidden=2, err.fct=ce,
  linear.output=FALSE)
#Warning message:
#'err.fct' was automatically set to sum of squared error (sse), because the 
response is not binary 


nn
#Call: neuralnet(formula = class ~ param1 + param2 + param3 + param4 + 
param5 + param5, data = mydata, hidden = 2, err.fct = ce, linear.output = 
FALSE)
#
#1 repetition was calculated.
#
  #  Error Reached Threshold Steps
#1 12.50980687    0.009804371657    30
plot(nn)

sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8    LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C    
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] grid  stats graphics  grDevices utils datasets  methods  
[8] base 

other attached packages:
[1] neuralnet_1.31 MASS_7.3-16    stringr_0.6    reshape_0.8.4  plyr_1.7.1    

loaded via a namespace (and not attached):
[1] tools_2.15.0
A.K.



- Original Message -
From: Katarzyna Nurzynska pa...@nottingham.ac.uk
To: PIKAL Petr petr.pi...@precheza.cz; r-help@r-project.org 
r-help@r-project.org
Cc: 
Sent: Thursday, December 13, 2012 10:56 AM
Subject: Re: [R] neural net

Hi
Thanks for your reply. I have compared my data with some other which works and 
I cannot see the difference... 

The structure of my data is shown below: 

 str(data) 
'data.frame':   19 obs. of  7 variables: 
$ drug    : Factor w/ 19 levels A,B,C,D,..: 1 2 3 4 5 6 7 8 9 10 ... 
$ param1  : int  111 347 335 477 863 737 390 209 376 262 ... 
$ param2 : int  15 13 9 37 24 28 63 93 72 16 ... 
$ param3     : int  125 280 119 75 180 150 167 200 201 205 ... 
$ param4     : int  40 55 89 2 10 15 12 48 45 49 ... 
$ param5     : num  0.5 3 -40 0 5 6 0 45 -60 25 ... 
$ Class   : int  1 2 1 1 2 2 3 3 3 3 ... 

 summary(data) 
      drug        param1         param2         param3             param4       
       param5             Class      
A      : 1   Min.   :111.0   Min.   : 2.0   Min.   : 75.0   Min.   :-20.00   
Min.   :-60.000   Min.   :1.000  
B      : 1   1st Qu.:253.5   1st Qu.:15.0   1st Qu.:132.5   1st Qu.: 12.00   
1st Qu.:  0.000   1st Qu.:1.000  
C      : 1   Median :335.0   Median :28.0   Median :164.0   Median : 40.00   
Median :  6.000   Median :2.000  
D      : 1   Mean   :383.0   Mean   :33.0   Mean   :166.0   Mean   : 35.26   
Mean   :  4.447   Mean   :1.895  
E      : 1   3rd Qu.:433.5   3rd Qu.:42.5   3rd Qu.:200.5   3rd Qu.: 54.00   
3rd Qu.: 20.500   3rd Qu.:2.000  
F      : 1   Max.   :863.0   Max.   :93.0   Max.   :280.0   Max.   : 89.00   
Max.   : 45.000   Max.   :3.000  
(Other):13                                                            

The structure of the example data which worked is shown below: 

 str(infert) 
'data.frame':   248 obs. of  8 variables: 
$ education     : Factor w/ 3 levels 0-5yrs,6-11yrs,..: 1 1 1 1 2 2 2 2 2 2 
... 
$ age           : num  26 42 39 34 35 36 23 32 21 28 ... 
$ parity        : num  6 1 6 4 3 4 1 2 1 2 ... 
$ induced       : num  1 1 2 2 1 2 0 0 0 0 ... 
$ case          : num  1 1 1 1 1 1 1 1 1 1 ... 
$ spontaneous   : num  2 0 0 0 1 1 0 0 1 0 ... 
$ stratum       : int  1 2 3 4 5 6 7 8 9 10 ... 
$ pooled.stratum: num  3 1 4 2 32 36 6 22 5 19 ... 

 summary(infert) 
   education        age            parity         induced            case       
  spontaneous        stratum      pooled.stratum 
0-5yrs : 12   Min.   :21.00   Min.   :1.000   Min.   :0.   Min.   :0.   
Min.   :0.   Min.   : 1.00   Min.   : 1.00  
6-11yrs:120   1st Qu.:28.00   1st Qu.:1.000   1st Qu.:0.   1st Qu.:0.   
1st Qu.:0.   1st Qu.:21.00   1st Qu.:19.00  
12+ yrs:116   Median :31.00   Median :2.000   Median :0.   Median :0.   
Median :0.   Median :42.00   Median :36.00  
               Mean   :31.50   Mean   :2.093   Mean   :0.5726   Mean   :0.3347  
 Mean   :0.5766   Mean   :41.87   Mean   :33.58  
               3rd Qu.:35.25   3rd Qu.:3.000   3rd Qu.:1.   3rd Qu.:1.  
 3rd Qu.:1.   3rd Qu.:62.25   3rd Qu.:48.25

Re: [R] How can I read the following complicated table