Re: [R] An important question about running MCMC
On Thu, Dec 13, 2012 at 3:22 AM, Chenyi Pan cp...@virginia.edu wrote: Dear officer I have a question concerning running R when I am doing my research. Can you help me to figure that out? I am now running a MCMC iteration in the R program. But it is always stucked in some loop. This cause big problems for my research. So I want to know whether we can skip the current dataset and move to next simulated data when the iteration is stucked? Alternatively, can the MCMC chain skip the current iteration when it is stucked and automatically to start another chain with different starting values. I am looking forward to your reply. What do you mean by 'stucked in some loop'? Is it so stuck it isn't generating proposals, because of a bug in your code? Or do you mean it is generating proposals but never accepting them? In which case maybe you need to look at your proposal distribution and make your proposal scheme adaptive. Or is it looping around a set of accepted values? Or what? Remember, for true convergence of an MCMC you have to loop an infinite number of times - have you tried that? Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] remove NA in df results in NA, NA.1 ... rows
Good morning! I have the following data frame (df): X.outer Y.outer X.PAD1 Y.PAD1 X.PAD2 Y.PAD2 X.PAD3 Y.PAD3 X.PAD4 Y.PAD4 73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674 574747.1 179598 74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699 574724.9 179673 75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787 574729.2 179674 76 574663.6 179736.0 574699.1 179734.0 574723.5 179678 574703.4 179760 574831.8 179699 77 574649.9 179734.0 574704.7 179724.0 574724.9 179673 574702.4 179755 574852.3 179626 78 574647.3 179742.0 574706.9 179719.0 574747.1 179598 574702.0 179754 574747.1 179598 79 574633.6 179739.0 574711.4 179710.0 574641.8 179570 574698.0 179747 NA NA 80 574634.9 179732.0 574716.6 179698.0 574639.6 179573 574700.2 179738 NA NA 81 574616.5 179728.6 574716.7 179695.0 574618.3 179650 574704.4 179729 NA NA 82 574615.4 179731.0 574718.2 179690.0 NA NA 574708.1 179724 NA NA 83 574614.4 179733.6 574719.1 179688.0 NA NA 574709.3 179720 NA NA ... 44 574702.0 179754.0 NA NA NA NA NA NA NA NA 45 574695.1 179751.0 NA NA NA NA NA NA NA NA 46 574694.4 179752.0 NA NA NA NA NA NA NA NA Which I subset to df2 - df[,c(X.PAD2,Y.PAD2)] df2 X.PAD2 Y.PAD2 73 574618.3 179650 74 574719.0 179688 75 574719.0 179688 76 574723.5 179678 77 574724.9 179673 78 574747.1 179598 79 574641.8 179570 80 574639.6 179573 81 574618.3 179650 82 NA NA 83 NA NA ... 44 NA NA 45 NA NA 46 NA NA followed by removing the NA's using df2 - df2[!is.na(df2),] If I now call df2, I get: X.PAD2 Y.PAD2 73 574618.3 179650 74 574719.0 179688 75 574719.0 179688 76 574723.5 179678 77 574724.9 179673 78 574747.1 179598 79 574641.8 179570 80 574639.6 179573 81 574618.3 179650 NA NA NA NA.1 NA NA NA.2 NA NA NA.3 NA NA NA.4 NA NA NA.5 NA NA NA.6 NA NA NA.7 NA NA NA.8 NA NA It seems there are still NA's in my data frame. How can I get rid of them? What is the meaning of the rows numbered NA, NA.1 and so on? Thanks for any hints. Best regards Raphael Felber [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to aggregate the data ?
HI, now I have dataset: Product Price_LC.1 Price_LC.2 Price_elasticity.1 Price_elasticity.2 Mean_Price Mean_Price_elasticity Trade_Price_Band Country 1 100 357580.1 527483.6 -4.1498383 -2.8459564 473934.0-3.6935476 0-542811 VN 51208 436931.9 536143.9 -3.9432305 -3.4570170 469330.2-3.6595372 0-542811 VN 61280 419666.6 520936.3 -1.7357983 -0.7689443 461367.0-1.2848528 0-542811 VN 2 101 629371.0 735167.2 -5.2289933 -3.0364372 676059.9-3.8059064542812-904779 VN 71616 576816.1 663369.6 -4.5528840 -3.9523261 614864.5-4.3181914542812-904779 VN 81661 583587.9 689853.0 -5.0948101 -4.3427497 650680.0-4.8109781542812-904779 VN I want to get the following dataset: Product VN Price_band 100 -3.69354760-542811 [357580.1, 527483.6 ] [-2.8459564, 473934.0] 1208-3.6595372 0-542811 [436931.9, 536143.9] [-3.9432305, -3.4570170] how do I get this in r?? Thanks. Kind regards, Lingyi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove NA in df results in NA, NA.1 ... rows
Hi Raphael, see below. I have the following data frame (df): ... df2 X.PAD2 Y.PAD2 73 574618.3 179650 74 574719.0 179688 75 574719.0 179688 76 574723.5 179678 77 574724.9 179673 78 574747.1 179598 79 574641.8 179570 80 574639.6 179573 81 574618.3 179650 82 NA NA 83 NA NA ... 44 NA NA 45 NA NA 46 NA NA followed by removing the NA's using df2 - df2[!is.na(df2),] ... is.na( df2) produces a logical matrix (!), and you are then indexing the rows of your data frame with a matrix which is converted into a vector of its elements producing far too many logical indices for your task (so to say). I assume you should be using na.omit( df2) instead. Hth -- Gerrit - Dr. Gerrit Eichner Mathematical Institute, Room 212 gerrit.eich...@math.uni-giessen.de Justus-Liebig-University Giessen Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany Fax: +49-(0)641-99-32109http://www.uni-giessen.de/cms/eichner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove NA in df results in NA, NA.1 ... rows
df2 - df2[!is.na(df2),] isn't doing what you want it to do because df2 is a data.frame and not a vector to solve your problem, review http://stackoverflow.com/questions/4862178/r-remove-rows-with-nas-in-data-frame On Thu, Dec 13, 2012 at 3:20 AM, raphael.fel...@art.admin.ch wrote: Good morning! I have the following data frame (df): X.outer Y.outer X.PAD1 Y.PAD1 X.PAD2 Y.PAD2 X.PAD3 Y.PAD3 X.PAD4 Y.PAD4 73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674 574747.1 179598 74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699 574724.9 179673 75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787 574729.2 179674 76 574663.6 179736.0 574699.1 179734.0 574723.5 179678 574703.4 179760 574831.8 179699 77 574649.9 179734.0 574704.7 179724.0 574724.9 179673 574702.4 179755 574852.3 179626 78 574647.3 179742.0 574706.9 179719.0 574747.1 179598 574702.0 179754 574747.1 179598 79 574633.6 179739.0 574711.4 179710.0 574641.8 179570 574698.0 179747 NA NA 80 574634.9 179732.0 574716.6 179698.0 574639.6 179573 574700.2 179738 NA NA 81 574616.5 179728.6 574716.7 179695.0 574618.3 179650 574704.4 179729 NA NA 82 574615.4 179731.0 574718.2 179690.0 NA NA 574708.1 179724 NA NA 83 574614.4 179733.6 574719.1 179688.0 NA NA 574709.3 179720 NA NA ... 44 574702.0 179754.0 NA NA NA NA NA NA NA NA 45 574695.1 179751.0 NA NA NA NA NA NA NA NA 46 574694.4 179752.0 NA NA NA NA NA NA NA NA Which I subset to df2 - df[,c(X.PAD2,Y.PAD2)] df2 X.PAD2 Y.PAD2 73 574618.3 179650 74 574719.0 179688 75 574719.0 179688 76 574723.5 179678 77 574724.9 179673 78 574747.1 179598 79 574641.8 179570 80 574639.6 179573 81 574618.3 179650 82 NA NA 83 NA NA ... 44 NA NA 45 NA NA 46 NA NA followed by removing the NA's using df2 - df2[!is.na(df2),] If I now call df2, I get: X.PAD2 Y.PAD2 73 574618.3 179650 74 574719.0 179688 75 574719.0 179688 76 574723.5 179678 77 574724.9 179673 78 574747.1 179598 79 574641.8 179570 80 574639.6 179573 81 574618.3 179650 NA NA NA NA.1 NA NA NA.2 NA NA NA.3 NA NA NA.4 NA NA NA.5 NA NA NA.6 NA NA NA.7 NA NA NA.8 NA NA It seems there are still NA's in my data frame. How can I get rid of them? What is the meaning of the rows numbered NA, NA.1 and so on? Thanks for any hints. Best regards Raphael Felber [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to aggregate the dataset?
HI, I want to transform the following dataset Product Price_LC.1 Price_LC.2 Price_elasticity.1 Price_elasticity.2 Mean_Price Mean_Price_elasticity Trade_Price_Band Country 100 35 52 -4.14 -2.84 47 -3.69 0-542811 VN 1208 43 53 -3.94 -3.45 47 -3.65 0-542811 VN into: Product VN Price_Band 100 -3.69 0-542811 [35,52] [43,53] 1208 -3.65 0-542811 [43,53] [-3.94,-3.45] How do I get it in R? I have large dataset like this, I need to create mechanism to tranform those. Thanks. Kind regards, Lingyi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove NA in df results in NA, NA.1 ... rows
is.na(df2) is not doing what you think it is doing. Perhaps you should read ?na.omit. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. raphael.fel...@art.admin.ch wrote: Good morning! I have the following data frame (df): X.outer Y.outer X.PAD1 Y.PAD1 X.PAD2 Y.PAD2 X.PAD3 Y.PAD3 X.PAD4 Y.PAD4 73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674 574747.1 179598 74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699 574724.9 179673 75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787 574729.2 179674 76 574663.6 179736.0 574699.1 179734.0 574723.5 179678 574703.4 179760 574831.8 179699 77 574649.9 179734.0 574704.7 179724.0 574724.9 179673 574702.4 179755 574852.3 179626 78 574647.3 179742.0 574706.9 179719.0 574747.1 179598 574702.0 179754 574747.1 179598 79 574633.6 179739.0 574711.4 179710.0 574641.8 179570 574698.0 179747 NA NA 80 574634.9 179732.0 574716.6 179698.0 574639.6 179573 574700.2 179738 NA NA 81 574616.5 179728.6 574716.7 179695.0 574618.3 179650 574704.4 179729 NA NA 82 574615.4 179731.0 574718.2 179690.0 NA NA 574708.1 179724 NA NA 83 574614.4 179733.6 574719.1 179688.0 NA NA 574709.3 179720 NA NA ... 44 574702.0 179754.0 NA NA NA NA NA NA NA NA 45 574695.1 179751.0 NA NA NA NA NA NA NA NA 46 574694.4 179752.0 NA NA NA NA NA NA NA NA Which I subset to df2 - df[,c(X.PAD2,Y.PAD2)] df2 X.PAD2 Y.PAD2 73 574618.3 179650 74 574719.0 179688 75 574719.0 179688 76 574723.5 179678 77 574724.9 179673 78 574747.1 179598 79 574641.8 179570 80 574639.6 179573 81 574618.3 179650 82 NA NA 83 NA NA ... 44 NA NA 45 NA NA 46 NA NA followed by removing the NA's using df2 - df2[!is.na(df2),] If I now call df2, I get: X.PAD2 Y.PAD2 73 574618.3 179650 74 574719.0 179688 75 574719.0 179688 76 574723.5 179678 77 574724.9 179673 78 574747.1 179598 79 574641.8 179570 80 574639.6 179573 81 574618.3 179650 NA NA NA NA.1 NA NA NA.2 NA NA NA.3 NA NA NA.4 NA NA NA.5 NA NA NA.6 NA NA NA.7 NA NA NA.8 NA NA It seems there are still NA's in my data frame. How can I get rid of them? What is the meaning of the rows numbered NA, NA.1 and so on? Thanks for any hints. Best regards Raphael Felber [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create a color palette with custom ranges between colors
Thank you Nicole! I did it with the color.palette function in the link you gave me. I added then in my levelplot function a sequence with at: at=seq(-40,40,1) And it works quite good. Thanks again Nicole. Merci à toi aussi pascal, et vive le CRC ainsi que le grand C. C. ! ;) -- View this message in context: http://r.789695.n4.nabble.com/create-a-color-palette-with-custom-ranges-between-colors-tp4652875p4652969.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] possible bug in function 'mclapply' of package parallel
Mailing list ate the attachment. Can you send it plain text (if short) or post it somewhere online? Michael On Dec 13, 2012, at 1:54 AM, Asis Hallab asis.hal...@gmail.com wrote: Dear parallel users and developers, I might have encountered a bug in the function 'mclapply' of package 'parallel'. I construct a matrix using the same input data and code with a single difference: Once I use mclapply and the other time lapply. Shockingly the result is NOT the same. To evaluate please unpack the attached archive and execute Rscript mclapply_test.R I put the two simple functions I wrote inside the R script and the serialized input matrix. My function is once executed using mclapply and the other time lapply internally. - There's an argument lapply.funk, one can set to mclapply. The results are checked for identity with a striking FALSE. Any hints on my misuse and or misunderstanding of mclapply or verification of a true bug will be much appreciated. Kind regards! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to aggregate the dataset
HI, Sorry for messing up.. I want to transform the following dataset: product min_price max_price mean_price country price_band 11 34 50 40 VN 0-300 22 10 30 15 VN 0-300 Into: product VN price_band 11400-300 [34,50] 22150-300 [10,30] How can I do this in r? I have large dataset like this. I want to transform all into that one. Thanks a lot. Kind regards, Tammy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to aggregate the dataset
Really sorry for messing up. I want to transform: product min_price max_price mean_price country price_band 11 34 50 40 VN 0-300 2210 30 15 VN 0-300 into Into: product VN price_band 11 40 0-300 [34,50] 22 15 0-300 [10,30] How Can I do this in r? Kind regards, Tammy From: metal_lical...@live.com To: metal_lical...@live.com Subject: RE: [R] how to aggregate the dataset Date: Thu, 13 Dec 2012 14:22:54 +0300 HI, I want it looks like this: Into: product VN price_band 11 40 0-300 [34,50] 22 15 0-300 [10,30] From: metal_lical...@live.com To: r-help@r-project.org Date: Thu, 13 Dec 2012 13:42:35 +0300 Subject: [R] how to aggregate the dataset HI, Sorry for messing up.. I want to transform the following dataset: product min_price max_price mean_price country price_band 11 34 50 40 VN 0-300 22 10 30 15 VN 0-300 Into: product VN price_band 11 40 0-300 [34,50] 22 15 0-300 [10,30] How can I do this in r? I have large dataset like this. I want to transform all into that one. Thanks a lot. Kind regards, Tammy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do you use agrep inside a loop
Thank you all it worked after I checked for length of agrep's result :) On Tue, Dec 11, 2012 at 6:11 PM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Inline. Em 11-12-2012 12:04, surekha nagabhushan escreveu: Rui, I have initialized it...doesn't seem to help... result_vector - vector() No! This must be just before the loop in 'j' result - vector(list, (length(test1)-1)) for(i in 1:(length(test1)-1)) { for(j in (i+1):length(test1)) { result_vector[j-i] - agrep(test1[i], test1[j], ignore.case = TRUE, value = TRUE, max.distance = 0.1) } result[[i]]- result_vector } whenever agrep does not find a match it returns character(0), length zero, do you suppose it has anything to do with that? Yes, without testing for length zero it throws an error, replacement has length zero. Hope this helps, Rui Barradas Thank you. On Tue, Dec 11, 2012 at 5:13 PM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, See if this is it. You must reinitialize 'result_vector' just before the loop that constructs it. test1 - c(Vashi, Vashi,navi Mumbai, Thane, Vashi,new Mumbai, Thana, Surekha, Thane(w), surekhaN) result - vector(list, (length(test1)-1)) for(i in 1:(length(test1)-1)){ result_vector - vector() for(j in (i+1):length(test1)){ tmp - agrep(test1[i], test1[j], ignore.case = TRUE, value = TRUE, max.distance = 0.1) if(length(tmp) 0) result_vector[j-i] - tmp } result[[i]] - result_vector } result Hope this helps, Rui Barradas Em 11-12-2012 11:23, surekha nagabhushan escreveu: Pascal, result_vector - vector() result - vector(list, (length(test1)-1)) for(i in 1:(length(test1)-1)) { for(j in (i+1):length(test1)) { result_vector[j-i] - agrep(test1[i], test1[j], ignore.case = TRUE, value = TRUE, max.distance = 0.1) } result[[i]]- result_vector } I'm not sure what the problem is with the dimension/length of result which is a list. But I just use the second line: result - vector(list, (length(test1)-1)) What am I missing? Thank you Rui Barradas. On Tue, Dec 11, 2012 at 4:25 PM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, And another error in line 2. It should be for(j in (i+1):length(test1)) Hope this helps, Rui Barradas Em 11-12-2012 07:54, Pascal Oettli escreveu: Hi, There is a mistake in the first line. It should be: for(i in 1:(length(test1)-1)) Regards, Pascal Le 11/12/2012 16:01, surekha nagabhushan a écrit : Hi all. This is my first message at R-help...so I'm hoping I have some beginner's luck and get some good help for my problem! FYI I have just started using R recently so my knowledge of R is pretty preliminary. Okay here is what I need help with - I need to know how to use agrep in a for loop. I need to compare elements of a vector of names with other elements of the same vector. However if I use something like this: for(i in 1:length(test1)-1) { for(j in i+1:length(test1)) { result[[i]][j] - agrep(test1[i], test1[j], ignore.case = TRUE, value = TRUE, max.distance = 0.1) } } I get an error message saying - invalid 'pattern' argument. -* Error in agrep(test1[i], test1[j], ignore.case = TRUE, value = TRUE, max.distance = 0.1) : * * invalid 'pattern' argument* Test 1 being - c(Vashi, Vashi,navi Mumbai, Thane, Vashi,new Mumbai, Thana, Surekha, Thane(w), surekhaN) This is the first time I'm using agrep, I do not understand how it works fully... Kindly help... Thank you. Su. [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help https://**stat.ethz.ch/mailman/listinfo/r-helphttps://stat.ethz.ch/mailman/**listinfo/r-help https://stat.**ethz.ch/**mailman/listinfo/r-**helphttp://ethz.ch/mailman/listinfo/r-**help http**s://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.htmlhttp://www.R-project.org/**posting-guide.html http://www.**R-project.org/posting-guide.**htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help https://**stat.ethz.ch/mailman/listinfo/r-helphttps://stat.ethz.ch/mailman/**listinfo/r-help https://stat.**ethz.ch/**mailman/listinfo/r-**helphttp://ethz.ch/mailman/listinfo/r-**help
Re: [R] how to aggregate the dataset
Hello, maybe something like this? range - with(dat, paste0([, min_price, ,, max_price, ])) dat2 - with(dat, data.frame(product = product, VN = mean_price, range = range, price_band = price_band)) Unless it's a printing problem and you really want the range below VN. Hope this helps, Rui Barradas Em 13-12-2012 11:24, Tammy Ma escreveu: Really sorry for messing up. I want to transform: product min_price max_price mean_price country price_band 11 34 50 40 VN 0-300 2210 3015 VN 0-300 into Into: product VN price_band 11 40 0-300 [34,50] 22 15 0-300 [10,30] How Can I do this in r? Kind regards, Tammy From: metal_lical...@live.com To: metal_lical...@live.com Subject: RE: [R] how to aggregate the dataset Date: Thu, 13 Dec 2012 14:22:54 +0300 HI, I want it looks like this: Into: product VN price_band 11 40 0-300 [34,50] 22 15 0-300 [10,30] From: metal_lical...@live.com To: r-help@r-project.org Date: Thu, 13 Dec 2012 13:42:35 +0300 Subject: [R] how to aggregate the dataset HI, Sorry for messing up.. I want to transform the following dataset: product min_price max_price mean_price country price_band 11 34 50 40 VN 0-300 22 10 30 15 VN 0-300 Into: product VN price_band 11 40 0-300 [34,50] 22 15 0-300 [10,30] How can I do this in r? I have large dataset like this. I want to transform all into that one. Thanks a lot. Kind regards, Tammy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running MCMC in R
Why don't you use one of the existing MCMC packages. There are many to choose from... On Wed, Dec 12, 2012 at 10:49 PM, Chenyi Pan cp...@virginia.edu wrote: Dear all I am now running a MCMC iteration in the R program. But it is always stucked in some loop. This cause big problems for my research. So I want to know whether we can skip the current dataset and move to next simulated data when the iteration is stucked? Alternatively, can the MCMC chain skip the current iteration when it is stucked and automatically to start another chain with different starting values. I am looking forward to your reply. Best, Chenyi -- Chenyi Pan Department of Statisitics Graduate School of Arts and Sciences, University of Virginia Tel: 434-466-9209 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] max_prepared_stmt_count exceeded using RODBC + 64-bit win7
Hi I am running R2.15.2 64-bit on Windows 7, using RODBC 1.3-6, MySQL5.5.20, MySQL Connector 5.5.2 - these are the latest 64-bit versions AFAIK. sqlQuery and sqlSave work fin as expected, but in a long session with a few sqlSave() calls, I get an error, for example: Error in sqlSave(channel = channel, dat = USArrests[, 1, drop = FALSE], : HY000 1461 [MySQL][ODBC 5.2(w) Driver][mysqld-5.5.20]Can't create more than max_prepared_stmt_count statements (current value: 16384) [RODBC] ERROR: Could not SQLPrepare 'INSERT INTO `usarrests` ( `murder` ) VALUES ( ? )' In my setup the MySQL global variable max_prepared_stmt_count has the default setting of 16K. If I reset the variable higher, I can run a while longer, but this is not a permanent solution. Digging around for a solution, I see that the following may cast some light: show global status like 'com_stmt%'; +-+---+ | Variable_name | Value | +-+---+ | Com_stmt_close | 0 | | Com_stmt_execute| 49931 | | Com_stmt_fetch | 0 | | Com_stmt_prepare| 36| | Com_stmt_reprepare | 0 | | Com_stmt_reset | 36| | Com_stmt_send_long_data | 0 | +-+---+ If I understand right, the number of Com_stmt_close should be 'close to or equal to' Com_stmt_execute, but is not. Rolling back to all 32-bit R2.13.2 etc does work, all entries in the table above remaining at zero, The number Com_stmt_execute increases with each row written using sqlSave(), but does not increase if I use sqlQuery() #This causes Com_stmt_execute to increase 50: sqlQuery(channel=channel,query=DROP TABLE IF EXISTS USArrests) sqlSave(channel=channel, dat=USArrests[,1,drop=FALSE],rownames=FALSE) #This causes no change in Com_stmt_execute : sqlQuery(channel=channel,query=INSERT INTO USArrests (murder) values (1)) This behaviour did not occur with R2.13.2 RODBC 1.3-3 32-bit. I could just revert one thing at a time to narrow it down but if anyone can offer a shortcut I'd be delighted. Thanks Giles Heywood [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installing Packages from a Local Repository
Hi everyone, I've followed the instructions from R-Admin Section 6.6 for creating a local repository. I've modified my Rprofile.site file to add the local repository to my repos, but I haven't been able to successfully install my package from the repo. Here's the code that I've run. ## sessionInfo() getOption(repos) setwd(Q:/Integrated Planning/R) list.files(path = ., recursive = TRUE) tools::write_PACKAGES(bin/windows/contrib/2.15, type = win.binary) list.files(path = ., recursive = TRUE) install.packages(RTIO) install.packages(RTIO, repos = Q:/Integrated Planning/R) install.packages(RTIO, repos = Q:/Integrated Planning/R, type = win.binary) unlink(c(bin/windows/contrib/2.15/PACKAGES,bin/windows/contrib/2.15/PACKAGES.gz)) And here it is with output included: ### sessionInfo() R version 2.15.1 (2012-06-22) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C LC_TIME=English_Australia.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.15.1 getOption(repos) CRANCRANextra MyLocal http://cran.ms.unimelb.edu.au/; http://www.stats. ox.ac.uk/pub/RWin file://Q:/Integrated Planning/R setwd(Q:/Integrated Planning/R) list.files(path = ., recursive = TRUE) [1] bin/windows/contrib/2.15/RTIO_0.1-2.zip tools::write_PACKAGES(bin/windows/contrib/2.15, type = win.binary) list.files(path = ., recursive = TRUE) [1] bin/windows/contrib/2.15/PACKAGES bin/windows/contrib/2.15/PACKAGES.gz bin/windows/contrib/2.15/RTIO_0.1-2.zip install.packages(RTIO) Installing package(s) into C:/Program Files/R/R-2.15.1/library (as lib is unspecified) Warning in install.packages : cannot open compressed file '//Q:/Integrated Planning/R/bin/windows/contrib/2.15/PACKAGES', probable reason 'No such file or directory' Error in install.packages : cannot open the connection install.packages(RTIO, repos = Q:/Integrated Planning/R) Installing package(s) into C:/Program Files/R/R-2.15.1/library (as lib is unspecified) Warning in install.packages : unable to access index for repository Q:/Integrated Planning/R/bin/windows/contrib/2.15 Warning in install.packages : package RTIO is not available (for R version 2.15.1) install.packages(RTIO, repos = Q:/Integrated Planning/R, type = win.binary) Installing package(s) into C:/Program Files/R/R-2.15.1/library (as lib is unspecified) Warning in install.packages : unable to access index for repository Q:/Integrated Planning/R/bin/windows/contrib/2.15 Warning in install.packages : package RTIO is not available (for R version 2.15.1) unlink(c(bin/windows/contrib/2.15/PACKAGES,bin/windows/contrib/2.15/PACKAGES.gz)) ### I'd really like to be able to use install.packages(RTIO) without having to specify the repo, as this will make it easy for our other less experienced R users. Any ideas why I get warning: cannot open compressed file and error: cannot open the connection? As far as I can tell, I've followed the R-Admin 6.6 instructions exactly. If it matters, Q: is a mapped network drive. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] neural net
Thanks for your reply. I have compared my data with some other which works and I cannot see the difference... The structure of my data is shown below: str(data) 'data.frame': 19 obs. of 7 variables: $ drug: Factor w/ 19 levels A,B,C,D,..: 1 2 3 4 5 6 7 8 9 10 ... $ param1 : int 111 347 335 477 863 737 390 209 376 262 ... $ param2 : int 15 13 9 37 24 28 63 93 72 16 ... $ param3 : int 125 280 119 75 180 150 167 200 201 205 ... $ param4 : int 40 55 89 2 10 15 12 48 45 49 ... $ param5 : num 0.5 3 -40 0 5 6 0 45 -60 25 ... $ Class : int 1 2 1 1 2 2 3 3 3 3 ... summary(data) drugparam1 param2 param3 param4 param5 Class A : 1 Min. :111.0 Min. : 2.0 Min. : 75.0 Min. :-20.00 Min. :-60.000 Min. :1.000 B : 1 1st Qu.:253.5 1st Qu.:15.0 1st Qu.:132.5 1st Qu.: 12.00 1st Qu.: 0.000 1st Qu.:1.000 C : 1 Median :335.0 Median :28.0 Median :164.0 Median : 40.00 Median : 6.000 Median :2.000 D : 1 Mean :383.0 Mean :33.0 Mean :166.0 Mean : 35.26 Mean : 4.447 Mean :1.895 E : 1 3rd Qu.:433.5 3rd Qu.:42.5 3rd Qu.:200.5 3rd Qu.: 54.00 3rd Qu.: 20.500 3rd Qu.:2.000 F : 1 Max. :863.0 Max. :93.0 Max. :280.0 Max. : 89.00 Max. : 45.000 Max. :3.000 (Other):13 The structure of the example data which worked is shown below: str(infert) 'data.frame': 248 obs. of 8 variables: $ education : Factor w/ 3 levels 0-5yrs,6-11yrs,..: 1 1 1 1 2 2 2 2 2 2 ... $ age : num 26 42 39 34 35 36 23 32 21 28 ... $ parity: num 6 1 6 4 3 4 1 2 1 2 ... $ induced : num 1 1 2 2 1 2 0 0 0 0 ... $ case : num 1 1 1 1 1 1 1 1 1 1 ... $ spontaneous : num 2 0 0 0 1 1 0 0 1 0 ... $ stratum : int 1 2 3 4 5 6 7 8 9 10 ... $ pooled.stratum: num 3 1 4 2 32 36 6 22 5 19 ... summary(infert) educationageparity inducedcase spontaneousstratum pooled.stratum 0-5yrs : 12 Min. :21.00 Min. :1.000 Min. :0. Min. :0. Min. :0. Min. : 1.00 Min. : 1.00 6-11yrs:120 1st Qu.:28.00 1st Qu.:1.000 1st Qu.:0. 1st Qu.:0. 1st Qu.:0. 1st Qu.:21.00 1st Qu.:19.00 12+ yrs:116 Median :31.00 Median :2.000 Median :0. Median :0. Median :0. Median :42.00 Median :36.00 Mean :31.50 Mean :2.093 Mean :0.5726 Mean :0.3347 Mean :0.5766 Mean :41.87 Mean :33.58 3rd Qu.:35.25 3rd Qu.:3.000 3rd Qu.:1. 3rd Qu.:1. 3rd Qu.:1. 3rd Qu.:62.25 3rd Qu.:48.25 Max. :44.00 Max. :6.000 Max. :2. Max. :1. Max. :2. Max. :83.00 Max. :63.00 So still not sure how to solve the problem -- View this message in context: http://r.789695.n4.nabble.com/neural-net-tp4652927p4652984.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to create multiple country's data into multiple sheets of one excel
HI, I have large dataset of many countries. I have written the program to run through each country to generate one output for each country. I want to put the output like this: one sheet has output for one country. How do I achieve it by r. I have tried this: library(xlsx) write.xlsx(nnn, vn.xlsx, sheetName=Sheet1) [1] but when I change sheetName=Sheet2 to add up another country into one sheet. it autimatically deleted which I have down on [1]. index-unique(dataset$country) for (i in 1:length(index)){ data-dataset[dataset$country==index[i],] (...) output-dd #then how do I create each country's output into one sheet of one excel??? } Kind regards, Tammy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove NA in df results in NA, NA.1 ... rows
You can use complete.cases: df - df[complete.cases(df), ] On Thu, Dec 13, 2012 at 3:20 AM, raphael.fel...@art.admin.ch wrote: Good morning! I have the following data frame (df): X.outer Y.outer X.PAD1 Y.PAD1 X.PAD2 Y.PAD2 X.PAD3 Y.PAD3 X.PAD4 Y.PAD4 73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674 574747.1 179598 74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699 574724.9 179673 75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787 574729.2 179674 76 574663.6 179736.0 574699.1 179734.0 574723.5 179678 574703.4 179760 574831.8 179699 77 574649.9 179734.0 574704.7 179724.0 574724.9 179673 574702.4 179755 574852.3 179626 78 574647.3 179742.0 574706.9 179719.0 574747.1 179598 574702.0 179754 574747.1 179598 79 574633.6 179739.0 574711.4 179710.0 574641.8 179570 574698.0 179747 NA NA 80 574634.9 179732.0 574716.6 179698.0 574639.6 179573 574700.2 179738 NA NA 81 574616.5 179728.6 574716.7 179695.0 574618.3 179650 574704.4 179729 NA NA 82 574615.4 179731.0 574718.2 179690.0 NA NA 574708.1 179724 NA NA 83 574614.4 179733.6 574719.1 179688.0 NA NA 574709.3 179720 NA NA ... 44 574702.0 179754.0 NA NA NA NA NA NA NA NA 45 574695.1 179751.0 NA NA NA NA NA NA NA NA 46 574694.4 179752.0 NA NA NA NA NA NA NA NA Which I subset to df2 - df[,c(X.PAD2,Y.PAD2)] df2 X.PAD2 Y.PAD2 73 574618.3 179650 74 574719.0 179688 75 574719.0 179688 76 574723.5 179678 77 574724.9 179673 78 574747.1 179598 79 574641.8 179570 80 574639.6 179573 81 574618.3 179650 82 NA NA 83 NA NA ... 44 NA NA 45 NA NA 46 NA NA followed by removing the NA's using df2 - df2[!is.na(df2),] If I now call df2, I get: X.PAD2 Y.PAD2 73 574618.3 179650 74 574719.0 179688 75 574719.0 179688 76 574723.5 179678 77 574724.9 179673 78 574747.1 179598 79 574641.8 179570 80 574639.6 179573 81 574618.3 179650 NA NA NA NA.1 NA NA NA.2 NA NA NA.3 NA NA NA.4 NA NA NA.5 NA NA NA.6 NA NA NA.7 NA NA NA.8 NA NA It seems there are still NA's in my data frame. How can I get rid of them? What is the meaning of the rows numbered NA, NA.1 and so on? Thanks for any hints. Best regards Raphael Felber [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to create multiple country's data into multiple sheets of one excel
use append = TRUE inside your write.xlsx() function On Thu, Dec 13, 2012 at 7:52 AM, Tammy Ma metal_lical...@live.com wrote: HI, I have large dataset of many countries. I have written the program to run through each country to generate one output for each country. I want to put the output like this: one sheet has output for one country. How do I achieve it by r. I have tried this: library(xlsx) write.xlsx(nnn, vn.xlsx, sheetName=Sheet1) [1] but when I change sheetName=Sheet2 to add up another country into one sheet. it autimatically deleted which I have down on [1]. index-unique(dataset$country) for (i in 1:length(index)){ data-dataset[dataset$country==index[i],] (...) output-dd #then how do I create each country's output into one sheet of one excel??? } Kind regards, Tammy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to create multiple country's data into multiple sheets of one excel
I use the XLConnect package to write out multiple sheets to an Excel workbook On Thu, Dec 13, 2012 at 7:52 AM, Tammy Ma metal_lical...@live.com wrote: HI, I have large dataset of many countries. I have written the program to run through each country to generate one output for each country. I want to put the output like this: one sheet has output for one country. How do I achieve it by r. I have tried this: library(xlsx) write.xlsx(nnn, vn.xlsx, sheetName=Sheet1) [1] but when I change sheetName=Sheet2 to add up another country into one sheet. it autimatically deleted which I have down on [1]. index-unique(dataset$country) for (i in 1:length(index)){ data-dataset[dataset$country==index[i],] (...) output-dd #then how do I create each country's output into one sheet of one excel??? } Kind regards, Tammy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] VarimpAUC in Party Package
Error: could not find function varimpAUC Was this function NOT included in the Windows binary I downloaded and installed? Which windows binary are you talking about? The R installer, the Party .zip or something else? S Ellison *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running MCMC in R
?try ?tryCatch (if the suggestion to use an MCMC package does not fix your problem). -- Bert On Wed, Dec 12, 2012 at 7:49 PM, Chenyi Pan cp...@virginia.edu wrote: Dear all I am now running a MCMC iteration in the R program. But it is always stucked in some loop. This cause big problems for my research. So I want to know whether we can skip the current dataset and move to next simulated data when the iteration is stucked? Alternatively, can the MCMC chain skip the current iteration when it is stucked and automatically to start another chain with different starting values. I am looking forward to your reply. Best, Chenyi -- Chenyi Pan Department of Statisitics Graduate School of Arts and Sciences, University of Virginia Tel: 434-466-9209 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Webinar: Advances in Gradient Boosting: the Power of Post-Processing. TOMORROW, 10-11 a.m., PST
Webinar: Advances in Gradient Boosting: the Power of Post-Processing TOMORROW: December 14, 10-11 a.m., PST Webinar Registration: http://2.salford-systems.com/gradientboosting-and-post-processing/ Course Outline: I. Gradient Boosting and Post-Processing: o What is missing from Gradient Boosting? o Why post-processing techniques are used? II. Applications Benefiting from Post-Processing: Examples from a variety of industries. o Financial Services o Biomedical o Environmental o Manufacturing o Adserving III. Typical Post-Processing Steps IV. Techniques: o Generalized Path Seeker (GPS): modern high-speed LASSO-style regularized regression. o Importance Sampled Learning Ensembles (ISLE): identify and reweight the most influential trees. o Rulefit: ISLE on steroids. Identify the most influential nodes and rules. V. Case Study Example: o Output/Results without Post-Processing o Output/Results with Post-Processing o Demo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running MCMC in R
I am now running a MCMC iteration in the R program. But it is always stucked in some loop. I never like it when folk just say Please read and follow the posting guide referenced in every R help email but ... please, read and follow the posting guide referenced in every R help email if you want any useful kind of answer. In the mean time, it would have helped a lot to say i) Is this a problem in your own code or in a contributed or core package? ii) If not your own, in which package and which function? iii) What does 'stuck' mean? Failing to converge (and according to what criterion)? Failing to complete a prescribed number of iterations? Failing to start? and also if you could have provided data and an example that would let someone see what is going on instead of trying to guess. Failing that, however, the answers to your questions are So I want to know whether we can skip the current dataset and move to next simulated data when the iteration is stucked? Yes, though the method of doing so will depend entirely on which function and which package you are using. and Alternatively, can the MCMC chain skip the current iteration when it is stucked and automatically to start another chain with different starting values. Possibly, though the method of doing so will depend entirely on which function and which package you are using. S *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] abline of an lm fit not correct
Hello fellow R-users,  Iâm stuck with something i think is pretty stupid, but i canât find out where iâm wrong, and itâs turning me crazy!  I am doing a very simple linear regression with Northing/Easting data, then I plot the data as well as the regression line :  plot(x=Dataset$EASTING, y=Dataset$NORTHING) fit - lm(formula = NORTHING ~ EASTING, data = Dataset) abline(fit) fit  Call: lm(formula = NORTHING ~ EASTING, data = Dataset)  Coefficients: (Intercept)     EASTING  5.376e+05   4.692e-02  Later on, when I use the command âablineâ with the coefficient provided by âsummary(fit)â, the line is not the same as abline(fit) !  To summarize, those two lines are different:  abline(fit) abline(5.376e+05, 4.692e-02)  The âbâ coefficients appear equal, but the intercepts are different.  Where am I missing something? L  Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] abline of an lm fit not correct
You don't provide a reproducible example, but my first guess is that the print method is rounding what appears on the screen, so you aren't actually using the slope and intercept. See ?print.default and the digits argument under ?options for more. Why do you need to copy and paste the coefficients? Just to check your understanding? Sarah On Thursday, December 13, 2012, Robert U wrote: Hello fellow R-users, Im stuck with something i think is pretty stupid, but i cant find out where im wrong, and its turning me crazy! I am doing a very simple linear regression with Northing/Easting data, then I plot the data as well as the regression line : plot(x=Dataset$EASTING, y=Dataset$NORTHING) fit - lm(formula = NORTHING ~ EASTING, data = Dataset) abline(fit) fit Call: lm(formula = NORTHING ~ EASTING, data = Dataset) Coefficients: (Intercept) EASTING 5.376e+054.692e-02 Later on, when I use the command abline with the coefficient provided by summary(fit), the line is not the same as abline(fit) ! To summarize, those two lines are different: abline(fit) abline(5.376e+05, 4.692e-02) The b coefficients appear equal, but the intercepts are different. Where am I missing something? L Thanks [[alternative HTML version deleted]] -- Sarah Goslee http://www.stringpage.com http://www.sarahgoslee.com http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Pairwise deletion in a linear regression and in a GLM ?
Dear useRs, In a thesis, I found a mention of the use of pairwise deletion in linear regression and GLM (binomial family). The author said that he has used R to do the statistics, but I did not find the option allowing pairwise deletion in both lm and glm functions. Is there somewhere a package allowing that ? Thanks, Arnaud [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] recursion depth limitations
Hello List, I am aware that one can set with recursion depth 'options(expressions = #)', but it has 500K limit. Why do we have a 500K limit on this? While some algorithms are only solvable feasibility with recursion and 500K sounds not too much i.e. graph algorithms for example dependency trees with large nodes easily reach to that number. Best, -m __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pairwise deletion in a linear regression and in a GLM ?
Hi Arnaud, A quick help search of lm or glm tells you that 'the factory-fresh default is na.omit'. If you then look up 'na.omit', you'll read that it 'returns the object with incomplete cases removed'. So, pairwise deletion is the default option in both lm and glm. On a related note, it goes without saying that pairwise deletion is not good practice in most cases, and that R has ways to impute these missing cases depending on assumptions regarding the cause or nature of their missingness. Regards, José José Iparraguirre Chief Economist Age UK -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Arnaud Mosnier Sent: 13 December 2012 15:40 To: r-help@r-project.org Subject: [R] Pairwise deletion in a linear regression and in a GLM ? Dear useRs, In a thesis, I found a mention of the use of pairwise deletion in linear regression and GLM (binomial family). The author said that he has used R to do the statistics, but I did not find the option allowing pairwise deletion in both lm and glm functions. Is there somewhere a package allowing that ? Thanks, Arnaud [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Wrap Up and Run 10k is back! Also, new for 2013 – 2km intergenerational walks at selected venues. So recruit a buddy, dust off the trainers and beat the winter blues by signing up now: http://www.ageuk.org.uk/10k Milton Keynes | Oxford | Sheffield | Crystal Palace | Exeter | Harewood House, Leeds | Tatton Park, Cheshire | Southampton | Coventry Age UK Improving later life http://www.ageuk.org.uk --- Age UK is a registered charity and company limited by guarantee, (registered charity number 1128267, registered company number 6825798). Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA. For the purposes of promoting Age UK Insurance, Age UK is an Appointed Representative of Age UK Enterprises Limited, Age UK is an Introducer Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth Access for the purposes of introducing potential annuity and health cash plans customers respectively. Age UK Enterprises Limited, JLT Benefit Solutions Limited and Simplyhealth Access are all authorised and regulated by the Financial Services Authority. -- This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you receive a message in error, please advise the sender and delete immediately. Except where this email is sent in the usual course of our business, any opinions expressed in this email are those of the author and do not necessarily reflect the opinions of Age UK or its subsidiaries and associated companies. Age UK monitors all e-mail transmissions passing through its network and may block or modify mails which are deemed to be unsuitable. Age Concern England (charity number 261794) and Help the Aged (charity number 272786) and their trading and other associated companies merged on 1st April 2009. Together they have formed the Age UK Group, dedicated to improving the lives of people in later life. The three national Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help the Aged in these nations to form three registered charities: Age Scotland, Age NI, Age Cymru. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to select a subset data to do a barplot in ggplot2
Hi,everybody I have a dataframe like this FID IID STATUS 14621live 14628dead 24631live 24632live 24633live 24634live 64675live 64679dead 104716dead 104719live 104721dead 114726live 114728nosperm 114730nosperm 124732live 174783live 174783live 174784live I just want a barblot to count live or dead in every FID, and fill the bar with different colour. I try these codes: p-ggplot(data,aes(x=FID)); p+geom_bar(aes(x=factor(FID),y=..count..,fill=STATUS)) But how could I exclude nosperm or other levels just in the use of ggplot2 without generating another dataframe Thanks a lot Yao He Master candidate in 2rd year Department of Animal genetics breeding Room 436,College of Animial ScienceTechnology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com ming...@vt.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove NA in df results in NA, NA.1 ... rows
Hi, You could use either: ?na.omit() #the option was already suggested #or df2[complete.cases(df2),] #In this case, this should also work sapply(df2,function(x) x[!is.na(x)]) #or apply(df2,2,function(x) x[!is.na(x)]) #If the NAs are not in the same rows, then the ouptut will be a list with the list elements differ in length. A.K. - Original Message - From: raphael.fel...@art.admin.ch raphael.fel...@art.admin.ch To: r-help@r-project.org Cc: Sent: Thursday, December 13, 2012 3:20 AM Subject: [R] remove NA in df results in NA, NA.1 ... rows Good morning! I have the following data frame (df): X.outer Y.outer X.PAD1 Y.PAD1 X.PAD2 Y.PAD2 X.PAD3 Y.PAD3 X.PAD4 Y.PAD4 73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674 574747.1 179598 74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699 574724.9 179673 75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787 574729.2 179674 76 574663.6 179736.0 574699.1 179734.0 574723.5 179678 574703.4 179760 574831.8 179699 77 574649.9 179734.0 574704.7 179724.0 574724.9 179673 574702.4 179755 574852.3 179626 78 574647.3 179742.0 574706.9 179719.0 574747.1 179598 574702.0 179754 574747.1 179598 79 574633.6 179739.0 574711.4 179710.0 574641.8 179570 574698.0 179747 NA NA 80 574634.9 179732.0 574716.6 179698.0 574639.6 179573 574700.2 179738 NA NA 81 574616.5 179728.6 574716.7 179695.0 574618.3 179650 574704.4 179729 NA NA 82 574615.4 179731.0 574718.2 179690.0 NA NA 574708.1 179724 NA NA 83 574614.4 179733.6 574719.1 179688.0 NA NA 574709.3 179720 NA NA ... 44 574702.0 179754.0 NA NA NA NA NA NA NA NA 45 574695.1 179751.0 NA NA NA NA NA NA NA NA 46 574694.4 179752.0 NA NA NA NA NA NA NA NA Which I subset to df2 - df[,c(X.PAD2,Y.PAD2)] df2 X.PAD2 Y.PAD2 73 574618.3 179650 74 574719.0 179688 75 574719.0 179688 76 574723.5 179678 77 574724.9 179673 78 574747.1 179598 79 574641.8 179570 80 574639.6 179573 81 574618.3 179650 82 NA NA 83 NA NA ... 44 NA NA 45 NA NA 46 NA NA followed by removing the NA's using df2 - df2[!is.na(df2),] If I now call df2, I get: X.PAD2 Y.PAD2 73 574618.3 179650 74 574719.0 179688 75 574719.0 179688 76 574723.5 179678 77 574724.9 179673 78 574747.1 179598 79 574641.8 179570 80 574639.6 179573 81 574618.3 179650 NA NA NA NA.1 NA NA NA.2 NA NA NA.3 NA NA NA.4 NA NA NA.5 NA NA NA.6 NA NA NA.7 NA NA NA.8 NA NA It seems there are still NA's in my data frame. How can I get rid of them? What is the meaning of the rows numbered NA, NA.1 and so on? Thanks for any hints. Best regards Raphael Felber [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to aggregate the dataset
Hi, You could try this: dat3-read.table(text= product min_price max_price mean_price country price_band 11 34 50 40 VN 0-300 22 10 30 15 VN 0-300 ,sep=,header=TRUE,stringsAsFactors=FALSE) library(reshape2) SubsetPrice-dat3[grep(price,names(dat3))] dat3$newPrice-paste(SubsetPrice[,3],paste([,SubsetPrice[,1],,,SubsetPrice[,2],],sep=),sep= ) dcast(dat3,product+price_band~country,value.var=newPrice) # product price_band VN #1 11 0-300 40 [34,50] #2 22 0-300 15 [10,30] A.K. - Original Message - From: Tammy Ma metal_lical...@live.com To: r-help@r-project.org r-help@r-project.org Cc: Sent: Thursday, December 13, 2012 5:42 AM Subject: [R] how to aggregate the dataset HI, Sorry for messing up.. I want to transform the following dataset: product min_price max_price mean_price country price_band 11 34 50 40 VN 0-300 22 10 30 15 VN 0-300 Into: product VN price_band 11 40 0-300 [34,50] 22 15 0-300 [10,30] How can I do this in r? I have large dataset like this. I want to transform all into that one. Thanks a lot. Kind regards, Tammy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Repeat elements of matrix based on vector counts
I have two dataframes (df) that share a column header (plot.id). In the 1st df, plot.id records are repeated a variable number of times based on the number of trees monitored within each plot. The 2nd df only has a single record for each plot.id, and contains a variable named load that is collected at the plot-level and is only listed once per plot record. *OBJECTIVE:* I need to repeat the load values from the 2nd df based on how many times plot.id is repeated in the 1st df (all plots are repeated a different number of times). My example dfs are below: df1 - data.frame(plot.id = rep(c(plot1, plot2, plot3), c(3,2,5)), tree.tag = c(111,112,113,222,223,333,334,335,336,337)) df2 - data.frame(plot.id = c(plot1, plot2, plot3), load=c(17, 6, 24)) I have gotten close to solving this, but alas I'm on day 2 of problem-shooting and can't get it! Thanks for any help you might provide. --Sarah [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I make a loop to extract a column from multiple lists and then bind them together to make a new matrix?
Hi! I am new to looping and R in general; and I have sent wy to much time on this one problem and am about a hair away from doing it manually for the next two days. So, there is a package that while calculating the statistic creates lists (that look like matrices) in the background. Each item (there are 10 items) has one of these matrix looking list that I need to extract data from. The list has 5 rows that represent 5 groups, and 8 columns. I need to extract 3 of the columns (Lo Score=[,2], Hi Score=[,3], and Mean=[,7]) for each of the items. I then want to turn the extracted data into 3 matrices (Lo Score, Hi Score, and Mean) where the rows are the 5 groups and the columns are items 1-10. This is how I can create the mean matrix by hand. MDD.mean.s10 is the matrix I want in the end. (notice the first bracket after $results is the only part that changes 1-10 (to represent the 10 items) and the last bracket is [,7] to represent the mean located in column 7) m.1a - MC_MDD.noNA$results[[1]][[2]][,7] m.2b - MC_MDD.noNA$results[[2]][[2]][,7] m.3c - MC_MDD.noNA$results[[3]][[2]][,7] m.4d - MC_MDD.noNA$results[[4]][[2]][,7] m.5e - MC_MDD.noNA$results[[5]][[2]][,7] m.6f - MC_MDD.noNA$results[[6]][[2]][,7] m.7g - MC_MDD.noNA$results[[7]][[2]][,7] m.8h - MC_MDD.noNA$results[[8]][[2]][,7] m.9i - MC_MDD.noNA$results[[9]][[2]][,7] m.10j - MC_MDD.noNA$results[[10]][[2]][,7] MDD.mean.s10 - cbind(m.1a, m.2b, m.3c, m.4d, m.5e, m.6f, m.7g, m.8h, m.9i, m.10j) MDD.mean.s10 m.1a m.2b m.3c m.4d m.5e m.6f m.7g m.8h m.9i m.10j [1,] 0.8707865 0.7393939 0.7769231 0.7591241 0.853 0.7925926 0.8258065 0.8410596 0.8843931 0.5638298 [2,] 0.8323353 0.7302632 0.5913978 0.5868263 0.6923077 0.6182796 0.6964286 0.6839080 0.7911392 0.3212121 [3,] 0.8726115 0.7159763 0.7117647 0.6163522 0.7987805 0.7105263 0.7613636 0.7674419 0.8034682 0.4011299 [4,] 0.9024390 0.7894737 0.7795276 0.6530612 0.8593750 0.7112676 0.8672566 0.8629032 0.9152542 0.4834437 [5,] 0.986 0.9102564 0.8452381 0.8160920 0.9726027 0.8658537 0.8352941 0.9342105 0.947 0.6454545 But I cant do this by hand every time, as this comes up over and over and over again in multiple lists. I have figured out how to loop this procedure and name the vector as it goes along: for(i in 1:10){ + assign(paste(m, i, sep = ), MC_MDD.noNA$results[[i]][[2]][,7]) + } m1 [1] 0.8707865 0.8323353 0.8726115 0.9024390 0.986 m2 [1] 0.7393939 0.7302632 0.7159763 0.7894737 0.9102564 m3 [1] 0.7769231 0.5913978 0.7117647 0.7795276 0.8452381 m4 [1] 0.7591241 0.5868263 0.6163522 0.6530612 0.8160920 m5 [1] 0.853 0.6923077 0.7987805 0.8593750 0.9726027 m6 [1] 0.7925926 0.6182796 0.7105263 0.7112676 0.8658537 m7 [1] 0.8258065 0.6964286 0.7613636 0.8672566 0.8352941 m8 [1] 0.8410596 0.6839080 0.7674419 0.8629032 0.9342105 m9 [1] 0.8843931 0.7911392 0.8034682 0.9152542 0.947 m10 [1] 0.5638298 0.3212121 0.4011299 0.4834437 0.6454545 Now here where I get stuck how do I cbind these vectors without typing it out expliciity? ie. mean.MDD - cbind(m1,m2,m3,m4,m5,m6,m7,m8,m9,10) Everything I have tried keeps overwriting the data instead of building a matrix. Basically I, start with a matrix (5x10) of zeros. Then I wind up with a few values in the beginning, but the rest is still zeros. Example of terrible code: fo - matrix(0,5,10) colnames(fo) - paste('f', 1:10, sep = ) fo f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 [1,] 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 for(i in 1:10){ + fo - assign(paste(f, i, sep = ), MC_MDD.noNA$results[[i]][[2]][,7]) + } fo [1] 0.5638298 0.3212121 0.4011299 0.4834437 0.6454545 fo - matrix(0,5,10) colnames(fo) - paste('f', 1:10, sep = ) fo f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 [1,] 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 for(i in 1:10){ + fo - cbind(assign(paste(f, i, sep = ), MC_MDD.noNA$results[[i]][[2]][,7])) + } fo [,1] [1,] 0.5638298 [2,] 0.3212121 [3,] 0.4011299 [4,] 0.4834437 [5,] 0.6454545 Thanks for your help in advance!!! (c: [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running MCMC in R
And if by stuck you mean taking too long a time you can generate an error at a given time limit by using setTimeLimit() and tryCatch() or try() can catch that error. E.g. timeOut - function (expr, cpu = Inf, elapsed = Inf) { setTimeLimit(cpu = cpu, elapsed = elapsed, transient = TRUE) on.exit(setTimeLimit()) # should not be needed, but averts future error message expr } timeOut({s-0 ; for(i in 1:1e7)s - s + 1/i ; s}, elapsed=1) Error: reached elapsed time limit timeOut({s-0 ; for(i in 1:1e7)s - s + 1/i ; s}, elapsed=10) # log(1e7) + gamma [1] 16.69531 tryCatch(timeOut({s-0 ; for(i in 1:1e7)s - s + 1/i ; s}, elapsed=1), + error = function(e) NA_real_) [1] NA Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter Sent: Thursday, December 13, 2012 6:21 AM To: Chenyi Pan Cc: R-help@r-project.org Subject: Re: [R] Running MCMC in R ?try ?tryCatch (if the suggestion to use an MCMC package does not fix your problem). -- Bert On Wed, Dec 12, 2012 at 7:49 PM, Chenyi Pan cp...@virginia.edu wrote: Dear all I am now running a MCMC iteration in the R program. But it is always stucked in some loop. This cause big problems for my research. So I want to know whether we can skip the current dataset and move to next simulated data when the iteration is stucked? Alternatively, can the MCMC chain skip the current iteration when it is stucked and automatically to start another chain with different starting values. I am looking forward to your reply. Best, Chenyi -- Chenyi Pan Department of Statisitics Graduate School of Arts and Sciences, University of Virginia Tel: 434-466-9209 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb- biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pairwise deletion in a linear regression and in a GLM ?
Hi Jose, To my perception na.omit is different from a pairwise deletion. With na.omit, you omit totally that case if there is a missing value for one of the variable you consider in the model. In the pairwise deletion, the case with some missing value is kept and values that are not missing are used in the statistics calculations. However, I agree that pairwise deletion is not good practice (so it would be surprising that it is the default in lm !!). I just when to be able to recalculate the statistics given in this thesis. Arnaud 2012/12/13 Jose Iparraguirre jose.iparragui...@ageuk.org.uk Hi Arnaud, A quick help search of lm or glm tells you that 'the factory-fresh default is na.omit'. If you then look up 'na.omit', you'll read that it 'returns the object with incomplete cases removed'. So, pairwise deletion is the default option in both lm and glm. On a related note, it goes without saying that pairwise deletion is not good practice in most cases, and that R has ways to impute these missing cases depending on assumptions regarding the cause or nature of their missingness. Regards, José José Iparraguirre Chief Economist Age UK -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Arnaud Mosnier Sent: 13 December 2012 15:40 To: r-help@r-project.org Subject: [R] Pairwise deletion in a linear regression and in a GLM ? Dear useRs, In a thesis, I found a mention of the use of pairwise deletion in linear regression and GLM (binomial family). The author said that he has used R to do the statistics, but I did not find the option allowing pairwise deletion in both lm and glm functions. Is there somewhere a package allowing that ? Thanks, Arnaud [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Wrap Up and Run 10k is back! Also, new for 2013 2km intergenerational walks at selected venues. So recruit a buddy, dust off the trainers and beat the winter blues by signing up now: http://www.ageuk.org.uk/10k Milton Keynes | Oxford | Sheffield | Crystal Palace | Exeter | Harewood House, Leeds | Tatton Park, Cheshire | Southampton | Coventry Age UK Improving later life http://www.ageuk.org.uk --- Age UK is a registered charity and company limited by guarantee, (registered charity number 1128267, registered company number 6825798). Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA. For the purposes of promoting Age UK Insurance, Age UK is an Appointed Representative of Age UK Enterprises Limited, Age UK is an Introducer Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth Access for the purposes of introducing potential annuity and health cash plans customers respectively. Age UK Enterprises Limited, JLT Benefit Solutions Limited and Simplyhealth Access are all authorised and regulated by the Financial Services Authority. -- This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you receive a message in error, please advise the sender and delete immediately. Except where this email is sent in the usual course of our business, any opinions expressed in this email are those of the author and do not necessarily reflect the opinions of Age UK or its subsidiaries and associated companies. Age UK monitors all e-mail transmissions passing through its network and may block or modify mails which are deemed to be unsuitable. Age Concern England (charity number 261794) and Help the Aged (charity number 272786) and their trading and other associated companies merged on 1st April 2009. Together they have formed the Age UK Group, dedicated to improving the lives of people in later life. The three national Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help the Aged in these nations to form three registered charities: Age Scotland, Age NI, Age Cymru. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeat elements of matrix based on vector counts
Hi Sarah, If I understand your requirements correctly, the easiest thing to do is approach it from a different direction: df3a - merge(df1, df2) But you can also use rep for this simple example because plot.id in df2 is sorted: nindex - table(df1$plot.id) df3b - df2[rep(1:length(nindex), times=nindex),] Thanks for the reproducible example, Sarah On Thu, Dec 13, 2012 at 9:15 AM, Sarah Haas haaszool...@gmail.com wrote: I have two dataframes (df) that share a column header (plot.id). In the 1st df, plot.id records are repeated a variable number of times based on the number of trees monitored within each plot. The 2nd df only has a single record for each plot.id, and contains a variable named load that is collected at the plot-level and is only listed once per plot record. *OBJECTIVE:* I need to repeat the load values from the 2nd df based on how many times plot.id is repeated in the 1st df (all plots are repeated a different number of times). My example dfs are below: df1 - data.frame(plot.id = rep(c(plot1, plot2, plot3), c(3,2,5)), tree.tag = c(111,112,113,222,223,333,334,335,336,337)) df2 - data.frame(plot.id = c(plot1, plot2, plot3), load=c(17, 6, 24)) I have gotten close to solving this, but alas I'm on day 2 of problem-shooting and can't get it! Thanks for any help you might provide. --Sarah [[alternative HTML version deleted]] -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeat elements of matrix based on vector counts
Hello, Something like this? rep(df2$load, table(df1$plot.id)) Hope this helps, Rui Barradas Em 13-12-2012 14:15, Sarah Haas escreveu: I have two dataframes (df) that share a column header (plot.id). In the 1st df, plot.id records are repeated a variable number of times based on the number of trees monitored within each plot. The 2nd df only has a single record for each plot.id, and contains a variable named load that is collected at the plot-level and is only listed once per plot record. *OBJECTIVE:* I need to repeat the load values from the 2nd df based on how many times plot.id is repeated in the 1st df (all plots are repeated a different number of times). My example dfs are below: df1 - data.frame(plot.id = rep(c(plot1, plot2, plot3), c(3,2,5)), tree.tag = c(111,112,113,222,223,333,334,335,336,337)) df2 - data.frame(plot.id = c(plot1, plot2, plot3), load=c(17, 6, 24)) I have gotten close to solving this, but alas I'm on day 2 of problem-shooting and can't get it! Thanks for any help you might provide. --Sarah [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeat elements of matrix based on vector counts
Hi, Try ?merge() or ?join() from library(plyr) res-merge(df1,df2,by=plot.id) head(res,6) # plot.id tree.tag load #1 plot1 111 17 #2 plot1 112 17 #3 plot1 113 17 #4 plot2 222 6 #5 plot2 223 6 #6 plot3 333 24 A.K - Original Message - From: Sarah Haas haaszool...@gmail.com To: r-help@r-project.org Cc: Sent: Thursday, December 13, 2012 9:15 AM Subject: [R] Repeat elements of matrix based on vector counts I have two dataframes (df) that share a column header (plot.id). In the 1st df, plot.id records are repeated a variable number of times based on the number of trees monitored within each plot. The 2nd df only has a single record for each plot.id, and contains a variable named load that is collected at the plot-level and is only listed once per plot record. *OBJECTIVE:* I need to repeat the load values from the 2nd df based on how many times plot.id is repeated in the 1st df (all plots are repeated a different number of times). My example dfs are below: df1 - data.frame(plot.id = rep(c(plot1, plot2, plot3), c(3,2,5)), tree.tag = c(111,112,113,222,223,333,334,335,336,337)) df2 - data.frame(plot.id = c(plot1, plot2, plot3), load=c(17, 6, 24)) I have gotten close to solving this, but alas I'm on day 2 of problem-shooting and can't get it! Thanks for any help you might provide. --Sarah [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pairwise deletion in a linear regression and in a GLM ?
Sorry, Arnaud, I misinterpreted the question. There isn't a built-in option in lm or glm to run pairwise deletion, but in the 'psych' package you can run regressions on covariance matrices rather than on raw data. So, first, you can obtain a covariance matrix by cov() with the option use=pairwise.complete.obs -or within 'psych', set.cor(...,use=pairwise), which will give you the correlations pairwise, and then you use the function mat.regress using the pairwise matrix. Hope this helps, José From: Arnaud Mosnier [mailto:a.mosn...@gmail.com] Sent: 13 December 2012 16:13 To: Jose Iparraguirre Cc: r-help@r-project.org Subject: Re: [R] Pairwise deletion in a linear regression and in a GLM ? Hi Jose, To my perception na.omit is different from a pairwise deletion. With na.omit, you omit totally that case if there is a missing value for one of the variable you consider in the model. In the pairwise deletion, the case with some missing value is kept and values that are not missing are used in the statistics calculations. However, I agree that pairwise deletion is not good practice (so it would be surprising that it is the default in lm !!). I just when to be able to recalculate the statistics given in this thesis. Arnaud 2012/12/13 Jose Iparraguirre jose.iparragui...@ageuk.org.ukmailto:jose.iparragui...@ageuk.org.uk Hi Arnaud, A quick help search of lm or glm tells you that 'the factory-fresh default is na.omit'. If you then look up 'na.omit', you'll read that it 'returns the object with incomplete cases removed'. So, pairwise deletion is the default option in both lm and glm. On a related note, it goes without saying that pairwise deletion is not good practice in most cases, and that R has ways to impute these missing cases depending on assumptions regarding the cause or nature of their missingness. Regards, José José Iparraguirre Chief Economist Age UK -Original Message- From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org] On Behalf Of Arnaud Mosnier Sent: 13 December 2012 15:40 To: r-help@r-project.orgmailto:r-help@r-project.org Subject: [R] Pairwise deletion in a linear regression and in a GLM ? Dear useRs, In a thesis, I found a mention of the use of pairwise deletion in linear regression and GLM (binomial family). The author said that he has used R to do the statistics, but I did not find the option allowing pairwise deletion in both lm and glm functions. Is there somewhere a package allowing that ? Thanks, Arnaud [[alternative HTML version deleted]] __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Wrap Up and Run 10k is back! Also, new for 2013 - 2km intergenerational walks at selected venues. So recruit a buddy, dust off the trainers and beat the winter blues by signing up now: http://www.ageuk.org.uk/10k Milton Keynes | Oxford | Sheffield | Crystal Palace | Exeter | Harewood House, Leeds | Tatton Park, Cheshire | Southampton | Coventry Age UK Improving later life http://www.ageuk.org.uk --- Age UK is a registered charity and company limited by guarantee, (registered charity number 1128267, registered company number 6825798). Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA. For the purposes of promoting Age UK Insurance, Age UK is an Appointed Representative of Age UK Enterprises Limited, Age UK is an Introducer Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth Access for the purposes of introducing potential annuity and health cash plans customers respectively. Age UK Enterprises Limited, JLT Benefit Solutions Limited and Simplyhealth Access are all authorised and regulated by the Financial Services Authority. -- This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you receive a message in error, please advise the sender and delete immediately. Except where this email is sent in the usual course of our business, any opinions expressed in this email are those of the author and do not necessarily reflect the opinions of Age UK or its subsidiaries and associated companies. Age UK monitors all e-mail transmissions passing through its network and may block or modify mails which are deemed to be unsuitable. Age Concern England (charity number 261794) and Help the Aged (charity number 272786) and their trading and other associated companies merged on 1st April 2009. Together they have formed the Age UK Group, dedicated to improving the lives of
Re: [R] Pairwise deletion in a linear regression and in a GLM ?
Thanks Jose, but I doubt that the author of these analysis used such a complex approach. Arnaud 2012/12/13 Jose Iparraguirre jose.iparragui...@ageuk.org.uk Sorry, Arnaud, I misinterpreted the question. There isnt a built-in option in lm or glm to run pairwise deletion, but in the psych package you can run regressions on covariance matrices rather than on raw data. So, first, you can obtain a covariance matrix by cov() with the option use=pairwise.complete.obs or within psych, set.cor( ,use=pairwise), which will give you the correlations pairwise, and then you use the function mat.regress using the pairwise matrix. Hope this helps, ** ** José ** ** ** ** *From:* Arnaud Mosnier [mailto:a.mosn...@gmail.com] *Sent:* 13 December 2012 16:13 *To:* Jose Iparraguirre *Cc:* r-help@r-project.org *Subject:* Re: [R] Pairwise deletion in a linear regression and in a GLM ? ** ** Hi Jose, To my perception na.omit is different from a pairwise deletion. With na.omit, you omit totally that case if there is a missing value for one of the variable you consider in the model. In the pairwise deletion, the case with some missing value is kept and values that are not missing are used in the statistics calculations. However, I agree that pairwise deletion is not good practice (so it would be surprising that it is the default in lm !!). I just when to be able to recalculate the statistics given in this thesis. Arnaud ** ** 2012/12/13 Jose Iparraguirre jose.iparragui...@ageuk.org.uk Hi Arnaud, A quick help search of lm or glm tells you that 'the factory-fresh default is na.omit'. If you then look up 'na.omit', you'll read that it 'returns the object with incomplete cases removed'. So, pairwise deletion is the default option in both lm and glm. On a related note, it goes without saying that pairwise deletion is not good practice in most cases, and that R has ways to impute these missing cases depending on assumptions regarding the cause or nature of their missingness. Regards, José José Iparraguirre Chief Economist Age UK -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Arnaud Mosnier Sent: 13 December 2012 15:40 To: r-help@r-project.org Subject: [R] Pairwise deletion in a linear regression and in a GLM ? Dear useRs, In a thesis, I found a mention of the use of pairwise deletion in linear regression and GLM (binomial family). The author said that he has used R to do the statistics, but I did not find the option allowing pairwise deletion in both lm and glm functions. Is there somewhere a package allowing that ? Thanks, Arnaud [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Wrap Up and Run 10k is back! Also, new for 2013 2km intergenerational walks at selected venues. So recruit a buddy, dust off the trainers and beat the winter blues by signing up now: http://www.ageuk.org.uk/10k Milton Keynes | Oxford | Sheffield | Crystal Palace | Exeter | Harewood House, Leeds | Tatton Park, Cheshire | Southampton | Coventry Age UK Improving later life http://www.ageuk.org.uk --- Age UK is a registered charity and company limited by guarantee, (registered charity number 1128267, registered company number 6825798). Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA. For the purposes of promoting Age UK Insurance, Age UK is an Appointed Representative of Age UK Enterprises Limited, Age UK is an Introducer Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth Access for the purposes of introducing potential annuity and health cash plans customers respectively. Age UK Enterprises Limited, JLT Benefit Solutions Limited and Simplyhealth Access are all authorised and regulated by the Financial Services Authority. -- This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you receive a message in error, please advise the sender and delete immediately. Except where this email is sent in the usual course of our business, any opinions expressed in this email are those of the author and do not necessarily reflect the opinions of Age UK or its subsidiaries and associated companies. Age UK monitors all e-mail transmissions passing through its network and may block or modify mails which are deemed to be unsuitable. Age Concern England (charity number 261794) and Help the Aged (charity number
[R] subsetting time series
Hello, my series of dates look like [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC [9] 2012-05-31 02:30:00 UTC 2012-05-31 00:30:00 UTC [11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC [13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC [15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC [17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC [19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC ... I'd like to subset this to four series 1) [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC [9] 2012-05-31 02:30:00 UTC [10] 2012-05-31 18:30:00 UTC 2012-05-31 19:30:00 UTC ... 2) 2012-05-31 00:30:00 UTC - [1] [11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC - [2,3] [13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC [15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC [17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC [10] 2012-06-01 00:30:00 UTC 3) [19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC ... so that I can plot data for each of the series separately without e.g. data at hour 2012-05-31 02:30:00 UTC connecting in the figure to 2012-05-31 00:30:00 UTC Basically, cycling through the series with period 9 Thanks for any suggestions/help, thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] recursion depth limitations
On Dec 13, 2012, at 10:45 AM, Suzen, Mehmet wrote: Hello List, I am aware that one can set with recursion depth 'options(expressions = #)', but it has 500K limit. Why do we have a 500K limit on this? Because it's far beyond what you can handle without changing a lot of other things. 500k expressions will require at least about 320Mb of stack (!) in the eval() chain alone -- compare that to the 8Mb stack size which is default in most OSes, so you'll hit the wall way before that limit is reached. While some algorithms are only solvable feasibility with recursion and 500K sounds not too much i.e. graph algorithms for example dependency trees with large nodes easily reach to that number. I don't see how large nodes have anything to do with this since we are talking about expression depth, not about sizes of any kind. Again, in any realistic example you'll hit other limits first anyway. Cheers, Simon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to select a subset data to do a barplot in ggplot2
Hi, May be this: p-ggplot(subset(dat1,STATUS!=nosperm),aes(x=FID)) p+geom_bar(aes(x=factor(FID),y=..count..,fill=STATUS)) A.K. - Original Message - From: Yao He yao.h.1...@gmail.com To: r-help@r-project.org Cc: Sent: Thursday, December 13, 2012 7:38 AM Subject: [R] How to select a subset data to do a barplot in ggplot2 Hi,everybody I have a dataframe like this FID IID STATUS 1 4621 live 1 4628 dead 2 4631 live 2 4632 live 2 4633 live 2 4634 live 6 4675 live 6 4679 dead 10 4716 dead 10 4719 live 10 4721 dead 11 4726 live 11 4728 nosperm 11 4730 nosperm 12 4732 live 17 4783 live 17 4783 live 17 4784 live I just want a barblot to count live or dead in every FID, and fill the bar with different colour. I try these codes: p-ggplot(data,aes(x=FID)); p+geom_bar(aes(x=factor(FID),y=..count..,fill=STATUS)) But how could I exclude nosperm or other levels just in the use of ggplot2 without generating another dataframe Thanks a lot Yao He — Master candidate in 2rd year Department of Animal genetics breeding Room 436,College of Animial ScienceTechnology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com ming...@vt.edu —— [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting time series
Is this a one off or not? Why not do it manually? If you need to write a function some example data would be helpful. On Thu, Dec 13, 2012 at 10:52 AM, m p mzp3...@gmail.com wrote: Hello, my series of dates look like [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC [9] 2012-05-31 02:30:00 UTC 2012-05-31 00:30:00 UTC [11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC [13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC [15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC [17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC [19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC ... I'd like to subset this to four series 1) [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC [9] 2012-05-31 02:30:00 UTC [10] 2012-05-31 18:30:00 UTC 2012-05-31 19:30:00 UTC ... 2) 2012-05-31 00:30:00 UTC - [1] [11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC - [2,3] [13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC [15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC [17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC [10] 2012-06-01 00:30:00 UTC 3) [19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC ... so that I can plot data for each of the series separately without e.g. data at hour 2012-05-31 02:30:00 UTC connecting in the figure to 2012-05-31 00:30:00 UTC Basically, cycling through the series with period 9 Thanks for any suggestions/help, thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick ** Auburn University Biological Sciences 331 Funchess Hall Auburn, Alabama 36849 ** sas0...@auburn.edu http://www.auburn.edu/~sas0025 ** Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis A big computer, a complex algorithm and a long time does not equal science. -Robert Gentleman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting time series
On Dec 13, 2012, at 8:52 AM, m p wrote: Hello, my series of dates look like [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC [9] 2012-05-31 02:30:00 UTC 2012-05-31 00:30:00 UTC [11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC [13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC [15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC [17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC [19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC ... Better would have been series - seq(as.POSIXct(2012-05-30 18:30:00, tz= UTC), length=40, by=1 hour) I'd like to subset this to four series Although you latter describe the problem differently, so I am following that description. See if this split approach with modulo arithmetic is helpful: split(series, (0:length(series)-1) %/% 9 ) -- David. 1) [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC [9] 2012-05-31 02:30:00 UTC [10] 2012-05-31 18:30:00 UTC 2012-05-31 19:30:00 UTC ... 2) 2012-05-31 00:30:00 UTC - [1] [11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC - [2,3] [13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC [15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC [17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC [10] 2012-06-01 00:30:00 UTC 3) [19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC ... so that I can plot data for each of the series separately without e.g. data at hour 2012-05-31 02:30:00 UTC connecting in the figure to 2012-05-31 00:30:00 UTC Basically, cycling through the series with period 9 Thanks for any suggestions/help, thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] More efficient use of reshape?
On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote: Hi all, I have played a bit with the reshape package and function along with melt and cast, but I feel I still don't have a good handle on how to use them efficiently. Below I have included a application of reshape that is rather clunky and I'm hoping someone can offer advice on how to use reshape (or melt/cast) more efficiently. You do realize that the 'reshape' function is _not_ in the reshape package, right? And also that the reshape package has been superseded by the reshape2 package? -- David. #For this example I am using climate change data available on-line file - ( http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv;) clim.data - read.csv(file, header=TRUE) library(lubridate) library(reshape) #I've been playing with the lubridate package a bit to work with dates, but as the climate dataset only uses year and month I have #added a day to each entry in the yr_mn column and then used dym from lubridate to generate the POSIXlt formatted dates in #a new column clim.data$date clim.data$yr_mn-paste(01, clim.data$yr_mn, sep=) clim.data$date-dym(clim.data$yr_mn) #Now to the reshape. The dataframe is in a wide format. The columns GISS, HAD, NOAA, RSS, and UAH are all different sources #from which the global temperature anomaly has been calculated since 1880 (actually only 1978 for RSS and UAH). What I would like to #do is plot the temperature anomaly vs date and use ggplot to facet by the different data source (GISS, HAD, etc.). Thus I need the #data in long format with a date column, a temperature anomaly column, and a data source column. The code below works, but its #really very clunky and I'm sure I am not using these tools as efficiently as I can. #The varying=list(3:7) specifies the columns in the dataframe that corresponded to the sources (GISS, etc.), though then in the resulting #reshaped dataframe the sources are numbered 1-5, so I have to reassigned their names. In addition, the original dataframe has #additional data columns I do not want and so after reshaping I create another! dataframe with just the columns I need, and #then I have to rename them so that I can keep track of what everything is. Whew! Not the most elegant of code. d-reshape(clim.data, varying=list(3:7),idvar=date, v.names=anomaly,direction=long) d$time-ifelse(d$time==1,GISS,d$time) d$time-ifelse(d$time==2,HAD,d$time) d$time-ifelse(d$time==3,NOAA,d$time) d$time-ifelse(d$time==4,RSS,d$time) d$time-ifelse(d$time==5,UAH,d$time) new.data-data.frame(d$date,d$time,d$anomaly) names(new.data)-c(date,source,anomaly) I realize this is a mess, though it works. I think with just some help on how better to work this example I'll probably get over the learning hump and actually figure out how to use these data manipulation functions more cleanly. Any advice or assistance would be appreciated. Thanks, Nate [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] More efficient use of reshape?
Sorry David, In my attempt to simplify example and just include the code I felt was necessary I left out the loading of ggplot2, which then imports reshape2, and which was actually used in the code I provided. Sorry to the mistake and my misunderstanding of where the reshape function was coming from. Should have checked that more carefully. Thanks, Nate On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius dwinsem...@comcast.netwrote: On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote: Hi all, I have played a bit with the reshape package and function along with melt and cast, but I feel I still don't have a good handle on how to use them efficiently. Below I have included a application of reshape that is rather clunky and I'm hoping someone can offer advice on how to use reshape (or melt/cast) more efficiently. You do realize that the 'reshape' function is _not_ in the reshape package, right? And also that the reshape package has been superseded by the reshape2 package? -- David. #For this example I am using climate change data available on-line file - ( http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csvhttp://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv ) clim.data - read.csv(file, header=TRUE) library(lubridate) library(reshape) #I've been playing with the lubridate package a bit to work with dates, but as the climate dataset only uses year and month I have #added a day to each entry in the yr_mn column and then used dym from lubridate to generate the POSIXlt formatted dates in #a new column clim.data$date clim.data$yr_mn-paste(01, clim.data$yr_mn, sep=) clim.data$date-dym(clim.data$**yr_mn) #Now to the reshape. The dataframe is in a wide format. The columns GISS, HAD, NOAA, RSS, and UAH are all different sources #from which the global temperature anomaly has been calculated since 1880 (actually only 1978 for RSS and UAH). What I would like to #do is plot the temperature anomaly vs date and use ggplot to facet by the different data source (GISS, HAD, etc.). Thus I need the #data in long format with a date column, a temperature anomaly column, and a data source column. The code below works, but its #really very clunky and I'm sure I am not using these tools as efficiently as I can. #The varying=list(3:7) specifies the columns in the dataframe that corresponded to the sources (GISS, etc.), though then in the resulting #reshaped dataframe the sources are numbered 1-5, so I have to reassigned their names. In addition, the original dataframe has #additional data columns I do not want and so after reshaping I create another! dataframe with just the columns I need, and #then I have to rename them so that I can keep track of what everything is. Whew! Not the most elegant of code. d-reshape(clim.data, varying=list(3:7),idvar=date**, v.names=anomaly,direction=**long) d$time-ifelse(d$time==1,**GISS,d$time) d$time-ifelse(d$time==2,HAD**,d$time) d$time-ifelse(d$time==3,**NOAA,d$time) d$time-ifelse(d$time==4,RSS**,d$time) d$time-ifelse(d$time==5,UAH**,d$time) new.data-data.frame(d$date,d$**time,d$anomaly) names(new.data)-c(date,**source,anomaly) I realize this is a mess, though it works. I think with just some help on how better to work this example I'll probably get over the learning hump and actually figure out how to use these data manipulation functions more cleanly. Any advice or assistance would be appreciated. Thanks, Nate [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] CPOS from cwhmisc package not found
Hi: I wonder if anyone can help me about cpos function not found error: : path.package(cwhmisc, quiet = FALSE) [1] C:/Users/slee/Documents/R/win-library/2.15/cwhmisc So I have package cwhmisc where there is cpos function. But I got error: cpos(ab,b,1) Error: could not find function cpos Then I tried to install on R prompt but got this error message: install.packages(cwhmisc,lib=C:/Program Files/R/R-2.15.2/library/) Warning message: package 'cwhmisc' is not available (for R version 2.15.2) So I don't understand why the package is not available for the version of R I am running. Anyone has any idea? --- Shirley Lee [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting time series
Hi, Try this: seq1-seq(from=as.POSIXct(2012-05-30 18:30:00,tz=UTC),to=as.POSIXct(2012-05-31 02:30:00,tz=UTC),by=1 hour) seq2-seq(from=as.POSIXct(2012-05-31 00:30:00,tz=UTC),to=as.POSIXct(2012-05-31 08:30:00,tz=UTC),by=1 hour) seq3-seq(from=as.POSIXct(2012-05-31 06:30:00,tz=UTC),to=as.POSIXct(2012-05-31 07:30:00,tz=UTC),by=1 hour) Sys.setenv(TZ=UTC) Series1-c(seq1,seq2,seq3) split(Series1,rep(1:3,each=9)) #or individually if it is a small dataset Series1[1:9] Series1[10:18] etc. A.K. - Original Message - From: m p mzp3...@gmail.com To: r-h...@stat.math.ethz.ch Cc: Sent: Thursday, December 13, 2012 11:52 AM Subject: [R] subsetting time series Hello, my series of dates look like [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC [9] 2012-05-31 02:30:00 UTC 2012-05-31 00:30:00 UTC [11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC [13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC [15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC [17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC [19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC ... I'd like to subset this to four series 1) [1] 2012-05-30 18:30:00 UTC 2012-05-30 19:30:00 UTC [3] 2012-05-30 20:30:00 UTC 2012-05-30 21:30:00 UTC [5] 2012-05-30 22:30:00 UTC 2012-05-30 23:30:00 UTC [7] 2012-05-31 00:30:00 UTC 2012-05-31 01:30:00 UTC [9] 2012-05-31 02:30:00 UTC [10] 2012-05-31 18:30:00 UTC 2012-05-31 19:30:00 UTC ... 2) 2012-05-31 00:30:00 UTC - [1] [11] 2012-05-31 01:30:00 UTC 2012-05-31 02:30:00 UTC - [2,3] [13] 2012-05-31 03:30:00 UTC 2012-05-31 04:30:00 UTC [15] 2012-05-31 05:30:00 UTC 2012-05-31 06:30:00 UTC [17] 2012-05-31 07:30:00 UTC 2012-05-31 08:30:00 UTC [10] 2012-06-01 00:30:00 UTC 3) [19] 2012-05-31 06:30:00 UTC 2012-05-31 07:30:00 UTC ... so that I can plot data for each of the series separately without e.g. data at hour 2012-05-31 02:30:00 UTC connecting in the figure to 2012-05-31 00:30:00 UTC Basically, cycling through the series with period 9 Thanks for any suggestions/help, thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] abline of an lm fit not correct
Easting and northing data uses numbers requiring more digits than R's default of 7. In all the years I've used R, the only time I've needed to adjust the default digits is with easting and northing data. Try something like options(digits = 11) HTH On Thu, 13-Dec-2012 at 03:22PM +, Robert U wrote: | Hello fellow | R-users, | ?? | I???m stuck | with something i think is pretty stupid, but i can???t find out where i???m wrong, | and it???s turning me crazy! | ?? | I am doing | a very simple linear regression with Northing/Easting data, then I plot the | data as well as the regression line : | ?? | plot(x=Dataset$EASTING, | y=Dataset$NORTHING) | fit - lm(formula = NORTHING ~ EASTING, | data = Dataset) | abline(fit) | fit | ?? | Call: | lm(formula = NORTHING ~ EASTING, data = | Dataset) | ?? | Coefficients: | (Intercept)?? EASTING?? | ?? 5.376e+05?? 4.692e-02?? | ?? | Later on, when I use the | command ???abline??? with the coefficient provided by ???summary(fit)???, the line is | not the same as abline(fit) ! | ?? | To summarize, | those two lines are different: | ?? | abline(fit) | | abline(5.376e+05, 4.692e-02) | ?? | The ???b??? coefficients | appear equal, but the intercepts are different. | ?? | Where am I missing | something? L | ?? | Thanks | [[alternative HTML version deleted]] | | __ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) . Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I make a loop to extract a column from multiple lists and then bind them together to make a new matrix?
Try this ... MDD.mean.s10 - sapply(MC_MDD.noNA$results, function(x) x[[2]][, 7]) Jean On Thu, Dec 13, 2012 at 8:31 AM, Corinne Lapare corinnelap...@gmail.comwrote: Hi! I am new to looping and R in general; and I have sent wy to much time on this one problem and am about a hair away from doing it manually for the next two days. So, there is a package that while calculating the statistic creates lists (that look like matrices) in the background. Each item (there are 10 items) has one of these matrix looking list that I need to extract data from. The list has 5 rows that represent 5 groups, and 8 columns. I need to extract 3 of the columns (Lo Score=[,2], Hi Score=[,3], and Mean=[,7]) for each of the items. I then want to turn the extracted data into 3 matrices (Lo Score, Hi Score, and Mean) where the rows are the 5 groups and the columns are items 1-10. This is how I can create the mean matrix by hand. MDD.mean.s10 is the matrix I want in the end. (notice the first bracket after $results is the only part that changes 1-10 (to represent the 10 items) and the last bracket is [,7] to represent the mean located in column 7) m.1a - MC_MDD.noNA$results[[1]][[2]][,7] m.2b - MC_MDD.noNA$results[[2]][[2]][,7] m.3c - MC_MDD.noNA$results[[3]][[2]][,7] m.4d - MC_MDD.noNA$results[[4]][[2]][,7] m.5e - MC_MDD.noNA$results[[5]][[2]][,7] m.6f - MC_MDD.noNA$results[[6]][[2]][,7] m.7g - MC_MDD.noNA$results[[7]][[2]][,7] m.8h - MC_MDD.noNA$results[[8]][[2]][,7] m.9i - MC_MDD.noNA$results[[9]][[2]][,7] m.10j - MC_MDD.noNA$results[[10]][[2]][,7] MDD.mean.s10 - cbind(m.1a, m.2b, m.3c, m.4d, m.5e, m.6f, m.7g, m.8h, m.9i, m.10j) MDD.mean.s10 m.1a m.2b m.3c m.4d m.5e m.6f m.7g m.8h m.9i m.10j [1,] 0.8707865 0.7393939 0.7769231 0.7591241 0.853 0.7925926 0.8258065 0.8410596 0.8843931 0.5638298 [2,] 0.8323353 0.7302632 0.5913978 0.5868263 0.6923077 0.6182796 0.6964286 0.6839080 0.7911392 0.3212121 [3,] 0.8726115 0.7159763 0.7117647 0.6163522 0.7987805 0.7105263 0.7613636 0.7674419 0.8034682 0.4011299 [4,] 0.9024390 0.7894737 0.7795276 0.6530612 0.8593750 0.7112676 0.8672566 0.8629032 0.9152542 0.4834437 [5,] 0.986 0.9102564 0.8452381 0.8160920 0.9726027 0.8658537 0.8352941 0.9342105 0.947 0.6454545 But I cant do this by hand every time, as this comes up over and over and over again in multiple lists. I have figured out how to loop this procedure and name the vector as it goes along: for(i in 1:10){ + assign(paste(m, i, sep = ), MC_MDD.noNA$results[[i]][[2]][,7]) + } m1 [1] 0.8707865 0.8323353 0.8726115 0.9024390 0.986 m2 [1] 0.7393939 0.7302632 0.7159763 0.7894737 0.9102564 m3 [1] 0.7769231 0.5913978 0.7117647 0.7795276 0.8452381 m4 [1] 0.7591241 0.5868263 0.6163522 0.6530612 0.8160920 m5 [1] 0.853 0.6923077 0.7987805 0.8593750 0.9726027 m6 [1] 0.7925926 0.6182796 0.7105263 0.7112676 0.8658537 m7 [1] 0.8258065 0.6964286 0.7613636 0.8672566 0.8352941 m8 [1] 0.8410596 0.6839080 0.7674419 0.8629032 0.9342105 m9 [1] 0.8843931 0.7911392 0.8034682 0.9152542 0.947 m10 [1] 0.5638298 0.3212121 0.4011299 0.4834437 0.6454545 Now here where I get stuck how do I cbind these vectors without typing it out expliciity? ie. mean.MDD - cbind(m1,m2,m3,m4,m5,m6,m7,m8,m9,10) Everything I have tried keeps overwriting the data instead of building a matrix. Basically I, start with a matrix (5x10) of zeros. Then I wind up with a few values in the beginning, but the rest is still zeros. Example of terrible code: fo - matrix(0,5,10) colnames(fo) - paste('f', 1:10, sep = ) fo f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 [1,] 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 for(i in 1:10){ + fo - assign(paste(f, i, sep = ), MC_MDD.noNA$results[[i]][[2]][,7]) + } fo [1] 0.5638298 0.3212121 0.4011299 0.4834437 0.6454545 fo - matrix(0,5,10) colnames(fo) - paste('f', 1:10, sep = ) fo f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 [1,] 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 for(i in 1:10){ + fo - cbind(assign(paste(f, i, sep = ), MC_MDD.noNA$results[[i]][[2]][,7])) + } fo [,1] [1,] 0.5638298 [2,] 0.3212121 [3,] 0.4011299 [4,] 0.4834437 [5,] 0.6454545 Thanks for your help in advance!!! (c: [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] subsetting time series
Thats works perfectly, thanks a lot, Mark On Thu, Dec 13, 2012 at 11:34 AM, arun smartpink...@yahoo.com wrote: Hi, Try this: seq1-seq(from=as.POSIXct(2012-05-30 18:30:00,tz=UTC),to=as.POSIXct(2012-05-31 02:30:00,tz=UTC),by=1 hour) seq2-seq(from=as.POSIXct(2012-05-31 00:30:00,tz=UTC),to=as.POSIXct(2012-05-31 08:30:00,tz=UTC),by=1 hour) seq3-seq(from=as.POSIXct(2012-05-31 06:30:00,tz=UTC),to=as.POSIXct(2012-05-31 07:30:00,tz=UTC),by=1 hour) Sys.setenv(TZ=UTC) Series1-c(seq1,seq2,seq3) split(Series1,rep(1:3,each=9)) #or individually if it is a small dataset Series1[1:9] Series1[10:18] etc. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Available Memory
I have a large database on sql Server 2012 Developers edition, Windows 7 ultimate edition, some of my tables are as large as 10GB, I am running R15.2 with a 64-bit build I have been connecting fine to the database and extracting info. but it seams this was the first time I tried to pull a large (1/2 gb) amount of data in one query, The query didn't have anything fancy, it was code that always worked! R dropped the work without providing an error message. I got the sand clock running for a couple of seconds, as if R had stared communication with the database, but then nothing. I looked at my windows task manger, and CPU utilization was at zero. I ran memory.size() function to confirm availability of memory and it read 24 thousand I don't remember the rest, I have 24GBs of ram on my computer. the size of the other R objects in memory was around 2GB I used RODBC to connect to the database, I understand the number you get when you run memory.size is in thousands of MBs, so a read of 24,000 means 24GB, which is consistent with the amount of ram in my machine. Is there anything that I missed? is there another way to check availability of memory? or allocated memory for an R session? Are there issues with RODBC which might cause a failure of data transfer when the amount of data requested is large? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CPOS from cwhmisc package not found
On Dec 13, 2012, at 10:56 AM, Shirley Lee wrote: Hi: I wonder if anyone can help me about cpos function not found error: : path.package(cwhmisc, quiet = FALSE) [1] C:/Users/slee/Documents/ R/win-library/2.15/cwhmisc So I have package cwhmisc where there is cpos function. But I got error: cpos(ab,b,1) Error: could not find function cpos Then I tried to install on R prompt but got this error message: install.packages(cwhmisc,lib=C:/Program Files/R/R-2.15.2/ library/) Warning message: package 'cwhmisc' is not available (for R version 2.15.2) So I don't understand why the package is not available for the version of R I am running. Anyone has any idea? http://cran.r-project.org/web/packages/cwhmisc/index.html It has been withdrawn for some reason. You are advised in the Posting Guide (that no one seems to read) to contact the maintainer (or search the Archives) for such questions. -- David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] recursion depth limitations
On 13 December 2012 17:52, Simon Urbanek simon.urba...@r-project.org wrote: Because it's far beyond what you can handle without changing a lot of other things. 500k expressions will require at least about 320Mb of stack (!) in the eval() chain alone -- compare that to the 8Mb stack size which is default in most OSes, so you'll hit the wall way before that limit is reached. Thank you for the explanation. Sorry to be dummy on this but why one need a stack? I thought pointing to itself has no memory cost for a function. Is it about how compilers designed or about R being dynamic language? While some algorithms are only solvable feasibility with recursion and 500K sounds not too much i.e. graph algorithms for example dependency trees with large nodes easily reach to that number. I don't see how large nodes have anything to do with this since we are talking about expression depth, not about sizes of any kind. Again, in any realistic example you'll hit other limits first anyway. I was thinking about very big tree with large depth, so each recursion step may correspond to one leaf. Well not sure what would be the application that has million depth, maybe a genetic algorithm. Cheers, Mehmet __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] changing character strings with hash marks
Hi R users, I am quite new to R and I don't know how to deal with this (surely) easy issue. I need to replace words in sentences with as many hash marks as the number of characters per each word, as in the following example: Mary plays football # Any suggestion about the function to be used? Thanks a lot. S. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] changing character strings with hash marks
On 13.12.2012 22:30, simona mancini wrote: Hi R users, I am quite new to R and I don't know how to deal with this (surely) easy issue. I need to replace words in sentences with as many hash marks as the number of characters per each word, as in the following example: Mary plays football # gsub([[:alpha:]], #, Mary plays football) Uwe Ligges Any suggestion about the function to be used? Thanks a lot. S. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] changing character strings with hash marks
Simona: If you intend to work with text, you need to learn about regular expressions. There are many tutorials on this topic on the web. Go search. Then learn about how R handles them via: ?regex ## at the R prompt Then ask your question more clearly, although by this time you'll probably have figured it out yourself: For example, you failed to specify whether punctuation could appear in the sentences or what language (and character set) is used. Finally, an answer (there are others) to the question you posed -- which is probably not going to be sufficient -- is: gsub([^ ],#,Mary plays football) [1] # Cheers, Bert On Thu, Dec 13, 2012 at 1:30 PM, simona mancini mancinisim...@yahoo.itwrote: Hi R users, I am quite new to R and I don't know how to deal with this (surely) easy issue. I need to replace words in sentences with as many hash marks as the number of characters per each word, as in the following example: Mary plays football # Any suggestion about the function to be used? Thanks a lot. S. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-hpc] recursion depth limitations
Hello. Inline. Em 13-12-2012 21:31, Suzen, Mehmet escreveu: On 13 December 2012 17:52, Simon Urbanek simon.urba...@r-project.org wrote: Because it's far beyond what you can handle without changing a lot of other things. 500k expressions will require at least about 320Mb of stack (!) in the eval() chain alone -- compare that to the 8Mb stack size which is default in most OSes, so you'll hit the wall way before that limit is reached. Thank you for the explanation. Sorry to be dummy on this but why one need a stack? I thought pointing to itself has no memory cost for a function. But it does, each recursive call will load another copy of the function, and another copy of the variables used. In fact, the cost can become quite large since everything is loaded in memory again. Hope this helps, Rui Barradas Is it about how compilers designed or about R being dynamic language? While some algorithms are only solvable feasibility with recursion and 500K sounds not too much i.e. graph algorithms for example dependency trees with large nodes easily reach to that number. I don't see how large nodes have anything to do with this since we are talking about expression depth, not about sizes of any kind. Again, in any realistic example you'll hit other limits first anyway. I was thinking about very big tree with large depth, so each recursion step may correspond to one leaf. Well not sure what would be the application that has million depth, maybe a genetic algorithm. Cheers, Mehmet __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] replace parenthetical phrases in a string
R-helpers, I have a vector of character strings in which I would like to replace each parenthetical phrase with a single space, . For example if I start with x, I would like to end up with y. x - c(My toast=bog(keep=3 no=4) and eggs(er34)omit=32, dogs have ears, cats have tails (and ears, too!)) y - c(My toast=bog and eggs omit=32, dogs have ears, cats have tails ) I'm guessing that this can be done with gsub(), but I have never mastered the mysteries of regular expressions. I would greatly appreciate any pointers. Thanks. Jean P.S. I'm using R version 2.15.2 on Windows 7. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replace parenthetical phrases in a string
My apologies. I sent too soon! I did a bit more digging, and found a solution on the R-help archives. y - gsub( *\\([^)]*\\) *, , x) Jean On Thu, Dec 13, 2012 at 4:53 PM, Adams, Jean jvad...@usgs.gov wrote: R-helpers, I have a vector of character strings in which I would like to replace each parenthetical phrase with a single space, . For example if I start with x, I would like to end up with y. x - c(My toast=bog(keep=3 no=4) and eggs(er34)omit=32, dogs have ears, cats have tails (and ears, too!)) y - c(My toast=bog and eggs omit=32, dogs have ears, cats have tails ) I'm guessing that this can be done with gsub(), but I have never mastered the mysteries of regular expressions. I would greatly appreciate any pointers. Thanks. Jean P.S. I'm using R version 2.15.2 on Windows 7. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] duplicated.data.frame() and POSIXct with DST shift
Hi, I encountered the behavior, that the duplicated method for data.frames gives false positives if there are columns of class POSIXct with a clock shift from DST to standard time. time - as.POSIXct(2012-10-28 02:00, tz=Europe/Vienna) + c(0, 60*60) time [1] 2012-10-28 02:00:00 CEST 2012-10-28 02:00:00 CET df - data.frame(time, text=foo) duplicated(df) [1] FALSE TRUE This is because the timezone is lost after calling paste(): do.call(paste, c(df, sep = \r)) [1] 2012-10-28 02:00:00\rfoo 2012-10-28 02:00:00\rfoo I can't really figure out if this behavior is desired or not. If so, a short warning in ?duplicated could be helpful. It is mentioned how duplicated.data.frame() works, but I didn't find a hint to properly handle POSIXct-objects. My particular problem was to cast a data.frame like this one with cast() (which calls reshape1(), which calls duplicated()): df2 - data.frame(time, time1=as.numeric(time), lab=rep(1:3, each=2), value=101:106, text=rep(c(foo, bar), each=3)) library(reshape2) Using the column of class POSIXct as a variable in the formula gives: cast(lab*time~text, data=df2, value=value) Aggregation requires fun.aggregate: length used as default labtime bar foo 1 1 2012-10-28 02:00:00 0 2 2 2 2012-10-28 02:00:00 1 1 3 3 2012-10-28 02:00:00 2 0 Converting to numeric, casting and converting back works as expected, although the timezone is not visible, because print.data.frame() calls format.POSIXct() with, usetz = FALSE: y - cast(lab*time1~text, data=df2, value=value) y$time1 - as.POSIXct(1970-01-01 01:00) + as.numeric(y$time1) Can anyone suggest a more elegant solution? Best, Tobias __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] changing character strings with hash marks
Hi, You could also use: gsub(\\w,#,Mary plays football) #[1] # #or gsub([A-Za-z], #, Mary plays football) A.K. - Original Message - From: Uwe Ligges lig...@statistik.tu-dortmund.de To: simona mancini mancinisim...@yahoo.it Cc: r-help@r-project.org r-help@r-project.org Sent: Thursday, December 13, 2012 5:38 PM Subject: Re: [R] changing character strings with hash marks On 13.12.2012 22:30, simona mancini wrote: Hi R users, I am quite new to R and I don't know how to deal with this (surely) easy issue. I need to replace words in sentences with as many hash marks as the number of characters per each word, as in the following example: Mary plays football # gsub([[:alpha:]], #, Mary plays football) Uwe Ligges Any suggestion about the function to be used? Thanks a lot. S. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Combined Marimekko/heatmap
Hi all, I'm trying to figure out a way to create a data graphic that I haven't ever seen an example of before, but hopefully there's an R package out there for it. The idea is to essentially create a heatmap, but to allow each column and/or row to be a different width, rather than having uniform column and row height. This is sort of like a Marimekko chart in appearance, except that rather than use a single color to represent the category, the color represents a value and all the y-axis heights in each column line up with each other. That way color represents one variable, while the area of the cell represents another. In my application, my heatmap has discrete categorical data rather than continuous. Rows are countries, columns are appliances, and I want to scale the width and height of each column to be the fraction of global energy consumed by the country and the fraction of energy use consumed by that appliance type. The color coding would then indicate whether or not that appliance is regulated in that country. Any ideas how to make such a chart, or even what it might be called? Neal Humphrey nhumph...@clasponline.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] neural net
Hi Thanks for your reply. I have compared my data with some other which works and I cannot see the difference... The structure of my data is shown below: str(data) 'data.frame': 19 obs. of 7 variables: $ drug: Factor w/ 19 levels A,B,C,D,..: 1 2 3 4 5 6 7 8 9 10 ... $ param1 : int 111 347 335 477 863 737 390 209 376 262 ... $ param2 : int 15 13 9 37 24 28 63 93 72 16 ... $ param3 : int 125 280 119 75 180 150 167 200 201 205 ... $ param4 : int 40 55 89 2 10 15 12 48 45 49 ... $ param5 : num 0.5 3 -40 0 5 6 0 45 -60 25 ... $ Class : int 1 2 1 1 2 2 3 3 3 3 ... summary(data) drugparam1 param2 param3 param4 param5 Class A : 1 Min. :111.0 Min. : 2.0 Min. : 75.0 Min. :-20.00 Min. :-60.000 Min. :1.000 B : 1 1st Qu.:253.5 1st Qu.:15.0 1st Qu.:132.5 1st Qu.: 12.00 1st Qu.: 0.000 1st Qu.:1.000 C : 1 Median :335.0 Median :28.0 Median :164.0 Median : 40.00 Median : 6.000 Median :2.000 D : 1 Mean :383.0 Mean :33.0 Mean :166.0 Mean : 35.26 Mean : 4.447 Mean :1.895 E : 1 3rd Qu.:433.5 3rd Qu.:42.5 3rd Qu.:200.5 3rd Qu.: 54.00 3rd Qu.: 20.500 3rd Qu.:2.000 F : 1 Max. :863.0 Max. :93.0 Max. :280.0 Max. : 89.00 Max. : 45.000 Max. :3.000 (Other):13 The structure of the example data which worked is shown below: str(infert) 'data.frame': 248 obs. of 8 variables: $ education : Factor w/ 3 levels 0-5yrs,6-11yrs,..: 1 1 1 1 2 2 2 2 2 2 ... $ age : num 26 42 39 34 35 36 23 32 21 28 ... $ parity: num 6 1 6 4 3 4 1 2 1 2 ... $ induced : num 1 1 2 2 1 2 0 0 0 0 ... $ case : num 1 1 1 1 1 1 1 1 1 1 ... $ spontaneous : num 2 0 0 0 1 1 0 0 1 0 ... $ stratum : int 1 2 3 4 5 6 7 8 9 10 ... $ pooled.stratum: num 3 1 4 2 32 36 6 22 5 19 ... summary(infert) educationageparity inducedcase spontaneousstratum pooled.stratum 0-5yrs : 12 Min. :21.00 Min. :1.000 Min. :0. Min. :0. Min. :0. Min. : 1.00 Min. : 1.00 6-11yrs:120 1st Qu.:28.00 1st Qu.:1.000 1st Qu.:0. 1st Qu.:0. 1st Qu.:0. 1st Qu.:21.00 1st Qu.:19.00 12+ yrs:116 Median :31.00 Median :2.000 Median :0. Median :0. Median :0. Median :42.00 Median :36.00 Mean :31.50 Mean :2.093 Mean :0.5726 Mean :0.3347 Mean :0.5766 Mean :41.87 Mean :33.58 3rd Qu.:35.25 3rd Qu.:3.000 3rd Qu.:1. 3rd Qu.:1. 3rd Qu.:1. 3rd Qu.:62.25 3rd Qu.:48.25 Max. :44.00 Max. :6.000 Max. :2. Max. :1. Max. :2. Max. :83.00 Max. :63.00 So still not sure how to solve the problem _ From: PIKAL Petr [petr.pi...@precheza.cz] Sent: 13 December 2012 07:16 To: dada; r-help@r-project.org Subject: RE: [R] neural net Hi -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of dada Sent: Thursday, December 13, 2012 12:41 AM To: r-help@r-project.org Subject: [R] neural net Hi I would like to do neural netowrk analysis on my data. It look like this: drug param1 param2 param3 param4 param5 class A 111 15 125 40 0.5 1 B 347 13 280 55 3 2 C 335 9 119 89 -40 1 D 477 37 75 2 0 1 E 863 24 180 10 5 2 F 737 28 150 15 6 2 G 390 63 167 12 0 3 H 209 93 200 48 45 3 I 376 72 201 45 -60 3 J 262 16 205 49 25 3 K 273 39 267 53 11 1 L 192 33 164 19 15 2 M 282 2 213 86 30 1 N 111 11 198 68 -21 1 O 387 20 143 12 16 2 P 674 15 78 -20 -17 2 R 734 54 140 24 7 2 S 272 46 159 57 28 2 T 245 37 90 6 31 2 I have entered the code below: nn - neuralnet( + class~param1+param2+param3+param4+param5+param5, + data=mydata, hidden=2, err.fct=ce, + linear.output=FALSE) However the error appeared: Error in model.frame.default(formula.reverse, data) : object is not a matrix I changed the data frame to matrix: mydata.mat=as.matrix(mydata) This is not very wise. It changes all numeric values to character. From documentation and your data frame there is nothing obviously wrong. However you
Re: [R] duplicated.data.frame() and POSIXct with DST shift
On Dec 13, 2012, at 1:43 PM, Tobias Gauster wrote: Hi, I encountered the behavior, that the duplicated method for data.frames gives false positives if there are columns of class POSIXct with a clock shift from DST to standard time. time - as.POSIXct(2012-10-28 02:00, tz=Europe/Vienna) + c(0, 60*60) time [1] 2012-10-28 02:00:00 CEST 2012-10-28 02:00:00 CET df - data.frame(time, text=foo) duplicated(df) [1] FALSE TRUE In this instance This is because the timezone is lost after calling paste(): do.call(paste, c(df, sep = \r)) I suspect the problem arise when 'paste' coerces to character: as.character(time) [1] 2012-10-28 02:00:00 2012-10-28 02:00:00 I think that as.character might get missed since the 'paste' operation is done internally. as.character(time, usetz=TRUE) [1] 2012-10-28 02:00:00 CEST 2012-10-28 02:00:00 CET -- David. [1] 2012-10-28 02:00:00\rfoo 2012-10-28 02:00:00\rfoo I can't really figure out if this behavior is desired or not. If so, a short warning in ?duplicated could be helpful. It is mentioned how duplicated.data.frame() works, but I didn't find a hint to properly handle POSIXct-objects. There is no duplicated.POSIXct method My particular problem was to cast a data.frame like this one with cast() (which calls reshape1(), which calls duplicated()): df2 - data.frame(time, time1=as.numeric(time), lab=rep(1:3, each=2), value=101:106, text=rep(c(foo, bar), each=3)) library(reshape2) Using the column of class POSIXct as a variable in the formula gives: cast(lab*time~text, data=df2, value=value) Aggregation requires fun.aggregate: length used as default labtime bar foo 1 1 2012-10-28 02:00:00 0 2 2 2 2012-10-28 02:00:00 1 1 3 3 2012-10-28 02:00:00 2 0 Converting to numeric, casting and converting back works as expected, although the timezone is not visible, because print.data.frame() calls format.POSIXct() with, usetz = FALSE: y - cast(lab*time1~text, data=df2, value=value) y$time1 - as.POSIXct(1970-01-01 01:00) + as.numeric(y$time1) Can anyone suggest a more elegant solution? Best, Tobias __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to select a subset data to do a barplot in ggplot2
Hi: The simplest way to do it is to modify the input data frame by taking out the records not having status live or dead and then redefining the factor in the new data frame to get rid of the removed levels. Calling your input data frame DF rather than data, DF - structure(list(FID = c(1L, 1L, 2L, 2L, 2L, 2L, 6L, 6L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 17L, 17L, 17L), IID = c(4621L, 4628L, 4631L, 4632L, 4633L, 4634L, 4675L, 4679L, 4716L, 4719L, 4721L, 4726L, 4728L, 4730L, 4732L, 4783L, 4783L, 4784L), STATUS = structure(c(2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 3L, 3L, 2L, 2L, 2L, 2L), .Label = c(dead, live, nosperm), class = factor)), .Names = c(FID, IID, STATUS), class = data.frame, row.names = c(NA, -18L )) # The right hand side above came from dput(DF), where DF was created by # DF - read.table(textConnection(your posted data), header = TRUE) # Consider using dput() to represent your data in the future. # Retain the records with status live or dead only DF2 - DF[DF$STATUS %in% c(live, dead), ] # This does not get rid of the original levels... levels(DF2$STATUS) # ...so redefine the factor DF2$STATUS - factor(DF2$STATUS) str(DF2) 'data.frame': 16 obs. of 3 variables: $ FID : int 1 1 2 2 2 2 6 6 10 10 ... $ IID : int 4621 4628 4631 4632 4633 4634 4675 4679 4716 4719 ... $ STATUS: Factor w/ 2 levels dead,live: 2 1 2 2 2 2 2 1 1 2 ... # now plot: # (1) FID numeric ggplot(DF2, aes(x = FID, fill = STATUS)) + geom_bar() # (2) FID factor ggplot(DF2, aes(x = factor(FID), fill = STATUS)) + geom_bar() The second one makes more sense to me, but you may have reasons to prefer the first. Dennis On Thu, Dec 13, 2012 at 4:38 AM, Yao He yao.h.1...@gmail.com wrote: FID IID STATUS 14621live 14628dead 24631live 24632live 24633live 24634live 64675live 64679dead 104716dead 104719live 104721dead 114726live 114728nosperm 114730nosperm 124732live 174783live 174783live 174784live __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I read the following complicated table
Hello, I have a table (in a txt file) which look like this: Monday 12 78 89 Tuesday 34 44 67 Wednesday 78 98 2 Thursday 34 55 4 Then the table repeats Monday , Tuesday, ... followed by several numbers My goal is to read values after the table. My problem is a little more complicated, but I just present a simpler case for ease of illustration. Is there any way to ask R to read several number after you see the word 'Monday' and store somewhere, and read several number after you see the word 'Tuesday' and store somewhere? Thanks, miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] duplicated.data.frame() and POSIXct with DST shift
On Dec 13, 2012, at 5:01 PM, David Winsemius wrote: On Dec 13, 2012, at 1:43 PM, Tobias Gauster wrote: Hi, I encountered the behavior, that the duplicated method for data.frames gives false positives if there are columns of class POSIXct with a clock shift from DST to standard time. time - as.POSIXct(2012-10-28 02:00, tz=Europe/Vienna) + c(0, 60*60) time [1] 2012-10-28 02:00:00 CEST 2012-10-28 02:00:00 CET df - data.frame(time, text=foo) duplicated(df) [1] FALSE TRUE In this instance This is because the timezone is lost after calling paste(): do.call(paste, c(df, sep = \r)) I suspect the problem arise when 'paste' coerces to character: as.character(time) [1] 2012-10-28 02:00:00 2012-10-28 02:00:00 I think that as.character might get missed since the 'paste' operation is done internally. as.character(time, usetz=TRUE) [1] 2012-10-28 02:00:00 CEST 2012-10-28 02:00:00 CET This would work as intended if you pre-processed the argument to duplicated with: data.frame(lapply(df, as.character, usetz=TRUE) ) time text 1 2012-10-28 02:00:00 CEST foo 2 2012-10-28 02:00:00 CET foo duplicated( data.frame(lapply(df, as.character, usetz=TRUE) ) ) [1] FALSE FALSE -- David. [1] 2012-10-28 02:00:00\rfoo 2012-10-28 02:00:00\rfoo I can't really figure out if this behavior is desired or not. If so, a short warning in ?duplicated could be helpful. It is mentioned how duplicated.data.frame() works, but I didn't find a hint to properly handle POSIXct-objects. There is no duplicated.POSIXct method My particular problem was to cast a data.frame like this one with cast() (which calls reshape1(), which calls duplicated()): df2 - data.frame(time, time1=as.numeric(time), lab=rep(1:3, each=2), value=101:106, text=rep(c(foo, bar), each=3)) library(reshape2) Using the column of class POSIXct as a variable in the formula gives: cast(lab*time~text, data=df2, value=value) Aggregation requires fun.aggregate: length used as default labtime bar foo 1 1 2012-10-28 02:00:00 0 2 2 2 2012-10-28 02:00:00 1 1 3 3 2012-10-28 02:00:00 2 0 Converting to numeric, casting and converting back works as expected, although the timezone is not visible, because print.data.frame() calls format.POSIXct() with, usetz = FALSE: y - cast(lab*time1~text, data=df2, value=value) y$time1 - as.POSIXct(1970-01-01 01:00) + as.numeric(y$time1) Can anyone suggest a more elegant solution? Best, Tobias __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I read the following complicated table
What have you tried so far that did not work, and what do you want the result of your reading the text file look like? What is store somewhere? Why does myDF - read.table( myData.txt ) which gives you myDF V1 V2 V3 V4 1Monday 12 78 89 2 Tuesday 34 44 67 3 Wednesday 78 98 2 4 Thursday 34 55 4 as a starting point, not suffice? Rgds, Rainer On Friday 14 December 2012 10:50:56 jpm miao wrote: Hello, I have a table (in a txt file) which look like this: Monday 12 78 89 Tuesday 34 44 67 Wednesday 78 98 2 Thursday 34 55 4 Then the table repeats Monday , Tuesday, ... followed by several numbers My goal is to read values after the table. My problem is a little more complicated, but I just present a simpler case for ease of illustration. Is there any way to ask R to read several number after you see the word 'Monday' and store somewhere, and read several number after you see the word 'Tuesday' and store somewhere? Thanks, miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replace parenthetical phrases in a string
Hi, I guess there are some problems with spaces in this solution. y [1] My toast=bog and eggs omit=32 dogs have ears [3] cats have tails gsub( *\\([^)]*\\) *, , x) #[1] My toast=bogand eggsomit=32 dogs have ears #[3] cats have tails You could try this: gsub((\\(.*\\))+?, ,x) #[1] My toast=bog and eggs omit=32 dogs have ears #[3] cats have tails A.K. - Original Message - From: Adams, Jean jvad...@usgs.gov To: r-help@r-project.org Cc: Sent: Thursday, December 13, 2012 6:03 PM Subject: Re: [R] replace parenthetical phrases in a string My apologies. I sent too soon! I did a bit more digging, and found a solution on the R-help archives. y - gsub( *\\([^)]*\\) *, , x) Jean On Thu, Dec 13, 2012 at 4:53 PM, Adams, Jean jvad...@usgs.gov wrote: R-helpers, I have a vector of character strings in which I would like to replace each parenthetical phrase with a single space, . For example if I start with x, I would like to end up with y. x - c(My toast=bog(keep=3 no=4) and eggs(er34)omit=32, dogs have ears, cats have tails (and ears, too!)) y - c(My toast=bog and eggs omit=32, dogs have ears, cats have tails ) I'm guessing that this can be done with gsub(), but I have never mastered the mysteries of regular expressions. I would greatly appreciate any pointers. Thanks. Jean P.S. I'm using R version 2.15.2 on Windows 7. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] neural net
Hi, I tried your dataset. I couldn't reproduce the Error: message. Instead, mydata-read.table(text= drug param1 param2 param3 param4 param5 class A 111 15 125 40 0.5 1 B 347 13 280 55 3 2 C 335 9 119 89 -40 1 D 477 37 75 2 0 1 E 863 24 180 10 5 2 F 737 28 150 15 6 2 G 390 63 167 12 0 3 H 209 93 200 48 45 3 I 376 72 201 45 -60 3 J 262 16 205 49 25 3 K 273 39 267 53 11 1 L 192 33 164 19 15 2 M 282 2 213 86 30 1 N 111 11 198 68 -21 1 O 387 20 143 12 16 2 P 674 15 78 -20 -17 2 R 734 54 140 24 7 2 S 272 46 159 57 28 2 T 245 37 90 6 31 2 ,sep=,header=TRUE,stringsAsFactors=TRUE) library(neuralnet) nn - neuralnet( class~param1+param2+param3+param4+param5+param5, #param5 is duplicated(typo?) data=mydata, hidden=2, err.fct=ce, linear.output=FALSE) #Warning message: #'err.fct' was automatically set to sum of squared error (sse), because the response is not binary nn #Call: neuralnet(formula = class ~ param1 + param2 + param3 + param4 + param5 + param5, data = mydata, hidden = 2, err.fct = ce, linear.output = FALSE) # #1 repetition was calculated. # # Error Reached Threshold Steps #1 12.50980687 0.009804371657 30 plot(nn) sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] neuralnet_1.31 MASS_7.3-16 stringr_0.6 reshape_0.8.4 plyr_1.7.1 loaded via a namespace (and not attached): [1] tools_2.15.0 A.K. - Original Message - From: Katarzyna Nurzynska pa...@nottingham.ac.uk To: PIKAL Petr petr.pi...@precheza.cz; r-help@r-project.org r-help@r-project.org Cc: Sent: Thursday, December 13, 2012 10:56 AM Subject: Re: [R] neural net Hi Thanks for your reply. I have compared my data with some other which works and I cannot see the difference... The structure of my data is shown below: str(data) 'data.frame': 19 obs. of 7 variables: $ drug : Factor w/ 19 levels A,B,C,D,..: 1 2 3 4 5 6 7 8 9 10 ... $ param1 : int 111 347 335 477 863 737 390 209 376 262 ... $ param2 : int 15 13 9 37 24 28 63 93 72 16 ... $ param3 : int 125 280 119 75 180 150 167 200 201 205 ... $ param4 : int 40 55 89 2 10 15 12 48 45 49 ... $ param5 : num 0.5 3 -40 0 5 6 0 45 -60 25 ... $ Class : int 1 2 1 1 2 2 3 3 3 3 ... summary(data) drug param1 param2 param3 param4 param5 Class A : 1 Min. :111.0 Min. : 2.0 Min. : 75.0 Min. :-20.00 Min. :-60.000 Min. :1.000 B : 1 1st Qu.:253.5 1st Qu.:15.0 1st Qu.:132.5 1st Qu.: 12.00 1st Qu.: 0.000 1st Qu.:1.000 C : 1 Median :335.0 Median :28.0 Median :164.0 Median : 40.00 Median : 6.000 Median :2.000 D : 1 Mean :383.0 Mean :33.0 Mean :166.0 Mean : 35.26 Mean : 4.447 Mean :1.895 E : 1 3rd Qu.:433.5 3rd Qu.:42.5 3rd Qu.:200.5 3rd Qu.: 54.00 3rd Qu.: 20.500 3rd Qu.:2.000 F : 1 Max. :863.0 Max. :93.0 Max. :280.0 Max. : 89.00 Max. : 45.000 Max. :3.000 (Other):13 The structure of the example data which worked is shown below: str(infert) 'data.frame': 248 obs. of 8 variables: $ education : Factor w/ 3 levels 0-5yrs,6-11yrs,..: 1 1 1 1 2 2 2 2 2 2 ... $ age : num 26 42 39 34 35 36 23 32 21 28 ... $ parity : num 6 1 6 4 3 4 1 2 1 2 ... $ induced : num 1 1 2 2 1 2 0 0 0 0 ... $ case : num 1 1 1 1 1 1 1 1 1 1 ... $ spontaneous : num 2 0 0 0 1 1 0 0 1 0 ... $ stratum : int 1 2 3 4 5 6 7 8 9 10 ... $ pooled.stratum: num 3 1 4 2 32 36 6 22 5 19 ... summary(infert) education age parity induced case spontaneous stratum pooled.stratum 0-5yrs : 12 Min. :21.00 Min. :1.000 Min. :0. Min. :0. Min. :0. Min. : 1.00 Min. : 1.00 6-11yrs:120 1st Qu.:28.00 1st Qu.:1.000 1st Qu.:0. 1st Qu.:0. 1st Qu.:0. 1st Qu.:21.00 1st Qu.:19.00 12+ yrs:116 Median :31.00 Median :2.000 Median :0. Median :0. Median :0. Median :42.00 Median :36.00 Mean :31.50 Mean :2.093 Mean :0.5726 Mean :0.3347 Mean :0.5766 Mean :41.87 Mean :33.58 3rd Qu.:35.25 3rd Qu.:3.000 3rd Qu.:1. 3rd Qu.:1. 3rd Qu.:1. 3rd Qu.:62.25 3rd Qu.:48.25
Re: [R] How can I read the following complicated table
Hi, If it is a dataframe with four columns. dat1-read.table(text= Monday 12 78 89 Tuesday 34 44 67 Wednesday 78 98 2 Thursday 34 55 4 Friday 14 25 13 Monday 18 75 56 Tuesday 28 42 65 ,header=FALSE,stringsAsFactors=FALSE) dat1Mon-dat1[,-1][dat1[,1]==Monday,] #rows with first column Monday dat1Tue-dat1[,-1][dat1[,1]==Tuesday,] #rows with first column Tuesday dat1Tue # V2 V3 V4 #2 34 44 67 #7 28 42 65 #You can repeat that for other days #If the table is like this: vec1-readLines(textConnection(Monday 12 78 89 Tuesday 34 44 67 Wednesday 78 98 2 Thursday 34 55 4 Friday 14 25 13 Monday 18 75 56 Tuesday 28 42 65)) vec1Mon-unlist(strsplit(gsub(\\D+, ,vec1[grep(Monday,vec1)]),split= )) vec1Mon-as.numeric(vec1Mon[vec1Mon!=]) vec1Mon #[1] 12 78 89 18 75 56 vec1Tue-unlist(strsplit(gsub(\\D+, ,vec1[grep(Tuesday,vec1)]),split= )) vec1Tue-as.numeric(vec1Tue[vec1Tue!=]) vec1Tue #[1] 34 44 67 28 42 65 #etc. A.K. - Original Message - From: jpm miao miao...@gmail.com To: r-help r-help@r-project.org Cc: Sent: Thursday, December 13, 2012 9:50 PM Subject: [R] How can I read the following complicated table Hello, I have a table (in a txt file) which look like this: Monday 12 78 89 Tuesday 34 44 67 Wednesday 78 98 2 Thursday 34 55 4 Then the table repeats Monday , Tuesday, ... followed by several numbers My goal is to read values after the table. My problem is a little more complicated, but I just present a simpler case for ease of illustration. Is there any way to ask R to read several number after you see the word 'Monday' and store somewhere, and read several number after you see the word 'Tuesday' and store somewhere? Thanks, miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.