[R] Constructing a list using a function...
Hi All I have a dataframe: myframe-data.frame(ID=c(first,second),x=c(1,2),y=c(3,4)) And I have a function myfun: myfun-function(x,y) x+y I would like to write a function myfun2 that takes myframe and myfun as parameters and returns a list as below: mylist $first [1] 4 $second [2] 6 Could you please help me with this? Doesn't seem like the apply family of functions were intended for this case. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constructing a list using a function...
Small typo in dataframe representation. It should have been the below one. Apologies. mylist $first [1] 4 $second [1] 6 On Mon, Jul 2, 2012 at 7:02 AM, Onur Uncu onuru...@gmail.com wrote: Hi All I have a dataframe: myframe-data.frame(ID=c(first,second),x=c(1,2),y=c(3,4)) And I have a function myfun: myfun-function(x,y) x+y I would like to write a function myfun2 that takes myframe and myfun as parameters and returns a list as below: mylist $first [1] 4 $second [2] 6 Could you please help me with this? Doesn't seem like the apply family of functions were intended for this case. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Insert row in specific location between data frames
First, I have predict_SO2_a which is contained 24 data. I want to insert NA in 11th row. Then, predict_SO2_a becomes 25 data. After insert the row, I want to use with function to combine the data.frame /groupA$predict_SO2-with(groupA, predict_SO2_a). / /dput(predict_SO2_a) c(39.7932308121176, 30.25257753285, 32.4675835451901, 31.9415094289634, 27.9083195877186, 11.5941369504695, 9.36812510512633, 12.3190926962636, ... 14.9134904913096, 33.8462160039482, 16.6586503422101, 11.0312717522444, 22.3102431270508, 15.1408236735915, 10.6875527887638, 11.3294850253127, 13.9037966719703, 28.6603710864312) dput(groupA) structure(list(Date = structure(c(15L, 21L, 23L, 20L, 9L, 10L, 2L, 11L, 22L, 6L, 7L, 16L, 17L, 24L, 26L, 12L, 18L, 19L, 14L, 8L, 25L, 3L, 4L, 5L, 13L), .Label = c(, 1/9/2001, 10/9/2010, 11/9/2010, 12/9/2010, 14/9/2002, 15/9/2002, 15/9/2009, 19/9/1999, 2/9/2000, 2/9/2001, 2/9/2008, 21/9/2010, 24/9/2008, 3/9/1997, 3/9/2003, 3/9/2005, 3/9/2008, 5/9/2008, 6/9/1998, 7/9/1997, 7/9/2001, 8/9/1997, 8/9/2006, 8/9/2010, 9/9/2006), class = factor), pressure = c(-8.110989011, -5.910989011, -3.510989011, -4.732967033, -5.737362637, -7.607692308, -9.675824176, -9.075824176, -5.575824176, -6.169230769, -8.169230769, -9.207692308, -9.197802198, -4.884615385, -3.684615385, -3.132967033, -3.332967033, -3.232967033, -9.532967033, -8.537362637, -6.869230769, -6.869230769, -3.869230769, -2.069230769, -5.369230769), maxtemp = c(2.056043956, 0.756043956, 1.556043956, 2.216483516, 1.995604396, 2.346153846, 1.97032967, 0.17032967, 1.57032967, 0.747252747, -0.352747253, 0.672527473, 1.985714286, 1.452747253, 0.352747253, 1.568131868, 3.068131868, 1.368131868, 0.168131868, 1.987912088, 5.187912088, 3.987912088, -0.812087912, 1.587912088, -1.112087912), avetemp = c(2.540659341, 0.440659341, 1.340659341, 1.287912088, 2.278021978, 2.2, 1.962637363, 0.962637363, 1.562637363, 1.482417582, 0.682417582, 1.089010989, 2.103296703, 1.989010989, 0.589010989, 2.087912088, 2.287912088, 1.787912088, 1.287912088, 1.330769231, 5.237362637, 3.43736263 . ratio = c(1.53920929073912, ... 2.08020364225369, 2.5845449621267, 4.68646633242131, 0.93343593089835, 1.18698605729367, 1.19133323040343, 1.9902213063946, 2.09049362040035 )), .Names = c(Date, pressure, maxtemp, avetemp, mintemp, RH, solar, windspeed, transport, angle, rainfall, RSP, Ozone, NO2, NOX, SO2, CO, newRSP, newOzone, newNO2, newNOX, newSO2, predict_RSP, predict_NO2, predict_NOX, ratio), row.names = c(NA, 25L), class = data.frame)/ Finally, I want to groupA combine with groupB, groupB contains .. /dput(groupB) structure(list(Date = structure(c(1L, 16L, 20L, 27L, 32L, 34L, 35L, 7L, 11L, 21L, 30L, 17L, 8L, 2L, 28L, 3L, 18L, 22L, 24L, 29L, 31L, 23L, 25L, 4L, 26L, 12L, 13L, 15L, 19L, 5L, 6L, 33L, 9L, 10L, 14L), .Label = c(1/9/1997, 1/9/2004, 1/9/2006, 1/9/2008, 10/11/2009, 11/11/2009, 11/9/1999, 12/10/2003, 13/9/2010, 17/9/2010, 18/9/1999, 18/9/2008, 18/9/2009, 18/9/2010, 19/9/2009, 2/9/1997, 2/9/2002, 2/9/2006, 20/9/2009, 26/11/1997, 3/10/2000, 3/9/2006, 3/9/2007, 4/9/2006, 4/9/2007, 4/9/2008, 5/9/1998, 5/9/2004, 5/9/2006, 6/9/2001, 6/9/2006, 7/9/1998, 7/9/2010, 8/9/1998, 9/9/1998 ), class = factor), pressure = c(-8.310989011, -8.710989011, -1.710989011, -4.732967033, -2.932967033, -2.732967033, -5.432967033, -6.637362637, -7.237362637, -1.707692308, -6.475824176, -3.869230769, -3.507692308, -8.098901099, -10.6989011, -7.184615385, ratio = c(1.94158182541644, 2.12248234979731, 1.87302150800523, 2.61289013672199, 2.97067043253228, 2.85053235533923, 2.51886435993509, 1.87829582620638, 2.9380496638884, 1.40686764084479, 0.858666346292962)), .Names = c(Date, pressure, maxtemp, avetemp, mintemp, RH, solar, windspeed, transport, angle, rainfall, RSP, Ozone, NO2, NOX, SO2, CO, newRSP, newOzone, newNO2, newNOX, newSO2, predict_RSP, predict_NO2, predict_NOX, predict_SO2, ratio), row.names = c(NA, -35L), class = data.frame)/ I used merge function to combine groupA and groupB. totally contains 60 data mab-merge(groupA,groupB) however, it shows... /mab [1] Datepressuremaxtemp avetemp mintemp RH solar windspeed transport angle rainfallRSP Ozone [14] NO2 NOX SO2 CO newRSP newOzone newNO2 newNOX newSO2 predict_RSP predict_NO2 predict_NOX predict_SO2 [27] ratio 0 rows (or 0-length row.names)/ -- View this message in context: http://r.789695.n4.nabble.com/Insert-row-in-specific-location-between-data-frames-tp4634905p4635071.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented,
[R] Fitting and Plotting the fitted distributions
Dear all, I have wrote some sample code that would allow me easier fit fast many distributions and check which of the fits performs better. My sample code (that you can of course execute it looks like that) distrList-list( exponential, geometric, log-normal, normal, Poisson) fitfunction-function(Type,x){ return (list(Type,(fitdistr(x,Type } require(MASS) On-round(abs(rnorm(1,sd=100))+5,digits=0) storeOn-lapply(distrList,fitdistr,x=On) plot(ecdf(On)) str(storeOn) what I am looking now is to plot with the initial dataset plot(ecdf(On)) all the fitted distributions over the same window. I am not sure though, if there is some straightforward way (i.e same random distribution generator) for the fitted paramemeters to plot those over the existing plot(ecdf(On)). Could you please help me with that? Regards Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to adjust the start of a series to zero? (i.e. subtract the first value from the sequence)
Hi Kristina: That's because I gave you the wrong function. It should be with not within, as: sort2v4$adj_mean-with(sort2v4, ave(mean, point, FUN=function(x)x-x[1])) within(dat,...) returns dat suitably modified. with (dat,...) just returns the result of the function. I frequently use within() when I mean with(), unfortunately. Sorry for the error. -- Bert On Sun, Jul 1, 2012 at 6:39 PM, Kristiina Hurme kristiina.hu...@uconn.eduwrote: Thanks everyone. I tried them all, and got all to work except for the last one. I tried sort2v4$adj_mean-within(sort2v4, ave(mean, point, FUN=function(x)x-x[1])) head(sort2v4) point time meansd adj_mean.point adj_mean.time adj_mean.mean adj_mean.sd 1 11 52.50100 1.5073927 1 1 52.50100 1.5073927 3 12 54.50182 0.8510329 1 2 54.50182 0.8510329 4 13 56.60174 1.5787222 1 3 56.60174 1.5787222 5 14 57.2 1.2292726 1 4 57.2 1.2292726 6 15 59.3 2.2632327 1 5 59.3 2.2632327 7 16 57.80089 1.4745218 1 6 57.80089 1.4745218 but am getting the columns in duplicate, rather than it performing the function. Any advice? Thanks again, Kristiina -- View this message in context: http://r.789695.n4.nabble.com/How-to-adjust-the-start-of-a-series-to-zero-i-e-subtract-the-first-value-from-the-sequence-tp4634999p4635062.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Heat Maps
Thanks Joseph but see i am not able to get heat maps with this code \ can u please give me the full codes to generate heat map on the same graph where i have drawn contour lines [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constructing a list using a function...
Hello, Try myfun2 - function(DF, FUN){ x - lapply(as.data.frame(t(DF[-1])), function(x) FUN(x[1], x[2])) names(x) - levels(DF[[1]])[ DF[[1]] ] x } myfun2(myframe, myfun) Hope this helps, Rui Barradas Em 02-07-2012 07:02, Onur Uncu escreveu: Hi All I have a dataframe: myframe-data.frame(ID=c(first,second),x=c(1,2),y=c(3,4)) And I have a function myfun: myfun-function(x,y) x+y I would like to write a function myfun2 that takes myframe and myfun as parameters and returns a list as below: mylist $first [1] 4 $second [2] 6 Could you please help me with this? Doesn't seem like the apply family of functions were intended for this case. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Insert row in specific location between data frames
Hello, When I've asked you to dput() your datasets, I meant all of the output of dput(), for us to copy it and paste in an R session. It is the easiest way of recreating exact copies of the objects. Like this, with those it's unusable. Now, as far as I can see, you have a data.frame called groupA with 25 rows and a vector of 24 elements. after including an NA in 11th position the vector length becomes 25. This part was already answered to. Then you want to put that vector as a column of groupA. You do NOT need 'with', this will do: groupA$predict_SO2 - predict_SO2_a Then you want to merge this resulting data.frame with another data.frame, groupB, right? But merge returns a data.frame with 0 rows. What went wrong? The common columns combined don't have the same values. Why not? Because column 'Date' is a factor, not a date. The labels, i.e., the dates values, might be equal but the factors, how they are coded, are not. Use the following. x - with(groupA, levels(Date)[Date]) x - as.Date(x, format=%d/%m/%Y) groupA$Date - x And the same for groupB. Only then try to merge them. And next time paste the output of dput(), ALL of it. Hope this helps, Rui Barradas Em 02-07-2012 05:39, pigpigmeow escreveu: First, I have predict_SO2_a which is contained 24 data. I want to insert NA in 11th row. Then, predict_SO2_a becomes 25 data. After insert the row, I want to use with function to combine the data.frame /groupA$predict_SO2-with(groupA, predict_SO2_a). / /dput(predict_SO2_a) c(39.7932308121176, 30.25257753285, 32.4675835451901, 31.9415094289634, 27.9083195877186, 11.5941369504695, 9.36812510512633, 12.3190926962636, ... 14.9134904913096, 33.8462160039482, 16.6586503422101, 11.0312717522444, 22.3102431270508, 15.1408236735915, 10.6875527887638, 11.3294850253127, 13.9037966719703, 28.6603710864312) dput(groupA) structure(list(Date = structure(c(15L, 21L, 23L, 20L, 9L, 10L, 2L, 11L, 22L, 6L, 7L, 16L, 17L, 24L, 26L, 12L, 18L, 19L, 14L, 8L, 25L, 3L, 4L, 5L, 13L), .Label = c(, 1/9/2001, 10/9/2010, 11/9/2010, 12/9/2010, 14/9/2002, 15/9/2002, 15/9/2009, 19/9/1999, 2/9/2000, 2/9/2001, 2/9/2008, 21/9/2010, 24/9/2008, 3/9/1997, 3/9/2003, 3/9/2005, 3/9/2008, 5/9/2008, 6/9/1998, 7/9/1997, 7/9/2001, 8/9/1997, 8/9/2006, 8/9/2010, 9/9/2006), class = factor), pressure = c(-8.110989011, -5.910989011, -3.510989011, -4.732967033, -5.737362637, -7.607692308, -9.675824176, -9.075824176, -5.575824176, -6.169230769, -8.169230769, -9.207692308, -9.197802198, -4.884615385, -3.684615385, -3.132967033, -3.332967033, -3.232967033, -9.532967033, -8.537362637, -6.869230769, -6.869230769, -3.869230769, -2.069230769, -5.369230769), maxtemp = c(2.056043956, 0.756043956, 1.556043956, 2.216483516, 1.995604396, 2.346153846, 1.97032967, 0.17032967, 1.57032967, 0.747252747, -0.352747253, 0.672527473, 1.985714286, 1.452747253, 0.352747253, 1.568131868, 3.068131868, 1.368131868, 0.168131868, 1.987912088, 5.187912088, 3.987912088, -0.812087912, 1.587912088, -1.112087912), avetemp = c(2.540659341, 0.440659341, 1.340659341, 1.287912088, 2.278021978, 2.2, 1.962637363, 0.962637363, 1.562637363, 1.482417582, 0.682417582, 1.089010989, 2.103296703, 1.989010989, 0.589010989, 2.087912088, 2.287912088, 1.787912088, 1.287912088, 1.330769231, 5.237362637, 3.43736263 . ratio = c(1.53920929073912, ... 2.08020364225369, 2.5845449621267, 4.68646633242131, 0.93343593089835, 1.18698605729367, 1.19133323040343, 1.9902213063946, 2.09049362040035 )), .Names = c(Date, pressure, maxtemp, avetemp, mintemp, RH, solar, windspeed, transport, angle, rainfall, RSP, Ozone, NO2, NOX, SO2, CO, newRSP, newOzone, newNO2, newNOX, newSO2, predict_RSP, predict_NO2, predict_NOX, ratio), row.names = c(NA, 25L), class = data.frame)/ Finally, I want to groupA combine with groupB, groupB contains .. /dput(groupB) structure(list(Date = structure(c(1L, 16L, 20L, 27L, 32L, 34L, 35L, 7L, 11L, 21L, 30L, 17L, 8L, 2L, 28L, 3L, 18L, 22L, 24L, 29L, 31L, 23L, 25L, 4L, 26L, 12L, 13L, 15L, 19L, 5L, 6L, 33L, 9L, 10L, 14L), .Label = c(1/9/1997, 1/9/2004, 1/9/2006, 1/9/2008, 10/11/2009, 11/11/2009, 11/9/1999, 12/10/2003, 13/9/2010, 17/9/2010, 18/9/1999, 18/9/2008, 18/9/2009, 18/9/2010, 19/9/2009, 2/9/1997, 2/9/2002, 2/9/2006, 20/9/2009, 26/11/1997, 3/10/2000, 3/9/2006, 3/9/2007, 4/9/2006, 4/9/2007, 4/9/2008, 5/9/1998, 5/9/2004, 5/9/2006, 6/9/2001, 6/9/2006, 7/9/1998, 7/9/2010, 8/9/1998, 9/9/1998 ), class = factor), pressure = c(-8.310989011, -8.710989011, -1.710989011, -4.732967033, -2.932967033, -2.732967033, -5.432967033, -6.637362637, -7.237362637, -1.707692308, -6.475824176, -3.869230769, -3.507692308, -8.098901099, -10.6989011, -7.184615385, ratio = c(1.94158182541644, 2.12248234979731, 1.87302150800523, 2.61289013672199, 2.97067043253228, 2.85053235533923, 2.51886435993509, 1.87829582620638, 2.9380496638884, 1.40686764084479,
Re: [R] table function in a matrix
Dear Petr, Thanks for your help. Sorry one more query for one of my datasets which has NAs (missing genotypes). Is there any way in which I can count NAs? Many thanks! Sarah From: Sarah Auburn saub...@yahoo.com To: Petr Savicky savi...@cs.cas.cz Cc: r-help@r-project.org r-help@r-project.org Sent: Thursday, 7 June 2012, 23:24 Subject: Re: [R] table function in a matrix Perfect, thank you! From: Petr Savicky savi...@cs.cas.cz To: r-help@r-project.org Sent: Thursday, 7 June 2012, 19:42 Subject: Re: [R] table function in a matrix On Wed, Jun 06, 2012 at 11:02:46PM -0700, Sarah Auburn wrote: Hi, I am trying to get a summary of the counts of different variables for each sample in a matrix of the form m below to generate an output as shown. (Ultimately I want to generate a stacked barchart for each sample). I am only able to get the table function to work on one sample (column) at a time. Any help appreciated. Thank you Sarah ? a-c(A, A, B, B, C, A, C, D, A, D, C, A, D, C, A, C) m-matrix(a, nrow=4) m [,1] [,2] [,3] [,4] [1,] A? C? A? D [2,] A? A? D? C [3,] B? C? C? A [4,] B? D? A? C output needed (so that I can use the barplot(t(output)) function): A B C D [,1] 2 2 0 0 [,2] 1 0 2 1 [,3] 2 0 1 1 [,4] 1 0 2 1 Hi. Try the following. a-c(A, A, B, B, C, A, C, D, A, D, C, A, D, C, A, C) m-matrix(a, nrow=4) tab - function(x) { table(factor(x, levels=LETTERS[1:4])) } t(apply(m, 2, tab)) A B C D [1,] 2 2 0 0 [2,] 1 0 2 1 [3,] 2 0 1 1 [4,] 1 0 2 1 Factors are used to ensure that all the tables have the same length, even if some letters are missing. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Heat Maps
the image function should do it, something like this: image( x, y, outer(x,y,u), col=[some vector of colors] ) You can add a breaks parameter but you need one more break than you have colors (include the start and end point). Or just let it be automatic.I have used colorpanel() from the gplots library to generate graduated color shades. This is based on actual code I'm using for one of my heatmaps: library(gplots) # for colorpanelz - outer(x,y,u)image( x, y, z, colorpanel(10,steelblue,white), breaks=quantile(z,seq(0,1,by=0.1)) )box() # for appearancespar(new=TRUE) # you want the lines on top of the colors, so do the contour plot secondcontour(...) It will produce ten colors with steelblue for the highest value and white for the lowest value, and one shade for each decile. You can omit the breaks term. I have drawn indifference curves using the program below (Contour Plot) u - function(x, y) x^0.5 + y^0.5 x - seq(0, 1000, by=1) y - seq(0, 1000, by=1) a - c(10, 20, 30) contour(x, y, outer(x, y, u),levels=a,col=blue) Now can any body please tell me how to draw Heat maps and that too on the same indifference curve plot (contour) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] table function in a matrix
Hello, See the difference. a - b - c(A, A, B, B, C, A, C, D, A, D, C, A, D, C, A, C) a[3] - NA table(a) table(a, exclude=NULL) # always include NA table(b, exclude=NULL) # always include NA # more flexible table(b, useNA=always) table(b, useNA=ifany) Hope this helps, Rui Barradas Em 02-07-2012 07:27, Sarah Auburn escreveu: Dear Petr, Thanks for your help. Sorry one more query for one of my datasets which has NAs (missing genotypes). Is there any way in which I can count NAs? Many thanks! Sarah From: Sarah Auburn saub...@yahoo.com To: Petr Savicky savi...@cs.cas.cz Cc: r-help@r-project.org r-help@r-project.org Sent: Thursday, 7 June 2012, 23:24 Subject: Re: [R] table function in a matrix Perfect, thank you! From: Petr Savicky savi...@cs.cas.cz To: r-help@r-project.org Sent: Thursday, 7 June 2012, 19:42 Subject: Re: [R] table function in a matrix On Wed, Jun 06, 2012 at 11:02:46PM -0700, Sarah Auburn wrote: Hi, I am trying to get a summary of the counts of different variables for each sample in a matrix of the form m below to generate an output as shown. (Ultimately I want to generate a stacked barchart for each sample). I am only able to get the table function to work on one sample (column) at a time. Any help appreciated. Thank you Sarah ? a-c(A, A, B, B, C, A, C, D, A, D, C, A, D, C, A, C) m-matrix(a, nrow=4) m [,1] [,2] [,3] [,4] [1,] A? C? A? D [2,] A? A? D? C [3,] B? C? C? A [4,] B? D? A? C output needed (so that I can use the barplot(t(output)) function): A B C D [,1] 2 2 0 0 [,2] 1 0 2 1 [,3] 2 0 1 1 [,4] 1 0 2 1 Hi. Try the following. a-c(A, A, B, B, C, A, C, D, A, D, C, A, D, C, A, C) m-matrix(a, nrow=4) tab - function(x) { table(factor(x, levels=LETTERS[1:4])) } t(apply(m, 2, tab)) A B C D [1,] 2 2 0 0 [2,] 1 0 2 1 [3,] 2 0 1 1 [4,] 1 0 2 1 Factors are used to ensure that all the tables have the same length, even if some letters are missing. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Binary Quadratic Opt?
Hi Menkes, Thanks for the reply but just academically free license won't work for me. GNU or more is reqd. Rest fine Khris. On Jun 30, 2012, at 7:21 PM, menkes [via R] wrote: Hi Khris, If all your variables are binary then you may want to check CPLEX and/or Gurobi (both provide a free academic license). http://www-01.ibm.com/software/integration/optimization/cplex-optimizer/ http://www.gurobi.com/products/additional-products-using-gurobi/r The algorithms that CPLEX and Gurobi use for quadratic programming are designed to work with convex objective functions, with the one exception when all variables are binary. In that case CPLEX and Gurobi apply some transformation that in certain cases will allow you to solve binary quadratic optimization problems. Regards, Menkes If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Binary-Quadratic-Opt-tp4633521p4634971.html To unsubscribe from Binary Quadratic Opt?, click here. NAML -- View this message in context: http://r.789695.n4.nabble.com/Binary-Quadratic-Opt-tp4633521p4635082.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] About Error message
Hi again! I have a question about R. I have done gam in previous version of R with mgcv package and saved the workspace. This workspace contains different models and I will do prediction by these GAMs. However, I install new version of R. and use the same workspace. when I type summary(models), and the error message showed Error in Predict.matrix.cr.smooth(object, dk$data) : F is missing from cr smooth - refit model with current mgcv. this workspace is normal when I used previous version of R. What's wrong?! Hi Maybe in new installation some packages are missing (not installed). Try to install all packages you used during your previous work and then try to start R again. Regards Petr Thank in advance. -- View this message in context: http://r.789695.n4.nabble.com/About-Error- message-tp4634955.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geom_boxplot
Hi In new ggplot2 version following works too p + geom_boxplot(aes(fill = factor(cyl))) + labs(fill = Cylinders) + ylab(Miles per Gallon)+xlab(Number of Cylinders) Regards Petr Yes you can do all of the things you want. Below is a start, to give you an idea of how to approach some of it. library(ggplot2) p - ggplot(mtcars, aes(factor(cyl), mpg)) p - p + geom_boxplot(aes(fill = factor(cyl))) + labs(fill = Cylinders) + scale_y_continuous(Miles per Gallon) + scale_x_discrete(Number of Cylinders) p Have a look at ackoverflow.com/questions/3606697/how-to-set-x-axis-limits- in-ggplot2-r-plots for x and y axes limits. It took me a while to realise it but, generally, I find that it is not too hard to find examples of what you need by just googling something like :ggplot2 set x and y limits or ggplot2 geom_bar colour and so on. The ggplot2 and geom_XXX are pretty unique on the internet and search results usually are not too bad. You may also want to subcribe to the ggplot2 group on google groups. Best wishes John Kane Kingston ON Canada -Original Message- From: hannah@gmail.com Sent: Sun, 1 Jul 2012 08:39:20 -0400 To: r-help@r-project.org Subject: Re: [R] geom_boxplot Also, it is possible to change ylim also? 2012/7/1 li li hannah@gmail.com Dear all, I have a few questions regarding the boxplot output from the geom_boxplot function. Attached is the output I get. Below are my questions: 1. How can I define the xlab and ylab myself? Also I would like to remove factor(variable) line on the right side. 2. How can I define the colors of the boxplots myself. For example, I want to use blue for LR, green for pair and purple for BR1. Thanks so much! Hannah [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] VIM package - how to get the underlying code
On 01.07.2012 21:19, Mathias Worni wrote: Dear R-users, I am using R on a Mac using the latest version of R (2.15.1) working with R-studio. To perform multiple imputation for a dataset with some missing values, I am using the VIM package (http://goo.gl/rfGfr). While everything is working fine also with the GUI, I wonder if anybody knows how to get the code for the diagrams you can create using the GUI. Just download the source version of the VIM package and take a look into the code? Best, Uwe Ligges Thanks, Mathias [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqlSave()
Hi, I tried your example but I have an error message: sqlUpdate( channel = con, dat = tbl, tablename = myNewTable, index = ID ) *Erreur dans sqlUpdate(channel = con, dat = tbl, tablename = myNewTable, : [RODBC] Failed exec in Update* I work with this: sessionInfo() R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) attached base packages: [1] tcltk stats graphics grDevices utils datasets methods [8] base other attached packages: [1] sqldf_0.4-6.4 gsubfn_0.6-3 proto_0.3-9.2 [4] chron_2.3-42 RSQLite.extfuns_0.0.1 RSQLite_0.11.1 [7] RODBC_1.3-5 RJDBC_0.2-0 rJava_0.9-3 [10] DBI_0.2-5 Do you know what is the problem? Thank you for your answer -- View this message in context: http://r.789695.n4.nabble.com/sqlSave-tp892040p4635087.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot: dodge positions
Dear all, I want to get a series of boxplots (grouped by two factors) and I want to overlay the original observations and the following code does almost what I want: library(ggplot) ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y = runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4))) ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point() Yet the position of the points and the position of the boxes on the x-axis is not the same. I would like that the points are shifted accordingly, such that they line up with the boxplots. I tried position_dodge: ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point(aes(ymax=max(y)), position = position_dodge(width=.75)) but that did not really help, as all points are now dodged and I just want to have a fixed offset for each subgroup of points such that the boxplot and the points are aligned. Any ideas? Kind Regards, Thorn Thaler Mathematician Applied Mathematics Nestec Ltd, Nestlé Research Center PO Box 44 CH-1000 Lausanne 26 Phone: +41 21 785 8220 Fax: +41 21 785 9486 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Getting R In Torrents instead of Download Zip Files
Dear List, Sorry for intrusion. I live in area of erratic internet download speeds. Can we get R in torrents instead of just download Zip files so we can resume downloads when broken Sincrely, A Ohri Websites- Technology http://decisionstats.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot: dodge positions
Hello, Though I'm not the most fluent user of ggplot, I've seen no problem with the graph, each subgroup of points is over each boxplot. Maybe what made the difference was the use of --- ggplot2 not ggplot. Hope this helps, Rui Barradas Em 02-07-2012 10:43, Thaler,Thorn,LAUSANNE,Applied Mathematics escreveu: Dear all, I want to get a series of boxplots (grouped by two factors) and I want to overlay the original observations and the following code does almost what I want: library(ggplot) ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y = runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4))) ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point() Yet the position of the points and the position of the boxes on the x-axis is not the same. I would like that the points are shifted accordingly, such that they line up with the boxplots. I tried position_dodge: ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point(aes(ymax=max(y)), position = position_dodge(width=.75)) but that did not really help, as all points are now dodged and I just want to have a fixed offset for each subgroup of points such that the boxplot and the points are aligned. Any ideas? Kind Regards, Thorn Thaler Mathematician Applied Mathematics Nestec Ltd, Nestlé Research Center PO Box 44 CH-1000 Lausanne 26 Phone: +41 21 785 8220 Fax: +41 21 785 9486 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Quantitative analysis of treatment effects
Hello! I need to analyse a biological study. The study design is rather simple. It includes 3 groups. 1 control group and 2 test groups. The two test groups are treated with the same drug but different doses. Each group has approximately 14 observations and I look at around 600 variables. I already used an one way ANOVA to determine the significant hits between the groups. However, I now want to know if there is a quantitative effect between the treatment doses. In other words, is the treatment effect (the difference between the control group and a test group) bigger when you use a lower or a higher dose of the same drug. I also need to include 1 or 2 co-variates in the analysis. Unfortunately, I do not know if there is a statistical test/technique in R which I can use to answer this question. Any advise is very much appreciated. syrvn -- View this message in context: http://r.789695.n4.nabble.com/Quantitative-analysis-of-treatment-effects-tp4635094.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot.prcomp() call/eval
Then lets go with this: http://pastebin.com/6dtGCrpA as an example of what i do. If you've got a better idea lets hear it :) On 29.06.2012, at 17:30, Joshua Wiley wrote: On Fri, Jun 29, 2012 at 1:20 AM, Jessica Streicher j.streic...@micromata.de wrote: Hm.. i attached a file with the code, but it doesn't show up somehow.. non text files are scrubbed, and only certain file extensions are allowed (I forget which, I know that .R is *not* allowed (although I think that .txt and maybe .log are). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quantitative analysis of treatment effects
I found a nice presentation which I think addresses my question/problem. See slide 3 here: http://www.ispor.org/meetings/atlanta0510/presentations/IP1-CookJohnR.pdf -- View this message in context: http://r.789695.n4.nabble.com/Quantitative-analysis-of-treatment-effects-tp4635094p4635095.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot: dodge positions
Yes indeed. Sorry for the typo, I just added the library(ggplot) thing afterwards and I did not check for the spelling. So it should read as library(ggplot2) and there the issue is still unsettled. Thx for pointing that out. KR, -Thorn -Original Message- From: Rui Barradas [mailto:ruipbarra...@sapo.pt] Sent: Montag, 2. Juli 2012 12:20 To: Thaler,Thorn,LAUSANNE,Applied Mathematics Cc: r-help@r-project.org Subject: Re: [R] ggplot: dodge positions Hello, Though I'm not the most fluent user of ggplot, I've seen no problem with the graph, each subgroup of points is over each boxplot. Maybe what made the difference was the use of --- ggplot2 not ggplot. Hope this helps, Rui Barradas Em 02-07-2012 10:43, Thaler,Thorn,LAUSANNE,Applied Mathematics escreveu: Dear all, I want to get a series of boxplots (grouped by two factors) and I want to overlay the original observations and the following code does almost what I want: library(ggplot) ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y = runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4))) ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point() Yet the position of the points and the position of the boxes on the x-axis is not the same. I would like that the points are shifted accordingly, such that they line up with the boxplots. I tried position_dodge: ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point(aes(ymax=max(y)), position = position_dodge(width=.75)) but that did not really help, as all points are now dodged and I just want to have a fixed offset for each subgroup of points such that the boxplot and the points are aligned. Any ideas? Kind Regards, Thorn Thaler Mathematician Applied Mathematics Nestec Ltd, Nestlé Research Center PO Box 44 CH-1000 Lausanne 26 Phone: +41 21 785 8220 Fax: +41 21 785 9486 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting R In Torrents instead of Download Zip Files
On Jul 2, 2012, at 12:17 , Ajay Ohri wrote: Dear List, Sorry for intrusion. I live in area of erratic internet download speeds. Can we get R in torrents instead of just download Zip files so we can resume downloads when broken Only if someone is willing to put up the manpower to create the server-side infrastructure, I expect. However, don't browsers and FTP client have stop/resume functionality built in these days? OSX Safari certainly does. -pd Sincrely, A Ohri Websites- Technology http://decisionstats.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error to convert a Compute A^-1 B from Matlab to R using solve(A, B)
Dear Researchers, I need to convert the following equation in R from Matlab a = [x y ones(size(x))]; b = [-(x.^2+y.^2)]; a\b ans = -9.9981 -16.4966 -7.6646 my solution in R is: a = cbind(x,y,rep(1,length(x))) b = cbind(-(x^2+y^2)) head(a) xy [1,] 14.45319 5.065726 1 [2,] 14.99478 5.173893 1 [3,] 14.64158 5.616916 1 [4,] 14.61803 6.624069 1 [5,] 14.19997 6.794587 1 [6,] 15.08174 8.224843 1 head(b) [,1] [1,] -234.5564 [2,] -251.6125 [3,] -245.9255 [4,] -257.5652 [5,] -247.8057 [6,] -295.1068 following MATLAB/ R Reference http://cran.r-project.org/doc/contrib/Hiebeler-matlabR.pdf the a\b could be converted by solve(a,b) but i get the following error: solve(a,b) Error in solve.default(a, b) : 'b' must be compatible with 'a' thanks for any help Gianni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error to convert a Compute A^-1 B from Matlab to R using solve(A, B)
Have a look at ?solve and see: a a square numeric or complex matrix containing the coefficients of the linear system. your a isn't square. The help also mentions qr.solve for non-square matrices. greetings Jessi On 02.07.2012, at 13:38, gianni lavaredo wrote: Dear Researchers, I need to convert the following equation in R from Matlab a = [x y ones(size(x))]; b = [-(x.^2+y.^2)]; a\b ans = -9.9981 -16.4966 -7.6646 my solution in R is: a = cbind(x,y,rep(1,length(x))) b = cbind(-(x^2+y^2)) head(a) xy [1,] 14.45319 5.065726 1 [2,] 14.99478 5.173893 1 [3,] 14.64158 5.616916 1 [4,] 14.61803 6.624069 1 [5,] 14.19997 6.794587 1 [6,] 15.08174 8.224843 1 head(b) [,1] [1,] -234.5564 [2,] -251.6125 [3,] -245.9255 [4,] -257.5652 [5,] -247.8057 [6,] -295.1068 following MATLAB/ R Reference http://cran.r-project.org/doc/contrib/Hiebeler-matlabR.pdf the a\b could be converted by solve(a,b) but i get the following error: solve(a,b) Error in solve.default(a, b) : 'b' must be compatible with 'a' thanks for any help Gianni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting R In Torrents instead of Download Zip Files
Dear Prof D No server side architecture is needed for bit torrents. It runs on peer to peer. You just host one file (torrent file) and your torrent searches for peers (other people having that torrent file/data). To kickstart- you need to host some files on one server, as the mother seed. That also will need just a modification in the search engine list within your local computer 's bit torrent engine(as in add- cran.at in list of sources instead of other sources) You may want to ask Ubuntu /Debian how /why they do it? I may be completely wrong here on the technical code- but I think it does help people from developing world with erratic bandwidth, which is where I come from. Sincerely, Ajay Ohri Websites- Technology http://decisionstats.com On Mon, Jul 2, 2012 at 4:55 PM, peter dalgaard pda...@gmail.com wrote: On Jul 2, 2012, at 12:17 , Ajay Ohri wrote: Dear List, Sorry for intrusion. I live in area of erratic internet download speeds. Can we get R in torrents instead of just download Zip files so we can resume downloads when broken Only if someone is willing to put up the manpower to create the server-side infrastructure, I expect. However, don't browsers and FTP client have stop/resume functionality built in these days? OSX Safari certainly does. -pd Sincrely, A Ohri Websites- Technology http://decisionstats.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPATSTAT: Minimum points for a Ripley K to be sensible?
Sebastian Pucilowski s.pucilow...@student.unimelb.edu.au writes: What are the minimum number of points in a point pattern before a clustering analysis using a Ripley K function loses any meaning? It depends what are your definition of `meaningful'. The K-function doesn't become meaningless (undefined) until there are fewer than 2 points. But if your dataset contained only 10 points, for example, you could type plot(envelope(runifpoint(10), Kest, nsim=1000, nrank=25)) to see the pointwise 95% prediction intervals for the K-function (as grey shading) from a Poisson process with a mean of 10 points. To be detectablly different from a Poisson process, a dataset of 10 points would need a K-function that goes outside these intervals somewhere. So it would need to be an extremely clustered pattern. Try for different values of 10 and adjust to suit your definition Adrian Baddeley __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] enquiry
hi, i am new to using r .so if you can pls tell me how to read 1951-52 ,1952-52 date format in r [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rgraphviz package question
Hi all, I have a question regarding the Rgraphviz package. Does anybody know how to add x-Axis and y-Axis with scale to the plot generated by renderGraph( ). Or, is there any althernative way to do so by using plot( ) instead ? Thanks, -- Jiayi Hou Ph.D Candidate Department of Biostatistics School of Medicine Virginia Commonwealth University Tel:(804)-828-2879(office) (804)-274-8757(cell) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adjusting length of series
Hi David and AK, I have been trying to implement your suggestions since yesterday, but I encountered some challenges.  As for David's suggestions, I could only implement it after some modifications. Using an abridged version of my data, I dpud my dataset and then show my steps below.  dput(ydata) structure(c(68.10004, -34.80002, 90.39996, 54.60004, -172.3, 51.80002, 175, 79.80002, -35.70007, 130.5, 116.8, -67.5, 164.5, 514.8, -326.1, 98.40005, 160.2, 53.19998, 283.6, -111.6, 127.8, -17.30002, 286.3, NA, NA, -102.9001, 125.2, -35.79993, -226.9001, 224.1, 123.2, -95.19998, -115.5001, 166.2001, -13.69998, -184.3, 232, 350.3, -840.9001, 424.5001, 61.79993, -107, 230.4001, -395.2001, 239.4001, -145.1, 303.6, NA, NA, NA, 228.1, -160., -191.1001, 451.0001, -100.9001, -218.4, -20.30011, 281.7002, -179.9001, -170.6, 416.3, 118.3, -1191.2, 1265.4, -362.7002, -168.7999, 337.4001, -625.6001, 634.6001, -384.5001, 448.7001, NA, NA, -164.45784099, 17.079353995, 95.976788009, 680.23816699, -491.34869099, -274.694009, -256.332907, 469.62296, -146.431891, -41.077201995, -106.970104, 757.68826399, -1689.214533, 2320.098952, -1446.97942, 516.384521, -375.27765099, 293.86702999, 417.845195, 278.198807, -968.59203399, -314.195986, NA, NA, NA, 181.53719499, 78.897434013, 584.26137898, -1171.586858, 216.65468199, 18.361101998, 725.955867, -616.054851, 105.35468901, -65.892902005, 864.65836799, -2446.902797, 4009.313485, -3767.078372, 1963.363941, -891.66217199, 669.14468099, 123.978165, -139.646388, -1246.790841, 654.396048, NA, 4937, 5005.1, 4970.3, 5060.7, 5115.3, 4943, 4994.8, 5169.8, 5249.6, 5213.9, 5344.4, 5461.2, 5393.7, 5558.2, 6073, 5746.9, 5845.3, 6005.5, 6058.7, 6342.3, 6230.7, 6358.5, 6341.2, 6627.5, 4187.5, 4296.004835, 4240.051829, 4201.178177, 4258.281313, 4995.622616, 5241.615228, 5212.913831, 4927.879527, 5112.468183, 5150.624948, 5147.704511, 5037.81397, 5685.611693, 4644.194883, 5922.877025, 5754.579747, 6102.66699, 6075.476582, 6342.153204, 7026.675021, 7989.395645, 7983.524235, 7663.456839), .Dim = c(24L, 7L), .Dimnames = list(    NULL, c(DCred1, DCred2, DCred3, DBoBC2, DBoBC3,    CredL1, BoBCL1)), .Tsp = c(2001.083, 2003, 12 ), class = c(mts, ts))  NB: the NAs in the dataset emanated from lagging or differencing the series  David's suggestion  df-data.frame(DCred1,DCred2,DCred3,DBoBC2,DBoBC3,CredL1,BoBCL1) Error in data.frame(DCred1, DCred2, DCred3, DBoBC2, DBoBC3, CredL1, BoBCL1) :  arguments imply differing number of rows: 23, 22, 21, 24 So I modified as follows: length(DCred3) # finding the minimum length of various series [1] 21 # Then dataframe construction dframe- data.frame(Dcre1=DCred1[1:21],Dcre2=DCred2[1:21],Dcre3=DCred3[1:21], + Dbobc2=DBoBC2[1:21],Dbobc3=DBoBC3[1:21],CredL=CredL1[1:21],BoBCL=BoBCL1[1:21]) # Then estimated regression regCred- lm(Dcre1~Dcre2+Dcre3+Dbobc2+Dbobc3+CredL+BoBCL, data=dframe) summary(regCred) # Worked well as shown by results below Call: lm(formula = Dcre1 ~ Dcre2 + Dcre3 + Dbobc2 + Dbobc3 + CredL +    BoBCL, data = dframe) Residuals:    Min     1Q Median     3Q    Max -69.516 -27.695 -8.085 13.851 107.276 Coefficients:             Estimate Std. Error t value Pr(|t|)   (Intercept) 159.32304 157.15209  1.014 0.327873   Dcre2       -0.75527   0.17262 -4.375 0.000634 *** Dcre3       -0.21006   0.08656 -2.427 0.029329 * Dbobc2       0.05111   0.06565  0.779 0.449197   Dbobc3       0.03106   0.03510  0.885 0.391108   CredL       -0.10967   0.04933 -2.223 0.043177 * BoBCL        0.09756   0.03097  3.150 0.007087 ** --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 Residual standard error: 52.3 on 14 degrees of freedom Multiple R-squared: 0.9331,    Adjusted R-squared: 0.9044 F-statistic: 32.55 on 6 and 14 DF, p-value: 1.911e-07  This is good, but couldn't I code the process for my 15 variable model? Perhaps that is where the use of Dcr- lapply(..., function(x) ...) comes in?  AK, if you spare some minutes, please use my dput data to illustrate the suggestion you made, I searched the lapply function (using ??lapply) but could not get a handle of how to use it in my case. My dput data is as shown below.          DCred1 DCred2 DCred3     DBoBC2     DBoBC3 CredL1  BoBCL1 Feb 2001  68.1    NA     NA         NA         NA 4937.0 4187.500 Mar 2001 -34.8
[R] R sub query
Hello, I would like to substitute a substring of characters defined by a specific start and end sequence. i.e. in the example matrix below, I would like to substitute .:X: with , where X varies in sequence... m-matrix(c(.:0:0,0, .:2:0,2, .:194:193,1, .:56:0,56, .:58:50,8, .:13:0,13, .:114:114,0, .:75:75,0), nrow=2) output required: [,1] [,2] [,3] [,4] [1,] 0,0 193,1 50,8 114,0 [2,] 0,2 0,56 0,13 75,0 Thank you for any help Sarah [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] degree of freedom GLM
Hi, I have a problem with the df. I read in a big csv file. Tabelle - read.csv(C:\\Users\\Public\\Documents\\Bachelorarbeit\\eingabe8_durchnummeriert.csv , header = T , sep=;) then I try this: ygamma - glm(Tabelle$sb_ek_ber ~1+ Tabelle$FAHRL_C + Tabelle$NUTZKREIS + Tabelle$schw_drittel_c  , family = Gamma)  anova(ygamma, test=Chisq) Analysis of Deviance Table Model: Gamma, link: inverse Response: Tabelle$sb_ek_ber Terms added sequentially (first to last)             Df Deviance Resid. Df Resid. Dev  Pr(Chi)   NULL                   1236805  35451551        Tabelle$FAHRL_C     1    33987  1236804  35417564 0.0018493 ** Tabelle$NUTZKREIS    1    48903  1236803  35368661 0.0001880 *** Tabelle$schw_drittel_c   1    47328  1236802  35321334 0.0002388 *** --- Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 str(Tabelle) 'data.frame':  1236806 obs. of  9 variables:  $ Alter_Jüngster_C_inkl_AlterNutz: int  1 1 1 1 1 1 1 1 1 1 ...  $ ALTERKAU_C           : int  1 2 2 1 3 3 3 4 1 1 ...  $ FAHRL_C             : int  1 2 1 3 4 3 3 1 5 1 ...  $ NUTZKREIS            : int  1 2 2 2 2 2 2 1 1 2 ...  $ RKL_U12             : int  1 1 1 2 3 4 4 3 5 6 ...  $ SF_Sonder_aufgefüllt      : int  1 2 3 4 4 4 4 5 6 7 ...  $ schw_drittel_c         : int  1 2 3 4 3 3 3 3 1 1 ...  $ sb_ek_ber            : num  0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 ...  $ JE_gewichtet          : num  0.384 3.952 3.952 2.81 3.952 ... I don't understand why the df are always 1. it would be great if you could help me. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] table function in a matrix
Thank you From: Rui Barradas ruipbarra...@sapo.pt To: Sarah Auburn saub...@yahoo.com Cc: r-help@r-project.org Sent: Monday, 2 July 2012, 17:39 Subject: Re: [R] table function in a matrix Hello, See the difference. a - b - c(A, A, B, B, C, A, C, D, A, D, C, A, D, C, A, C) a[3] - NA table(a) table(a, exclude=NULL) # always include NA table(b, exclude=NULL) # always include NA # more flexible table(b, useNA=always) table(b, useNA=ifany) Hope this helps, Rui Barradas Em 02-07-2012 07:27, Sarah Auburn escreveu: Dear Petr, Thanks for your help. Sorry one more query for one of my datasets which has NAs (missing genotypes). Is there any way in which I can count NAs? Many thanks! Sarah From: Sarah Auburn saub...@yahoo.com To: Petr Savicky savi...@cs.cas.cz Cc: r-help@r-project.org r-help@r-project.org Sent: Thursday, 7 June 2012, 23:24 Subject: Re: [R] table function in a matrix Perfect, thank you! From: Petr Savicky savi...@cs.cas.cz To: r-help@r-project.org Sent: Thursday, 7 June 2012, 19:42 Subject: Re: [R] table function in a matrix On Wed, Jun 06, 2012 at 11:02:46PM -0700, Sarah Auburn wrote: Hi, I am trying to get a summary of the counts of different variables for each sample in a matrix of the form m below to generate an output as shown. (Ultimately I want to generate a stacked barchart for each sample). I am only able to get the table function to work on one sample (column) at a time. Any help appreciated. Thank you Sarah ? a-c(A, A, B, B, C, A, C, D, A, D, C, A, D, C, A, C) m-matrix(a, nrow=4) m [,1] [,2] [,3] [,4] [1,] A? C? A? D [2,] A? A? D? C [3,] B? C? C? A [4,] B? D? A? C output needed (so that I can use the barplot(t(output)) function): A B C D [,1] 2 2 0 0 [,2] 1 0 2 1 [,3] 2 0 1 1 [,4] 1 0 2 1 Hi. Try the following. a-c(A, A, B, B, C, A, C, D, A, D, C, A, D, C, A, C) m-matrix(a, nrow=4) tab - function(x) { table(factor(x, levels=LETTERS[1:4])) } t(apply(m, 2, tab)) A B C D [1,] 2 2 0 0 [2,] 1 0 2 1 [3,] 2 0 1 1 [4,] 1 0 2 1 Factors are used to ensure that all the tables have the same length, even if some letters are missing. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pie Chart in map
The package mapplots lets you plot pie charts on a map and vary their size. http://r.789695.n4.nabble.com/file/n4635089/pie.png -- View this message in context: http://r.789695.n4.nabble.com/Pie-Chart-in-map-tp2318816p4635089.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] interpolation to new points between geo coordinates
Hi I have a data set with geo coordinates and values for each coordinate. I want to interpolate the values to new positions on a finer grid, also geo coordinates. I have looked at the fields package (interp.surface) and the akima package (interp) but cant quite figure what I am doing wrong, or if these functions suits my needs. I have the two data set: grid_1: lat lon value 1 56.5 11.1 53 2 56.6 11.1 53.1 3 56.7 11.12 52.1 4 56.5 11.2 52.9 ...etc. and a new grid grid_2 lat lon 1 55.52 11.11 2 55.53 11.115 3 55.54 11.12 ...etc. And I want interpolated values for grid_2. Any ideas? /Jan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Decrete value check in a matrix
Hi All, Here i have an Dataframe (or) Matrix like this, MyMatrix - ABC XYZ ----- 1 2.5 3.4 4 5 6 5.6 6.7 Here i need to check each column value having decrete value or not ?. If that particular coulmn-value having decrete value, then the result should be TRUE/FALSE respectively in the result column. Finally, i need to get the result as Dataframe (or) Matrix form like this ABC XYZ ABC_RESULT XYZ_RESULT ----- 1 2.5 TRUE FALSE 3.4 4FALSE TRUE 5 6TRUETRUE 5.6 6.7 FALSE FALSE - Can any one help me fast ? Antony. -- View this message in context: http://r.789695.n4.nabble.com/Decrete-value-check-in-a-matrix-tp4635090.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] zero dimension error while soft thresholding
powers = c(c(1:12), seq(from = 10, to=20, by=2)) sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5) pickSoftThreshold: calculating connectivity for given powers... ..working on genes 1 through 1000 of 631066 Error in cor(data, data[, c(startG:endG)], use = p) : 'x' has a zero dimension. In addition: Warning message: In is.na(cols) : is.na() applied to non-(list or vector) of type 'NULL' I am getting error as shown above while using the soft thresholding code. .please help me [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting bar graph over a geographical map
You could try the library mapplots, see example below: http://r.789695.n4.nabble.com/file/n4635091/xy.png -- View this message in context: http://r.789695.n4.nabble.com/Plotting-bar-graph-over-a-geographical-map-tp4346925p4635091.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error() model is singular - what does that mean
Hello I have some test data that looks like that from a within subject experiment. Subject Task-KindData-Kind Time-Taken Correct 1A Data1 5 1 1A Data1 3 0 1A Data1 1 1 1A Data2 8 1 1A Data2 7 0 1A Data2 5 0 1A Data3 2 1 1A Data3 7 0 1A Data350 1A Data360 1B Data1 3 1 1B Data1 1 1 1B Data1 3 0 1B Data2 9 0 1B Data2 8 1 1B Data2 5 0 1B Data3 2 1 1B Data3 7 2 1B Data353 1B Data360 1C Data1 3 1 1C Data1 1 1 1C Data1 3 0 1C Data2 9 0 1C Data2 8 1 1C Data2 5 0 1C Data3 2 1 1C Data3 7 2 1C Data353 1C Data360 2A Data1 5 1 2A Data1 3 0 2A Data1 1 1 2A Data2 8 1 2A Data2 7 0 2A Data2 5 0 2A Data3 2 1 2A Data3 7 0 2A Data350 2A Data360 2B Data1 3 1 2B Data1 1 1 2B Data1 3 0 2B Data2 9 0 2B Data2 8 1 2 B Data2 5 0 2B Data3 2 1 2B Data3 7 2 2B Data353 2B Data360 2C Data1 3 1 2C Data1 1 1 2C Data1 3 0 2C Data2 9 0 2C Data2 8 1 2C Data2 5 0 2C Data3 2 1 2C Data3 7 2 2C Data353 2C Data360 . . . some notes: there are 20 subjects there are 5 different kinds of tasks There are 5 different kinds of data and there are several different variations for a certain kind of task and kind of data which is why for Subject = 1 Task-Kind=A and Data-Kind=Data1 we have 3 different results. The measured parameters are time to complete the task and whether it was correct or not (0 implies correct and 1 implies not correct) I am computing the anova as follows: aov.ex = aov(Correct~Task-Kind*Data-Kind+Error(Subject/(Task-Kind*Data-Kind)),data=allDataRaw.xp) since I want to see how the result is affected by the different kinds of data as well as the the kind of task and I get a warning message saying: Error() model is singular I would be very grateful if someone could please tell me what does this mean. Thanks Pascal -- View this message in context: http://r.789695.n4.nabble.com/Error-model-is-singular-what-does-that-mean-tp4635103.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] apply with multiple conditions
Hello all, I have written a for loop to act on a dataframe with close to 3million rows and 6 columns and I would like to pass it to apply() to speed the process up (I let the loop run for 2 days before stopping it and it had only gone through 200,000 rows) but I am really struggling to find a way to pass the arguments. Below are the loop and the head of the dataframe I am working on. Any hints would be much appreciated, thank you! (I have searched for this but could not find any other posts doing quite what I want) Paul x-as.numeric(all.tf7[1,2]) for (i in 2:nrow(all.tf7)) { if (all.tf7[i,1]==all.tf7[i-1,1] (all.tf7[i,2]-x)115341) all.tf7[i,6]-all.tf7[i-1,6] else if (all.tf7[i,1]==all.tf7[i-1,1] (all.tf7[i,2]-x)=115341) { all.tf7[i,6]-(all.tf7[i-1,6]+1) x-as.numeric(all.tf7[i,2]) } else if (all.tf7[i,1]!=all.tf7[i-1,1]) { all.tf7[i,6]-(all.tf7[i-1,6]+1) x-as.numeric(all.tf7[i,2]) } } #the aim here is to attribute a bin number to each row so that I can then split the dataframe according to those bins. chrom chromStart chromEnd name cumsum bin chr1 10089 10309 ZBTB33 10089 1 chr1 10132 10536 TAF7_(SQ-8) 20221 1 chr1 10133 10362Pol2-4H8 30354 1 chr1 10148 10418 MafF_(M8194) 40502 1 chr1 10382 10578ZBTB33 50884 1 chr1 16132 16352CTCF 67016 1 -- View this message in context: http://r.789695.n4.nabble.com/apply-with-multiple-conditions-tp4635098.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Decrete value check in a matrix
You are not asking for a Decrete [sic] (descrete) value check but rather if the numbers are intergers. Try this: # from the ?is.integer help page is.wholenumber - function(x, tol = .Machine$double.eps^0.5) abs(x - round(x)) tol aa - data.frame( na = c( 1, 3.4, 5, 5.6), nb = c(2.4, 4, 6, 6.7)) ww - data.frame( is.wholenumber(aa)) cbind(aa, ww) John Kane Kingston ON Canada -Original Message- From: antony.akk...@ge.com Sent: Mon, 2 Jul 2012 03:04:48 -0700 (PDT) To: r-help@r-project.org Subject: [R] Decrete value check in a matrix Hi All, Here i have an Dataframe (or) Matrix like this, MyMatrix - ABC XYZ ----- 1 2.5 3.4 4 5 6 5.6 6.7 Here i need to check each column value having decrete value or not ?. If that particular coulmn-value having decrete value, then the result should be TRUE/FALSE respectively in the result column. Finally, i need to get the result as Dataframe (or) Matrix form like this ABC XYZ ABC_RESULT XYZ_RESULT ----- 1 2.5 TRUE FALSE 3.4 4FALSE TRUE 5 6TRUETRUE 5.6 6.7 FALSE FALSE - Can any one help me fast ? Antony. Receive Notifications of Incoming Messages Easily monitor multiple email accounts access them with a click. Visit http://www.inbox.com/notifier and check it out! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error() model is singular - what does that mean
Just looking at it i would try renaming Task-Kind, Data-Kind an Time-Taken Those are ambiguous in the Formula. Task-Kind vs Task - Kind Though that might not be the error at hand :) On 02.07.2012, at 14:15, zetwal wrote: Hello I have some test data that looks like that from a within subject experiment. Subject Task-KindData-Kind Time-Taken Correct 1A Data1 5 1 1A Data1 3 0 1A Data1 1 1 1A Data2 8 1 1A Data2 7 0 1A Data2 5 0 1A Data3 2 1 1A Data3 7 0 1A Data350 1A Data360 1B Data1 3 1 1B Data1 1 1 1B Data1 3 0 1B Data2 9 0 1B Data2 8 1 1B Data2 5 0 1B Data3 2 1 1B Data3 7 2 1B Data353 1B Data360 1C Data1 3 1 1C Data1 1 1 1C Data1 3 0 1C Data2 9 0 1C Data2 8 1 1C Data2 5 0 1C Data3 2 1 1C Data3 7 2 1C Data353 1C Data360 2A Data1 5 1 2A Data1 3 0 2A Data1 1 1 2A Data2 8 1 2A Data2 7 0 2A Data2 5 0 2A Data3 2 1 2A Data3 7 0 2A Data350 2A Data360 2B Data1 3 1 2B Data1 1 1 2B Data1 3 0 2B Data2 9 0 2B Data2 8 1 2 B Data2 5 0 2B Data3 2 1 2B Data3 7 2 2B Data353 2B Data360 2C Data1 3 1 2C Data1 1 1 2C Data1 3 0 2C Data2 9 0 2C Data2 8 1 2C Data2 5 0 2C Data3 2 1 2C Data3 7 2 2C Data353 2C Data360 . . . some notes: there are 20 subjects there are 5 different kinds of tasks There are 5 different kinds of data and there are several different variations for a certain kind of task and kind of data which is why for Subject = 1 Task-Kind=A and Data-Kind=Data1 we have 3 different results. The measured parameters are time to complete the task and whether it was correct or not (0 implies correct and 1 implies not correct) I am computing the anova as follows: aov.ex = aov(Correct~Task-Kind*Data-Kind+Error(Subject/(Task-Kind*Data-Kind)),data=allDataRaw.xp) since I want to see how the result is affected by the different kinds of data as well as the the kind of task and I get a warning message saying: Error() model is singular I would be very grateful if someone could please tell me what does this mean. Thanks Pascal -- View this message in context: http://r.789695.n4.nabble.com/Error-model-is-singular-what-does-that-mean-tp4635103.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] binary tree
On 6/27/2012 3:20 AM, Peppino wrote: Hi I am new with R I Have to build a binary tree with R. I'm very confused was wondering if anyone had any R sample code they would share. Any bady can help me? You might want to look at the R Task view for phylogenetics: http://cran.r-project.org/web/views/Phylogenetics.html. The ape package may be some help depending on what you want to do. Rob Bye Giuseppe -- View this message in context: http://r.789695.n4.nabble.com/binary-tree-tp4634593.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Decrete value check in a matrix
Need not to be that complicated: aa == round(aa) nanb [1,] TRUE FALSE [2,] FALSE TRUE [3,] TRUE TRUE [4,] FALSE FALSE cbind(aa, Result = aa == round(aa)) na nb Result.na Result.nb 1 1.0 2.4 TRUE FALSE 2 3.4 4.0 FALSE TRUE 3 5.0 6.0 TRUE TRUE 4 5.6 6.7 FALSE FALSE Regards, Marc Schwartz On Jul 2, 2012, at 7:46 AM, John Kane wrote: You are not asking for a Decrete [sic] (descrete) value check but rather if the numbers are intergers. Try this: # from the ?is.integer help page is.wholenumber - function(x, tol = .Machine$double.eps^0.5) abs(x - round(x)) tol aa - data.frame( na = c( 1, 3.4, 5, 5.6), nb = c(2.4, 4, 6, 6.7)) ww - data.frame( is.wholenumber(aa)) cbind(aa, ww) John Kane Kingston ON Canada -Original Message- From: antony.akk...@ge.com Sent: Mon, 2 Jul 2012 03:04:48 -0700 (PDT) To: r-help@r-project.org Subject: [R] Decrete value check in a matrix Hi All, Here i have an Dataframe (or) Matrix like this, MyMatrix - ABC XYZ ----- 1 2.5 3.4 4 5 6 5.6 6.7 Here i need to check each column value having decrete value or not ?. If that particular coulmn-value having decrete value, then the result should be TRUE/FALSE respectively in the result column. Finally, i need to get the result as Dataframe (or) Matrix form like this ABC XYZ ABC_RESULT XYZ_RESULT ----- 1 2.5 TRUE FALSE 3.4 4FALSE TRUE 5 6TRUETRUE 5.6 6.7 FALSE FALSE - Can any one help me fast ? Antony. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R sub query
Will this do: m-matrix(c(.:0:0,0, .:2:0,2, .:194:193,1, .:56:0,56, .:58:50,8, .:13:0,13, .:114:114,0, .:75:75,0), nrow=2) m [,1] [,2] [,3][,4] [1,] .:0:0,0 .:194:193,1 .:58:50,8 .:114:114,0 [2,] .:2:0,2 .:56:0,56 .:13:0,13 .:75:75,0 sub(^\\.:[^:]*:, , m) [,1] [,2][,3] [,4] [1,] 0,0 193,1 50,8 114,0 [2,] 0,2 0,56 0,13 75,0 On Mon, Jul 2, 2012 at 4:15 AM, Sarah Auburn saub...@yahoo.com wrote: Hello, I would like to substitute a substring of characters defined by a specific start and end sequence. i.e. in the example matrix below, I would like to substitute .:X: with , where X varies in sequence... m-matrix(c(.:0:0,0, .:2:0,2, .:194:193,1, .:56:0,56, .:58:50,8, .:13:0,13, .:114:114,0, .:75:75,0), nrow=2) output required: [,1] [,2] [,3] [,4] [1,] 0,0 193,1 50,8 114,0 [2,] 0,2 0,56 0,13 75,0 Thank you for any help Sarah [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot: dodge positions
Can you expand a bit on what is wrong with the dodge option? From what I see it looks lovely witht the points exactly lined with the boxplots for each group but perhaps I don't understand exactly what you want . John Kane Kingston ON Canada -Original Message- From: thorn.tha...@rdls.nestle.com Sent: Mon, 2 Jul 2012 11:43:03 +0200 To: r-help@r-project.org Subject: [R] ggplot: dodge positions Dear all, I want to get a series of boxplots (grouped by two factors) and I want to overlay the original observations and the following code does almost what I want: library(ggplot) ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y = runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4))) ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point() Yet the position of the points and the position of the boxes on the x-axis is not the same. I would like that the points are shifted accordingly, such that they line up with the boxplots. I tried position_dodge: ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point(aes(ymax=max(y)), position = position_dodge(width=.75)) but that did not really help, as all points are now dodged and I just want to have a fixed offset for each subgroup of points such that the boxplot and the points are aligned. Any ideas? Kind Regards, Thorn Thaler Mathematician Applied Mathematics Nestec Ltd, Nestlé Research Center PO Box 44 CH-1000 Lausanne 26 Phone: +41 21 785 8220 Fax: +41 21 785 9486 GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] enquiry
try this: as.Date('1951-52', format = %Y-%j) [1] 1951-02-21 On Mon, Jul 2, 2012 at 5:44 AM, Karan Anand anand.kara...@gmail.com wrote: hi, i am new to using r .so if you can pls tell me how to read 1951-52 ,1952-52 date format in r [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error() model is singular - what does that mean
Also, try googling for - R model is singular - , there seem to have been a lot of people with that particular error. On 02.07.2012, at 14:56, Jessica Streicher wrote: Just looking at it i would try renaming Task-Kind, Data-Kind an Time-Taken Those are ambiguous in the Formula. Task-Kind vs Task - Kind Though that might not be the error at hand :) On 02.07.2012, at 14:15, zetwal wrote: Hello I have some test data that looks like that from a within subject experiment. Subject Task-KindData-Kind Time-Taken Correct 1A Data1 5 1 1A Data1 3 0 1A Data1 1 1 1A Data2 8 1 1A Data2 7 0 1A Data2 5 0 1A Data3 2 1 1A Data3 7 0 1A Data350 1A Data360 1B Data1 3 1 1B Data1 1 1 1B Data1 3 0 1B Data2 9 0 1B Data2 8 1 1B Data2 5 0 1B Data3 2 1 1B Data3 7 2 1B Data353 1B Data360 1C Data1 3 1 1C Data1 1 1 1C Data1 3 0 1C Data2 9 0 1C Data2 8 1 1C Data2 5 0 1C Data3 2 1 1C Data3 7 2 1C Data353 1C Data360 2A Data1 5 1 2A Data1 3 0 2A Data1 1 1 2A Data2 8 1 2A Data2 7 0 2A Data2 5 0 2A Data3 2 1 2A Data3 7 0 2A Data350 2A Data360 2B Data1 3 1 2B Data1 1 1 2B Data1 3 0 2B Data2 9 0 2B Data2 8 1 2 B Data2 5 0 2B Data3 2 1 2B Data3 7 2 2B Data353 2B Data360 2C Data1 3 1 2C Data1 1 1 2C Data1 3 0 2C Data2 9 0 2C Data2 8 1 2C Data2 5 0 2C Data3 2 1 2C Data3 7 2 2C Data353 2C Data360 . . . some notes: there are 20 subjects there are 5 different kinds of tasks There are 5 different kinds of data and there are several different variations for a certain kind of task and kind of data which is why for Subject = 1 Task-Kind=A and Data-Kind=Data1 we have 3 different results. The measured parameters are time to complete the task and whether it was correct or not (0 implies correct and 1 implies not correct) I am computing the anova as follows: aov.ex = aov(Correct~Task-Kind*Data-Kind+Error(Subject/(Task-Kind*Data-Kind)),data=allDataRaw.xp) since I want to see how the result is affected by the different kinds of data as well as the the kind of task and I get a warning message saying: Error() model is singular I would be very grateful if someone could please tell me what does this mean. Thanks Pascal -- View this message in context: http://r.789695.n4.nabble.com/Error-model-is-singular-what-does-that-mean-tp4635103.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geom_boxplot
Thanks Petr, I suppose this means I have to reread that set of changes again. I think I noticed it and promptly forgot it. John Kane Kingston ON Canada -Original Message- From: petr.pi...@precheza.cz Sent: Mon, 2 Jul 2012 10:41:20 +0200 To: jrkrid...@inbox.com Subject: Re: [R] geom_boxplot Hi In new ggplot2 version following works too p + geom_boxplot(aes(fill = factor(cyl))) + labs(fill = Cylinders) + ylab(Miles per Gallon)+xlab(Number of Cylinders) Regards Petr Yes you can do all of the things you want. Below is a start, to give you an idea of how to approach some of it. library(ggplot2) p - ggplot(mtcars, aes(factor(cyl), mpg)) p - p + geom_boxplot(aes(fill = factor(cyl))) + labs(fill = Cylinders) + scale_y_continuous(Miles per Gallon) + scale_x_discrete(Number of Cylinders) p Have a look at ackoverflow.com/questions/3606697/how-to-set-x-axis-limits- in-ggplot2-r-plots for x and y axes limits. It took me a while to realise it but, generally, I find that it is not too hard to find examples of what you need by just googling something like :ggplot2 set x and y limits or ggplot2 geom_bar colour and so on. The ggplot2 and geom_XXX are pretty unique on the internet and search results usually are not too bad. You may also want to subcribe to the ggplot2 group on google groups. Best wishes John Kane Kingston ON Canada -Original Message- From: hannah@gmail.com Sent: Sun, 1 Jul 2012 08:39:20 -0400 To: r-help@r-project.org Subject: Re: [R] geom_boxplot Also, it is possible to change ylim also? 2012/7/1 li li hannah@gmail.com Dear all, I have a few questions regarding the boxplot output from the geom_boxplot function. Attached is the output I get. Below are my questions: 1. How can I define the xlab and ylab myself? Also I would like to remove factor(variable) line on the right side. 2. How can I define the colors of the boxplots myself. For example, I want to use blue for LR, green for pair and purple for BR1. Thanks so much! Hannah [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot: dodge positions
I guess it works with ggplot but not with ggplot2. I'm using only the latter but had a typo in my first post. So the code (which does not do what I want) is: library(ggplot2) ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y = runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4))) ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point() Thinking of it, I would need to find out which offset ggplot uses to dodge the nested factors. If I knew the exact quantity, I could do something like geom_point(aes(x = offset.used.by.geom_boxplot)) So how are the exact positions on the x-axis for geom_boxplot determined? Any ideas? Thanks for the help, anyways. KR, -Thorn -Original Message- From: John Kane [mailto:jrkrid...@inbox.com] Sent: Montag, 2. Juli 2012 15:04 To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r-project.org Subject: RE: [R] ggplot: dodge positions Can you expand a bit on what is wrong with the dodge option? From what I see it looks lovely witht the points exactly lined with the boxplots for each group but perhaps I don't understand exactly what you want . John Kane Kingston ON Canada -Original Message- From: thorn.tha...@rdls.nestle.com Sent: Mon, 2 Jul 2012 11:43:03 +0200 To: r-help@r-project.org Subject: [R] ggplot: dodge positions Dear all, I want to get a series of boxplots (grouped by two factors) and I want to overlay the original observations and the following code does almost what I want: library(ggplot) ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y = runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4))) ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point() Yet the position of the points and the position of the boxes on the x-axis is not the same. I would like that the points are shifted accordingly, such that they line up with the boxplots. I tried position_dodge: ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point(aes(ymax=max(y)), position = position_dodge(width=.75)) but that did not really help, as all points are now dodged and I just want to have a fixed offset for each subgroup of points such that the boxplot and the points are aligned. Any ideas? Kind Regards, Thorn Thaler Mathematician Applied Mathematics Nestec Ltd, Nestlé Research Center PO Box 44 CH-1000 Lausanne 26 Phone: +41 21 785 8220 Fax: +41 21 785 9486 GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R sub query
Hi I am not at all an expert in regular expressions but gsub(^[[:punct:]]+[[:digit:]]+:, ,m) does the output you want. Maybe by chance :-) Regards Petr Hello, I would like to substitute a substring of characters defined by a specific start and end sequence. i.e. in the example matrix below, I would like to substitute .:X: with , where X varies in sequence... m-matrix(c(.:0:0,0, .:2:0,2, .:194:193,1, .:56:0,56, .:58:50,8, .:13:0,13, .:114:114,0, .:75:75,0), nrow=2) output required: [,1] [,2] [,3][,4] [1,] 0,0 193,1 50,8 114,0 [2,] 0,2 0,56 0,13 75,0 Thank you for any help Sarah [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply with multiple conditions
Paul, My interpretation is that you are trying to assign a new bin number to a row every time the variable chrom changes and every time the variable chromStart changes by 115341 or more. Is that right? If so, you don't need a loop at all. Check out the code below. I made a couple changes to the all.tf7 example data frame so that it would have two changes in bin number, one based on the chrom variable and one based on the chromStart variable. Jean all.tf7 - data.frame( chrom = c(chr1, chr1, chr2, chr2, chr2, chr2), chromStart = c(10089, 10132, 10133, 10148, 210382, 216132), chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352), name = c(ZBTB33, TAF7_(SQ-8), Pol2-4H8, MafF_(M8194), ZBTB33, CTCF), cumsum = c(10089, 20221, 30354, 40502, 50884, 67016), bin = rep(NA, 6) ) # assign a new bin every time chrom changes and every time chromStart changes by 115341 or more L - nrow(all.tf7) prev.chrom - c(NA, all.tf7$chrom[-L]) delta.start - c(NA, all.tf7$chromStart[-1] - all.tf7$chromStart[-L]) new.bin - is.na(prev.chrom) | all.tf7$chrom != prev.chrom | delta.start = 115341 all.tf7$bin - cumsum(new.bin) all.tf7 pguilha paul.guilha...@gmail.com wrote on 07/02/2012 06:25:13 AM: Hello all, I have written a for loop to act on a dataframe with close to 3million rows and 6 columns and I would like to pass it to apply() to speed the process up (I let the loop run for 2 days before stopping it and it had only gone through 200,000 rows) but I am really struggling to find a way to pass the arguments. Below are the loop and the head of the dataframe I am working on. Any hints would be much appreciated, thank you! (I have searched for this but could not find any other posts doing quite what I want) Paul x-as.numeric(all.tf7[1,2]) for (i in 2:nrow(all.tf7)) { if (all.tf7[i,1]==all.tf7[i-1,1] (all.tf7[i,2]-x)115341) all.tf7[i,6]-all.tf7[i-1,6] else if (all.tf7[i,1]==all.tf7[i-1,1] (all.tf7[i,2]-x)=115341) { all.tf7[i,6]-(all.tf7[i-1,6]+1) x-as.numeric(all.tf7[i,2]) } else if (all.tf7[i,1]!=all.tf7[i-1,1]) { all.tf7[i,6]-(all.tf7[i-1,6]+1) x-as.numeric(all.tf7[i,2]) } } #the aim here is to attribute a bin number to each row so that I can then split the dataframe according to those bins. chrom chromStart chromEnd name cumsum bin chr1 10089 10309 ZBTB33 10089 1 chr1 10132 10536 TAF7_(SQ-8) 20221 1 chr1 10133 10362Pol2-4H8 30354 1 chr1 10148 10418 MafF_(M8194) 40502 1 chr1 10382 10578ZBTB33 50884 1 chr1 16132 16352CTCF 67016 1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] enquiry
In order to do any conversion, you have to know the format of the data that is being input. So which is it: does 52 represent the day of the year, or the week of the year? Does make a big difference, but until you know what 1952-52 means, it is hard to specify which way you should do the conversion. On Mon, Jul 2, 2012 at 9:12 AM, arun smartpink...@yahoo.com wrote: Hi Jim, I tried, dat2-as.Date(dat1,format=%Y-%V) dat2 [1] 1951-07-02 1952-07-02 But, if the format is for -wk or -yy, then, not sure how this will help. A.K. - Original Message - From: jim holtman jholt...@gmail.com To: Karan Anand anand.kara...@gmail.com Cc: r-help@r-project.org Sent: Monday, July 2, 2012 9:04 AM Subject: Re: [R] enquiry try this: as.Date('1951-52', format = %Y-%j) [1] 1951-02-21 On Mon, Jul 2, 2012 at 5:44 AM, Karan Anand anand.kara...@gmail.com wrote: hi, i am new to using r .so if you can pls tell me how to read 1951-52 ,1952-52 date format in r [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot: dodge positions
I don't think I was clear. Sorry. What I was refering to was the ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point(aes(ymax=max(y)), position = position_dodge(width=.75)) which is giving me http://www.mediafire.com/i/?fdurpq6e6l8cu35 which was what I though you want. I have no idea how the x axis points on the boxplot are determined. It may be relatively clear in the code but I don't really have the knowledge to ferret it out. Sorry that I cannot be of more help. John Kane Kingston ON Canada -Original Message- From: thorn.tha...@rdls.nestle.com Sent: Mon, 2 Jul 2012 15:10:33 +0200 To: jrkrid...@inbox.com, r-help@r-project.org Subject: RE: [R] ggplot: dodge positions I guess it works with ggplot but not with ggplot2. I'm using only the latter but had a typo in my first post. So the code (which does not do what I want) is: library(ggplot2) ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y = runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4))) ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point() Thinking of it, I would need to find out which offset ggplot uses to dodge the nested factors. If I knew the exact quantity, I could do something like geom_point(aes(x = offset.used.by.geom_boxplot)) So how are the exact positions on the x-axis for geom_boxplot determined? Any ideas? Thanks for the help, anyways. KR, -Thorn -Original Message- From: John Kane [mailto:jrkrid...@inbox.com] Sent: Montag, 2. Juli 2012 15:04 To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r-project.org Subject: RE: [R] ggplot: dodge positions Can you expand a bit on what is wrong with the dodge option? From what I see it looks lovely witht the points exactly lined with the boxplots for each group but perhaps I don't understand exactly what you want . John Kane Kingston ON Canada -Original Message- From: thorn.tha...@rdls.nestle.com Sent: Mon, 2 Jul 2012 11:43:03 +0200 To: r-help@r-project.org Subject: [R] ggplot: dodge positions Dear all, I want to get a series of boxplots (grouped by two factors) and I want to overlay the original observations and the following code does almost what I want: library(ggplot) ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y = runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4))) ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point() Yet the position of the points and the position of the boxes on the x-axis is not the same. I would like that the points are shifted accordingly, such that they line up with the boxplots. I tried position_dodge: ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point(aes(ymax=max(y)), position = position_dodge(width=.75)) but that did not really help, as all points are now dodged and I just want to have a fixed offset for each subgroup of points such that the boxplot and the points are aligned. Any ideas? Kind Regards, Thorn Thaler Mathematician Applied Mathematics Nestec Ltd, Nestlé Research Center PO Box 44 CH-1000 Lausanne 26 Phone: +41 21 785 8220 Fax: +41 21 785 9486 GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Decrete value check in a matrix
Hi, Try this: dat1-read.table(text= ABC XYZ 1 2.5 3.4 4 5 6 5.6 6.7 ,sep=,header=TRUE) dat1[dat1$ABC ==as.integer(dat1$ABC), ABC_RESULT]-TRUE dat1[dat1$XYZ== as.integer(dat1$XYZ),XYZ_RESULT]-TRUE dat1[is.na(dat1)]-FALSE dat1 ABC XYZ ABC_RESULT XYZ_RESULT 1 1.0 2.5 TRUE FALSE 2 3.4 4.0 FALSE TRUE 3 5.0 6.0 TRUE TRUE 4 5.6 6.7 FALSE FALSE A.K. From: Akkara, Antony (GE Energy, Non-GE) antony.akk...@ge.com To: arun smartpink...@yahoo.com Sent: Monday, July 2, 2012 7:29 AM Subject: Decrete value check in a matrix Hi Arun, Can you please help me, Here i have a Data frame (or) Matrix like this, MyMatrix - ABC XYZ -- --- 1 2.5 3.4 4 5 6 5.6 6.7 Here i need to check each column value having decrete value or not ?. If that particular coulmn-value having decrete value, then the result should be TRUE/FALSE respectively in the result column. Finally, i need to get the result as Dataframe (or) Matrix form like this ABC XYZ ABC_RESULT XYZ_RESULT -- --- 1 2.5 TRUE FALSE 3.4 4 FALSE TRUE 5 6 TRUE TRUE 5.6 6.7 FALSE FALSE - Can any one solution fast. Its urgent thtz y. Antony. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot: dodge positions
Unfortunately I can't see your example as the page is blocked by our firewall. Anyways, if I try the dodge code, the points are shifted, yet they are all shifted by another offset. It makes that the green points for instance are indeed closer to the green boxplot, yet they are not aligned meaning that all green plots seem to have a different position on the x-axis, while all the green points for x == A should align exactly with A. Am I clearer now? KR, -Thorn -Original Message- From: John Kane [mailto:jrkrid...@inbox.com] Sent: Montag, 2. Juli 2012 15:21 To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r-project.org Subject: RE: [R] ggplot: dodge positions I don't think I was clear. Sorry. What I was refering to was the ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point(aes(ymax=max(y)), position = position_dodge(width=.75)) which is giving me http://www.mediafire.com/i/?fdurpq6e6l8cu35 which was what I though you want. I have no idea how the x axis points on the boxplot are determined. It may be relatively clear in the code but I don't really have the knowledge to ferret it out. Sorry that I cannot be of more help. John Kane Kingston ON Canada -Original Message- From: thorn.tha...@rdls.nestle.com Sent: Mon, 2 Jul 2012 15:10:33 +0200 To: jrkrid...@inbox.com, r-help@r-project.org Subject: RE: [R] ggplot: dodge positions I guess it works with ggplot but not with ggplot2. I'm using only the latter but had a typo in my first post. So the code (which does not do what I want) is: library(ggplot2) ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y = runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4))) ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point() Thinking of it, I would need to find out which offset ggplot uses to dodge the nested factors. If I knew the exact quantity, I could do something like geom_point(aes(x = offset.used.by.geom_boxplot)) So how are the exact positions on the x-axis for geom_boxplot determined? Any ideas? Thanks for the help, anyways. KR, -Thorn -Original Message- From: John Kane [mailto:jrkrid...@inbox.com] Sent: Montag, 2. Juli 2012 15:04 To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r-project.org Subject: RE: [R] ggplot: dodge positions Can you expand a bit on what is wrong with the dodge option? From what I see it looks lovely witht the points exactly lined with the boxplots for each group but perhaps I don't understand exactly what you want . John Kane Kingston ON Canada -Original Message- From: thorn.tha...@rdls.nestle.com Sent: Mon, 2 Jul 2012 11:43:03 +0200 To: r-help@r-project.org Subject: [R] ggplot: dodge positions Dear all, I want to get a series of boxplots (grouped by two factors) and I want to overlay the original observations and the following code does almost what I want: library(ggplot) ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y = runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4))) ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point() Yet the position of the points and the position of the boxes on the x-axis is not the same. I would like that the points are shifted accordingly, such that they line up with the boxplots. I tried position_dodge: ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point(aes(ymax=max(y)), position = position_dodge(width=.75)) but that did not really help, as all points are now dodged and I just want to have a fixed offset for each subgroup of points such that the boxplot and the points are aligned. Any ideas? Kind Regards, Thorn Thaler Mathematician Applied Mathematics Nestec Ltd, Nestlé Research Center PO Box 44 CH-1000 Lausanne 26 Phone: +41 21 785 8220 Fax: +41 21 785 9486 GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Decrete value check in a matrix
Good. Its working fine. Thank you John ! From: John Kane [via R] [mailto:ml-node+s789695n4635113...@n4.nabble.com] Sent: Monday, July 02, 2012 6:18 PM To: Akkara, Antony (GE Energy, Non-GE) Subject: Re: Decrete value check in a matrix You are not asking for a Decrete [sic] (descrete) value check but rather if the numbers are intergers. Try this: # from the ?is.integer help page is.wholenumber - function(x, tol = .Machine$double.eps^0.5) abs(x - round(x)) tol aa - data.frame( na = c( 1, 3.4, 5, 5.6), nb = c(2.4, 4, 6, 6.7)) ww - data.frame( is.wholenumber(aa)) cbind(aa, ww) John Kane Kingston ON Canada -Original Message- From: [hidden email] Sent: Mon, 2 Jul 2012 03:04:48 -0700 (PDT) To: [hidden email] Subject: [R] Decrete value check in a matrix Hi All, Here i have an Dataframe (or) Matrix like this, MyMatrix - ABC XYZ ----- 1 2.5 3.4 4 5 6 5.6 6.7 Here i need to check each column value having decrete value or not ?. If that particular coulmn-value having decrete value, then the result should be TRUE/FALSE respectively in the result column. Finally, i need to get the result as Dataframe (or) Matrix form like this ABC XYZ ABC_RESULT XYZ_RESULT ----- 1 2.5 TRUE FALSE 3.4 4FALSE TRUE 5 6TRUETRUE 5.6 6.7 FALSE FALSE - Can any one help me fast ? Antony. Receive Notifications of Incoming Messages Easily monitor multiple email accounts access them with a click. Visit http://www.inbox.com/notifier and check it out! __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Decrete-value-check-in-a-matrix-tp4635090p 4635113.html To unsubscribe from Decrete value check in a matrix, click here http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscrib e_by_codenode=4635090code=YW50b255LmFra2FyYUBnZS5jb218NDYzNTA5MHwxNTUx OTQzMDI5 . NAML http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_view erid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.Bas icNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.tem plate.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml -instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemai l.naml -- View this message in context: http://r.789695.n4.nabble.com/Decrete-value-check-in-a-matrix-tp4635090p4635118.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Specify model with polynomial interaction terms up to degree n
I would like to specify a model with all polynomial interaction terms between two variables, say, up to degree 6. For example, terms like a^6 + (a^5 * b^1) + (a^4 * b^2) + ... and so on. The documentation states The ^ operator indicates crossing to the specified degree. so I would expect a model specified as y ~ (a+b)^6 to produce these terms. However doing this only returns four slope coefficients, for Intercept, a, b, and a:b. Does anyone know how to produce the desired result? Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/Specify-model-with-polynomial-interaction-terms-up-to-degree-n-tp4635130.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Binary Quadratic Opt?
Hi, Petr, Hi Khris: If i understand the problem correctly, you have a list of (x,y) coordinates, where some sensor is located, but you do not know, which sensor is there. The database contains data for each sensor identified in some way, but you do not know the mapping between sensor identifiers from the database and the (x,y) coordinates. Is this correct? Yes. So I modelled the problem as inexact match between 2 Graphs. Since the best package on Graphs i.e. iGraph does not have any function for Graph matching I think, the problem is close to http://en.wikipedia.org/wiki/Graph_isomorphism You have estimates of the distances between the sensors using identifiers from the database. So, you know, which pairs of sensors are close. This is one graph. The other graph is the graph of closeness between the known (x,y) coordinates. You want to find a mapping between the vertices of these two graphs, which preserves edges. Yes, I agree the problem is more into Graph theoretic domain to be more precise inexact graph matching whose generalization is the Graph Isomorphism problem. The problem is more general than Graph Isomorphism. Let me define the problem more formally. We have 2 weighted undirected graphs. In one graph I know the distance of every vertex from every other vertex whereas in another graph I know only which vertices are close to a given vertex. So I know the neighboring vertices given a vertex. So the distance matrix of other Graph is incompletely known. So the question is can I find the best alignment between the 2 graphs. Ex:- G1 is know the complete distance matrix. For G2, if there are four vertices let's say (v1, v2, v3 v4) the I know edge weight (v1,v2) and (v1,v3) but have no information of edge weight(v1,v4). Similarly I know about (v2,v3) but no information about edge weights (v2,v4) or (v3,v4). So I was thinking of not to model it as general inexact Graph matching problem for then the complexity n^4. It seems the best way to model the solution is to consider only edges with are at distance of 1 unit i.e. closest edge from every vertex and not every edge from the given vertex. This will bring down the complexity from n^4 to 6*6*n^2 assuming every vertex has atmost 6 neighboring vertex. Quadratic complexity seems manageable. Ofcourse now the solution become lot more sensitive to the errors in Graph G2. Assuming best case if I have no errors in G2 i.e. for every vertex I know correctly it's closest neighbored in the rectangular grid then optimizing distance between G1 and G2 should give me best correct alignment. This seems to be the best approach under current circumstance. As far as implementation goes I think I still have to use optimization package since there are not any readily and freely available function for inexact graph matching. Petr how do you feel about it. Appreciate your feedback. Regards Khris. I converted the Inexact graph matching problem to Binary Quadratic Opt Problem. Since there is no specialized package for Binary Quadratic Opt, based on your input I converted it into Binary Linear Opt problem. The problem of graph isomorphism is hard in general, but if one of the graphs is a rectangular grid, which does not have too many automorphisms, the problem is not too hard. Try, for example, the following approach. Look for small groups of the sensors, which form connected subgraphs, which have the form of small pieces of the rectangular grid. If you have such a small subgraph, look for nodes, which can be add to the subgraph to make it a larger piece of the grid. To start, the algorithm can choose any sensor, say S_0. Find all its neighbours. There should be at most 4 neighbours (in an ideal grid). Call the group of these neighbours S_1. Then, find sensors, which are neighbours to at least two members of S_1. Call them group S_2. The connections between S_0, S_1 and S_2 should form a pattern like 2 - 1 - 2 | | | 1 - 0 - 1 | | | 2 - 1 - 2 The digits 0, 1, 2 distinguish elements of S_0, S_1, S_2. Continue this in order to enlarge this recognised pattern. If the grid is not ideal, the process may require to maintain several candidate connected patterns and choose those, which can be extended with further sensors and discard those, which cannot. Another approach is as follows. Choose a random mapping between the sensors and (x,y). Define a measure of the quality of the mapping. For example, the number of matching edges minus the number of non-matching edges. Then, use local search to maximize the quality. For example, in each step, exchange two sensors in a way, which increases the quality. Do you think that some of these approaches is applicable to your situation? Petr. __ [hidden email] mailing list
Re: [R] enquiry
Hi Jim, I tried, dat2-as.Date(dat1,format=%Y-%V) dat2 [1] 1951-07-02 1952-07-02 But, if the format is for -wk or -yy, then, not sure how this will help. A.K. - Original Message - From: jim holtman jholt...@gmail.com To: Karan Anand anand.kara...@gmail.com Cc: r-help@r-project.org Sent: Monday, July 2, 2012 9:04 AM Subject: Re: [R] enquiry try this: as.Date('1951-52', format = %Y-%j) [1] 1951-02-21 On Mon, Jul 2, 2012 at 5:44 AM, Karan Anand anand.kara...@gmail.com wrote: hi, i am new to using r .so if you can pls tell me how to read 1951-52 ,1952-52 date format in r [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot: dodge positions
Damn firewalls. However as I see the graph, the points are lined up exactly on the centreline of each boxplot. So for example the lowest outlier on grp1 boxplot for A is exactlly where it should be at the end of the whisker. Real outliers for grp1 for D are exactly above the 'non-existant' whisker for that boxplot. So I looks to me as if in my version of the plot it is what you want. Any change that I can email directly to you and get an attactment through? John Kane Kingston ON Canada -Original Message- From: thorn.tha...@rdls.nestle.com Sent: Mon, 2 Jul 2012 15:40:48 +0200 To: jrkrid...@inbox.com, r-help@r-project.org Subject: RE: [R] ggplot: dodge positions Unfortunately I can't see your example as the page is blocked by our firewall. Anyways, if I try the dodge code, the points are shifted, yet they are all shifted by another offset. It makes that the green points for instance are indeed closer to the green boxplot, yet they are not aligned meaning that all green plots seem to have a different position on the x-axis, while all the green points for x == A should align exactly with A. Am I clearer now? KR, -Thorn -Original Message- From: John Kane [mailto:jrkrid...@inbox.com] Sent: Montag, 2. Juli 2012 15:21 To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r-project.org Subject: RE: [R] ggplot: dodge positions I don't think I was clear. Sorry. What I was refering to was the ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point(aes(ymax=max(y)), position = position_dodge(width=.75)) which is giving me http://www.mediafire.com/i/?fdurpq6e6l8cu35 which was what I though you want. I have no idea how the x axis points on the boxplot are determined. It may be relatively clear in the code but I don't really have the knowledge to ferret it out. Sorry that I cannot be of more help. John Kane Kingston ON Canada -Original Message- From: thorn.tha...@rdls.nestle.com Sent: Mon, 2 Jul 2012 15:10:33 +0200 To: jrkrid...@inbox.com, r-help@r-project.org Subject: RE: [R] ggplot: dodge positions I guess it works with ggplot but not with ggplot2. I'm using only the latter but had a typo in my first post. So the code (which does not do what I want) is: library(ggplot2) ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y = runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4))) ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point() Thinking of it, I would need to find out which offset ggplot uses to dodge the nested factors. If I knew the exact quantity, I could do something like geom_point(aes(x = offset.used.by.geom_boxplot)) So how are the exact positions on the x-axis for geom_boxplot determined? Any ideas? Thanks for the help, anyways. KR, -Thorn -Original Message- From: John Kane [mailto:jrkrid...@inbox.com] Sent: Montag, 2. Juli 2012 15:04 To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r-project.org Subject: RE: [R] ggplot: dodge positions Can you expand a bit on what is wrong with the dodge option? From what I see it looks lovely witht the points exactly lined with the boxplots for each group but perhaps I don't understand exactly what you want . John Kane Kingston ON Canada -Original Message- From: thorn.tha...@rdls.nestle.com Sent: Mon, 2 Jul 2012 11:43:03 +0200 To: r-help@r-project.org Subject: [R] ggplot: dodge positions Dear all, I want to get a series of boxplots (grouped by two factors) and I want to overlay the original observations and the following code does almost what I want: library(ggplot) ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y = runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4))) ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point() Yet the position of the points and the position of the boxes on the x-axis is not the same. I would like that the points are shifted accordingly, such that they line up with the boxplots. I tried position_dodge: ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point(aes(ymax=max(y)), position = position_dodge(width=.75)) but that did not really help, as all points are now dodged and I just want to have a fixed offset for each subgroup of points such that the boxplot and the points are aligned. Any ideas? Kind Regards, Thorn Thaler Mathematician Applied Mathematics Nestec Ltd, Nestlé Research Center PO Box 44 CH-1000 Lausanne 26 Phone: +41 21 785 8220 Fax: +41 21 785 9486 GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with
Re: [R] Decrete value check in a matrix
Glad it works. So far we seem to have at least three ways to do it. R is amazing! John Kane Kingston ON Canada -Original Message- From: antony.akk...@ge.com Sent: Mon, 2 Jul 2012 06:04:03 -0700 (PDT) To: r-help@r-project.org Subject: Re: [R] Decrete value check in a matrix Good. Its working fine. Thank you John ! From: John Kane [via R] [mailto:ml-node+s789695n4635113...@n4.nabble.com] Sent: Monday, July 02, 2012 6:18 PM To: Akkara, Antony (GE Energy, Non-GE) Subject: Re: Decrete value check in a matrix You are not asking for a Decrete [sic] (descrete) value check but rather if the numbers are intergers. Try this: # from the ?is.integer help page is.wholenumber - function(x, tol = .Machine$double.eps^0.5) abs(x - round(x)) tol aa - data.frame( na = c( 1, 3.4, 5, 5.6), nb = c(2.4, 4, 6, 6.7)) ww - data.frame( is.wholenumber(aa)) cbind(aa, ww) John Kane Kingston ON Canada -Original Message- From: [hidden email] Sent: Mon, 2 Jul 2012 03:04:48 -0700 (PDT) To: [hidden email] Subject: [R] Decrete value check in a matrix Hi All, Here i have an Dataframe (or) Matrix like this, MyMatrix - ABC XYZ ----- 1 2.5 3.4 4 5 6 5.6 6.7 Here i need to check each column value having decrete value or not ?. If that particular coulmn-value having decrete value, then the result should be TRUE/FALSE respectively in the result column. Finally, i need to get the result as Dataframe (or) Matrix form like this ABC XYZ ABC_RESULT XYZ_RESULT ----- 1 2.5 TRUE FALSE 3.4 4FALSE TRUE 5 6TRUETRUE 5.6 6.7 FALSE FALSE - Can any one help me fast ? Antony. Receive Notifications of Incoming Messages Easily monitor multiple email accounts access them with a click. Visit http://www.inbox.com/notifier and check it out! __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Decrete-value-check-in-a-matrix-tp4635090p 4635113.html To unsubscribe from Decrete value check in a matrix, click here http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscrib e_by_codenode=4635090code=YW50b255LmFra2FyYUBnZS5jb218NDYzNTA5MHwxNTUx OTQzMDI5 . NAML http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_view erid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.Bas icNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.tem plate.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml -instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemai l.naml -- View this message in context: http://r.789695.n4.nabble.com/Decrete-value-check-in-a-matrix-tp4635090p4635118.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply with multiple conditions
Thanks for your reply Jean, I think your interpretation is correct but when I run your code I end up with the below dataframe and obviously the bins created there don't correspond to a chromStart change of 115341: chrom chromStart chromEnd name cumsum bin 1 chr1 1008910309 ZBTB33 10089 1 2 chr1 1013210536 TAF7_(SQ-8) 20221 2 3 chr2 1013310362 Pol2-4H8 30354 3 4 chr2 1014810418 MafF_(M8194) 40502 4 5 chr2 210382 210578 ZBTB33 50884 5 6 chr2 216132 216352 CTCF 67016 6 the first two rows should have the same bin number (same chrom, 115341 diff), then rows 34 should be in another bin (different chrom from rows 12, 115341 diff), and rows 56 in another one (same chrom but 115341 difference between row 4 and row 5). it seems the new.bin line of your code isn't quite doing what it should but I can't pinpoint the error there... Paul On 2 July 2012 14:19, Jean V Adams jvad...@usgs.gov wrote: Paul, My interpretation is that you are trying to assign a new bin number to a row every time the variable chrom changes and every time the variable chromStart changes by 115341 or more. Is that right? If so, you don't need a loop at all. Check out the code below. I made a couple changes to the all.tf7 example data frame so that it would have two changes in bin number, one based on the chrom variable and one based on the chromStart variable. Jean all.tf7 - data.frame( chrom = c(chr1, chr1, chr2, chr2, chr2, chr2), chromStart = c(10089, 10132, 10133, 10148, 210382, 216132), chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352), name = c(ZBTB33, TAF7_(SQ-8), Pol2-4H8, MafF_(M8194), ZBTB33, CTCF), cumsum = c(10089, 20221, 30354, 40502, 50884, 67016), bin = rep(NA, 6) ) # assign a new bin every time chrom changes and every time chromStart changes by 115341 or more L - nrow(all.tf7) prev.chrom - c(NA, all.tf7$chrom[-L]) delta.start - c(NA, all.tf7$chromStart[-1] - all.tf7$chromStart[-L]) new.bin - is.na(prev.chrom) | all.tf7$chrom != prev.chrom | delta.start = 115341 all.tf7$bin - cumsum(new.bin) all.tf7 pguilha paul.guilha...@gmail.com wrote on 07/02/2012 06:25:13 AM: Hello all, I have written a for loop to act on a dataframe with close to 3million rows and 6 columns and I would like to pass it to apply() to speed the process up (I let the loop run for 2 days before stopping it and it had only gone through 200,000 rows) but I am really struggling to find a way to pass the arguments. Below are the loop and the head of the dataframe I am working on. Any hints would be much appreciated, thank you! (I have searched for this but could not find any other posts doing quite what I want) Paul x-as.numeric(all.tf7[1,2]) for (i in 2:nrow(all.tf7)) { if (all.tf7[i,1]==all.tf7[i-1,1] (all.tf7[i,2]-x)115341) all.tf7[i,6]-all.tf7[i-1,6] else if (all.tf7[i,1]==all.tf7[i-1,1] (all.tf7[i,2]-x)=115341) { all.tf7[i,6]-(all.tf7[i-1,6]+1) x-as.numeric(all.tf7[i,2]) } else if (all.tf7[i,1]!=all.tf7[i-1,1]) { all.tf7[i,6]-(all.tf7[i-1,6]+1) x-as.numeric(all.tf7[i,2]) } } #the aim here is to attribute a bin number to each row so that I can then split the dataframe according to those bins. chrom chromStart chromEnd name cumsum bin chr1 10089 10309 ZBTB33 10089 1 chr1 10132 10536 TAF7_(SQ-8) 20221 1 chr1 10133 10362Pol2-4H8 30354 1 chr1 10148 10418 MafF_(M8194) 40502 1 chr1 10382 10578ZBTB33 50884 1 chr1 16132 16352CTCF 67016 1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot: dodge positions
Well, that is exactly what I wanted to have. And you were right, it had something to do with package versions. So I updated R and the plot looks exactly the way I wanted it. Thanks a lot for your help and time. KR, -Thorn -Original Message- From: John Kane [mailto:jrkrid...@inbox.com] Sent: Montag, 2. Juli 2012 15:58 To: Thaler,Thorn,LAUSANNE,Applied Mathematics Subject: RE: [R] ggplot: dodge positions Let's hope it makes it. Just in case my version is okay let me give you my sessionInfo in case we have some subtle difference in settings that is having an effect. Of course it may be the wrong graph. sessionInfo() R version 2.15.1 (2012-06-22) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_CA.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] RColorBrewer_1.0-5 plyr_1.7.1 reshape2_1.2.1 scales_0.2.1 [5] ggplot2_0.9.1 loaded via a namespace (and not attached): [1] colorspace_1.1-1 dichromat_1.2-4 digest_0.5.2 labeling_0.1 [5] MASS_7.3-18 memoise_0.1 munsell_0.3 proto_0.3-9.2 [9] stringr_0.6 tools_2.15.1 John Kane Kingston ON Canada -Original Message- From: thorn.tha...@rdls.nestle.com Sent: Mon, 2 Jul 2012 15:53:50 +0200 To: jrkrid...@inbox.com Subject: RE: [R] ggplot: dodge positions Yes, that would be nice if you could send the graph directly to me. Thanks a billion for your help and your time. KR, -Thorn -Original Message- From: John Kane [mailto:jrkrid...@inbox.com] Sent: Montag, 2. Juli 2012 15:52 To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r-project.org Subject: RE: [R] ggplot: dodge positions Damn firewalls. However as I see the graph, the points are lined up exactly on the centreline of each boxplot. So for example the lowest outlier on grp1 boxplot for A is exactlly where it should be at the end of the whisker. Real outliers for grp1 for D are exactly above the 'non-existant' whisker for that boxplot. So I looks to me as if in my version of the plot it is what you want. Any change that I can email directly to you and get an attactment through? John Kane Kingston ON Canada -Original Message- From: thorn.tha...@rdls.nestle.com Sent: Mon, 2 Jul 2012 15:40:48 +0200 To: jrkrid...@inbox.com, r-help@r-project.org Subject: RE: [R] ggplot: dodge positions Unfortunately I can't see your example as the page is blocked by our firewall. Anyways, if I try the dodge code, the points are shifted, yet they are all shifted by another offset. It makes that the green points for instance are indeed closer to the green boxplot, yet they are not aligned meaning that all green plots seem to have a different position on the x-axis, while all the green points for x == A should align exactly with A. Am I clearer now? KR, -Thorn -Original Message- From: John Kane [mailto:jrkrid...@inbox.com] Sent: Montag, 2. Juli 2012 15:21 To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r- project.org Subject: RE: [R] ggplot: dodge positions I don't think I was clear. Sorry. What I was refering to was the ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point(aes(ymax=max(y)), position = position_dodge(width=.75)) which is giving me http://www.mediafire.com/i/?fdurpq6e6l8cu35 which was what I though you want. I have no idea how the x axis points on the boxplot are determined. It may be relatively clear in the code but I don't really have the knowledge to ferret it out. Sorry that I cannot be of more help. John Kane Kingston ON Canada -Original Message- From: thorn.tha...@rdls.nestle.com Sent: Mon, 2 Jul 2012 15:10:33 +0200 To: jrkrid...@inbox.com, r-help@r-project.org Subject: RE: [R] ggplot: dodge positions I guess it works with ggplot but not with ggplot2. I'm using only the latter but had a typo in my first post. So the code (which does not do what I want) is: library(ggplot2) ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y = runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4))) ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point() Thinking of it, I would need to find out which offset ggplot uses to dodge the nested factors. If I knew the exact quantity, I could do something like geom_point(aes(x = offset.used.by.geom_boxplot)) So how are the exact positions on the x-axis for geom_boxplot determined? Any ideas? Thanks for the help, anyways. KR,
Re: [R] Adjusting length of series
On Jul 2, 2012, at 5:13 AM, Lekgatlhamang, lexi Setlhare wrote: Hi David and AK, I have been trying to implement your suggestions since yesterday, but I encountered some challenges. As for David's suggestions, I could only implement it after some modifications. Using an abridged version of my data, I dpud my dataset and then show my steps below. Well, your initial question (why the $ referencing did not work) is now answered. This is not a dataframe but rather a 'ts' classed object and there is no `$` method for such objects. They are really matrices with some extra attributes. ydata$BoBCL1 Error in ydata$BoBCL1 : $ operator is invalid for atomic vectors As I understood it you were able to get useful analyses using the formula methods for lm on these objects, but were just having difficulty with the $ operator. So the answer is . don't do that. -- David. dput(ydata) structure(c(68.10004, -34.80002, 90.39996, 54.60004, -172.3, 51.80002, 175, 79.80002, -35.70007, 130.5, 116.8, -67.5, 164.5, 514.8, -326.1, 98.40005, 160.2, 53.19998, 283.6, -111.6, 127.8, -17.30002, 286.3, NA, NA, -102.9001, 125.2, -35.79993, -226.9001, 224.1, 123.2, -95.19998, -115.5001, 166.2001, -13.69998, -184.3, 232, 350.3, -840.9001, 424.5001, 61.79993, -107, 230.4001, -395.2001, 239.4001, -145.1, 303.6, NA, NA, NA, 228.1, -160., -191.1001, 451.0001, -100.9001, -218.4, -20.30011, 281.7002, -179.9001, -170.6, 416.3, 118.3, -1191.2, 1265.4, -362.7002, -168.7999, 337.4001, -625.6001, 634.6001, -384.5001, 448.7001, NA, NA, -164.45784099, 17.079353995, 95.976788009, 680.23816699, -491.34869099, -274.694009, -256.332907, 469.62296, -146.431891, -41.077201995, -106.970104, 757.68826399, -1689.214533, 2320.098952, -1446.97942, 516.384521, -375.27765099, 293.86702999, 417.845195, 278.198807, -968.59203399, -314.195986, NA, NA, NA, 181.53719499, 78.897434013, 584.26137898, -1171.586858, 216.65468199, 18.361101998, 725.955867, -616.054851, 105.35468901, -65.892902005, 864.65836799, -2446.902797, 4009.313485, -3767.078372, 1963.363941, -891.66217199, 669.14468099, 123.978165, -139.646388, -1246.790841, 654.396048, NA, 4937, 5005.1, 4970.3, 5060.7, 5115.3, 4943, 4994.8, 5169.8, 5249.6, 5213.9, 5344.4, 5461.2, 5393.7, 5558.2, 6073, 5746.9, 5845.3, 6005.5, 6058.7, 6342.3, 6230.7, 6358.5, 6341.2, 6627.5, 4187.5, 4296.004835, 4240.051829, 4201.178177, 4258.281313, 4995.622616, 5241.615228, 5212.913831, 4927.879527, 5112.468183, 5150.624948, 5147.704511, 5037.81397, 5685.611693, 4644.194883, 5922.877025, 5754.579747, 6102.66699, 6075.476582, 6342.153204, 7026.675021, 7989.395645, 7983.524235, 7663.456839), .Dim = c(24L, 7L), .Dimnames = list( NULL, c(DCred1, DCred2, DCred3, DBoBC2, DBoBC3, CredL1, BoBCL1)), .Tsp = c(2001.083, 2003, 12 ), class = c(mts, ts)) NB: the NAs in the dataset emanated from lagging or differencing the series David's suggestion df-data.frame(DCred1,DCred2,DCred3,DBoBC2,DBoBC3,CredL1,BoBCL1) Error in data.frame(DCred1, DCred2, DCred3, DBoBC2, DBoBC3, CredL1, BoBCL1) : arguments imply differing number of rows: 23, 22, 21, 24 So I modified as follows: length(DCred3) # finding the minimum length of various series [1] 21 # Then dataframe construction dframe- data.frame(Dcre1=DCred1[1:21],Dcre2=DCred2[1:21],Dcre3=DCred3[1:21], + Dbobc2 = DBoBC2 [1:21],Dbobc3=DBoBC3[1:21],CredL=CredL1[1:21],BoBCL=BoBCL1[1:21]) # Then estimated regression regCred- lm(Dcre1~Dcre2+Dcre3+Dbobc2+Dbobc3+CredL+BoBCL, data=dframe) summary(regCred) # Worked well as shown by results below Call: lm(formula = Dcre1 ~ Dcre2 + Dcre3 + Dbobc2 + Dbobc3 + CredL + BoBCL, data = dframe) Residuals: Min 1Q Median 3Q Max -69.516 -27.695 -8.085 13.851 107.276 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 159.32304 157.15209 1.014 0.327873 Dcre2-0.755270.17262 -4.375 0.000634 *** Dcre3-0.210060.08656 -2.427 0.029329 * Dbobc20.051110.06565 0.779 0.449197 Dbobc30.031060.03510 0.885 0.391108 CredL-0.109670.04933 -2.223 0.043177 * BoBCL 0.097560.03097 3.150 0.007087 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 52.3 on 14 degrees of freedom Multiple R-squared: 0.9331, Adjusted R-squared: 0.9044 F-statistic: 32.55 on 6 and 14 DF, p-value: 1.911e-07 This is good, but couldn't I code the process for my 15 variable model? Perhaps that is where the use of Dcr- lapply(..., function(x) ...) comes in? AK, if you spare some minutes,
Re: [R] R sub query
On Jul 2, 2012, at 4:15 AM, Sarah Auburn wrote: Hello, I would like to substitute a substring of characters defined by a specific start and end sequence. i.e. in the example matrix below, I would like to substitute .:X: with , where X varies in sequence... m-matrix(c(.:0:0,0, .:2:0,2, .:194:193,1, .:56:0,56, .: 58:50,8, .:13:0,13, .:114:114,0, .:75:75,0), nrow=2) sub(\\..+\\:, , m) [,1] [,2][,3] [,4] [1,] 0,0 193,1 50,8 114,0 [2,] 0,2 0,56 0,13 75,0 You should also look at Holtman's since he is better at this than I am but I didn't really understand how his version worked. Mine is really in three parts. The first entry '\\.' matches the leading dot and it could have been '^\\.' to avoid any confusion with decimal points. The second entry is '.+' which is anything until the third entry '\\:' which ends up matching the last ':' since these are greedy expressions. You could also have done it with \\.\\:.+\\: (Now that I look at his again ^\\.:[^:]*: , I find that I can learn something from it, as often happens when I read his contributions. To my surprise the ':' character does not need to be escaped but can be and the interior of his expression '[^:]' is a negative character- class. It matches anything other than ':' and the '*' following it lets that anything be of any length. And then he didn't need to escape the trailing ':'.) -- David. output required: [,1] [,2] [,3][,4] [1,] 0,0 193,1 50,8 114,0 [2,] 0,2 0,56 0,13 75,0 Thank you for any help Sarah [[alternative HTML version deleted]] -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Specify model with polynomial interaction terms up to degree n
On Jul 2, 2012, at 9:29 AM, YTP wrote: I would like to specify a model with all polynomial interaction terms between two variables, say, up to degree 6. For example, terms like a^6 + (a^5 * b^1) + (a^4 * b^2) + ... and so on. The documentation states The ^ operator indicates crossing to the specified degree. so I would expect a model specified as y ~ (a+b)^6 to produce these terms. However doing this only returns four slope coefficients, for Intercept, a, b, and a:b. Does anyone know how to produce the desired result? Thanks in advance. You might try: poly(a,6)*poly(b,6) (untested ... and it looks somewhat dangerous to me.) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply with multiple conditions
Paul, Are you submitting the exact code that I included in my previous e-mail? When I submit that code, I get this ... chrom chromStart chromEnd name cumsum bin 1 chr1 1008910309 ZBTB33 10089 1 2 chr1 1013210536 TAF7_(SQ-8) 20221 1 3 chr2 1013310362 Pol2-4H8 30354 2 4 chr2 1014810418 MafF_(M8194) 40502 2 5 chr2 210382 210578 ZBTB33 50884 3 6 chr2 216132 216352 CTCF 67016 3 Jean Paul Guilhamon paul.guilha...@gmail.com wrote on 07/02/2012 08:59:00 AM: Thanks for your reply Jean, I think your interpretation is correct but when I run your code I end up with the below dataframe and obviously the bins created there don't correspond to a chromStart change of 115341: chrom chromStart chromEnd name cumsum bin 1 chr1 1008910309 ZBTB33 10089 1 2 chr1 1013210536 TAF7_(SQ-8) 20221 2 3 chr2 1013310362 Pol2-4H8 30354 3 4 chr2 1014810418 MafF_(M8194) 40502 4 5 chr2 210382 210578 ZBTB33 50884 5 6 chr2 216132 216352 CTCF 67016 6 the first two rows should have the same bin number (same chrom, 115341 diff), then rows 34 should be in another bin (different chrom from rows 12, 115341 diff), and rows 56 in another one (same chrom but 115341 difference between row 4 and row 5). it seems the new.bin line of your code isn't quite doing what it should but I can't pinpoint the error there... Paul On 2 July 2012 14:19, Jean V Adams jvad...@usgs.gov wrote: Paul, My interpretation is that you are trying to assign a new bin number to a row every time the variable chrom changes and every time the variable chromStart changes by 115341 or more. Is that right? If so, you don't need a loop at all. Check out the code below. I made a couple changes to the all.tf7 example data frame so that it would have two changes in bin number, one based on the chrom variable and one based on the chromStart variable. Jean all.tf7 - data.frame( chrom = c(chr1, chr1, chr2, chr2, chr2, chr2), chromStart = c(10089, 10132, 10133, 10148, 210382, 216132), chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352), name = c(ZBTB33, TAF7_(SQ-8), Pol2-4H8, MafF_(M8194), ZBTB33, CTCF), cumsum = c(10089, 20221, 30354, 40502, 50884, 67016), bin = rep(NA, 6) ) # assign a new bin every time chrom changes and every time chromStart changes by 115341 or more L - nrow(all.tf7) prev.chrom - c(NA, all.tf7$chrom[-L]) delta.start - c(NA, all.tf7$chromStart[-1] - all.tf7$chromStart[-L]) new.bin - is.na(prev.chrom) | all.tf7$chrom != prev.chrom | delta.start = 115341 all.tf7$bin - cumsum(new.bin) all.tf7 pguilha paul.guilha...@gmail.com wrote on 07/02/2012 06:25:13 AM: Hello all, I have written a for loop to act on a dataframe with close to 3million rows and 6 columns and I would like to pass it to apply() to speed the process up (I let the loop run for 2 days before stopping it and it had only gone through 200,000 rows) but I am really struggling to find a way to pass the arguments. Below are the loop and the head of the dataframe I am working on. Any hints would be much appreciated, thank you! (I have searched for this but could not find any other posts doing quite what I want) Paul x-as.numeric(all.tf7[1,2]) for (i in 2:nrow(all.tf7)) { if (all.tf7[i,1]==all.tf7[i-1,1] (all.tf7[i,2]-x)115341) all.tf7[i,6]-all.tf7[i-1,6] else if (all.tf7[i,1]==all.tf7[i-1,1] (all.tf7[i,2]-x)=115341) { all.tf7[i,6]-(all.tf7[i-1,6]+1) x-as.numeric(all.tf7[i,2]) } else if (all.tf7[i,1]!=all.tf7[i-1,1]) { all.tf7[i,6]-(all.tf7[i-1,6]+1) x-as.numeric(all.tf7[i,2]) } } #the aim here is to attribute a bin number to each row so that I can then split the dataframe according to those bins. chrom chromStart chromEnd name cumsum bin chr1 10089 10309 ZBTB33 10089 1 chr1 10132 10536 TAF7_(SQ-8) 20221 1 chr1 10133 10362Pol2-4H8 30354 1 chr1 10148 10418 MafF_(M8194) 40502 1 chr1 10382 10578ZBTB33 50884 1 chr1 16132 16352CTCF 67016 1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] interpolation to new points between geo coordinates
Jan, There are a lot of packages that can help you, the best one depends on your needs (with or without prediction uncertainty, format of results, different options) and the size of your problem. CRAN has a spatial Task View http://cran.r-project.org/web/views/Spatial.html with a short description of most packages dealing with spatial data. I think the functions you mentioned should be able to solve your problems, but I dont have experience with either of them. It is impossible to know what you are doing wrong as you did not post any error messages. For increasing the resolution of your data, you can also try disaggregate or resample in the raster package. gstat, with automap or intamap as simpler interfaces can also be used for geostatistical interpolation to the higher resolution grid, also giving you a prediction uncertainty. You should in general be careful with interpolation of lat-lon data, consider using spTransform to get projected coordinates if you use any of the geostatistical methods. You will for spatial questions generally get quicker response from the r-sig-geo mailinglist. Best wishes, Jon On 02-Jul-12 10:47, Jan Näs wrote: Hi I have a data set with geo coordinates and values for each coordinate. I want to interpolate the values to new positions on a finer grid, also geo coordinates. I have looked at the fields package (interp.surface) and the akima package (interp) but cant quite figure what I am doing wrong, or if these functions suits my needs. I have the two data set: grid_1: lat lon value 1 56.5 11.1 53 2 56.6 11.1 53.1 3 56.7 11.12 52.1 4 56.5 11.2 52.9 ...etc. and a new grid grid_2 lat lon 1 55.52 11.11 2 55.53 11.115 3 55.54 11.12 ...etc. And I want interpolated values for grid_2. Any ideas? /Jan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jon Olav Skøien Joint Research Centre - European Commission Institute for Environment and Sustainability (IES) Land Resource Management Unit Via Fermi 2749, TP 440, I-21027 Ispra (VA), ITALY jon.sko...@jrc.ec.europa.eu Tel: +39 0332 789206 Disclaimer: Views expressed in this email are those of the individual and do not necessarily represent official views of the European Commission. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Specify model with polynomial interaction terms up to degree n
On Jul 2, 2012, at 10:51 AM, David Winsemius wrote: On Jul 2, 2012, at 9:29 AM, YTP wrote: I would like to specify a model with all polynomial interaction terms between two variables, say, up to degree 6. For example, terms like a^6 + (a^5 * b^1) + (a^4 * b^2) + ... and so on. The documentation states The ^ operator indicates crossing to the specified degree. so I would expect a model specified as y ~ (a+b)^6 to produce these terms. However doing this only returns four slope coefficients, for Intercept, a, b, and a:b. Does anyone know how to produce the desired result? Thanks in advance. You might try: poly(a,6)*poly(b,6) (untested ... and it looks somewhat dangerous to me.) Well, now it's tested and succeeds at least numerically. Also tested ( poly(a,6) +poly(b,6) )^2 with identical results. Whether this is wise practice remains in doubt: dfrm - data.frame(out=rnorm(100), a=rnorm(100), b=rnorm(100) ) anova(lm( out ~ (poly(a,6) +poly(b,6) )^2, data=dfrm) ) #--- Analysis of Variance Table Response: out Df Sum Sq Mean Sq F value Pr(F) poly(a, 6) 6 12.409 2.06810 3.0754 0.01202 * poly(b, 6) 6 5.321 0.88675 1.3187 0.26596 poly(a, 6):poly(b, 6) 36 41.091 1.14142 1.6974 0.04069 * Residuals 51 34.295 0.67246 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using na.locf from package zoo to fill NA gaps
Hi everybody, I have a small question about the function na.locf from the package zoo. I saw in the help that this function is able to fill NA gaps with the last value before the NA gap (or with the next value). But it is possible to fill my NA gaps according to the last AND the next value at the same time? Actually, I want R to fill my gaps with the method of na.locf only if the last value before the gap and the next value after the gap are identical. Here's an example: imagine this small DF: df - data.frame(x1=c(1:3,NA,NA,NA,6:9)) In this case, the last value before NA (3) and the next value after NA (6) are different, so I don't want him to fill this gap. But if I have a DF like this: df2 - data.frame(x2=c(1:3,NA,NA,NA,3:6)) The last and next value (3) are identical, so in this case I want him to fill my gap with 3 as would do the na.locf function: na.locf(df2) But as you understood, I want to do this only if last and next value are identical. If they're not, I want to keep my NA gap. Have you any idea how I can do this (maybe something to add to na.locf or maybe another better function to do this)? Thank you very much! -- View this message in context: http://r.789695.n4.nabble.com/using-na-locf-from-package-zoo-to-fill-NA-gaps-tp4635150.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using na.locf from package zoo to fill NA gaps
On Mon, Jul 2, 2012 at 11:17 AM, jeff6868 geoffrey_kl...@etu.u-bourgogne.fr wrote: Hi everybody, I have a small question about the function na.locf from the package zoo. I saw in the help that this function is able to fill NA gaps with the last value before the NA gap (or with the next value). But it is possible to fill my NA gaps according to the last AND the next value at the same time? Actually, I want R to fill my gaps with the method of na.locf only if the last value before the gap and the next value after the gap are identical. Here's an example: imagine this small DF: df - data.frame(x1=c(1:3,NA,NA,NA,6:9)) In this case, the last value before NA (3) and the next value after NA (6) are different, so I don't want him to fill this gap. But if I have a DF like this: df2 - data.frame(x2=c(1:3,NA,NA,NA,3:6)) The last and next value (3) are identical, so in this case I want him to fill my gap with 3 as would do the na.locf function: na.locf(df2) But as you understood, I want to do this only if last and next value are identical. If they're not, I want to keep my NA gap. Have you any idea how I can do this (maybe something to add to na.locf or maybe another better function to do this)? Try doing it forwards and backwards and only replacing if they are the same: library(zoo) na.locf.ifeq - function(x) { ix - na.locf(x) == na.locf(x, fromLast = TRUE) is.na(x) replace(x, ix, na.locf(x)[ix]) } # test 1 x1 - c(1, 2, 3, NA, NA, NA, 6, 7, 8, 9) na.locf.ifeq(x1) # test 2 x2 - c(1, 2, 3, NA, NA, NA, 3, 4, 5, 6) na.locf.ifeq(x2) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Heat Maps
Something like this? image(x, y, outer(x, y, u), breaks=c(0, a), col=heat.colors(3)) contour(x, y, outer(x, y, u),levels=a, col=blue, add=TRUE) -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Akhil dua Sent: Monday, July 02, 2012 2:26 AM To: Joseph Clark Cc: r-help@r-project.org Subject: Re: [R] Heat Maps Thanks Joseph but see i am not able to get heat maps with this code \ can u please give me the full codes to generate heat map on the same graph where i have drawn contour lines [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fit circle with R
Dear Researchers, I wrote two function to fit a circle using noisy data. 1- the fitCircle() is derived from MATLAB code of * zhak Bucher* from the link http://www.mathworks.com/matlabcentral/fileexchange/5557-circle-fit/content/circfit.m 2- the CircleFitByPratt() from MATLAB code of *Nikolai Chernov *from the link http://www.mathworks.com/matlabcentral/fileexchange/22643-circle-fit-pratt-method/content/CircleFitByPratt.m, based on: *V. Pratt, Direct least-squares fitting of algebraic surfaces, Computer Graphics, Vol. 21, pages 145-152 (1987)* I am looking for new methods to compare and improve my analysis because the error increase with decreasing of points used in the functions. Thanks for all suggestions Gianni Here the funtions with example # fitCircle, returns: # xf,yf = centre of the fitted circle # Rf = radius of the fitted circle # Cf = circumference of the fitted circle # Af = Area of the fitted circle fitCircle - function(x,y){ a = qr.solve(cbind(x,y,rep(1,length(x))),cbind(-(x^2+y^2))) xf = -.5*a[1] yf = -.5*a[2] Rf = sqrt((a[1]^2+a[2]^2)/4-a[3]) Cf = 2*pi*Rf Af = pi*(Rf^2) m - cbind(xf,yf,Rf,Cf,Af) return(m)} # CircleFitByPratt, returns: # [,1] and [,2] = centre of the fitted circle # [,3] = radius of fitted cirlce CircleFitByPratt - function(x,y){ n - length(x) centroid - cbind(mean(x),mean(y)) Mxx=0; Myy=0; Mxy=0; Mxz=0; Myz=0; Mzz=0; for(i in 1:n){ Xi - x[[i]] - centroid[1] Yi - y[[i]] - centroid[2] Zi - (Xi*Xi) + (Yi*Yi) Mxy = Mxy + Xi*Yi; Mxx = Mxx + Xi*Xi; Myy = Myy + Yi*Yi; Mxz = Mxz + Xi*Zi; Myz = Myz + Yi*Zi; Mzz = Mzz + Zi*Zi; } Mxx = Mxx/n Myy = Myy/n Mxy = Mxy/n Mxz = Mxz/n Myz = Myz/n Mzz = Mzz/n # computing the coefficients of the characteristic polynomial Mz = Mxx + Myy; Cov_xy = Mxx*Myy - Mxy*Mxy; Mxz2 = Mxz*Mxz; Myz2 = Myz*Myz; A2 = 4*Cov_xy - 3*Mz*Mz - Mzz; A1 = Mzz*Mz + 4*Cov_xy*Mz - Mxz2 - Myz2 - Mz*Mz*Mz; A0 = Mxz2*Myy + Myz2*Mxx - Mzz*Cov_xy - 2*Mxz*Myz*Mxy + Mz*Mz*Cov_xy; A22 = A2 + A2; epsilon=1e-12; ynew=1e+20; IterMax=20; xnew = 0; # Newton's method starting at x=0 epsilon=1e-12; ynew=1e+20; IterMax=20; xnew = 0; iter=1:IterMax for (i in 1:IterMax){ yold = ynew; ynew = A0 + xnew*(A1 + xnew*(A2 + 4.*xnew*xnew)); if (abs(ynew) abs(yold)){ print('Newton-Pratt goes wrong direction: |ynew| |yold|') xnew = 0; break } Dy = A1 + xnew*(A22 + 16*xnew*xnew); xold = xnew; xnew = xold - ynew/Dy; if (abs((xnew-xold)/xnew) epsilon) {break} if(iter[[i]] = IterMax){ print('Newton-Pratt will not converge'); xnew = 0; } if(xnew 0.){ print('Newton-Pratt negative root: x=',xnew); } } DET = xnew*xnew - xnew*Mz + Cov_xy; Center = cbind(Mxz*(Myy-xnew)-Myz*Mxy , Myz*(Mxx-xnew)-Mxz*Mxy)/DET/2; #computing the circle parameters DET = xnew*xnew - xnew*Mz + Cov_xy; Center = cbind(Mxz*(Myy-xnew)-Myz*Mxy , Myz*(Mxx-xnew)-Mxz*Mxy)/DET/2; Par = cbind(Center+centroid , sqrt(Center[2]*Center[2]+Mz+2*xnew)); return(Par) } #EXAMPLE library(plotrix) # Create a Circle of radius=10, centre=5,5 R = 10; x_c = 5; y_c = 5; thetas = seq(0,pi,(pi/64)); xs = x_c + R*cos(thetas) ys = y_c + R*sin(thetas) # Now add some random noise mult = 0.5; xs = xs+mult*rnorm(rnorm(xs)); ys = ys+mult*rnorm(rnorm(ys)); plot(xs,ys,pch=19,cex=0.5,col=red,xlim=c(-10,20),ylim=c(-10,20),asp=1) # real circle draw.circle(x_c,y_c,radius=10,border=black) points(x_c,y_c,,pch=4,col=black) CPrat - CircleFitByPratt(xs,ys) draw.circle(CPrat[1],CPrat[2],radius=CPrat[3],border=blue) points(CPrat[1],CPrat[2],pch=4,col=blue) MyC - fitCircle(xs,ys) draw.circle(MyC[1],MyC[2],radius=MyC[3],border=green) points(MyC[1],MyC[2],pch=4,col=green) # Select less points points(xs[20:49],ys[20:49]) MyC1 - fitCircle(xs[20:49],ys[20:49]) draw.circle(MyC1[1],MyC1[2],radius=MyC1[3],border=blue,lty=2,lwd=2) CPrat1 - CircleFitByPratt(xs[20:49],ys[20:49]) draw.circle(CPrat1[1],CPrat1[2],radius=CPrat1[3],border=green,lty=2,lwd=2) points(CPrat[1],CPrat[2],pch=4,col=red) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Undocumented behavior around daylight savings time?
Apologies for the intrusion. I am a lurker on list. I have been working to convert a digitized signal from a matlab file into R for analysis and other applications. R.matlab is working fine, and it is easy to convert the matlab date-time number (days since year 0) into R date-time numbers (seconds since 1970-01-01). Unfortunately, when I cast the R date-time number into POSIXct format it seems to adjust silently by one hour to reflect daylight savings time, but I have been unable to suppress that behavior. (The problem is that the matlab date is already in MDT, and I don't want to have to write my own code to suppress the added hour only when local DST rules apply.) To use the example of the R date-time number 1340717324, this is the behavior I observe: as.POSIXct(1340717324,origin='1970-01-01') [1] 2012-06-26 14:28:44 MDT as.POSIXct(1340717324,origin='1970-01-01',tz='') [1] 2012-06-26 14:28:44 MDT as.POSIXct(1340717324,origin='1970-01-01',tz='America/Denver') [1] 2012-06-26 14:28:44 MDT as.POSIXct(1340717324,origin='1970-01-01',tz='MST') [1] 2012-06-26 13:28:44 MST as.POSIXct(1340717324,origin='1970-01-01',tz='UTC') [1] 2012-06-26 13:28:44 UTC I was ultimately able to solve the problem by casting into and out of a character string, but that seems risky/error prone. rdates='%Y-%m-%d %H:%M:%S' as.POSIXct(strptime(as.character(as.POSIXct(1340717324,origin='1970-01-01',tz='UTC')),format=rdates)) [1] 2012-06-26 13:28:44 MDT I have read the various help entries and even investigated the lubridates package, but none indicate why exactly the extra hour is being added or how to suppress it. (Note that I tried as.POSIXlt with various settings of isdst, none of which worked). I'm using R version 2.13.1 (2011-07-08) on x86_64-apple-darwin9.8.0. -- Samuel Brown [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using na.locf from package zoo to fill NA gaps
Seems to work very well! Thank you very much Gabor! -- View this message in context: http://r.789695.n4.nabble.com/using-na-locf-from-package-zoo-to-fill-NA-gaps-tp4635150p4635160.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] save conditions in a list
Hi how would you save conditions like a = day 100; b = val 50; c = year == 2012 in a list? I like to have variables like day, val, year and a list of conditions list(a,b,c). Then I want to check if a b c is true or if a | b | c is true or similar things. Greetings Christof __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Specify model with polynomial interaction terms up to degree n
Inline below. -- Bert On Mon, Jul 2, 2012 at 8:04 AM, David Winsemius dwinsem...@comcast.netwrote: On Jul 2, 2012, at 10:51 AM, David Winsemius wrote: On Jul 2, 2012, at 9:29 AM, YTP wrote: I would like to specify a model with all polynomial interaction terms between two variables, say, up to degree 6. For example, terms like a^6 + (a^5 * b^1) + (a^4 * b^2) + ... and so on. The documentation states The ^ operator indicates crossing to the specified degree. so I would expect a model specified as y ~ (a+b)^6 to produce these terms. However doing this only returns four slope coefficients, for Intercept, a, b, and a:b. Does anyone know how to produce the desired result? Thanks in advance. You might try: poly(a,6)*poly(b,6) (untested ... and it looks somewhat dangerous to me.) Well, now it's tested and succeeds at least numerically. Also tested ( poly(a,6) +poly(b,6) )^2 with identical results. Whether this is wise practice remains in doubt: No it doesn't. It isn't. -- Bert dfrm - data.frame(out=rnorm(100), a=rnorm(100), b=rnorm(100) ) anova(lm( out ~ (poly(a,6) +poly(b,6) )^2, data=dfrm) ) #--- Analysis of Variance Table Response: out Df Sum Sq Mean Sq F value Pr(F) poly(a, 6) 6 12.409 2.06810 3.0754 0.01202 * poly(b, 6) 6 5.321 0.88675 1.3187 0.26596 poly(a, 6):poly(b, 6) 36 41.091 1.14142 1.6974 0.04069 * Residuals 51 34.295 0.67246 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 -- David Winsemius, MD West Hartford, CT __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Specify model with polynomial interaction terms up to degree n
Hello, Another way is to cbind the vectors 'a' and 'b', but this needs argument 'raw' set to TRUE. poly(cbind(a, b), 6, raw=TRUE) To the OP: is this time series related? With 6 being a lag or test (e.g., Tsay, 1986) order? I'm asking this because package nlts has a function for this test up to order 5 and it uses poly(). Hope this helps, Rui Barradas Em 02-07-2012 16:04, David Winsemius escreveu: On Jul 2, 2012, at 10:51 AM, David Winsemius wrote: On Jul 2, 2012, at 9:29 AM, YTP wrote: I would like to specify a model with all polynomial interaction terms between two variables, say, up to degree 6. For example, terms like a^6 + (a^5 * b^1) + (a^4 * b^2) + ... and so on. The documentation states The ^ operator indicates crossing to the specified degree. so I would expect a model specified as y ~ (a+b)^6 to produce these terms. However doing this only returns four slope coefficients, for Intercept, a, b, and a:b. Does anyone know how to produce the desired result? Thanks in advance. You might try: poly(a,6)*poly(b,6) (untested ... and it looks somewhat dangerous to me.) Well, now it's tested and succeeds at least numerically. Also tested ( poly(a,6) +poly(b,6) )^2 with identical results. Whether this is wise practice remains in doubt: dfrm - data.frame(out=rnorm(100), a=rnorm(100), b=rnorm(100) ) anova(lm( out ~ (poly(a,6) +poly(b,6) )^2, data=dfrm) ) #--- Analysis of Variance Table Response: out Df Sum Sq Mean Sq F value Pr(F) poly(a, 6) 6 12.409 2.06810 3.0754 0.01202 * poly(b, 6) 6 5.321 0.88675 1.3187 0.26596 poly(a, 6):poly(b, 6) 36 41.091 1.14142 1.6974 0.04069 * Residuals 51 34.295 0.67246 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Undocumented behavior around daylight savings time?
Set your default local timezone (at least while converting to POSIXt types: Sys.setenv(TZ=Etc/GMT+7) --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Samuel Brown samuelbr...@gmail.com wrote: Apologies for the intrusion. I am a lurker on list. I have been working to convert a digitized signal from a matlab file into R for analysis and other applications. R.matlab is working fine, and it is easy to convert the matlab date-time number (days since year 0) into R date-time numbers (seconds since 1970-01-01). Unfortunately, when I cast the R date-time number into POSIXct format it seems to adjust silently by one hour to reflect daylight savings time, but I have been unable to suppress that behavior. (The problem is that the matlab date is already in MDT, and I don't want to have to write my own code to suppress the added hour only when local DST rules apply.) To use the example of the R date-time number 1340717324, this is the behavior I observe: as.POSIXct(1340717324,origin='1970-01-01') [1] 2012-06-26 14:28:44 MDT as.POSIXct(1340717324,origin='1970-01-01',tz='') [1] 2012-06-26 14:28:44 MDT as.POSIXct(1340717324,origin='1970-01-01',tz='America/Denver') [1] 2012-06-26 14:28:44 MDT as.POSIXct(1340717324,origin='1970-01-01',tz='MST') [1] 2012-06-26 13:28:44 MST as.POSIXct(1340717324,origin='1970-01-01',tz='UTC') [1] 2012-06-26 13:28:44 UTC I was ultimately able to solve the problem by casting into and out of a character string, but that seems risky/error prone. rdates='%Y-%m-%d %H:%M:%S' as.POSIXct(strptime(as.character(as.POSIXct(1340717324,origin='1970-01-01',tz='UTC')),format=rdates)) [1] 2012-06-26 13:28:44 MDT I have read the various help entries and even investigated the lubridates package, but none indicate why exactly the extra hour is being added or how to suppress it. (Note that I tried as.POSIXlt with various settings of isdst, none of which worked). I'm using R version 2.13.1 (2011-07-08) on x86_64-apple-darwin9.8.0. -- Samuel Brown [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Insert row in specific location between data frames
I have already follow your step, it still not work when I merge groupA and groupB , the error message was shown Error in rbind(deparse.level, ...) : replacement has length zero -- View this message in context: http://r.789695.n4.nabble.com/Insert-row-in-specific-location-between-data-frames-tp4634905p4635153.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Specifying Transfer Function in Time series Intervention model
Hi Team, I am running ARIMAX with TSA package. my code is fit2 - arimax(yseries, order = c(1,0,1),xtransf = data.frame(X1var),transfer=list(c(1,0))) my question is 1st Q.-- If I need to take difference of X1var then what should i do?. What i am doing like submitting R code as X1vard - diff(X1var) and then i am including in the xtransf. Same time if i need to take difference of yseries yseriesd-diff(yseries) here ARIMAX order = c(1,0,1) i have to keep. ARIMAX order = c(1,1,1) is giving error. Hope this is correct procedure. 2nd Q --In the transfer = list(c(1,0), 1 is Autoregressive operator function and 0 is Moving average operator. I am not able to change the value of MA operator. I am receiving an error message Error in optim(init[mask], armaCSS, method = BFGS, hessian = FALSE, : initial value in 'vmmin' is not finite I am using R 2.15.1 version. 3rd Q -- If i need to take Lag values of X1var. how to incorporate in the model. Warm regards, Subhadip -- View this message in context: http://r.789695.n4.nabble.com/Specifying-Transfer-Function-in-Time-series-Intervention-model-tp4635133.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adjusting length of series
Hello, The class of your data is not dataframe. Suppose I call your data as ydat1 str(ydat1) mts [1:24, 1:7] 68.1 -34.8 90.4 54.6 -172.3 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:7] DCred1 DCred2 DCred3 DBoBC2 ... - attr(*, tsp)= num [1:3] 2001 2003 12 - attr(*, class)= chr [1:2] mts ts ydat2-data.frame(ydat1) str(ydat2) 'data.frame': 24 obs. of 7 variables: $ DCred1: num 68.1 -34.8 90.4 54.6 -172.3 ... $ DCred2: num NA -102.9 125.2 -35.8 -226.9 ... $ DCred3: num NA NA 228 -161 -191 ... $ DBoBC2: num NA -164.5 17.1 96 680.2 ... $ DBoBC3: num NA NA 181.5 78.9 584.3 ... $ CredL1: num 4937 5005 4970 5061 5115 ... $ BoBCL1: num 4188 4296 4240 4201 4258 ... #Since you wanted only to do lm for these columns, I guess it doesn't really matter whether you have month and year in the dataset. #With NAs regCred-lm(DCred1~DCred2+DCred3+DBoBC2+DBoBC3+CredL1+BoBCL1,data=ydat2) summary(regCred) Call: lm(formula = DCred1 ~ DCred2 + DCred3 + DBoBC2 + DBoBC3 + CredL1 + BoBCL1, data = ydat2) Residuals: Min 1Q Median 3Q Max -124.988463 -33.133975 7.971083 23.607953 76.813601 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -538.61375718 205.91179535 -2.61575 0.020344 * DCred2 0.96401908 0.15623660 6.17025 2.4337e-05 *** DCred3 -0.25720355 0.08983607 -2.86303 0.012524 * DBoBC2 -0.11222347 0.07828182 -1.43358 0.173646 DBoBC3 0.04564621 0.03825169 1.19331 0.252578 CredL1 0.18499925 0.06565456 2.81777 0.013693 * BoBCL1 -0.07682710 0.03406916 -2.25503 0.040666 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 54.44479 on 14 degrees of freedom (3 observations deleted due to missingness) Multiple R-squared: 0.9324472, Adjusted R-squared: 0.903496 F-statistic: 32.20757 on 6 and 14 DF, p-value: 2.046024e-07 Without NAs ydat3-na.omit(ydat2) regCred-lm(DCred1~DCred2+DCred3+DBoBC2+DBoBC3+CredL1+BoBCL1,data=ydat3) summary(regCred) Call: lm(formula = DCred1 ~ DCred2 + DCred3 + DBoBC2 + DBoBC3 + CredL1 + BoBCL1, data = ydat3) Residuals: Min 1Q Median 3Q Max -124.988463 -33.133975 7.971083 23.607953 76.813601 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -538.61375718 205.91179535 -2.61575 0.020344 * DCred2 0.96401908 0.15623660 6.17025 2.4337e-05 *** DCred3 -0.25720355 0.08983607 -2.86303 0.012524 * DBoBC2 -0.11222347 0.07828182 -1.43358 0.173646 DBoBC3 0.04564621 0.03825169 1.19331 0.252578 CredL1 0.18499925 0.06565456 2.81777 0.013693 * BoBCL1 -0.07682710 0.03406916 -2.25503 0.040666 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 54.44479 on 14 degrees of freedom Multiple R-squared: 0.9324472, Adjusted R-squared: 0.903496 F-statistic: 32.20757 on 6 and 14 DF, p-value: 2.046024e- #Same result Not sure what you meant by (This is good, but couldn't I code the process for my 15 variable model?) A.K. From: Lekgatlhamang, lexi Setlhare lexisetlh...@yahoo.com To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Monday, July 2, 2012 5:13 AM Subject: Re: [R] Adjusting length of series Hi David and AK, I have been trying to implement your suggestions since yesterday, but I encountered some challenges. As for David's suggestions, I could only implement it after some modifications. Using an abridged version of my data, I dpud my dataset and then show my steps below. dput(ydata) structure(c(68.10004, -34.80002, 90.39996, 54.60004, -172.3, 51.80002, 175, 79.80002, -35.70007, 130.5, 116.8, -67.5, 164.5, 514.8, -326.1, 98.40005, 160.2, 53.19998, 283.6, -111.6, 127.8, -17.30002, 286.3, NA, NA, -102.9001, 125.2, -35.79993, -226.9001, 224.1, 123.2, -95.19998, -115.5001, 166.2001, -13.69998, -184.3, 232, 350.3, -840.9001, 424.5001, 61.79993, -107, 230.4001, -395.2001, 239.4001, -145.1, 303.6, NA, NA, NA, 228.1, -160., -191.1001, 451.0001, -100.9001, -218.4, -20.30011, 281.7002, -179.9001, -170.6, 416.3, 118.3, -1191.2, 1265.4, -362.7002, -168.7999, 337.4001, -625.6001, 634.6001, -384.5001, 448.7001, NA, NA, -164.45784099, 17.079353995, 95.976788009, 680.23816699, -491.34869099, -274.694009, -256.332907, 469.62296, -146.431891, -41.077201995, -106.970104, 757.68826399, -1689.214533, 2320.098952, -1446.97942,
Re: [R] turning R expressions into functions?
Dear Thomas, Many thanks for your answer. On Sat, Jun 30, 2012 at 10:22:52AM +0900, Thomas Lumley wrote: 1) good: If I run the following using Rscript test1 - function(e1) { e1 - substitute(e1) FuncIt(100, e1) } f - test1(rnorm(1)) print(f) then I get the following output: function () { for (funcit.i in 1:100) { rnorm(1) } } environment: 0x102260c28 This is what I want. But why do I need the extra substitute in test1? I only found by experiment that this is needed. You don't. You need an extra quote() in the argument. [...] You can get around this using substitute(), which extracts the unevaluated code from the formal argument, but it's probably a bad idea, since the user of the function should expect all the arguments to be evaluated. I want my final function to work like system.time, i.e. the user should not have to type quote() all the time when calling the top-level function of my measuring mechanism. Is there a way to do the quoting inside the top-level function call? Many thanks, Jochen Voss -- http://seehuhn.de/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply with multiple conditions
Jean, that's exactly what it should be, but yes I copied and pasted from your email so I don't see how I could have introduced an error in there paul On 2 July 2012 15:57, Jean V Adams [via R] ml-node+s789695n4635144...@n4.nabble.com wrote: Paul, Are you submitting the exact code that I included in my previous e-mail? When I submit that code, I get this ... chrom chromStart chromEnd name cumsum bin 1 chr1 1008910309 ZBTB33 10089 1 2 chr1 1013210536 TAF7_(SQ-8) 20221 1 3 chr2 1013310362 Pol2-4H8 30354 2 4 chr2 1014810418 MafF_(M8194) 40502 2 5 chr2 210382 210578 ZBTB33 50884 3 6 chr2 216132 216352 CTCF 67016 3 Jean Paul Guilhamon [hidden email] wrote on 07/02/2012 08:59:00 AM: Thanks for your reply Jean, I think your interpretation is correct but when I run your code I end up with the below dataframe and obviously the bins created there don't correspond to a chromStart change of 115341: chrom chromStart chromEnd name cumsum bin 1 chr1 1008910309 ZBTB33 10089 1 2 chr1 1013210536 TAF7_(SQ-8) 20221 2 3 chr2 1013310362 Pol2-4H8 30354 3 4 chr2 1014810418 MafF_(M8194) 40502 4 5 chr2 210382 210578 ZBTB33 50884 5 6 chr2 216132 216352 CTCF 67016 6 the first two rows should have the same bin number (same chrom, 115341 diff), then rows 34 should be in another bin (different chrom from rows 12, 115341 diff), and rows 56 in another one (same chrom but 115341 difference between row 4 and row 5). it seems the new.bin line of your code isn't quite doing what it should but I can't pinpoint the error there... Paul On 2 July 2012 14:19, Jean V Adams [hidden email] wrote: Paul, My interpretation is that you are trying to assign a new bin number to a row every time the variable chrom changes and every time the variable chromStart changes by 115341 or more. Is that right? If so, you don't need a loop at all. Check out the code below. I made a couple changes to the all.tf7 example data frame so that it would have two changes in bin number, one based on the chrom variable and one based on the chromStart variable. Jean all.tf7 - data.frame( chrom = c(chr1, chr1, chr2, chr2, chr2, chr2), chromStart = c(10089, 10132, 10133, 10148, 210382, 216132), chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352), name = c(ZBTB33, TAF7_(SQ-8), Pol2-4H8, MafF_(M8194), ZBTB33, CTCF), cumsum = c(10089, 20221, 30354, 40502, 50884, 67016), bin = rep(NA, 6) ) # assign a new bin every time chrom changes and every time chromStart changes by 115341 or more L - nrow(all.tf7) prev.chrom - c(NA, all.tf7$chrom[-L]) delta.start - c(NA, all.tf7$chromStart[-1] - all.tf7$chromStart[-L]) new.bin - is.na(prev.chrom) | all.tf7$chrom != prev.chrom | delta.start = 115341 all.tf7$bin - cumsum(new.bin) all.tf7 pguilha [hidden email] wrote on 07/02/2012 06:25:13 AM: Hello all, I have written a for loop to act on a dataframe with close to 3million rows and 6 columns and I would like to pass it to apply() to speed the process up (I let the loop run for 2 days before stopping it and it had only gone through 200,000 rows) but I am really struggling to find a way to pass the arguments. Below are the loop and the head of the dataframe I am working on. Any hints would be much appreciated, thank you! (I have searched for this but could not find any other posts doing quite what I want) Paul x-as.numeric(all.tf7[1,2]) for (i in 2:nrow(all.tf7)) { if (all.tf7[i,1]==all.tf7[i-1,1] (all.tf7[i,2]-x)115341) all.tf7[i,6]-all.tf7[i-1,6] else if (all.tf7[i,1]==all.tf7[i-1,1] (all.tf7[i,2]-x)=115341) { all.tf7[i,6]-(all.tf7[i-1,6]+1) x-as.numeric(all.tf7[i,2]) } else if (all.tf7[i,1]!=all.tf7[i-1,1]) { all.tf7[i,6]-(all.tf7[i-1,6]+1) x-as.numeric(all.tf7[i,2]) } } #the aim here is to attribute a bin number to each row so that I can then split the dataframe according to those bins. chrom chromStart chromEnd name cumsum bin chr1 10089 10309 ZBTB33 10089 1 chr1 10132 10536 TAF7_(SQ-8) 20221 1 chr1 10133 10362Pol2-4H8 30354 1 chr1 10148 10418 MafF_(M8194) 40502 1 chr1 10382 10578ZBTB33 50884 1 chr1 16132 16352CTCF 67016 1 [[alternative HTML version deleted]] __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,
Re: [R] turning R expressions into functions?
Dear Greg, many thanks for your anwer. On Sat, Jun 30, 2012 at 11:39:07AM -0600, Greg Snow wrote: Look at the replicate function, it takes an expression (does not need a function) and runs that expression the specified number of times. Will that accomplish what you want without needing to worry about substitute, quote, eval, etc.? Yes, this is very similar to what I want to achieve. One of the main differences is that 'replicate' builds up a list of all call results, and for my measurements I want to avoid the resulting (time and memory) overhead. But I did look at the implementation of 'replicate' and this is where I took the trick of using eval.parent and substitute from. All the best, Jochen -- http://seehuhn.de/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] turning R expressions into functions?
Dear Dirk, On Sat, Jun 30, 2012 at 01:28:13PM -0500, Dirk Eddelbuettel wrote: And also look at the existing benchmark packages 'rbenchmark' and 'microbenchmark': Many thanks for pointing out these packages, I wasn't aware of these. R library(microbenchmark) R x - 5; microbenchmark( 1/x, x^-1 ) Unit: nanoseconds expr minlq medianuq max 1 1/x 296 322.5341 364.0 6298 2 x^-1 516 548.5570 591.5 5422 My own code (current version attached, comments would be very welcome) is much more chatty: R source(timeit.R) R x - 5; TimeIt(1/x, x^-1) tuning ... measuring 10*1466753 samples for each expression ... |==| 100% execution time comparison: 1/x(0.000571 ± 1.48e-05) ms/call x^-1 (0.000864 ± 9.69e-06) ms/call CI for difference: [-0.00031, -0.000275] ms/call '1/x' is about 33.9% faster (p=2.75e-11) One of the things I would love to add to my package would be the ability to compare more than two expressions in one call. But unfortunately, I haven't found out so far whether (and if so, how) it is possible to extract the elements of a ... object without evaluating them. Many thanks, Jochen -- http://seehuhn.de/ # timeit.R - pairwise comparison for the execution time of R expressions # # Copyright (c) 2012 Jochen Voss v...@seehuhn.de # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see http://www.gnu.org/licenses/. # # -- # # This file provides the R command 'TimeIt' to compare the execution # time of two R expressions. FuncIt - function(k, expr) { # Return a function which executes an expression k times. # # Args: # k: The number of times 'expr' is executed. # expr: An R expression. # # Returns: # An R function, executing 'expr' in a loop. k - as.numeric(k) expr - eval.parent(substitute(expr)) fn - eval(substitute(function() { for (funcit.i in 1:k) { expr } })) return(fn) } TuneIt - function(expr, max.seconds=1) { # Determine the approximate cost of calling an R expression in a # loop. This function tries loops of different length and the uses # linear interpolation to get the result. # # Args: # expr: The R expression to test. # max.seconds: How much time (approximately) to use for #measurements, in seconds. This should be much #larger than the resolution of 'system.time'. #Default is 1. # # Returns: # A vector 'x' of length 2, such that the execution time for 'k' # iterations is approximately 'x[0] + k * x[1]'. kk - c() tt - c() k - 1 repeat { f - FuncIt(k, expr) t - system.time(f())[1] kk - c(kk, k) tt - c(tt, t) if (t max.seconds / 3) break k - 2 * k } if (k 1) { fit - lm(tt ~ kk) return(coefficients(fit)) } else { return(c(0, tt)) } return(k) } TimeIt - function(ex1, ex2, total.time=30, verbose=T) { # Compare the execution time of two R expressions. # # Args: # ex1: The first R expression to evaluate # ex2: The second R expression to evaluate # total.time: How much time (approximately) to spend on # measuring, in seconds. Longer times lead to more # accurate measurements and allow to detect smaller # differences in run time. Default is 30 seconds. # # Returns: # An object of class 'TimeIt', summarising the difference in # execution time of the two expressions. start - proc.time()[1] ex1 - substitute(ex1) ex2 - substitute(ex2) if (verbose) { cat(tuning ...\n) } # Use at most 20% or 10 seconds (whatever is smaller) of our time # budget for tuning. tune.time - min(.2*total.time, 10) c1 - TuneIt(ex1, tune.time / 2) c2 - TuneIt(ex2, tune.time / 2) mid - proc.time()[1] - start total.time - total.time - mid block.min - 1 block.target - total.time^(1/4) * block.min^(3/4) c - c1 + c2 block.k - max(round((block.target - c[1]) / c[2]), 1) f1 - FuncIt(block.k, ex1) f2 - FuncIt(block.k, ex2) ex1.time - c1[1] + block.k * c1[2] ex2.time - c2[1] + block.k * c2[2] pair.time - ex1.time + ex2.time n - max(round(total.time / pair.time), 2) if (verbose) { cat(measuring , n, *, block.k, samples for each expression ...\n, sep=) flush.console() progress -
[R] vectorization with subset?
Hello, I have a data frame (68,000 rows) of scores (V4) for a series of [genomic] coordinates ranges (V2 to V3). I also have a data frame (1.2 million rows) of single [genomic] coordinates. For each genomic coordinate (in coord), I would like to determine the average of all scores whose genomic ranges (in scores) encompass the coordinate (in coord). To accomplish this, I tried: The function works, but is extremely slow. It would take about 4 days for this to finish for a single data set, and I have 64 data sets. Why does the rate at which coordinate averages are calculated increase when coord is smaller, but not when scores is smaller? How can I accomplish the same thing more efficiently? Thanks, Dan -- View this message in context: http://r.789695.n4.nabble.com/vectorization-with-subset-tp4635156.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adjusting length of series
Noted David, and thanks very much.  Lexi From: David Winsemius dwinsem...@comcast.net Sent: Monday, July 2, 2012 4:26 PM Subject: Re: [R] Adjusting length of series On Jul 2, 2012, at 5:13 AM, Lekgatlhamang, lexi Setlhare wrote: Hi David and AK, I have been trying to implement your suggestions since yesterday, but I encountered some challenges. As for David's suggestions, I could only implement it after some modifications. Using an abridged version of my data, I dpud my dataset and then show my steps below. Well, your initial question (why the $ referencing did not work) is now answered. This is not a dataframe but rather a 'ts' classed object and there is no `$` method for such objects. They are really matrices with some extra attributes. ydata$BoBCL1 Error in ydata$BoBCL1 : $ operator is invalid for atomic vectors As I understood it you were able to get useful analyses using the formula methods for lm on these objects, but were just having difficulty with the $ operator. So the answer is . don't do that. --David. dput(ydata) structure(c(68.10004, -34.80002, 90.39996, 54.60004, -172.3, 51.80002, 175, 79.80002, -35.70007, 130.5, 116.8, -67.5, 164.5, 514.8, -326.1, 98.40005, 160.2, 53.19998, 283.6, -111.6, 127.8, -17.30002, 286.3, NA, NA, -102.9001, 125.2, -35.79993, -226.9001, 224.1, 123.2, -95.19998, -115.5001, 166.2001, -13.69998, -184.3, 232, 350.3, -840.9001, 424.5001, 61.79993, -107, 230.4001, -395.2001, 239.4001, -145.1, 303.6, NA, NA, NA, 228.1, -160., -191.1001, 451.0001, -100.9001, -218.4, -20.30011, 281.7002, -179.9001, -170.6, 416.3, 118.3, -1191.2, 1265.4, -362.7002, -168.7999, 337.4001, -625.6001, 634.6001, -384.5001, 448.7001, NA, NA, -164.45784099, 17.079353995, 95.976788009, 680.23816699, -491.34869099, -274.694009, -256.332907, 469.62296, -146.431891, -41.077201995, -106.970104, 757.68826399, -1689.214533, 2320.098952, -1446.97942, 516.384521, -375.27765099, 293.86702999, 417.845195, 278.198807, -968.59203399, -314.195986, NA, NA, NA, 181.53719499, 78.897434013, 584.26137898, -1171.586858, 216.65468199, 18.361101998, 725.955867, -616.054851, 105.35468901, -65.892902005, 864.65836799, -2446.902797, 4009.313485, -3767.078372, 1963.363941, -891.66217199, 669.14468099, 123.978165, -139.646388, -1246.790841, 654.396048, NA, 4937, 5005.1, 4970.3, 5060.7, 5115.3, 4943, 4994.8, 5169.8, 5249.6, 5213.9, 5344.4, 5461.2, 5393.7, 5558.2, 6073, 5746.9, 5845.3, 6005.5, 6058.7, 6342.3, 6230.7, 6358.5, 6341.2, 6627.5, 4187.5, 4296.004835, 4240.051829, 4201.178177, 4258.281313, 4995.622616, 5241.615228, 5212.913831, 4927.879527, 5112.468183, 5150.624948, 5147.704511, 5037.81397, 5685.611693, 4644.194883, 5922.877025, 5754.579747, 6102.66699, 6075.476582, 6342.153204, 7026.675021, 7989.395645, 7983.524235, 7663.456839), .Dim = c(24L, 7L), .Dimnames = list(   NULL, c(DCred1, DCred2, DCred3, DBoBC2, DBoBC3,   CredL1, BoBCL1)), .Tsp = c(2001.083, 2003, 12 ), class = c(mts, ts)) NB: the NAs in the dataset emanated from lagging or differencing the series David's suggestion  df-data.frame(DCred1,DCred2,DCred3,DBoBC2,DBoBC3,CredL1,BoBCL1) Error in data.frame(DCred1, DCred2, DCred3, DBoBC2, DBoBC3, CredL1, BoBCL1) :  arguments imply differing number of rows: 23, 22, 21, 24 So I modified as follows: length(DCred3) # finding the minimum length of various series [1] 21 # Then dataframe construction dframe- data.frame(Dcre1=DCred1[1:21],Dcre2=DCred2[1:21],Dcre3=DCred3[1:21], + Dbobc2=DBoBC2[1:21],Dbobc3=DBoBC3[1:21],CredL=CredL1[1:21],BoBCL=BoBCL1[1:21]) # Then estimated regression regCred- lm(Dcre1~Dcre2+Dcre3+Dbobc2+Dbobc3+CredL+BoBCL, data=dframe) summary(regCred) # Worked well as shown by results below Call: lm(formula = Dcre1 ~ Dcre2 + Dcre3 + Dbobc2 + Dbobc3 + CredL +   BoBCL, data = dframe) Residuals:   Min   1Q Median   3Q  Max -69.516 -27.695 -8.085 13.851 107.276 Coefficients:        Estimate Std. Error t value Pr(|t|) (Intercept) 159.32304 157.15209 1.014 0.327873 Dcre2    -0.75527  0.17262 -4.375 0.000634 *** Dcre3    -0.21006  0.08656 -2.427 0.029329 * Dbobc2    0.05111  0.06565 0.779 0.449197 Dbobc3    0.03106  0.03510 0.885 0.391108 CredL    -0.10967  0.04933 -2.223 0.043177 * BoBCL    0.09756  0.03097 3.150 0.007087 ** --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 Residual standard
Re: [R] R sub query
Hi, Either of these should work: m-matrix(c(.:0:0,0, .:2:0,2, .:194:193,1, .:56:0,56, .:58:50,8, .:13:0,13, .:114:114,0, .:75:75,0), nrow=2) gsub(^\\.:[[:digit:]]+:,,m) [,1] [,2] [,3] [,4] [1,] 0,0 193,1 50,8 114,0 [2,] 0,2 0,56 0,13 75,0 gsub(^\\.:\\d+:,,m) [,1] [,2] [,3] [,4] [1,] 0,0 193,1 50,8 114,0 [2,] 0,2 0,56 0,13 75,0 A.K. - Original Message - From: Sarah Auburn saub...@yahoo.com To: r-help@r-project.org r-help@r-project.org Cc: Sent: Monday, July 2, 2012 4:15 AM Subject: [R] R sub query Hello, I would like to substitute a substring of characters defined by a specific start and end sequence. i.e. in the example matrix below, I would like to substitute .:X: with , where X varies in sequence... m-matrix(c(.:0:0,0, .:2:0,2, .:194:193,1, .:56:0,56, .:58:50,8, .:13:0,13, .:114:114,0, .:75:75,0), nrow=2) output required: [,1] [,2] [,3] [,4] [1,] 0,0 193,1 50,8 114,0 [2,] 0,2 0,56 0,13 75,0 Thank you for any help Sarah [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] residuals from lm
FYI: As you are likely thinking: this doesn't belong here (It just occurred to me), I am questioning what I did, not the output of lm() But if someone knows why I am wrong please let me know. chuck.01 wrote Hi, I was playing around with something else and I noticed this matrix code for residuals in a linear model doesn't say what lm() says. Please tell me if I am completely misguided here. data(mtcars) Y - as.matrix(mtcars[,1]) X - as.matrix(mtcars[,c(2:11)]) # shouldnt this: H - X %*% solve(t(X) %*% X) %*% t(X) (diag(dim(H)[1]) - H) %*% Y # be equal to this: residuals(lm(Y~X)) # ??? # thanks -- View this message in context: http://r.789695.n4.nabble.com/residuals-from-lm-tp4635155p4635161.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adjusting length of series
Thanks very much A.K. I have to admit that my problem was not clearly stated, with the structure of my data provided. Now all is well.  Cheers Lexi Cc: R help r-help@r-project.org Sent: Monday, July 2, 2012 4:40 PM Subject: Re: [R] Adjusting length of series Hello, The class of your data is not dataframe. Suppose I call your data as ydat1 str(ydat1)  mts [1:24, 1:7] 68.1 -34.8 90.4 54.6 -172.3 ...  - attr(*, dimnames)=List of 2  ..$ : NULL  ..$ : chr [1:7] DCred1 DCred2 DCred3 DBoBC2 ...  - attr(*, tsp)= num [1:3] 2001 2003 12  - attr(*, class)= chr [1:2] mts ts ydat2-data.frame(ydat1) str(ydat2) 'data.frame':   24 obs. of 7 variables:  $ DCred1: num 68.1 -34.8 90.4 54.6 -172.3 ...  $ DCred2: num NA -102.9 125.2 -35.8 -226.9 ...  $ DCred3: num NA NA 228 -161 -191 ...  $ DBoBC2: num NA -164.5 17.1 96 680.2 ...  $ DBoBC3: num NA NA 181.5 78.9 584.3 ...  $ CredL1: num 4937 5005 4970 5061 5115 ...  $ BoBCL1: num 4188 4296 4240 4201 4258 ... #Since you wanted only to do lm for these columns, I guess it doesn't really matter whether you have month and year in the dataset.  #With NAs  regCred-lm(DCred1~DCred2+DCred3+DBoBC2+DBoBC3+CredL1+BoBCL1,data=ydat2) summary(regCred) Call: lm(formula = DCred1 ~ DCred2 + DCred3 + DBoBC2 + DBoBC3 + CredL1 +    BoBCL1, data = ydat2) Residuals:        Min         1Q     Median         3Q        Max -124.988463 -33.133975   7.971083  23.607953  76.813601 Coefficients:                 Estimate   Std. Error t value  Pr(|t|)   (Intercept) -538.61375718 205.91179535 -2.61575  0.020344 * DCred2        0.96401908   0.15623660 6.17025 2.4337e-05 *** DCred3       -0.25720355   0.08983607 -2.86303  0.012524 * DBoBC2       -0.11222347   0.07828182 -1.43358  0.173646   DBoBC3        0.04564621   0.03825169 1.19331  0.252578   CredL1        0.18499925   0.06565456 2.81777  0.013693 * BoBCL1       -0.07682710   0.03406916 -2.25503  0.040666 * --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 Residual standard error: 54.44479 on 14 degrees of freedom  (3 observations deleted due to missingness) Multiple R-squared: 0.9324472,   Adjusted R-squared: 0.903496 F-statistic: 32.20757 on 6 and 14 DF, p-value: 2.046024e-07 Without NAs ydat3-na.omit(ydat2) regCred-lm(DCred1~DCred2+DCred3+DBoBC2+DBoBC3+CredL1+BoBCL1,data=ydat3) summary(regCred) Call: lm(formula = DCred1 ~ DCred2 + DCred3 + DBoBC2 + DBoBC3 + CredL1 +    BoBCL1, data = ydat3) Residuals:        Min         1Q     Median         3Q        Max -124.988463 -33.133975   7.971083  23.607953  76.813601 Coefficients:                 Estimate   Std. Error t value  Pr(|t|)   (Intercept) -538.61375718 205.91179535 -2.61575  0.020344 * DCred2        0.96401908   0.15623660 6.17025 2.4337e-05 *** DCred3       -0.25720355   0.08983607 -2.86303  0.012524 * DBoBC2       -0.11222347   0.07828182 -1.43358  0.173646   DBoBC3        0.04564621   0.03825169 1.19331  0.252578   CredL1        0.18499925   0.06565456 2.81777  0.013693 * BoBCL1       -0.07682710   0.03406916 -2.25503  0.040666 * --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 Residual standard error: 54.44479 on 14 degrees of freedom Multiple R-squared: 0.9324472,   Adjusted R-squared: 0.903496 F-statistic: 32.20757 on 6 and 14 DF, p-value: 2.046024e- #Same result Not sure what you meant by (This is good, but couldn't I code the process for my 15 variable model?) A.K. Cc: R help r-help@r-project.org Sent: Monday, July 2, 2012 5:13 AM Subject: Re: [R]  Adjusting length of series Hi David and AK, I have been trying to implement your suggestions since yesterday, but I encountered some challenges. As for David's suggestions, I could only implement it after some modifications. Using an abridged version of my data, I dpud my dataset and then show my steps below. dput(ydata) structure(c(68.10004, -34.80002, 90.39996, 54.60004, -172.3, 51.80002, 175, 79.80002, -35.70007, 130.5, 116.8, -67.5, 164.5, 514.8, -326.1, 98.40005, 160.2, 53.19998, 283.6, -111.6, 127.8, -17.30002, 286.3, NA, NA, -102.9001, 125.2, -35.79993, -226.9001, 224.1, 123.2, -95.19998, -115.5001, 166.2001, -13.69998, -184.3, 232, 350.3, -840.9001, 424.5001, 61.79993, -107, 230.4001,
Re: [R] Adjusting length of series
Hi, One more thing, ydat1: original dataset ydat2-data.frame(ydat1) #Not sure ,how you did this step on original data because:: dframe- data.frame(Dcre1=DCred1[1:21],Dcre2=DCred2[1:21],Dcre3=DCred3[1:21], Dbobc2=DBoBC2[1:21],Dbobc3=DBoBC3[1:21],CredL=CredL1[1:21],BoBCL=BoBCL1[1:21]) I am getting errors for that step, when I used ydat1. head(ydat1) [1] 68.1 -34.8 90.4 54.6 -172.3 51.8 head(ydat2) DCred1 DCred2 DCred3 DBoBC2 DBoBC3 CredL1 BoBCL1 1 68.1 NA NA NA NA 4937.0 4187.500 2 -34.8 -102.9 NA -164.45784 NA 5005.1 4296.005 3 90.4 125.2 228.1 17.07935 181.53719 4970.3 4240.052 4 54.6 -35.8 -161.0 95.97679 78.89743 5060.7 4201.178 5 -172.3 -226.9 -191.1 680.23817 584.26138 5115.3 4258.281 6 51.8 224.1 451.0 -491.34869 -1171.58686 4943.0 4995.623 #I analyzed [1:21] again in ydat2. dframe-data.frame(Dcre1=ydat2$DCred1[1:21],Dcre2=ydat2$DCred2[1:21],Dcre3=ydat2$DCred3[1:21],Dbobc2=ydat2$DBoBC2[1:21],Dbobc3=ydat2$DBoBC3[1:21],CredL=ydat2$CredL1[1:21],BoBCL=ydat2$BoBCL1[1:21]) But, the results are bit different than in my earlier post, because, here the NAs are still present in different rows. So, the observations in those rows will be deleted while it is analyzed. regCred- lm(Dcre1~Dcre2+Dcre3+Dbobc2+Dbobc3+CredL+BoBCL, data=dframe) summary(regCred) Call: lm(formula = Dcre1 ~ Dcre2 + Dcre3 + Dbobc2 + Dbobc3 + CredL + BoBCL, data = dframe) Residuals: Min 1Q Median 3Q Max -118.687 -25.568 -5.334 35.035 69.992 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -485.42427 209.47952 -2.317 0.038958 * Dcre2 0.95097 0.18156 5.238 0.000209 *** Dcre3 -0.28676 0.10787 -2.658 0.020852 * Dbobc2 -0.09512 0.09334 -1.019 0.328278 Dbobc3 0.03199 0.04933 0.648 0.528936 CredL 0.14825 0.07193 2.061 0.061645 . BoBCL -0.04844 0.04333 -1.118 0.285540 --- A.K. From: Lekgatlhamang, lexi Setlhare lexisetlh...@yahoo.com To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Monday, July 2, 2012 11:43 AM Subject: Re: [R] Adjusting length of series Thanks very much A.K. I have to admit that my problem was not clearly stated, with the structure of my data provided. Now all is well. Cheers Lexi From: arun smartpink...@yahoo.com To: Lekgatlhamang, lexi Setlhare lexisetlh...@yahoo.com Cc: R help r-help@r-project.org Sent: Monday, July 2, 2012 4:40 PM Subject: Re: [R] Adjusting length of series Hello, The class of your data is not dataframe. Suppose I call your data as ydat1 str(ydat1) mts [1:24, 1:7] 68.1 -34.8 90.4 54.6 -172.3 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:7] DCred1 DCred2 DCred3 DBoBC2 ... - attr(*, tsp)= num [1:3] 2001 2003 12 - attr(*, class)= chr [1:2] mts ts ydat2-data.frame(ydat1) str(ydat2) 'data.frame': 24 obs. of 7 variables: $ DCred1: num 68.1 -34.8 90.4 54.6 -172.3 ... $ DCred2: num NA -102.9 125.2 -35.8 -226.9 ... $ DCred3: num NA NA 228 -161 -191 ... $ DBoBC2: num NA -164.5 17.1 96 680.2 ... $ DBoBC3: num NA NA 181.5 78.9 584.3 ... $ CredL1: num 4937 5005 4970 5061 5115 ... $ BoBCL1: num 4188 4296 4240 4201 4258 ... #Since you wanted only to do lm for these columns, I guess it doesn't really matter whether you have month and year in the dataset. #With NAs regCred-lm(DCred1~DCred2+DCred3+DBoBC2+DBoBC3+CredL1+BoBCL1,data=ydat2) summary(regCred) Call: lm(formula = DCred1 ~ DCred2 + DCred3 + DBoBC2 + DBoBC3 + CredL1 + BoBCL1, data = ydat2) Residuals: Min 1Q Median 3Q Max -124.988463 -33.133975 7.971083 23.607953 76.813601 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -538.61375718 205.91179535 -2.61575 0.020344 * DCred2 0.96401908 0.15623660 6.17025 2.4337e-05 *** DCred3 -0.25720355 0.08983607 -2.86303 0.012524 * DBoBC2 -0.11222347 0.07828182 -1.43358 0.173646 DBoBC3 0.04564621 0.03825169 1.19331 0.252578 CredL1 0.18499925 0.06565456 2.81777 0.013693 * BoBCL1 -0.07682710 0.03406916 -2.25503 0.040666 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 54.44479 on 14 degrees of freedom (3 observations deleted due to missingness) Multiple R-squared: 0.9324472, Adjusted R-squared: 0.903496 F-statistic: 32.20757 on 6 and 14 DF, p-value: 2.046024e-07 Without NAs ydat3-na.omit(ydat2) regCred-lm(DCred1~DCred2+DCred3+DBoBC2+DBoBC3+CredL1+BoBCL1,data=ydat3) summary(regCred) Call: lm(formula = DCred1 ~ DCred2 + DCred3 + DBoBC2 + DBoBC3 + CredL1 + BoBCL1, data = ydat3) Residuals: Min 1Q Median 3Q Max -124.988463 -33.133975 7.971083 23.607953
[R] residuals from lm
Hi, I was playing around with something else and I noticed this matrix code for residuals in a linear model doesn't say what lm() says. Please tell me if I am completely misguided here. data(mtcars) Y - as.matrix(mtcars[,1]) X - as.matrix(mtcars[,c(2:11)]) # shouldnt this: H - X %*% solve(t(X) %*% X) %*% t(X) (diag(dim(H)[1]) - H) %*% Y # be equal to this: residuals(lm(Y~X)) # ??? # thanks -- View this message in context: http://r.789695.n4.nabble.com/residuals-from-lm-tp4635155.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predicting expected number of events using a coxph model
Peter Dalgaard-2 wrote I fit a coxph model: coxphfit - coxph(Surv(sampledLifetime, !sampledCensoredQ) ~ curpbc6 + prevpbc6, sampledTimeSeries) Now I'm trying to predict the expected number of events using a new dataset. The documentation suggests that coxPred - predict(coxphfit, newdata = testTimeSeries, type=expected) will do what I want, but I get the error Error in model.frame.default(data = testTimeSeries, formula = Surv(sampledLifetime, : variable lengths differ (found for 'curpbc6') when I do this. The dataframes sampledTimeSeries and testTimeSeries were constructed by taking rows from a larger dataframe, so they have the same data. What am I doing incorrectly? Most likely referring to a variable not in testTimeSeries. (I kind of suspect that unlike predict.lm, predict.coxph does not ignore the left hand side of formulas. Does testTimeSeries contain a sampledLifetime column?) No, I did not have the lifetime and censored data in the dataframe. Per your idea, I put the sampledLifetime and and sampledCensoredQ variables in the dataframe sampledTimeSeries and left the rest of the code the same. Now when I try with the new data set, coxPred - predict(coxphfit, newdata = testTimeSeries, type=expected) I get different errors. If I use testTimeSeries without the lifetime and censor indicator columns (which shouldn't be required for prediction), then i get the same error as before. If I put in these columns, then I get the error Error in predict.coxph(coxphfit, newdata = testTimeSeries, type = expected, : object 'x' not found -- View this message in context: http://r.789695.n4.nabble.com/predicting-expected-number-of-events-using-a-coxph-model-tp4634935p4635168.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error() model is singular - what does that mean
WARNING: Not tested in the absence of data provided by dput() to allow easy input into R 1. Change the names of the inputs by removing the dash: This is not a legitimate R name and c/sh ould be causing problems in the aov() call since the names are not quoted. 2. The model specification is wrong. It should be: aov(Correct~TaskKind*DataKind+Error(Subject),data=allDataRaw.xp) -- Bert On Mon, Jul 2, 2012 at 6:04 AM, Jessica Streicher j.streic...@micromata.dewrote: Also, try googling for - R model is singular - , there seem to have been a lot of people with that particular error. On 02.07.2012, at 14:56, Jessica Streicher wrote: Just looking at it i would try renaming Task-Kind, Data-Kind an Time-Taken Those are ambiguous in the Formula. Task-Kind vs Task - Kind Though that might not be the error at hand :) On 02.07.2012, at 14:15, zetwal wrote: Hello I have some test data that looks like that from a within subject experiment. Subject Task-KindData-Kind Time-Taken Correct 1A Data1 5 1 1A Data1 3 0 1A Data1 1 1 1A Data2 8 1 1A Data2 7 0 1A Data2 5 0 1A Data3 2 1 1A Data3 7 0 1A Data350 1A Data360 1B Data1 3 1 1B Data1 1 1 1B Data1 3 0 1B Data2 9 0 1B Data2 8 1 1B Data2 5 0 1B Data3 2 1 1B Data3 7 2 1B Data353 1B Data360 1C Data1 3 1 1C Data1 1 1 1C Data1 3 0 1C Data2 9 0 1C Data2 8 1 1C Data2 5 0 1C Data3 2 1 1C Data3 7 2 1C Data353 1C Data360 2A Data1 5 1 2A Data1 3 0 2A Data1 1 1 2A Data2 8 1 2A Data2 7 0 2A Data2 5 0 2A Data3 2 1 2A Data3 7 0 2A Data350 2A Data360 2B Data1 3 1 2B Data1 1 1 2B Data1 3 0 2B Data2 9 0 2B Data2 8 1 2 B Data2 5 0 2B Data3 2 1 2B Data3 7 2 2B Data353 2B Data360 2C Data1 3 1 2C Data1 1 1 2C Data1 3 0 2C Data2 9 0 2C Data2 8 1 2C Data2 5 0 2C Data3 2 1 2C Data3 7 2 2C Data353 2C Data360 . . . some notes: there are 20 subjects there are 5 different kinds of tasks There are 5 different kinds of data and there are several different variations for a certain kind of task and kind of data which is why for Subject = 1 Task-Kind=A and Data-Kind=Data1 we have 3 different results. The measured parameters are time to complete the task and whether it was correct or not (0 implies correct and 1 implies not correct) I am computing the anova as follows: aov.ex = aov(Correct~Task-Kind*Data-Kind+Error(Subject/(Task-Kind*Data-Kind)),data=allDataRaw.xp) since I want to see how the result is affected by the different kinds of data as well as the the kind of task and I get a warning message saying: Error() model is singular I would be very grateful if someone could please tell me what does this mean. Thanks Pascal -- View this message in context: http://r.789695.n4.nabble.com/Error-model-is-singular-what-does-that-mean-tp4635103.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version
[R] 'init.win' error when installing from source
Dear R People: I'm installing R 2-.15.1 on a Windows 32 bit machine from source. I'm getting a strange error about init.win (please see below) Does this look familiar to anyone, please? Thanks, Erin Microsoft Windows [Version 6.1.7600] Copyright (c) 2009 Microsoft Corporation. All rights reserved. c:\R\R-2.15.1\src\gnuwin32make all recommended make all recommended make[1]: `MkRules' is up to date. make[4]: Nothing to be done for `svnonly'. installing C headers make[2]: Nothing to be done for `all'. make[2]: `libRblas.dll.a' is up to date. make[5]: Nothing to be done for `svnonly'. installing C headers make --no-print-directory -C ../extra/intl CFLAGS='-O3 -Wall -pedantic -mtune=core2' -f Makefile.win make --no-print-directory -C ../appl CFLAGS='-O3 -Wall -pedantic -mtune=core2' FFLAGS='-O3 -mtune=core2' -f Makefile.win make --no-print-directory -C ../nmath CFLAGS='-O3 -Wall -pedantic -mtune=core2' FFLAGS='-O3 -mtune=core2' -f Makefile.win make --no-print-directory -C ../main CFLAGS='-O3 -Wall -pedantic -mtune=core2' FFLAGS='-O3 -mtune=core2' malloc-DEFS='-DLEA_MALLOC' -f Makefile.win make --no-print-directory -C ./getline CFLAGS='-O3 -Wall -pedantic -mtune=core2' make[4]: `gl.a' is up to date. make -f Makefile.win makeMakedeps make -f Makefile.win libpcre.a make[5]: `libpcre.a' is up to date. make[4]: Nothing to be done for `all'. make -f Makefile.win makeMakedeps make -f Makefile.win libtre.a make[5]: `libtre.a' is up to date. make[4]: Nothing to be done for `all'. make[5]: `stamp' is up to date. make[5]: `liblzma.a' is up to date. make[3]: `R.dll' is up to date. cp R.dll ../../bin/i386 make[3]: Nothing to be done for `all'. make --no-print-directory -C front-ends make[2]: `COPYRIGHTS' is up to date. make --no-print-directory -C ../modules -f Makefile.win \ CFLAGS='-O3 -Wall -pedantic -mtune=core2' FFLAGS='-O3 -mtune=core2' make[5]: *** No rule to make target `init_win.o', needed by `../../../bin/i386/Rlapack.dll'. Stop. make[4]: *** [all] Error 2 make[3]: *** [all] Error 1 make[2]: *** [rmodules] Error 2 make[1]: *** [rbuild] Error 2 make: *** [all] Error 2 c:\R\R-2.15.1\src\gnuwin32 Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vectorization with subset?
On Jul 2, 2012, at 12:15 PM, dlv04c wrote: Hello, I have a data frame (68,000 rows) of scores (V4) for a series of [genomic] coordinates ranges (V2 to V3). I also have a data frame (1.2 million rows) of single [genomic] coordinates. For each genomic coordinate (in coord), I would like to determine the average of all scores whose genomic ranges (in scores) encompass the coordinate (in coord). To accomplish this, I tried: The function works, but is extremely slow. It would take about 4 days for this to finish for a single data set, and I have 64 data sets. Why does the rate at which coordinate averages are calculated increase when coord is smaller, but not when scores is smaller? How can I accomplish the same thing more efficiently? You probably need to start by reading the vignettes for the IRanges package. It's difficult to be sure since you did not show the code for what you were doing currently. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply with multiple conditions
Paul, Try this (I changed some of the object names, but the meat of the code is the same): df - data.frame( chrom = c(chr1, chr1, chr2, chr2, chr2, chr2), chromStart = c(10089, 10132, 10133, 10148, 210382, 216132), chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352), name = c(ZBTB33, TAF7_(SQ-8), Pol2-4H8, MafF_(M8194), ZBTB33, CTCF), cumsum = c(10089, 20221, 30354, 40502, 50884, 67016) ) # assign a new bin every time chrom changes and every time chromStart changes by 115341 or more L - nrow(df) prev.chrom - c(NA, df$chrom[-L]) delta.start - c(NA, df$chromStart[-1] - df$chromStart[-L]) new.bin - is.na(prev.chrom) | df$chrom != prev.chrom | delta.start = 115341 df$bin - cumsum(new.bin) df pguilha paul.guilha...@gmail.com wrote on 07/02/2012 10:23:36 AM: Jean, that's exactly what it should be, but yes I copied and pasted from your email so I don't see how I could have introduced an error in there paul On 2 July 2012 15:57, Jean V Adams [via R] ml-node+s789695n4635144...@n4.nabble.com wrote: Paul, Are you submitting the exact code that I included in my previous e-mail? When I submit that code, I get this ... chrom chromStart chromEnd name cumsum bin 1 chr1 1008910309 ZBTB33 10089 1 2 chr1 1013210536 TAF7_(SQ-8) 20221 1 3 chr2 1013310362 Pol2-4H8 30354 2 4 chr2 1014810418 MafF_(M8194) 40502 2 5 chr2 210382 210578 ZBTB33 50884 3 6 chr2 216132 216352 CTCF 67016 3 Jean Paul Guilhamon [hidden email] wrote on 07/02/2012 08:59:00 AM: Thanks for your reply Jean, I think your interpretation is correct but when I run your code I end up with the below dataframe and obviously the bins created there don't correspond to a chromStart change of 115341: chrom chromStart chromEnd name cumsum bin 1 chr1 1008910309 ZBTB33 10089 1 2 chr1 1013210536 TAF7_(SQ-8) 20221 2 3 chr2 1013310362 Pol2-4H8 30354 3 4 chr2 1014810418 MafF_(M8194) 40502 4 5 chr2 210382 210578 ZBTB33 50884 5 6 chr2 216132 216352 CTCF 67016 6 the first two rows should have the same bin number (same chrom, 115341 diff), then rows 34 should be in another bin (different chrom from rows 12, 115341 diff), and rows 56 in another one (same chrom but 115341 difference between row 4 and row 5). it seems the new.bin line of your code isn't quite doing what it should but I can't pinpoint the error there... Paul On 2 July 2012 14:19, Jean V Adams [hidden email] wrote: Paul, My interpretation is that you are trying to assign a new bin number to a row every time the variable chrom changes and every time the variable chromStart changes by 115341 or more. Is that right? If so, you don't need a loop at all. Check out the code below. I made a couple changes to the all.tf7 example data frame so that it would have two changes in bin number, one based on the chrom variable and one based on the chromStart variable. Jean all.tf7 - data.frame( chrom = c(chr1, chr1, chr2, chr2, chr2, chr2), chromStart = c(10089, 10132, 10133, 10148, 210382, 216132), chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352), name = c(ZBTB33, TAF7_(SQ-8), Pol2-4H8, MafF_(M8194), ZBTB33, CTCF), cumsum = c(10089, 20221, 30354, 40502, 50884, 67016), bin = rep(NA, 6) ) # assign a new bin every time chrom changes and every time chromStart changes by 115341 or more L - nrow(all.tf7) prev.chrom - c(NA, all.tf7$chrom[-L]) delta.start - c(NA, all.tf7$chromStart[-1] - all.tf7$chromStart[-L]) new.bin - is.na(prev.chrom) | all.tf7$chrom != prev.chrom | delta.start = 115341 all.tf7$bin - cumsum(new.bin) all.tf7 pguilha [hidden email] wrote on 07/02/2012 06:25:13 AM: Hello all, I have written a for loop to act on a dataframe with close to 3million rows and 6 columns and I would like to pass it to apply() to speed the process up (I let the loop run for 2 days before stopping it and it had only gone through 200,000 rows) but I am really struggling to find a way to pass the arguments. Below are the loop and the head of the dataframe I am working on. Any hints would be much appreciated, thank you! (I have searched for this but could not find any other posts doing quite what I want) Paul x-as.numeric(all.tf7[1,2]) for (i in 2:nrow(all.tf7)) { if (all.tf7[i,1]==all.tf7[i-1,1] (all.tf7[i,2]-x)115341) all.tf7[i,6]-all.tf7[i-1,6] else if (all.tf7[i,1]==all.tf7[i-1,1] (all.tf7[i,2]-x)=115341) { all.tf7[i,6]-(all.tf7[i-1,6]+1) x-as.numeric(all.tf7[i,2]) } else if