Re: [R] (no subject)
AJ: This is something to learn a lesson from. A question, starved of preparation, can't help anybody to help you. James On Mar 25, 2012 9:00 PM, Rolf Turner rolf.tur...@xtra.co.nz wrote: On 26/03/12 00:18, Anjana Thampi wrote: How do you decompose inequality in R, say by gender? This has to be one of the most meaningless and ill-expressed questions I've ever seen on this list. And that's a high hurdle to clear. cheers, Rolf Turner __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help in replacing for llop
Hi I have records like like this X1 X2 State 34 72 state1 9 63 state1 49 31 state1 60 34 state1 80 73 state1 60 20 state2 59 87 state2 88 20 state2 71 66 state2 65 56 state2 59 16 state1 60 100 state2 I want to get the summarize value like mean median histogram for X1 and X2 based on state. I'm using FOR loop for this. Is there any method to remove for loop and use apply or any alternatives - Thanks in Advance Arun -- View this message in context: http://r.789695.n4.nabble.com/help-in-replacing-for-llop-tp4507939p4507939.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plot of function seems to cut off near edge of domain
Hello helpful R folks, I am simply trying to graph a quarter circle centered at the origin in the first quadrant. When I set the xlim of the plot to the radius of the circle, the plot appears correct. However, I'd like to see a slight extension of the axes beyond the domain of the function itself. When I do this, a portion of the plot seems to be missing by the edge of the domain. Here is the code for both of the plots: dev.off() plot.new() #Set up two-figure plot par(mfrow=c(1,2),pty='s') g-function(x){sqrt(2500-x^2)} #Figure 1, with xlim at the radius of the circle plot(g,axes=F,xlim=c(0,50),ylim=c(0,50)) axis(1,pos=0) axis(2,pos=0) #Figure 2, with xlim beyond the radius of the circle plot(g,axes=F,xlim=c(0,60),ylim=c(0,60)) axis(1,pos=0) axis(2,pos=0) Notice that the second graph doesn't appear to intersect the x-axis, while the first one does. Any ideas why that might be the case? Here's an image of what I see in case that's useful: http://r.789695.n4.nabble.com/file/n4507954/Cut_off_Quarter_Circle.png Thanks in advance for the help! -Chad Mills -- View this message in context: http://r.789695.n4.nabble.com/Plot-of-function-seems-to-cut-off-near-edge-of-domain-tp4507954p4507954.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] y needing more than 2 functions
Another approach: foo - function(t){ bm - ceiling(t/15) s - cut(t,breaks=15*(0:bm),labels=1:bm) s - as.numeric(levels(s)[s]) t^(s+1) } This idea can be generalised . cheers, Rolf Turner On 27/03/12 15:31, R. Michael Weylandt wrote: One way is to simply nest your ifelse()s: y- ifelse(t 15, t^2, ifelse(t 30, t^3, t^4)) Michael On Mon, Mar 26, 2012 at 7:48 PM, Aimee Jones al...@hoyamail.georgetown.edu wrote: Dear all, I'm aware if y has two separate functions (depending on the conditions of x) you can use the ifelse function to separate y into two separate functions depending on input. How do you do this if there a multiple different conditions for x? for example, y fits the following between t0 t15-function(t) t^2, y fits the following between t15 t30- function(t) t^3, y fits the following between t30t45---function(t) t^4 etc Thanks for any help you are able to give, yours sincerely, Aimee __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] y needing more than 2 functions
On Mon, Mar 26, 2012 at 07:48:07PM -0400, Aimee Jones wrote: Dear all, I'm aware if y has two separate functions (depending on the conditions of x) you can use the ifelse function to separate y into two separate functions depending on input. How do you do this if there a multiple different conditions for x? for example, y fits the following between t0 t15-function(t) t^2, y fits the following between t15 t30- function(t) t^3, y fits the following between t30 t45---function(t) t^4 etc Hi. Try the following. bounds - c(0, 15, 30, 45) x - seq(4, 44, length=51) valfunc - cbind(x^2, x^3, x^4) indfunc - findInterval(x, bounds) y - valfunc[cbind(1:nrow(valfunc), indfunc)] # verify range(x[y == x^2]) # [1] 4.0 14.4 range(x[y == x^3]) # [1] 15.2 29.6 range(x[y == x^4]) # [1] 30.4 44.0 Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] circles()
On 2012-03-26 05:09, John D. Muccigrosso wrote: I cannot for the life of me figure this out: What's the parameter to fill in with color circles made with circles()? col changes the line color, but all I see in the help is a reference to additional graphic parameters, and no examples via google. Hi John, Perhaps the draw.circle function (plotrix) will do what you want. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Export Created Variables to SPSS/.csv
Strassburger, Daniel Daniel_Strassburger at baylor.edu writes: I haven't been successful in converting my colleagues to the world of R yet they wish to share collected data so that they may analyze it in SPSS. I know how to write to an SPSS file and it opens fine, but my problem is that it only includes the existing data - none of the variables I created within R. Daniel, Apologies if this is too obvious, but have you appended the new variables that you have created in R to the original dataset before you export in SPSS or .csv format? This doesn't happen automatically in R. Somthing like: all.data - data.frame (imported.data, created.variable1, created.variable2, etc) Michael Bibo Queensland Health __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to test for the difference of means in population, please help
Dear all, Novice in statistics. I have 2 experimental conditions. Each condition has ~400 points as its response. Each condition is done in 4 repereats (so I have 2 x 400 x 4 points). I want to compare the means of two conditions and test whether they are same or not. Which test should I use? #populations c = matrix (sample (1:20,1600, replace= TRUE), 400 ,4) b = matrix (sample (1:20,1600, replace= TRUE), 400 ,4) #means of repeats c.mean= apply (c,2, mean) b.mean= apply (b,2,mean) #mean of experiment c.mean.all= mean (c) b.mean.all= mean (b) -- View this message in context: http://r.789695.n4.nabble.com/How-to-test-for-the-difference-of-means-in-population-please-help-tp4508089p4508089.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Testing for difference of distribution of two population(newbie)
Dear all, I am novice in statistics. I have two matrices (results from my experiments), each having ~400 point. I want to test whether the points come from a same distribution or not. Further, I have the results for each matrix (each experiment) in 4 replicates. What should I do? # Repeats of experiments are in different columns exp.1 = matrix (sample (1:20,1600, replace= TRUE), 400 ,4) exp.2 = matrix (sample (1:20,1600, replace= TRUE), 400 ,4) Thanks in advance -- View this message in context: http://r.789695.n4.nabble.com/Testing-for-difference-of-distribution-of-two-population-newbie-tp4508092p4508092.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot of function seems to cut off near edge of domain
Dear Chad, your problem is linked to (1) the function returning NaNs from x values greater than 50, and (2) the fact that the function is estimated on a predefined number of points. Calling plot for a function object is basically a wrapper for curve(). Your function g() is evaluated on the whole xlim domain, which will return NaN values for x50 (Try g(60) ). In addition, curve() splits the x interval (here from 0 to 60) into a predifined number of points (n=101 is the default, see help(curve)) at which the function is estimated. In your code, the function is estimated at values x - seq(0, 60, length=101), and g(x) that are not NaN are plotted. The largest x value (from the sequence) that doesn't return a NaN is max(x[!is.nan(g(x))]), which is 49.8. One way to solve it is to explicitly specify the domain used to estimate the function, by using the from and to arguments that are passed to curve(): #Figure 2, with xlim beyond the radius of the circle plot(g,axes=F,from=0, to =50, xlim=c(0, 60), ylim=c(0,60)) axis(1,pos=0) axis(2,pos=0) HTH Matthieu Matthieu Dubois Post-doctoral researcher Psychology Department Université Libre de Bruxelles __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] row, col function but for a list (probably very easy question, cannot seem to find it though)
On Mar 26, 2012, at 17:33 , David Winsemius wrote: The usual approach to that problem is to use sapply: x - list() x - sapply(1:10, function(z) x[[z]] - 1:z ) Yikes! If that works, it is only by coincidence (The pre-assignment to x only serves the purpose of allowing the [[-assignment inside the anonymous function, but the assignment is to a local copy which is deleted on exit, and the return value is the rhs of the assignment.) Please: x - lapply(1:10, function(z) 1:z) or even x - lapply(1:10, seq_len) -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] normalization of multi-value string variable
Thank you so much, Jessica, The specific of my case is that I have a very detailed variable 'Interests' which may have several thousands of possible values. Usually each customer has 3-10 different interests. For example: customer_id|...|interests 1001 |...| cycling, swimming, cooking 1002 |...| cooking, singing, dancing Total number of possible distinct values is several thousands. I m curious how to use these interests in SVM (represent as a vector of real numbers with several thousands of elements?). If you have any ideas please let me know. Thank you, -Alex From: Jessica Streicher [j.streic...@micromata.de] Sent: 27 March 2012 11:18 To: Alekseiy Beloshitskiy Subject: Re: [R] normalization of multi-value string variable Well, not sure what you mean with scaling and normalizing strings, but if you want to represent the interests as numbers, you can do something like this: n-seq(1,length(unique(my_strings)))[factor(my_strings)] Am 26.03.2012 um 18:50 schrieb Alekseiy Beloshitskiy: Hi All, I need to normalize/scale string variable which represents interests of customers (e.g., 'cycling, rollerblading, swimming' etc). Does anybody know how to do this, I want then use it along with other numeric variables for SVM classification. Appreciate for any advice. -Alex [[alternative HTML version deleted]] __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Velti anti-spam filter: Click herehttps://www.mailcontrol.com/sr/r0FnbR2LtoLTndxI!oX7UvIItv2OGGpT0AcqlhvMu8o1Dzu7YBkufzUjcExl8H5fIQg52m9U+4B6aunJTqVygQ== to report this email as spam. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SVM. How to use categorical attributes?
Hi All, Here is the case. I want to build classification model (SVM). Some of variables for this model are categorical attributes which represent words (usually 3-10 words - query for search in google). For example: search_id | query_words|..| result ---+--+--+ 1| how,to,grow,tree |..| 4 2| smartfone,htc,buy,price |..| 7 3| buy,house,realty,london |..| 6 4| where,to,go,weekend,cinema |..| 4 ... As you can see, words in the query are disordered and may occur in different queries. Total number of unique words for all queries is several thousands. The question is how to represent this variable (query_words) to use for SVM. Thank you for any advices! Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Completely Off Topic:Link to IOM report on use of -omics tests in clinical trials
Thanks, I had totally missed this controversy but from quick read of summary the impact on open source analysis was unclear.Can you explain the punchline? I think many users of R have concluded the biggest problem in most analyses isfirst getting the data and then verfiying any results you derive, both issues that sound related to your post. ( The jumble below is illustrative of what hotmail has been doing with plain text, getting plain data withoutall the formatting junk is a recurring problem LOL). #62; Date#58; Mon, 26 Mar 2012 22#58;38#58;56 #43;0100#13;#10;#62; From#58; iaingallagher#64;btopenworld.com#13;#10;#62; To#58; gunter.berton#64;gene.com#59; r-help#64;r-project.org#13;#10;#62; Subject#58; Re#58; #91;R#93; Completely Off Topic#58;Link to IOM report on use of #34;-omics#34; tests in clinical trials#13;#10;#62;#13;#10;#62; I followed this case while it was ongoing.#13;#10;#62;#13;#10;#62;#13;#10;#62; It was a very interesting example of basic mistakes but also #40;for me#41; of journal politicking.#13;#10;#62;#13;#10;#62;#13;#10;#62; Keith Baggerly and Kevin Coombes wrote a great paper - #34;DERIVING CHEMOSENSITIVITY FROM CELL LINES#58; FORENSIC BIOINFORMATICS AND REPRODUCIBLE RESEARCH IN HIGH-THROUGHPUT BIOLOGY#34; in The Annals of Applied Statistics #40;2009, Vol. 3, No. 4, 1309#8211;1334#41; which explains some of the background and investigative work they had to do to bring those mistakes to light.! #13;#10;#62;#13;#10;#62;#13;#10;#62; Best#13;#10;#62;#13;#10;#62; iain#13;#10;#62;#13;#10;#62;#13;#10;#62;#13;#10;#62; - Original Message -#13;#10;#62; From#58; Bert Gunter #60;gunter.berton#64;gene.com#62;#13;#10;#62; To#58; r-help#64;r-project.org#13;#10;#62; Cc#58;#13;#10;#62; Sent#58; Monday, 26 March 2012, 19#58;12#13;#10;#62; Subject#58; #91;R#93; Completely Off Topic#58;Link to IOM report on use of #34;-omics#34; tests in clinical trials#13;#10;#62;#13;#10;#62; Warning#58; This has little directly to do with R, although R and related#13;#10;#62; tools #40;e.g. sweave and other reproducible research tools#41; have a#13;#10;#62; natural role to play.#13;#10;#62;#13;#10;#62; The IOM report#58;#13;#10;#62;#13;#10;#62; http#58;//www.iom.edu/Reports/2012/Evolution-of-Translational-Omics.aspx#13;#10;#62;#13;#10;#62; that arose out of the Duke Univ. genomics testing scandal ha! s been#13;#10;#62; released. My thanks to Keith Baggerly for forwar ding this. I believe#13;#10;#62; that many R users in the medical research community will find this#13;#10;#62; interesting, and I hope I do not venture too far out of line by#13;#10;#62; passing on the link to readers of this list. It #42;#42;will#42;#42; have an#13;#10;#62; important impact on so-called Personalized Health Care #40;which I guess#13;#10;#62; affects all of us#41;, and open source analytical #40;statistical#41;#13;#10;#62; methodology is a central issue.#13;#10;#62;#13;#10;#62; For those interested, try the summary first.#13;#10;#62;#13;#10;#62; Best to all,#13;#10;#62; Bert#13;#10;#62;#13;#10;#62;#13;#10;#62; --#13;#10;#62;#13;#10;#62; Bert Gunter#13;#10;#62; Genentech Nonclinical Biostatistics#13;#10;#62;#13;#10;#62; Internal Contact Info#58;#13;#10;#62; Phone#58; 467-7374#13;#10;#62; Website#58;#13;#10;#62; http#58;//pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pd! b-biostatistics/pdb-ncb-home.htm#13;#10;#62;#13;#10;#62; __#13;#10;#62; R-help#64;r-project.org mailing list#13;#10;#62; https#58;//stat.ethz.ch/mailman/listinfo/r-help#13;#10;#62; PLEASE do read the posting guide http#58;//www.R-project.org/posting-guide.html#13;#10;#62; and provide commented, minimal, self-contained, reproducible code.#13;#10;#62;#13;#10;#62;#13;#10;#62; __#13;#10;#62; R-help#64;r-project.org mailing list#13;#10;#62; https#58;//stat.ethz.ch/mailman/listinfo/r-help#13;#10;#62; PLEASE do read the posting guide http#58;//www.R-project.org/posting-guide.html#13;#10;#62; and provide commented, minimal, self-contained, reproducible code.#13;#10; __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to enable Arial font for postcript/pdf figure on Windows?
Hi Agnes and Camille (and help-list), In Ubuntu 11.10 I needed to use su permissions to copy and gzip the *.afm files manually into /usr/lib/R/library/grDevices/afm/ to get the Arial embedding to work in R for postscript. Ie. after following the instructions by Agnes and Camille, I did sudo cp arial*.afm /usr/lib/R/library/grDevices/afm/ gzip /usr/lib/R/library/grDevices/afm/arial*.afm Then the postscript toy example in this thread worked. Leo -- View this message in context: http://r.789695.n4.nabble.com/How-to-enable-Arial-font-for-postcript-pdf-figure-on-Windows-tp3017809p4508266.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data indexing issue...
Dear R-help, My dataset (which is a data frame, called 'Calender' here) includes 365 rows representing 365 days for a year. One column ('Season')contains factor data representing seasons, e.g. spring, summer, autumn and winter. Another column (called 'Day') contains data representing wether the day is a working day (I use 'Wd' for short here)or weekend (I use 'Wkend' for short here). I want to seperate the index of the working days and weekends for each season. I used R commend which before for one criteria, for example, if I use... WdIndex-which(Calender$Day=='Wd') that will gives a set of indeices of working days in the year. I wonder in R could I use a combination of something such as 'AND' , 'OR' (e.g. in MySQL) to set 'multi-criteria' when selecting data. So for example... WinterWdIndex-which(Calender$Day=='Wd' AND Calender$Season==Winter) I know the above syntax is wrong, and I checked '?which' which did not give me an answer and also tried '?AND' but seems it doesn`t exist at all... Many thanks! HJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exporting a data.frame to excel using sqlSave - adds a character ' to values
Hello Tal, I have the same problem with the ' added to all my cells when exported into Excel. I can drop them manually but only one by one (the Find Replace does not work) ... So finally the exported Excel file can actually not be used by scientists to draw graphs or whatever! Did you find a solution to this problem ? Thanks, Juliette -- View this message in context: http://r.789695.n4.nabble.com/Exporting-a-data-frame-to-excel-using-sqlSave-adds-a-character-to-values-tp1016523p4508239.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] detecting time out on download.file command
Hi, I'm working with a legacy R script which makes use of the download.file command. We're having a problem that occasionally we get a time out from a particular FTP site but the function that does this doesn't pass that information back to the main function that calls it. I'm aware that it is possible to set a timeout using the options command but I don't know how to check if a timeout has been executed. If I put the command into a try block could I get the information there ? All the best, Hugh -- Hugh Shanahan Department of Computer Science Lecturer in Bioinformatics Room 246 McCrea Building E-mail : hugh.shana...@rhul.ac.uk Royal Holloway, Web : http://www.shanahanlab.orgUniversity of London Tel : +44 (0)1784 443433Egham, Surrey TW20 0EX Fax : +44 (0)1784 439786England, U.K. PGP Key http://www.cs.rhul.ac.uk/~hugh/PGP/public_key.asc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Discretization Package MDLP
Dear All, I have a dataset of eight variables with 156 records which I wish to discretize using the MDLP algorithm. My issue is that I want to dictate the number of bins the algorithm splits the data into (around 5), rather than just allowing the algorithm to dictate this using the mdlp(data) command. Any help would be greatly appreciated. Kind Regards, Khaled Taalab -- View this message in context: http://r.789695.n4.nabble.com/Discretization-Package-MDLP-tp4508501p4508501.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data indexing issue...
Hi HJ, Take a look at ?; this is probably what you're looking for. What you could also do is: Calender[Calender$Day=='Wd' Calender$Season==Winter, ] # notice the last comma This will subset directly without using which(); it might be helpful to you. HTH, Ivan -- Ivan CALANDRA Université de Bourgogne UMR CNRS/uB 6282 Biogéosciences 6 Boulevard Gabriel 21000 Dijon, FRANCE +33(0)3.80.39.63.06 ivan.calan...@u-bourgogne.fr http://biogeosciences.u-bourgogne.fr/calandra Le 27/03/12 12:32, HJ YAN a écrit : Dear R-help, My dataset (which is a data frame, called 'Calender' here) includes 365 rows representing 365 days for a year. One column ('Season')contains factor data representing seasons, e.g. spring, summer, autumn and winter. Another column (called 'Day') contains data representing wether the day is a working day (I use 'Wd' for short here)or weekend (I use 'Wkend' for short here). I want to seperate the index of the working days and weekends for each season. I used R commend which before for one criteria, for example, if I use... WdIndex-which(Calender$Day=='Wd') that will gives a set of indeices of working days in the year. I wonder in R could I use a combination of something such as 'AND' , 'OR' (e.g. in MySQL) to set 'multi-criteria' when selecting data. So for example... WinterWdIndex-which(Calender$Day=='Wd' AND Calender$Season==Winter) I know the above syntax is wrong, and I checked '?which' which did not give me an answer and also tried '?AND' but seems it doesn`t exist at all... Many thanks! HJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data indexing issue...
Why not use 'split' and get all the groups at once: result - split(Calandra, list(Calandra$Day, Calandra$Season, drop = TRUE) On Tue, Mar 27, 2012 at 7:43 AM, Ivan Calandra ivan.calan...@u-bourgogne.fr wrote: Hi HJ, Take a look at ?; this is probably what you're looking for. What you could also do is: Calender[Calender$Day=='Wd' Calender$Season==Winter, ] # notice the last comma This will subset directly without using which(); it might be helpful to you. HTH, Ivan -- Ivan CALANDRA Université de Bourgogne UMR CNRS/uB 6282 Biogéosciences 6 Boulevard Gabriel 21000 Dijon, FRANCE +33(0)3.80.39.63.06 ivan.calan...@u-bourgogne.fr http://biogeosciences.u-bourgogne.fr/calandra Le 27/03/12 12:32, HJ YAN a écrit : Dear R-help, My dataset (which is a data frame, called 'Calender' here) includes 365 rows representing 365 days for a year. One column ('Season')contains factor data representing seasons, e.g. spring, summer, autumn and winter. Another column (called 'Day') contains data representing wether the day is a working day (I use 'Wd' for short here)or weekend (I use 'Wkend' for short here). I want to seperate the index of the working days and weekends for each season. I used R commend which before for one criteria, for example, if I use... WdIndex-which(Calender$Day=='Wd') that will gives a set of indeices of working days in the year. I wonder in R could I use a combination of something such as 'AND' , 'OR' (e.g. in MySQL) to set 'multi-criteria' when selecting data. So for example... WinterWdIndex-which(Calender$Day=='Wd' AND Calender$Season==Winter) I know the above syntax is wrong, and I checked '?which' which did not give me an answer and also tried '?AND' but seems it doesn`t exist at all... Many thanks! HJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exporting a data.frame to excel using sqlSave - adds a character ' to values
I don't see any problem here; there is no data and no indication as to the actual problem you are having that is causing a Find Replace. I export to Excel all the time and don't have any problems. So provide some data and an indication of the problem. On Tue, Mar 27, 2012 at 4:37 AM, Juliette Fabre juliette_fa...@yahoo.fr wrote: Hello Tal, I have the same problem with the ' added to all my cells when exported into Excel. I can drop them manually but only one by one (the Find Replace does not work) ... So finally the exported Excel file can actually not be used by scientists to draw graphs or whatever! Did you find a solution to this problem ? Thanks, Juliette -- View this message in context: http://r.789695.n4.nabble.com/Exporting-a-data-frame-to-excel-using-sqlSave-adds-a-character-to-values-tp1016523p4508239.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Standard error terms from gfcure
Dear R-help, I am using R 2.14.1 on Windows 7 with the 'gfcure' package (cure rate model). I have included the treatment variable in the cure part of the model as shown below: Ø ref_treat - gfcure(Surv(rem.Remtime,rem.Rcens)~1,~1+strata(drpa)+factor(treat(delcure)),data=delcure,dist=loglogistic) From that I can obtain the coefficients, standard errors etc as per alternative models (with covariates only fitted to the survival part of the model say). summary(ref_treat) However, only one standard error is output: Log-logistic mixture model The maximum loglikelihood is -927.0449 Terms in the accelerated failure time model: Coefficients Std.err z-score p-value Log(scale) -0.894528 0.0236 -37.8324 0.000 (Intercept) 6.929351 0.0151 460.4157 0.000 Terms in the logistic model: Coefficients Std.err z-score p-value (Intercept) 2.542726 strata(drpa)drpa=2 18.76 factor(treat(delcure))2 0.184192 factor(treat(delcure))3 0.472809 factor(treat(delcure))4 0.255565 953.6876 0.0003 0.9997862 factor(treat(delcure))5 0.401713 Warning message: In sqrt(diag(solve(object$infomat))) : NaNs produced Can anyone explain why this is the case? Very many thanks, Laura [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv and field containing single quotes
Thanks Henrique... giving it a try now, but it'll take a good while, given the file size. Cheers, b On 27 March 2012 02:35, Henrique Dallazuanna www...@gmail.com wrote: Benilton, Try this: read.table(textConnection(gsub(',', ',', gsub('^\|\$', ', readLines('../teste.csv', sep = ',', quote = ', header = TRUE) On Mon, Mar 26, 2012 at 8:09 PM, Benilton Carvalho beniltoncarva...@gmail.com wrote: I need to read in csv files, created by 3rd party, with fields containing single quotes (as shown below). header1,header2,header3,header4 field1r1,field2r1,field3r1,field4r1 field1r2,field2r2,field3r2PartA), field3r2PartB Very Long,field4r2 field1r3,field2r3,field3r3,field4r3 read.csv(filename, quote=\', header=TRUE) won't read the file represented above, unless the 3rd line has Very (double quotes) instead of Very (single quotes)... and this is documented (scan() man page). Assuming that the creation of such csv files is something I'm not in a position to interfere with, are there (preferably, all in R) suggestions on how to handle such task? For the moment, I'm using my poor man's solution (below), but any tricks that would simplify this task would be great. Thank you very much, benilton parser - function(fname, header=TRUE, stringsAsFactors=FALSE){ txt - readLines(fname) txt - gsub(^\|\$, , txt) txt - strsplit(txt, \,\) txt - do.call(rbind, lapply(txt, function(x) gsub(\, \\, x))) if (header){ nms - txt[1,] txt - txt[-1,] } txt - as.data.frame(txt, stringsAsFactors=stringsAsFactors) if (header) names(txt) - nms txt } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv and field containing single quotes
On 27/03/12 01:09, Benilton Carvalho wrote: I need to read in csv files, created by 3rd party, with fields containing single quotes (as shown below). header1,header2,header3,header4 field1r1,field2r1,field3r1,field4r1 field1r2,field2r2,field3r2PartA), field3r2PartB Very Long,field4r2 field1r3,field2r3,field3r3,field4r3 You could try under your OS, to 1) replace , with ', (assuming that the csv does not contain any' 2) read into R with sep=\' If the file is huge, some in OS solution would be the best. Cheers, Rainer read.csv(filename, quote=\', header=TRUE) won't read the file represented above, unless the 3rd line has Very (double quotes) instead of Very (single quotes)... and this is documented (scan() man page). Assuming that the creation of such csv files is something I'm not in a position to interfere with, are there (preferably, all in R) suggestions on how to handle such task? For the moment, I'm using my poor man's solution (below), but any tricks that would simplify this task would be great. Thank you very much, benilton parser - function(fname, header=TRUE, stringsAsFactors=FALSE){ txt - readLines(fname) txt - gsub(^\|\$, , txt) txt - strsplit(txt, \,\) txt - do.call(rbind, lapply(txt, function(x) gsub(\, \\, x))) if (header){ nms - txt[1,] txt - txt[-1,] } txt - as.data.frame(txt, stringsAsFactors=stringsAsFactors) if (header) names(txt) - nms txt } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D):+49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] normalization of multi-value string variable
Right, I was also thinking about it, but since I have few thousands of unique words I 'm not quite sure how it will work I just posted my question with more detailed description here: http://stats.stackexchange.com/questions/25355/multi-value-categorical-attributes-how-r Really interesting case :) Thank you, -Alex From: Jessica Streicher [j.streic...@micromata.de] Sent: 27 March 2012 15:24 To: Alekseiy Beloshitskiy Cc: r-help@r-project.org Subject: Re: [R] normalization of multi-value string variable Hm.. so what you need is either - one new feature for each activity that has a binary value e.g.: cust_id , cycling, swimming, cooking 1001 , 1 , 0, 1 - one new feature that has a value corresponding to a certain combination of activities so if you had just the three activities you would have 2^3 possible values I'm not sure how useful that would be though for the classification. (Would need to think about how to compute this, i'm new to R as well. Would probably just iterate over the data) If you make one feature per activity, and you end up having too many to properly compute the svm, you might try to reduce it by other methods, PCA comes to mind for example, though i never used that on binary data before. Am 27.03.2012 um 11:34 schrieb Alekseiy Beloshitskiy: Thank you so much, Jessica, The specific of my case is that I have a very detailed variable 'Interests' which may have several thousands of possible values. Usually each customer has 3-10 different interests. For example: customer_id|...|interests 1001 |...| cycling, swimming, cooking 1002 |...| cooking, singing, dancing Total number of possible distinct values is several thousands. I m curious how to use these interests in SVM (represent as a vector of real numbers with several thousands of elements?). If you have any ideas please let me know. Thank you, -Alex From: Jessica Streicher [j.streic...@micromata.demailto:j.streic...@micromata.de] Sent: 27 March 2012 11:18 To: Alekseiy Beloshitskiy Subject: Re: [R] normalization of multi-value string variable Well, not sure what you mean with scaling and normalizing strings, but if you want to represent the interests as numbers, you can do something like this: n-seq(1,length(unique(my_strings)))[factor(my_strings)] Am 26.03.2012 um 18:50 schrieb Alekseiy Beloshitskiy: Hi All, I need to normalize/scale string variable which represents interests of customers (e.g., 'cycling, rollerblading, swimming' etc). Does anybody know how to do this, I want then use it along with other numeric variables for SVM classification. Appreciate for any advice. -Alex [[alternative HTML version deleted]] __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Velti anti-spam filter: Click herehttps://www.mailcontrol.com/sr/r0FnbR2LtoLTndxI!oX7UvIItv2OGGpT0AcqlhvMu8o1Dzu7YBkufzUjcExl8H5fIQg52m9U+4B6aunJTqVygQ== to report this email as spam. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RSqlite UPDATE command problem
All: I am using RSqlite and want to be able to update individual values in a record, such as with this simple example: library(RSQLite) drv-dbDriver(SQLite) con-dbConnect(drv,test.db) my.data-data.frame(countries=c(US,UK,Canada,Australia,NewZealand),vals=c(52,36,74,10,98)) dbWriteTable(con,testtable,my.data) q-dbReadTable(con,testtable) q countries vals 1 US 52 2 UK 36 3 Canada 74 4 Australia 10 5 NewZealand 98 So, say, I want to change the value for NewZealand to '21' from '98' I've tried something like this: sql-UPDATE testtable SET vals=21 WHERE countries='NewZealand' dbBeginTransaction(con) dbGetPreparedQuery(con,sql) == I get an error here dbCommit(con) using a different example for an INSERT command using a data frame 'data', this construct is accepted: dbGetPreparedQuery(con,sql,bind.data=data) What do I need to do differently to use the UPDATE command? Regards, Tom -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: thomas.ad...@noaa.gov VOICE: 937-383-0528 FAX:937-383-0033 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] normalization of multi-value string variable
Hm.. so what you need is either - one new feature for each activity that has a binary value e.g.: cust_id , cycling, swimming, cooking 1001 , 1 , 0, 1 - one new feature that has a value corresponding to a certain combination of activities so if you had just the three activities you would have 2^3 possible values I'm not sure how useful that would be though for the classification. (Would need to think about how to compute this, i'm new to R as well. Would probably just iterate over the data) If you make one feature per activity, and you end up having too many to properly compute the svm, you might try to reduce it by other methods, PCA comes to mind for example, though i never used that on binary data before. Am 27.03.2012 um 11:34 schrieb Alekseiy Beloshitskiy: Thank you so much, Jessica, The specific of my case is that I have a very detailed variable 'Interests' which may have several thousands of possible values. Usually each customer has 3-10 different interests. For example: customer_id|...|interests 1001 |...| cycling, swimming, cooking 1002 |...| cooking, singing, dancing Total number of possible distinct values is several thousands. I m curious how to use these interests in SVM (represent as a vector of real numbers with several thousands of elements?). If you have any ideas please let me know. Thank you, -Alex From: Jessica Streicher [j.streic...@micromata.de] Sent: 27 March 2012 11:18 To: Alekseiy Beloshitskiy Subject: Re: [R] normalization of multi-value string variable Well, not sure what you mean with scaling and normalizing strings, but if you want to represent the interests as numbers, you can do something like this: n-seq(1,length(unique(my_strings)))[factor(my_strings)] Am 26.03.2012 um 18:50 schrieb Alekseiy Beloshitskiy: Hi All, I need to normalize/scale string variable which represents interests of customers (e.g., 'cycling, rollerblading, swimming' etc). Does anybody know how to do this, I want then use it along with other numeric variables for SVM classification. Appreciate for any advice. -Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Velti anti-spam filter: Click here to report this email as spam. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Supperscript, subscript and double lines in the main/sub title and using greek letters
Dear R-help, I am trying to express myself as best as I can here. If you also use Latex to edit math reports or other languages with similar editing method, you'll see what I'm talking about. My sincere appologies if my question is not clear enough to some extend, as also I'm not able to provide my code here because I don`t know which one I can use... When editing the title in R plots, such as using 'plot', or 'xyplot' in 'lattic', what method do you use to write greek letters and make use of superscript and subscript, e.g. to write mathematical expressions like using Latex: \sigma^2 \tau^{2s} \mu_i \pi_{2s} Also I would like to learn how to make two lines in the main title or sub title if the text I need it too long for putting in a single line, e.g. are there some R code/syntax allowing me to do something like in Latex to make two lines in the title, for example using '//' or '\\' to seperate the two parts of the text I want to put in two lines?? I heard about using something like plot(x,y, main=expression()) but from neither '?plot' or '?expression' could I find comprehensive information about what I need... Many thanks! HJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data indexing issue...
Hi Jim! Thank you so much for the very helpful hints!! I am learning 'split' now and it seems very useful.. HJ On Tue, Mar 27, 2012 at 12:58 PM, jim holtman jholt...@gmail.com wrote: Why not use 'split' and get all the groups at once: result - split(Calandra, list(Calandra$Day, Calandra$Season, drop = TRUE) On Tue, Mar 27, 2012 at 7:43 AM, Ivan Calandra ivan.calan...@u-bourgogne.fr wrote: Hi HJ, Take a look at ?; this is probably what you're looking for. What you could also do is: Calender[Calender$Day=='Wd' Calender$Season==Winter, ] # notice the last comma This will subset directly without using which(); it might be helpful to you. HTH, Ivan -- Ivan CALANDRA Université de Bourgogne UMR CNRS/uB 6282 Biogéosciences 6 Boulevard Gabriel 21000 Dijon, FRANCE +33(0)3.80.39.63.06 ivan.calan...@u-bourgogne.fr http://biogeosciences.u-bourgogne.fr/calandra Le 27/03/12 12:32, HJ YAN a écrit : Dear R-help, My dataset (which is a data frame, called 'Calender' here) includes 365 rows representing 365 days for a year. One column ('Season')contains factor data representing seasons, e.g. spring, summer, autumn and winter. Another column (called 'Day') contains data representing wether the day is a working day (I use 'Wd' for short here)or weekend (I use 'Wkend' for short here). I want to seperate the index of the working days and weekends for each season. I used R commend which before for one criteria, for example, if I use... WdIndex-which(Calender$Day=='Wd') that will gives a set of indeices of working days in the year. I wonder in R could I use a combination of something such as 'AND' , 'OR' (e.g. in MySQL) to set 'multi-criteria' when selecting data. So for example... WinterWdIndex-which(Calender$Day=='Wd' AND Calender$Season==Winter) I know the above syntax is wrong, and I checked '?which' which did not give me an answer and also tried '?AND' but seems it doesn`t exist at all... Many thanks! HJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] row, col function but for a list (probably very easy question, cannot seem to find it though)
Thanks guys for all the replies. It is an urban myth that using 'apply' functions will deliver better performance than 'for' loops. It may even worsen performance or create obstacles when it is improperly used with dataframes. Most of the benefits come from improving readability and maintainability. This is what I had to learn the hard way: apply functions made it go slower :) I do understand them much better now, also in the light of some of these ways of using them. In the end my program became much faster by making the data frames matrices, and even more by finally seeing the light (courtesy of a colleague for getting me to think in the right direction) and making much more of it into a matrix operation. I'm very happy with the results :). So consider me helped! Regards, Mark -- View this message in context: http://r.789695.n4.nabble.com/row-col-function-but-for-a-list-probably-very-easy-question-cannot-seem-to-find-it-though-tp4504216p4508816.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RSqlite UPDATE command problem
You probably want: sql-UPDATE testtable SET vals=21 WHERE countries='NewZealand' dbGetQuery(con, sql) instead... b On 27 March 2012 14:18, Thomas Adams thomas.ad...@noaa.gov wrote: All: I am using RSqlite and want to be able to update individual values in a record, such as with this simple example: library(RSQLite) drv-dbDriver(SQLite) con-dbConnect(drv,test.db) my.data-data.frame(countries=c(US,UK,Canada,Australia,NewZealand),vals=c(52,36,74,10,98)) dbWriteTable(con,testtable,my.data) q-dbReadTable(con,testtable) q countries vals 1 US 52 2 UK 36 3 Canada 74 4 Australia 10 5 NewZealand 98 So, say, I want to change the value for NewZealand to '21' from '98' I've tried something like this: sql-UPDATE testtable SET vals=21 WHERE countries='NewZealand' dbBeginTransaction(con) dbGetPreparedQuery(con,sql) == I get an error here dbCommit(con) using a different example for an INSERT command using a data frame 'data', this construct is accepted: dbGetPreparedQuery(con,sql,bind.data=data) What do I need to do differently to use the UPDATE command? Regards, Tom -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: thomas.ad...@noaa.gov VOICE: 937-383-0528 FAX:937-383-0033 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Supperscript, subscript and double lines in the main/subtitle and using greekletters
Hi, HJ, see ?plotmath Hth -- Gerrit - Dr. Gerrit Eichner Mathematical Institute, Room 212 gerrit.eich...@math.uni-giessen.de Justus-Liebig-University Giessen Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany Fax: +49-(0)641-99-32109http://www.uni-giessen.de/cms/eichner - On Tue, 27 Mar 2012, HJ YAN wrote: Dear R-help, I am trying to express myself as best as I can here. If you also use Latex to edit math reports or other languages with similar editing method, you'll see what I'm talking about. My sincere appologies if my question is not clear enough to some extend, as also I'm not able to provide my code here because I don`t know which one I can use... When editing the title in R plots, such as using 'plot', or 'xyplot' in 'lattic', what method do you use to write greek letters and make use of superscript and subscript, e.g. to write mathematical expressions like using Latex: \sigma^2 \tau^{2s} \mu_i \pi_{2s} Also I would like to learn how to make two lines in the main title or sub title if the text I need it too long for putting in a single line, e.g. are there some R code/syntax allowing me to do something like in Latex to make two lines in the title, for example using '//' or '\\' to seperate the two parts of the text I want to put in two lines?? I heard about using something like plot(x,y, main=expression()) but from neither '?plot' or '?expression' could I find comprehensive information about what I need... Many thanks! HJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RSqlite UPDATE command problem
Benilton, * * *Thank you you are quite right!!* * * *Regards,* *Tom * On Tue, Mar 27, 2012 at 9:35 AM, Benilton Carvalho beniltoncarva...@gmail.com wrote: You probably want: sql-UPDATE testtable SET vals=21 WHERE countries='NewZealand' dbGetQuery(con, sql) instead... b On 27 March 2012 14:18, Thomas Adams thomas.ad...@noaa.gov wrote: All: I am using RSqlite and want to be able to update individual values in a record, such as with this simple example: library(RSQLite) drv-dbDriver(SQLite) con-dbConnect(drv,test.db) my.data-data.frame(countries=c(US,UK,Canada,Australia,NewZealand),vals=c(52,36,74,10,98)) dbWriteTable(con,testtable,my.data) q-dbReadTable(con,testtable) q countries vals 1 US 52 2 UK 36 3 Canada 74 4 Australia 10 5 NewZealand 98 So, say, I want to change the value for NewZealand to '21' from '98' I've tried something like this: sql-UPDATE testtable SET vals=21 WHERE countries='NewZealand' dbBeginTransaction(con) dbGetPreparedQuery(con,sql) == I get an error here dbCommit(con) using a different example for an INSERT command using a data frame 'data', this construct is accepted: dbGetPreparedQuery(con,sql,bind.data=data) What do I need to do differently to use the UPDATE command? Regards, Tom -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: thomas.ad...@noaa.gov VOICE: 937-383-0528 FAX:937-383-0033 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: thomas.ad...@noaa.gov VOICE: 937-383-0528 FAX:937-383-0033 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] copy the columns based on the code
:) yes! I agree! On Mon, Mar 26, 2012 at 10:51:17AM -0700, Bert Gunter wrote: Fortunes candidate?! -- Bert On Mon, Mar 26, 2012 at 10:24 AM, Sarah Goslee sarah.gos...@gmail.com wrote: The OP wrote The problem is that it gives the result that I want. Sarah's reply: That's a new sort of problem. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- :: Igor Sosa Mayor :: joseleopoldo1...@gmail.com :: :: GnuPG: 0x1C1E2890 :: http://www.gnupg.org/ :: :: jabberid: rogorido :::: pgpB12B850AAx.pgp Description: PGP signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help in replacing for llop
No idea what a mean median histogram is but you may wish to check out ?tapply or library(plyr), both of which are designed for this split-apply-combine paradigm. Michael On Tue, Mar 27, 2012 at 12:51 AM, arunkumar akpbond...@gmail.com wrote: Hi I have records like like this X1 X2 State 34 72 state1 9 63 state1 49 31 state1 60 34 state1 80 73 state1 60 20 state2 59 87 state2 88 20 state2 71 66 state2 65 56 state2 59 16 state1 60 100 state2 I want to get the summarize value like mean median histogram for X1 and X2 based on state. I'm using FOR loop for this. Is there any method to remove for loop and use apply or any alternatives - Thanks in Advance Arun -- View this message in context: http://r.789695.n4.nabble.com/help-in-replacing-for-llop-tp4507939p4507939.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] I can't open a .nc file with the cdfcont function of the clim.pact package
Hello, I am new at using R. I would like to use the following functions of the clim.pact package: ncdfcont and retrieve.nc I have installed the package clim.pact in Rstudio. I have downloaded the ncdf pack from unicar (including ncdump and ncgen). The ncdf file I'm working on is called essai2.nc Here is what I get, when I type the command ncinfo - cdfcont(essai2.nc) ncinfo cdfcont.txt' renvoie un statut 1 2: In min(nchar(str)) : aucun argument trouvé pour min ; Inf est renvoyé I'm sorry it's in French! If I try to translate: Error in 1:nc: the argument has null length Information message: 1: executing the command 'C:...' gives status 1 2: In min(nchar(str)) :no argument found for min ; Inf is sent back Could someone please help me with this? PS: I can open the document with the function open.ncdf of the ncdf package. Regards -- View this message in context: http://r.789695.n4.nabble.com/I-can-t-open-a-nc-file-with-the-cdfcont-function-of-the-clim-pact-package-tp4508950p4508950.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] two lmer questions - formula with related variables and output interpretation
Hello, I have been attempting to set up a lme and have looked at numerous posts including 'R's lmer cheat-sheet' as well as reading a number of papers and other resources including R help, but I am still a little confused on how to write my model (I thought I had it). I have asked a number of questions on different forums; most of which have been resolved. My main concern right now is whether my model is correct. I studied broods of precocial chicks and watched each chick every other day for five minutes if possible. As chicks on the same day are completely non-independent the mean was found for each brood for each day. Variables that were recorded were the behaviours during that time and the habitats used. There were seven broods. Three at one site and four at the other site. Only one site had a brood that consistently used mudflats rather than oceanfront habitats. As none of the data within a brood is truly independent, along with the very small number of broods, it became impossible to use conventional statistics to test the hypotheses and so it was suggested that mixed-effects models would be the best option as it would not only allow for all data to be used with a random effect of Brood ID to negate the pseudo-replication but also let me look at partial use of mudflats in one of the other broods that only used it periodically. So, for this part of the analysis I would like to see which factors affect the amount of time feeding. I set up a global model with ten fixed variables plus (1|Brood). Site, tide.h.l, tide.inc.out, MF.vs.OF, Human Disturbance Rate (HDr), Human Disturbance proportion of time(HDp), non-Human Disturbance (two variables as for Human Disturbance) and Age and mean.foraging.rate. As so: gm1-lmer(Feeding~Site+tide.level+MF.vs.OF+HDr+HDp+NHDr+NHDp+Age+mean.for.rate+(1|Brood), data=AllBrood, REML=TRUE) I wished to put all the factors together to explore which ones really did influence the time spent feeding and used 'dredge' command to run all possible combinations and then averaged the models with an AICc Delta2. I was expecting that the proportion of time being disturbed (HDp and NHDp) would be the most relevant as by default the greater time in other behaviours the less time for feeding. However, MF.vs.OF had a larger effect than HDp and NHDp but this may be because MF observations did not experience HDp at all so this may push the effect of this habitat. Surprisingly non-human disturbance rates rather than time had a greater effect (but these are quite even among habitats. The results of the model.avg are as follows: Estimate Std. Error z value Pr(|z|) (Intercept) 102.7190 5.5300 18.575 2e-16 *** HDr-1.5495 0.3451 4.490 7.11e-06 *** MF.vs.OF2 -7.6780 3.7507 2.047 0.04065 * NHDp -0.5145 0.2909 1.769 0.07695 . NHDr -1.4164 0.4663 3.037 0.00239 ** Site2 6.1477 2.7400 2.244 0.02485 * tide.h.l2 -7.2546 2.6914 2.695 0.00703 ** tide.inc.out2 -5.8486 2.6187 2.233 0.02553 * HDp-0.3773 0.2732 1.381 0.16731 mean.for.rate -0.3966 0.3220 1.232 0.21807 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Full model-averaged coefficients (with shrinkage): (Intercept)HDr MF.vs.OF2 NHDp NHDr Site2 tide.h.l2 tide.inc.out2HDp 102.718962 -1.549499 -5.734171 -0.239550 -1.416373 5.336532 -7.254627 -5.848553 -0.044795 mean.for.rate -0.081734 Relative variable importance: (Intercept) Age HDp HDr mean.for.rate MF.vs.OF NHDp NHDr 1.00 0.00 0.12 1.00 0.21 0.75 0.47 1.00 Site tide.h.l tide.inc.out 0.87 1.00 1.00 I was wondering whether there would be a better way to formulate the model to allow for this effect, or could I just keep it as is and just infer that it may be partly affected by the amount of disturbance within these habitats but as it has a greater effect that other factors are at play which would then lead me onto the next model which is going to explore observations that do not include disturbance which would allow me to tease the natural factors affecting feeding behaviour? I was going to run this second model with site still as a fixed effect and then run it with (1|Site) to remove site effect (if one is found). I would prefer to keep it simple as I really want to use a lme, but don't have the understanding for more complex interactions. I has also asked a question, which is yet to be answered on stats stack exchange, in regards to the output of the model.avg. as follows: I have seen the Estimates described as the effect of the variable and this is discussed in results sections as an important value to report (in regards to the size of them and their direction (+ve/-ve). (the paper I
[R] Zero inflated GAMM
HI all, I am planning to get Zuur et al.'s new book when it comes out, but until then I was wondering if anyone could suggest examples of zero inflated or hurdle GAMMs. I have count data with many zeros, non-linear relationships, and site as a random effect. Thank you! Bert Harris, University of Adelaide [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory Utilization on R
Thank you for the modified script! I have now tried on different datasets and it works very well and is dramatically faster than my original script! I really appreciate the help. Kurinji On Fri, Mar 23, 2012 at 1:33 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Taking a look at your script: there are a some potential optimizations you can do: # Fine poi - as.character(top.GSM396290) #5000 characters x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables # Pre-allocate the space x - vector(list, 485577) # x - list() # Do the a stuff once outside the loop so you aren't doing it 485577 times a - strsplit(as.character(x.data[, UCSC_REFGENE_NAME]), ;) # Lets use an apply statement instead of a for loop # vapply is the fastest since we prespecify the return type. x.data[vapply(a, function(x) any(poi %in% x), logical(1)), ] I think this will do what you wanted (and hopefully much faster) Note that you could probably tune this further but I think this strikes a good balance between clarity and performance (for now) Hope this helps, Michael On Fri, Mar 23, 2012 at 11:52 AM, Kurinji Pandiyan kurinji.pandi...@gmail.com wrote: Thank you for the input. As it were, I realized that my script is utilizing a lot more memory than I claimed - it was initially using 3 GB but has gone up to 20.24 active but 29.63 assigned to the R session. The script has run overnight and now I don't think it is active anymore since I keep getting the error message that I am out of startup disk space for application memory. I am attaching screen shots of my RAM usage distribution (given that there is no fluctuation in the usage by the R session I believe it is not running anymore) and of my available HD. Here is my script - poi - as.character(top.GSM396290) #5000 characters x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables head(x.data) x - list() for(i in 1:485577){ a - as.character(x.data[i, UCSC_REFGENE_NAME]) a - unlist(strsplit(a, ;)) if(any(poi %in% a) == TRUE) {x[[i]] - x.data[i,]} } # this step completed in a few hours x - do.call(rbind, x) # this step has been running overnight and is still stuck Thanks, I really appreciate the help. Kurinji On Thu, Mar 22, 2012 at 10:44 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Well... what makes you think you are hitting memory constraints then? If you have significantly less than 3GB of data, it shouldn't surprise you if R never needs more than 3GB of memory. You could just be running your scripts inefficiently...it's an extreme example, but all the memory and gigaflopping in the world can't speed this up (by much): for(i in seq_len(1e6)) Sys.sleep(10) Perhaps you should look into profiling tools or parallel computation...if you can post a representative example of your scripts, we might be able to give performance pointers. Michael On Fri, Mar 23, 2012 at 1:33 AM, Kurinji Pandiyan kurinji.pandi...@gmail.com wrote: Yes, I am. Thank you, Kurinji On Mar 22, 2012, at 10:27 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Use 64bit R? Michael On Thu, Mar 22, 2012 at 5:22 PM, Kurinji Pandiyan kurinji.pandi...@gmail.com wrote: Hello, I have a 32 GB RAM Mac Pro with a 2*2.4 GHz quad core processor and 2TB storage. Despite this having so much memory, I am not able to get R to utilize much more than 3 GBs. Some of my scripts take hours to run but I would think they would be much faster if more memory is utilized. How do I optimize the memory usage on R by my Mac Pro? Thank you! Kurinji [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using MuMIn - error message
Hello Mike, I don't think I did, but I fixed the issue by loading each package before use. The second issue was solved by removing a variable that was used to create two other categorical variables. I think it must have been recognising this. Thanks for the help. -- View this message in context: http://r.789695.n4.nabble.com/Using-MuMIn-error-message-tp4500236p4508901.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] matrix(unlist(strsplit())) 'missing value' issue
*I'm still a R noob, just had a couple of lectures about it in our research master. There is a Deal or no deal experiment where I have to write some code for. Someone wrote a website to gather the data and write it in a .xlsx file. These are seperate files for seperate participants so first I have to import the seperate datafiles. I do that like this: # Merge the xlsx files into one dataframe alldata - rbind(read.xlsx('experimentdata.xlsx',1), read.xlsx('experimentdata_1.xlsx',1), read.xlsx('experimentdata_2.xlsx',1) #etc..#read.xlsx('filepath',1) ) The website is poorly written and some of the variables are not conveniant. I have the variables 'bankoffer.1', 'bankoffer.3', 'bankoffer.5' etc. These variables look like the following: alldata$bankoffer.1 [1] 246000:accepted267000:notaccepted 20:notaccepted Levels: 246000:accepted 267000:notaccepted 20:notaccepted alldata$bankoffer.3 [1] 999429000:notaccepted 48000:notaccepted Levels: 999 429000:notaccepted 48000:notaccepted The problem is that the values in the cells are weird, they constitude for example of /'246000:accepted'/I would decompose that so that /246000 /is in one variable and /accepted /in another no problem just do this: as.data.frame(matrix(unlist(strsplit(as.character(alldata$bankoffer.1),:)), ncol = 2, byrow = TRUE)) V1 V2 1 246000accepted 2 267000 notaccepted 3 20 notaccepted However when there are missing values, like in bankoffer.3, there is a problem as.data.frame(matrix(unlist(strsplit(as.character(alldata$bankoffer.3),:)), ncol = 2, byrow = TRUE)) V1 V2 1 999 429000 2 notaccepted 48000 3 notaccepted 999 Warning message: In matrix(unlist(strsplit(as.character(alldata$bankoffer.3), :)), : data length [5] is not a sub-multiple or multiple of the number of rows [3] R does not encounter a ':' in the 999 and therefor places the 429000 in the second colomn, this should however be in the first one. Like this: V1 V2 1 999 999 2 429000 notaccepted 3 48000 notaccepted How can I tell R to place 999 in both colomns when he/she encounters a 999. Or any other solotion to my problem is also good. I for example thought about making R add ':999' whenever it encounters 999 as a sort of a workaround for the problem but I have no idea how to do that. I hope I made it a little clear what the problem is and what I eventually want. If not please ask. Greetings Maarten -- View this message in context: http://r.789695.n4.nabble.com/matrix-unlist-strsplit-missing-value-issue-tp4509065p4509065.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exporting a data.frame to excel using sqlSave - adds a character ' to values
Hello, I encountered a situation similar as the one described by Tal above : I use the RODBC library to export multiple dataframes into different sheets of an Excel file. My dataframes contain Character, Date and Numeric columns. library(RODBC) channel - odbcConnectExcel(xls.file = myXlsFile, readOnly = FALSE) sqlSave(channel, data, tablename = Table1, rownames = F, colnames = T) odbcClose(channel) When exported into Excel, *all * of my cells start with the ' character (which is different from Tal's situation where *only * non-numeric cells started with ' character). I need the columns that contain numeric data or dates to be imported into the appropriate format so that they can be manipulated (graphics etc). I found a macro that formats all the sheets in the appropriate way, but I would like to discover why even my numeric data (type Numeric in R) are imported as text. Regards, Juliette -- View this message in context: http://r.789695.n4.nabble.com/Exporting-a-data-frame-to-excel-using-sqlSave-adds-a-character-to-values-tp1016523p4509108.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] row, col function but for a list (probably very easy question, cannot seem to find it though)
On Mar 27, 2012, at 3:37 AM, peter dalgaard wrote: On Mar 26, 2012, at 17:33 , David Winsemius wrote: The usual approach to that problem is to use sapply: x - list() x - sapply(1:10, function(z) x[[z]] - 1:z ) Yikes! If that works, it is only by coincidence (The pre-assignment to x only serves the purpose of allowing the [[-assignment inside the anonymous function, but the assignment is to a local copy which is deleted on exit, and the return value is the rhs of the assignment.) Well, maybe not by pure coincidence. There are really two rhs's and it was because of the outer assignment of the values to 'x' that it worked as intended. My error is in propagating the notion that assignments to named objects inside the function will survive outside the function. x - list(); y-list() y - sapply(1:10, function(z) x[[z]] - 1:z ) x list() Please: x - lapply(1:10, function(z) 1:z) or even x - lapply(1:10, seq_len) Yes, I see the error of my ways. I wonder how many times I have been in this state of sin in the past? -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R extract parts
Good Afternoon, I believe that my to the problem, the R has a more effective solution. in place the use the loop I have the following set of data, and needs to extract some sections. user poscommunications source v_destine 7 1 109 2222 7 2 100 2222 7 3 214 2222 7 4 322 2222 7 5 69920 22 161 7 6 68 16197 7 7 196 9797 7 8 427 9722 7 9460 2222 7 10 307 2222 7 11 9582 2222 7 12 55428 2222 7 139192 2222 7 14 19 2222 my idea is to arise when a value greater than 1000 communications able to extract some data. In the example data set, is valued at over 1000 in the position 11,12,13. my idea is to get results like this: user, sector, source, destine, count, average 7 1 22 22 4 186.25 # (109+100+214+322) 7 2 161 97 1 68 7 2 97 97 1 196 7 2 97 22 1 427 7 2 22 22 2 383 -- View this message in context: http://r.789695.n4.nabble.com/R-extract-parts-tp4509042p4509042.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Constructing Distance matrix for hclust
Hi, I have similarity value between string pairs in a mysql database. I need to construct the distance matrix which hclust can take and cluster the strings. Most of the examples I came across show how to construct the distance matrix using dist function. How can I code to construct distance matrix using the data in mysql db. Thanks a lot for any help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lasso constraint
In the package lasso2, there is a Prostate Data. To find coefficients in the prostate cancer example we could impose L1 constraint on the parameters. code is: data(Prostate) p.mean - apply(Prostate, 5,mean) pros - sweep(Prostate, 5, p.mean, -) p.std - apply(pros, 5, var) pros - sweep(pros, 5, sqrt(p.std),/) pros[, lpsa] - Prostate[, lpsa] l1ce(lpsa ~ . , pros, bound = 0.44) I can't figure out what dose 0.44 come from. On the paper it said it was from generalized cross-validation and it is the optimal choice. paper name: Regression Shrinkage and Selection via the Lasso author: Robert Tibshirani -- View this message in context: http://r.789695.n4.nabble.com/lasso-constraint-tp4508998p4508998.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Supperscript, subscript and double lines in the main/subtitle and using greekletters
On Mar 27, 2012, at 9:39 AM, Gerrit Eichner wrote: Hi, HJ, see ?plotmath Hth -- Gerrit - Dr. Gerrit Eichner Mathematical Institute, Room 212 On Tue, 27 Mar 2012, HJ YAN wrote: Dear R-help, I am trying to express myself as best as I can here. If you also use Latex to edit math reports or other languages with similar editing method, you'll see what I'm talking about. My sincere appologies if my question is not clear enough to some extend, as also I'm not able to provide my code here because I don`t know which one I can use... When editing the title in R plots, such as using 'plot', or 'xyplot' in 'lattic', what method do you use to write greek letters and make use of superscript and subscript, e.g. to write mathematical expressions like using Latex: \sigma^2 \tau^{2s} \mu_i \pi_{2s} Also I would like to learn how to make two lines in the main title or sub title if the text I need it too long for putting in a single line, e.g. are there some R code/syntax allowing me to do something like in Latex to make two lines in the title, for example using '//' or '\\' to seperate the two parts of the text I want to put in two lines?? I heard about using something like plot(x,y, main=expression()) but from neither '?plot' or '?expression' could I find comprehensive information about what I need... The plotmath environment (not the correct term) will not accept the usual EOL \n marker for new lines. You can cobble together a subsitute (at least for the two line problem) using the plotmath `atop` function. plot(1,1, main=expression(atop( laaahhh~tau, bllleeehhh~epsilon))) Notice the need for a plotmath connector such as ~ or * between the text and the unquoted greeks. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to test for the difference of means in population, please help
You should use mixed effects modeling to analyze data of this sort. This is not a topic that has generally been covered by introductory classes, so you should consult with a professional statistician on your problem, or educate yourself well beyond the novice level (this takes more than just reading 1 book, a few classes would be good to get to this level, or intense study of several books). Since everything is balanced nicely, you could average over the 4 repeats and use a 2 sample t test (assuming the assumptions hold, your sample data would be fine) comparing the 2 sets of 400 means. This will test for a general difference in the overall means, but ignores other information and hypotheses that may be important (which is why the mixed effects model approach is much preferred). On Tue, Mar 27, 2012 at 1:13 AM, ali_protocol mohammadianalimohammad...@gmail.com wrote: Dear all, Novice in statistics. I have 2 experimental conditions. Each condition has ~400 points as its response. Each condition is done in 4 repereats (so I have 2 x 400 x 4 points). I want to compare the means of two conditions and test whether they are same or not. Which test should I use? #populations c = matrix (sample (1:20,1600, replace= TRUE), 400 ,4) b = matrix (sample (1:20,1600, replace= TRUE), 400 ,4) #means of repeats c.mean= apply (c,2, mean) b.mean= apply (b,2,mean) #mean of experiment c.mean.all= mean (c) b.mean.all= mean (b) -- View this message in context: http://r.789695.n4.nabble.com/How-to-test-for-the-difference-of-means-in-population-please-help-tp4508089p4508089.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Supperscript, subscript and double lines in the main/subtitle and using greekletters
The title() function also has parameter 'line' where you can specify the margin line in which the text should be displayed. How many lines of margin should be around the figure region of the plot can be specified before plotting by par(mar=c(bottom,left,top,right)), in text lines. margin lines are also used by par(mgp=...) or mtext() Regards! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory Utilization on R
Guys, let me add my 5 coins into your interesting discussion. I have ~10Gb txt file with train data for my model. It has about 150 millions rows for 12 variables. When I load it into memory (just run only one row!): train-read.table(file=/training.txt) while loading it takes ~28Gb of RAM (It takes about 2hours to finish), and when data are loaded, rsession takes ~14Gb. I even can't imagine how much it will take when I will run svm train on this data set. Is there any optimization to decrease time required for loading data into memory. I use 32RAM x64 box. Thank you, -Alex From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of Kurinji Pandiyan [kurinji.pandi...@gmail.com] Sent: 27 March 2012 18:14 To: R. Michael Weylandt Cc: r-help@r-project.org Subject: Re: [R] Memory Utilization on R Thank you for the modified script! I have now tried on different datasets and it works very well and is dramatically faster than my original script! I really appreciate the help. Kurinji On Fri, Mar 23, 2012 at 1:33 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Taking a look at your script: there are a some potential optimizations you can do: # Fine poi - as.character(top.GSM396290) #5000 characters x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables # Pre-allocate the space x - vector(list, 485577) # x - list() # Do the a stuff once outside the loop so you aren't doing it 485577 times a - strsplit(as.character(x.data[, UCSC_REFGENE_NAME]), ;) # Lets use an apply statement instead of a for loop # vapply is the fastest since we prespecify the return type. x.data[vapply(a, function(x) any(poi %in% x), logical(1)), ] I think this will do what you wanted (and hopefully much faster) Note that you could probably tune this further but I think this strikes a good balance between clarity and performance (for now) Hope this helps, Michael On Fri, Mar 23, 2012 at 11:52 AM, Kurinji Pandiyan kurinji.pandi...@gmail.com wrote: Thank you for the input. As it were, I realized that my script is utilizing a lot more memory than I claimed - it was initially using 3 GB but has gone up to 20.24 active but 29.63 assigned to the R session. The script has run overnight and now I don't think it is active anymore since I keep getting the error message that I am out of startup disk space for application memory. I am attaching screen shots of my RAM usage distribution (given that there is no fluctuation in the usage by the R session I believe it is not running anymore) and of my available HD. Here is my script - poi - as.character(top.GSM396290) #5000 characters x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables head(x.data) x - list() for(i in 1:485577){ a - as.character(x.data[i, UCSC_REFGENE_NAME]) a - unlist(strsplit(a, ;)) if(any(poi %in% a) == TRUE) {x[[i]] - x.data[i,]} } # this step completed in a few hours x - do.call(rbind, x) # this step has been running overnight and is still stuck Thanks, I really appreciate the help. Kurinji On Thu, Mar 22, 2012 at 10:44 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Well... what makes you think you are hitting memory constraints then? If you have significantly less than 3GB of data, it shouldn't surprise you if R never needs more than 3GB of memory. You could just be running your scripts inefficiently...it's an extreme example, but all the memory and gigaflopping in the world can't speed this up (by much): for(i in seq_len(1e6)) Sys.sleep(10) Perhaps you should look into profiling tools or parallel computation...if you can post a representative example of your scripts, we might be able to give performance pointers. Michael On Fri, Mar 23, 2012 at 1:33 AM, Kurinji Pandiyan kurinji.pandi...@gmail.com wrote: Yes, I am. Thank you, Kurinji On Mar 22, 2012, at 10:27 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Use 64bit R? Michael On Thu, Mar 22, 2012 at 5:22 PM, Kurinji Pandiyan kurinji.pandi...@gmail.com wrote: Hello, I have a 32 GB RAM Mac Pro with a 2*2.4 GHz quad core processor and 2TB storage. Despite this having so much memory, I am not able to get R to utilize much more than 3 GBs. Some of my scripts take hours to run but I would think they would be much faster if more memory is utilized. How do I optimize the memory usage on R by my Mac Pro? Thank you! Kurinji [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version
Re: [R] two lmer questions - formula with related variables and output interpretation
I realised that I removed the link to the question but forgot to remove the text regarding it. Sorry. I am not sure if I am supposed to link to other forums, but I can add the links as needed (as the format is clearer). I actually have one more question though in regards to which data to use. If it is better to just report the estimates and CIs then should I use those with shrinkage instead, and if so, does anyone know how I can get the CIs for these rather than just the regular CIs. I apologise if I am asking too many questions within one post. Rachel -- View this message in context: http://r.789695.n4.nabble.com/two-lmer-questions-formula-with-related-variables-and-output-interpretation-tp4508876p4509334.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory Utilization on R
Note that you can actually drop the line defining the big list x. I thought it would be needed, but it turns out to be unnecessary after cleaning up the second half: cutting off that allocation might save you even more time. Best, Michael On Tue, Mar 27, 2012 at 11:14 AM, Kurinji Pandiyan kurinji.pandi...@gmail.com wrote: Thank you for the modified script! I have now tried on different datasets and it works very well and is dramatically faster than my original script! I really appreciate the help. Kurinji On Fri, Mar 23, 2012 at 1:33 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Taking a look at your script: there are a some potential optimizations you can do: # Fine poi - as.character(top.GSM396290) #5000 characters x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables # Pre-allocate the space x - vector(list, 485577) # x - list() # Do the a stuff once outside the loop so you aren't doing it 485577 times a - strsplit(as.character(x.data[, UCSC_REFGENE_NAME]), ;) # Lets use an apply statement instead of a for loop # vapply is the fastest since we prespecify the return type. x.data[vapply(a, function(x) any(poi %in% x), logical(1)), ] I think this will do what you wanted (and hopefully much faster) Note that you could probably tune this further but I think this strikes a good balance between clarity and performance (for now) Hope this helps, Michael On Fri, Mar 23, 2012 at 11:52 AM, Kurinji Pandiyan kurinji.pandi...@gmail.com wrote: Thank you for the input. As it were, I realized that my script is utilizing a lot more memory than I claimed - it was initially using 3 GB but has gone up to 20.24 active but 29.63 assigned to the R session. The script has run overnight and now I don't think it is active anymore since I keep getting the error message that I am out of startup disk space for application memory. I am attaching screen shots of my RAM usage distribution (given that there is no fluctuation in the usage by the R session I believe it is not running anymore) and of my available HD. Here is my script - poi - as.character(top.GSM396290) #5000 characters x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables head(x.data) x - list() for(i in 1:485577){ a - as.character(x.data[i, UCSC_REFGENE_NAME]) a - unlist(strsplit(a, ;)) if(any(poi %in% a) == TRUE) {x[[i]] - x.data[i,]} } # this step completed in a few hours x - do.call(rbind, x) # this step has been running overnight and is still stuck Thanks, I really appreciate the help. Kurinji On Thu, Mar 22, 2012 at 10:44 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Well... what makes you think you are hitting memory constraints then? If you have significantly less than 3GB of data, it shouldn't surprise you if R never needs more than 3GB of memory. You could just be running your scripts inefficiently...it's an extreme example, but all the memory and gigaflopping in the world can't speed this up (by much): for(i in seq_len(1e6)) Sys.sleep(10) Perhaps you should look into profiling tools or parallel computation...if you can post a representative example of your scripts, we might be able to give performance pointers. Michael On Fri, Mar 23, 2012 at 1:33 AM, Kurinji Pandiyan kurinji.pandi...@gmail.com wrote: Yes, I am. Thank you, Kurinji On Mar 22, 2012, at 10:27 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Use 64bit R? Michael On Thu, Mar 22, 2012 at 5:22 PM, Kurinji Pandiyan kurinji.pandi...@gmail.com wrote: Hello, I have a 32 GB RAM Mac Pro with a 2*2.4 GHz quad core processor and 2TB storage. Despite this having so much memory, I am not able to get R to utilize much more than 3 GBs. Some of my scripts take hours to run but I would think they would be much faster if more memory is utilized. How do I optimize the memory usage on R by my Mac Pro? Thank you! Kurinji [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] copy the columns based on the code
Hello, this code, works perfectly temp - merge(travel, city, by.x=Source, by.y=cod) result - merge(temp, city, by.x=Destine, by.y=cod) The problem was the construction of the data frame, had a parenthesis in city-rbind(city,data.frame(city=Lisbon,cod=3))), I tried to delete the post, but i don't could. As I have little experience in R, I still do some mistakes. I use read.table to load the data frame, the way in the post, it was quickly that i found to describe the problem. The forum has been a great help for me. Thanks -- View this message in context: http://r.789695.n4.nabble.com/copy-the-columns-based-on-the-code-tp4505253p4509340.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] copy the columns based on the code
yet another way: city-data.frame(city=Barcelona,cod=1) city-rbind(city,data.frame(city=Madrid,cod=2)) city-rbind(city,data.frame(city=Lisbon,cod=3)) city-rbind(city,data.frame(city=Milan,cod=4)) city-rbind(city,data.frame(city=London,cod=5)) travel-data.frame(pos=1,Source=1,Destine=2) travel-rbind(travel,data.frame(pos=1,Source=1,Destine=3)) travel-rbind(travel,data.frame(pos=2,Source=3,Destine=4)) travel-rbind(travel,data.frame(pos=3,Source=2,Destine=4)) travel-rbind(travel,data.frame(pos=4,Source=1,Destine=3)) travel$city - city$city[match(travel$Source, city$cod)] travel$city_destine - city$city[match(travel$Destine, city$cod)] travel pos Source Destine city city_destine 1 1 1 2 Barcelona Madrid 2 1 1 3 Barcelona Lisbon 3 2 3 4LisbonMilan 4 3 2 4MadridMilan 5 4 1 3 Barcelona Lisbon On Tue, Mar 27, 2012 at 12:15 PM, MSousa ricardosousa2...@clix.pt wrote: Hello, this code, works perfectly temp - merge(travel, city, by.x=Source, by.y=cod) result - merge(temp, city, by.x=Destine, by.y=cod) The problem was the construction of the data frame, had a parenthesis in city-rbind(city,data.frame(city=Lisbon,cod=3))), I tried to delete the post, but i don't could. As I have little experience in R, I still do some mistakes. I use read.table to load the data frame, the way in the post, it was quickly that i found to describe the problem. The forum has been a great help for me. Thanks -- View this message in context: http://r.789695.n4.nabble.com/copy-the-columns-based-on-the-code-tp4505253p4509340.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot of function seems to cut off near edge of domain
Ah, thanks. I am new to R and was unaware of the from/to parameters for the plot function. I thought xlim and ylim served that purpose. Thanks again! -Chad On Tue, Mar 27, 2012 at 3:31 AM, Matthieu Dubois matth...@gmail.com wrote: Dear Chad, your problem is linked to (1) the function returning NaNs from x values greater than 50, and (2) the fact that the function is estimated on a predefined number of points. Calling plot for a function object is basically a wrapper for curve(). Your function g() is evaluated on the whole xlim domain, which will return NaN values for x50 (Try g(60) ). In addition, curve() splits the x interval (here from 0 to 60) into a predifined number of points (n=101 is the default, see help(curve)) at which the function is estimated. In your code, the function is estimated at values x - seq(0, 60, length=101), and g(x) that are not NaN are plotted. The largest x value (from the sequence) that doesn't return a NaN is max(x[!is.nan(g(x))]), which is 49.8. One way to solve it is to explicitly specify the domain used to estimate the function, by using the from and to arguments that are passed to curve(): #Figure 2, with xlim beyond the radius of the circle plot(g,axes=F,from=0, to =50, xlim=c(0, 60), ylim=c(0,60)) axis(1,pos=0) axis(2,pos=0) HTH Matthieu Matthieu Dubois Post-doctoral researcher Psychology Department Université Libre de Bruxelles __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory Utilization on R
It's really not suggested etiquette to thread-jack, but generally, the more you can tell to read.table (particularly the colClasses, nrows, as.is, and stringsAsFactors arguments) the faster it will be able to read things by skipping various necessary checks. Michael On Tue, Mar 27, 2012 at 12:07 PM, Alekseiy Beloshitskiy abeloshits...@velti.com wrote: Guys, let me add my 5 coins into your interesting discussion. I have ~10Gb txt file with train data for my model. It has about 150 millions rows for 12 variables. When I load it into memory (just run only one row!): train-read.table(file=/training.txt) while loading it takes ~28Gb of RAM (It takes about 2hours to finish), and when data are loaded, rsession takes ~14Gb. I even can't imagine how much it will take when I will run svm train on this data set. Is there any optimization to decrease time required for loading data into memory. I use 32RAM x64 box. Thank you, -Alex From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of Kurinji Pandiyan [kurinji.pandi...@gmail.com] Sent: 27 March 2012 18:14 To: R. Michael Weylandt Cc: r-help@r-project.org Subject: Re: [R] Memory Utilization on R Thank you for the modified script! I have now tried on different datasets and it works very well and is dramatically faster than my original script! I really appreciate the help. Kurinji On Fri, Mar 23, 2012 at 1:33 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Taking a look at your script: there are a some potential optimizations you can do: # Fine poi - as.character(top.GSM396290) #5000 characters x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables # Pre-allocate the space x - vector(list, 485577) # x - list() # Do the a stuff once outside the loop so you aren't doing it 485577 times a - strsplit(as.character(x.data[, UCSC_REFGENE_NAME]), ;) # Lets use an apply statement instead of a for loop # vapply is the fastest since we prespecify the return type. x.data[vapply(a, function(x) any(poi %in% x), logical(1)), ] I think this will do what you wanted (and hopefully much faster) Note that you could probably tune this further but I think this strikes a good balance between clarity and performance (for now) Hope this helps, Michael On Fri, Mar 23, 2012 at 11:52 AM, Kurinji Pandiyan kurinji.pandi...@gmail.com wrote: Thank you for the input. As it were, I realized that my script is utilizing a lot more memory than I claimed - it was initially using 3 GB but has gone up to 20.24 active but 29.63 assigned to the R session. The script has run overnight and now I don't think it is active anymore since I keep getting the error message that I am out of startup disk space for application memory. I am attaching screen shots of my RAM usage distribution (given that there is no fluctuation in the usage by the R session I believe it is not running anymore) and of my available HD. Here is my script - poi - as.character(top.GSM396290) #5000 characters x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables head(x.data) x - list() for(i in 1:485577){ a - as.character(x.data[i, UCSC_REFGENE_NAME]) a - unlist(strsplit(a, ;)) if(any(poi %in% a) == TRUE) {x[[i]] - x.data[i,]} } # this step completed in a few hours x - do.call(rbind, x) # this step has been running overnight and is still stuck Thanks, I really appreciate the help. Kurinji On Thu, Mar 22, 2012 at 10:44 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Well... what makes you think you are hitting memory constraints then? If you have significantly less than 3GB of data, it shouldn't surprise you if R never needs more than 3GB of memory. You could just be running your scripts inefficiently...it's an extreme example, but all the memory and gigaflopping in the world can't speed this up (by much): for(i in seq_len(1e6)) Sys.sleep(10) Perhaps you should look into profiling tools or parallel computation...if you can post a representative example of your scripts, we might be able to give performance pointers. Michael On Fri, Mar 23, 2012 at 1:33 AM, Kurinji Pandiyan kurinji.pandi...@gmail.com wrote: Yes, I am. Thank you, Kurinji On Mar 22, 2012, at 10:27 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Use 64bit R? Michael On Thu, Mar 22, 2012 at 5:22 PM, Kurinji Pandiyan kurinji.pandi...@gmail.com wrote: Hello, I have a 32 GB RAM Mac Pro with a 2*2.4 GHz quad core processor and 2TB storage. Despite this having so much memory, I am not able to get R to utilize much more than 3 GBs. Some of my scripts take hours to run but I would think they would be much faster if more memory is utilized. How do I optimize the memory usage on R by my Mac Pro? Thank you! Kurinji
[R] What error distribution should I use?
I'm trying to make a glmm to identify the relationship between insect species richness with fragment size, isolation and time (different years). I already tried to analyse it using poisson distribution error, but I always face with the following warning: *glm.fit: fitted probabilities numerically 0 or 1 occurred * This is probably hapenning because my dataset has a lot of zeros. So, what error distribution should I use? -- *Lívia * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ignore error getting next result
Dear All, How do I ignore an error and still getting result of next iteration. I am trying to do wilcox.test on a loop, when the test fail, I would like to continue doing the next iteration and getting the p-value. I tried to do tryCatch or try but I cannot retrieve the p-value if the test is not fail. sample code: test2=list(numeric(0),c(10,20)); test1=list(c(1),c(1,2,3,4)); for (i in 1:2){ wtest=wilcox.test(test1[[i]],test2[[i]]) } i=1 will fail, I want to ignore this and get the pvalue for i=2. Thanks, Lin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lasso constraint
Hi, your code has errors: apply function only has 1 or 2 as margin. bound is used as turning parameter for summation of absolute coefficients. lasso runs on a grid of the turning parameter for varying strength of shrinkage. so each turning value may yield different sets of coefficients and values. cross validation is used to estimate the value of the turning parameter which gives the smallest errors (mse or deviance) on testing data. Weidong Gu On Tue, Mar 27, 2012 at 10:35 AM, yx78 yangx...@gmail.com wrote: In the package lasso2, there is a Prostate Data. To find coefficients in the prostate cancer example we could impose L1 constraint on the parameters. code is: data(Prostate) p.mean - apply(Prostate, 5,mean) pros - sweep(Prostate, 5, p.mean, -) p.std - apply(pros, 5, var) pros - sweep(pros, 5, sqrt(p.std),/) pros[, lpsa] - Prostate[, lpsa] l1ce(lpsa ~ . , pros, bound = 0.44) I can't figure out what dose 0.44 come from. On the paper it said it was from generalized cross-validation and it is the optimal choice. paper name: Regression Shrinkage and Selection via the Lasso author: Robert Tibshirani -- View this message in context: http://r.789695.n4.nabble.com/lasso-constraint-tp4508998p4508998.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.octave fails with data from Octave 3.2.X
Hi, I'm afraid that the function read.octave from package foreign has some problems with the ASCII data format exported by new versions of Octave (later than 3.2.X). It fails even for a simple case as: [Octave code:] octave:1 x=1; octave:2 save -ascii testdata.mat x [Now in R:] octavedata - read.octave('testdata.mat') Mensajes de aviso perdidos In read_octave_unknown(con, type) : cannot handle unknown type '' In this simple case I guess that the problem is that new versions Octave append two blank lines after each variable, and this confuses the current implementation of read.octave() The problem is worse if the saved variables include other types as structs, or strings. The new syntax of the MAT files is not recognized by read.octave(). Of course, it's always difficult to keep this kind of functions working when the external program changes its specification for saving variables, but if would be nice if the maintainers of foreign could at least solve the issue of blank lines. That way, it would still be possible to import simple data types as scalars and matrices. Otherwise, I suppose that a workaround is saving the data in binary (matlab) format, then load it with Octave 3.2.X, and save it in text format from that version. sessionInfo() R version 2.14.2 (2012-02-29) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C [5] LC_TIME=Spanish_Spain.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] foreign_0.8-49 -- Helios de Rosario Martínez Researcher INSTITUTO DE BIOMECÁNICA DE VALENCIA Universidad Politécnica de Valencia • Edificio 9C Camino de Vera s/n • 46022 VALENCIA (ESPAÑA) Tel. +34 96 387 91 60 • Fax +34 96 387 91 69 www.ibv.org Antes de imprimir este e-mail piense bien si es necesario hacerlo. En cumplimiento de la Ley Orgánica 15/1999 reguladora de la Protección de Datos de Carácter Personal, le informamos de que el presente mensaje contiene información confidencial, siendo para uso exclusivo del destinatario arriba indicado. En caso de no ser usted el destinatario del mismo le informamos que su recepción no le autoriza a su divulgación o reproducción por cualquier medio, debiendo destruirlo de inmediato, rogándole lo notifique al remitente. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lasso constraint
Inline: On Tue, Mar 27, 2012 at 10:00 AM, Weidong Gu anopheles...@gmail.com wrote: Hi, your code has errors: apply function only has 1 or 2 as margin. FALSE. Please re-read the Help files. It works as expected with arbitrary higher dim arrays. -- Bert bound is used as turning parameter for summation of absolute coefficients. lasso runs on a grid of the turning parameter for varying strength of shrinkage. so each turning value may yield different sets of coefficients and values. cross validation is used to estimate the value of the turning parameter which gives the smallest errors (mse or deviance) on testing data. Weidong Gu On Tue, Mar 27, 2012 at 10:35 AM, yx78 yangx...@gmail.com wrote: In the package lasso2, there is a Prostate Data. To find coefficients in the prostate cancer example we could impose L1 constraint on the parameters. code is: data(Prostate) p.mean - apply(Prostate, 5,mean) pros - sweep(Prostate, 5, p.mean, -) p.std - apply(pros, 5, var) pros - sweep(pros, 5, sqrt(p.std),/) pros[, lpsa] - Prostate[, lpsa] l1ce(lpsa ~ . , pros, bound = 0.44) I can't figure out what dose 0.44 come from. On the paper it said it was from generalized cross-validation and it is the optimal choice. paper name: Regression Shrinkage and Selection via the Lasso author: Robert Tibshirani -- View this message in context: http://r.789695.n4.nabble.com/lasso-constraint-tp4508998p4508998.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Supperscript, subscript and double lines in the main/subtitle and using greekletters
Sorry last message was not completed before sending Please below On Tue, Mar 27, 2012 at 5:36 PM, HJ YAN yhj...@googlemail.com wrote: Thank you very much Gerrit, for the nice hints! Just done some more googling and reaserches on this and trying to answering it myself... Below is the code that works for double lines (adopted from Gerrit's hints) and some of the formats (e.g. 1 and 3, but not 2 and 4) listed below: (1) \sigma^2 (2) \tau^{2s} (3) \mu_i (4) \pi_{2s} plot(1:3, ylab = expression(Superscript in greek letters ( * mu^2 ~ m)) , xlab = expression(Subscript in greek letters ~ mu[2]* ~ pi) , main = expression(atop(Happy Easter ,to all R-Helpers))) For using greek letters, am still a bit confused when needing a * though...e.g. seems it needs a * in front of greek letter expressions, when applying 'expression (...)'. And a * seems not required when a greek letter is needed outside the double quotations, e.g. when applying just 'expression(...)'. Again, a * is needed when making subscript as shown above... It seems ~ is reserved for making spaces before/between greek letters. What if we need ~ in the title as ~ is a standard notation in statistics when expressing is from when writing down a distribution, e.g. 'X~N(0,1)'... HJ On Tue, Mar 27, 2012 at 2:39 PM, Gerrit Eichner gerrit.eich...@math.uni-giessen.de wrote: Hi, HJ, see ?plotmath Hth -- Gerrit --**--**- Dr. Gerrit Eichner Mathematical Institute, Room 212 gerrit.eich...@math.uni-**giessen.de gerrit.eich...@math.uni-giessen.de Justus-Liebig-University Giessen Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/**eichnerhttp://www.uni-giessen.de/cms/eichner --**--**- On Tue, 27 Mar 2012, HJ YAN wrote: Dear R-help, I am trying to express myself as best as I can here. If you also use Latex to edit math reports or other languages with similar editing method, you'll see what I'm talking about. My sincere appologies if my question is not clear enough to some extend, as also I'm not able to provide my code here because I don`t know which one I can use... When editing the title in R plots, such as using 'plot', or 'xyplot' in 'lattic', what method do you use to write greek letters and make use of superscript and subscript, e.g. to write mathematical expressions like using Latex: \sigma^2 \tau^{2s} \mu_i \pi_{2s} Also I would like to learn how to make two lines in the main title or sub title if the text I need it too long for putting in a single line, e.g. are there some R code/syntax allowing me to do something like in Latex to make two lines in the title, for example using '//' or '\\' to seperate the two parts of the text I want to put in two lines?? I heard about using something like plot(x,y, main=expression()) but from neither '?plot' or '?expression' could I find comprehensive information about what I need... Many thanks! HJ [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rgdal package - get information
Hi, I used GDALinfo(MOD13Q1.A2001049.h13v11.005.2007002215512.250m_16_days_EVI.tif) and got the results: rows10 columns 11 bands 1 origin.x150701.4 origin.y7744897 res.x 250 res.y 250 ysign -1 oblique.x 0 oblique.y 0 driver GTiff projection +proj=utm +zone=23 +south +datum=WGS84 +units=m +no_defs file /MOD13Q1.A2001049.h13v11.005.2007002215512.250m_16_days_EVI.tif apparent band summary: *GDType* Bmin Bmax Bmean Bsd hasNoDataValue NoDataValue 1 *Int16* -32768 32767 0 0 FALSE 0 Metadata: AREA_OR_POINT=Point TIFFTAG_SOFTWARE=MODIS Reprojection Tool v4.1 March 2009 *How to read the information GDType?* Thanks, julio [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] installing R 2.14.2
Hello,I am trying to install a newer version of R (R 2.14.2) from this linkhttp://cran.r-project.org/bin/macosx/ However I am getting an error that it can not be installed on my computer. My Mac is version 10.6.8. Can you please advise me what the problem. I need the newer version to install the ggm package. Thanks, Heba [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ignore error getting next result
On Mar 27, 2012, at 12:56 PM, C Lin wrote: Dear All, How do I ignore an error and still getting result of next iteration. I am trying to do wilcox.test on a loop, when the test fail, I would like to continue doing the next iteration and getting the p-value. I tried to do tryCatch or try but I cannot retrieve the p-value if the test is not fail. sample code: test2=list(numeric(0),c(10,20)); test1=list(c(1),c(1,2,3,4)); for (i in 1:2){ wtest=wilcox.test(test1[[i]],test2[[i]]) } i=1 will fail, I want to ignore this and get the pvalue for i=2. Please read the FAQ entry And you would be advise to read through the rest of the FAQ as well. http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-capture-or-ignore-errors-in-a-long-simulation_003f Thanks, Lin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help on predict.lm
Hello, I'm new here, but will try to be as specific and complete as possible. I'm trying to use âlmâ to first estimate parameter values from a set of calibration measurements, and then later to use those estimates to calculate another set of values with âpredict.lmâ. First I have a calibration dataset of absorbance values measured from standard solutions with known concentration of Bromide: stds abs conc 1 -0.00210 2 0.1003 200 3 0.2395 500 4 0.3293 800 On this small calibration series, I perform a linear regression to find the parameter estimates of the relationship between absorbance (abs) and concentration (conc): linear1 - lm(abs~conc, data=stds) summary(linear1) Call: lm(formula = abs ~ conc, data = stds) Residuals: 1 2 3 4 -0.012600 0.006467 0.020667 -0.014533 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1.050e-02 1.629e-02 0.645 0.58527 conc4.167e-04 3.378e-05 12.333 0.00651 ** --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 Residual standard error: 0.02048 on 2 degrees of freedom Multiple R-squared: 0.987, Adjusted R-squared: 0.9805 F-statistic: 152.1 on 1 and 2 DF, p-value: 0.00651 Now I come with another dataset, which contains measured absorbance values of Bromide in solution: brom hours abs 1-1.0 0.0633 2 1.0 0.2686 3 5.0 0.2446 418.0 0.2274 529.0 0.2091 642.0 0.1961 753.0 0.1310 876.0 0.1504 991.0 0.1317 10 95.5 0.1169 11 101.0 0.0977 12 115.0 0.1023 13 123.5 0.0879 14 138.5 0.0724 15 147.5 0.0564 16 163.0 0.0495 17 171.0 0.0325 18 189.0 0.0182 19 211.0 0.0047 20 212.5 NA 21 815.5 -0.2112 22 816.5 -0.1896 23 817.5 -0.0783 24 818.5 0.2963 25 819.5 0.1448 26 839.5 0.0936 27 864.0 0.0560 28 888.0 0.0310 29 960.5 0.0056 30 1009.0 -0.0163 The values in column brom$abs, measured on 30 subsequent points in time need to be calculated to Bromide concentrations, using the previously established relationship âlinear1â. At first, I thought it could be done by: predict.lm(linear1, brom$abs) Error in eval(predvars, data, env) : numeric 'envir' arg not of length one But, R gives the above error message. Then, after some searching around on different fora and R-communities (including this one), I learned that the ânewdataâ in âpredict.lmâ actually needs to be coerced into a separate dataframe. Thus: mabs - data.frame(Abs = brom$abs) predict.lm(linear1, mabs) Error in eval(expr, envir, enclos) : object 'conc' not found Again, R gives an error...probably because I made an error, but I truly fail to see where. I hope somebody can explain to me clearly what I'm doing wrong and what I should do to instead. Any help is greatly appreciated, thanks ! -- View this message in context: http://r.789695.n4.nabble.com/Help-on-predict-lm-tp4509586p4509586.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lasso constraint
Hi, On Tue, Mar 27, 2012 at 10:35 AM, yx78 yangx...@gmail.com wrote: In the package lasso2, there is a Prostate Data. To find coefficients in the prostate cancer example we could impose L1 constraint on the parameters. code is: data(Prostate) p.mean - apply(Prostate, 5,mean) pros - sweep(Prostate, 5, p.mean, -) p.std - apply(pros, 5, var) pros - sweep(pros, 5, sqrt(p.std),/) pros[, lpsa] - Prostate[, lpsa] l1ce(lpsa ~ . , pros, bound = 0.44) I can't figure out what dose 0.44 come from. On the paper it said it was from generalized cross-validation and it is the optimal choice. Yes, this is exactly how the optimal value for bound would be found. Using the lasso2 package, you'll likely have to do a grid search over possible values for `bound` in a cross validation setting and you pick the one that fits the model best on the held out data over all your CV folds. If I were you, I'd use the glmnet package since it can calculate the entire regularization path w/o having to do a grid search over the bound (or lamda), making cross validation easier. If you're confused about how you might use cross validation to find the optimal value of the parameter(s) of the model you are building, then it's time to pull yourself away from the keyboaRd and start doing some reading, or (as Bert will likely tell you) consult your local statistician. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] readHTLMTable help
Hello to everyone. I´m using this function to download some information from a website. This is the URL: http://164.77.222.61/climatologia/php/vientoMaximo8.php?IdEstacion=330007FechaIni=01-1-1980 If you go to that website you´ll find a table with meteorological information. One column is called Intesidad Máxima Diaria, and that is the one i need. I´ve been traying to extract that column, but I´m unable to do it. First I tryed simple to download the complete table and then do some kind of filter to extract the column but, for some reason when I call the function a-readHTLMTable(url), the table is downloaded in a unfriendly format and I can not differentiate the column If anyone could help me I´ll appreciate it. Thank you. Lucas. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Convert day of year back into a date format.
Hello, I am having trouble figuring out how to convert a Day of Year integer back into a Date format. For example I have the following: date - c('2008-01-01','2008-01-02','2008-01-03','2008-01-04','2008-01-05','2008-01-06','2008-01-07', '2008-01-08','2008-01-09','2008-01-10','2008-01-11','2008-01-12','2008-01-13','2008-01-14','2008-01-15', '2008-01-16','2008-01-17','2008-01-18','2008-01-19','2008-01-20','2008-01-21','2008-01-22','2008-01-23') ## this is then converted into a number corresponding to the day of the year like so: dayofyear - strptime(date, format=%Y-%m-%d)$yday + 1 ## Now my question is how do I get back to a date format (obviously omitting the year). ## The end result is that I'd like to be able to have axis labels as something like Month-Day or just Month ## instead of just an integers which isn't always intuitive for people but I can't seem to figure out how to tell R ## to recognize an integer as a date. Any suggestions? Many thanks in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ignore error getting next result
As a matter of fact, I did read the FAQ. However, in the FAQ coef() is used to return the coefficients of lm() if it succeeded. I cannot find similar function for pvalue. CC: r-help@r-project.org From: dwinsem...@comcast.net To: bac...@hotmail.com Subject: Re: [R] ignore error getting next result Date: Tue, 27 Mar 2012 13:40:39 -0400 On Mar 27, 2012, at 12:56 PM, C Lin wrote: Dear All, How do I ignore an error and still getting result of next iteration. I am trying to do wilcox.test on a loop, when the test fail, I would like to continue doing the next iteration and getting the p-value. I tried to do tryCatch or try but I cannot retrieve the p-value if the test is not fail. sample code: test2=list(numeric(0),c(10,20)); test1=list(c(1),c(1,2,3,4)); for (i in 1:2){ wtest=wilcox.test(test1[[i]],test2[[i]]) } i=1 will fail, I want to ignore this and get the pvalue for i=2. Please read the FAQ entry And you would be advise to read through the rest of the FAQ as well. http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-capture-or-ignore-errors-in-a-long-simulation_003f Thanks, Lin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on predict.lm
On 27-03-2012, at 19:24, Nederjaard wrote: Hello, I'm new here, but will try to be as specific and complete as possible. I'm trying to use “lm“ to first estimate parameter values from a set of calibration measurements, and then later to use those estimates to calculate another set of values with “predict.lm”. First I have a calibration dataset of absorbance values measured from standard solutions with known concentration of Bromide: stds abs conc 1 -0.00210 2 0.1003 200 3 0.2395 500 4 0.3293 800 On this small calibration series, I perform a linear regression to find the parameter estimates of the relationship between absorbance (abs) and concentration (conc): linear1 - lm(abs~conc, data=stds) summary(linear1) Call: lm(formula = abs ~ conc, data = stds) Residuals: 1 2 3 4 -0.012600 0.006467 0.020667 -0.014533 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1.050e-02 1.629e-02 0.645 0.58527 conc4.167e-04 3.378e-05 12.333 0.00651 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.02048 on 2 degrees of freedom Multiple R-squared: 0.987, Adjusted R-squared: 0.9805 F-statistic: 152.1 on 1 and 2 DF, p-value: 0.00651 Now I come with another dataset, which contains measured absorbance values of Bromide in solution: brom hours abs 1-1.0 0.0633 2 1.0 0.2686 3 5.0 0.2446 418.0 0.2274 529.0 0.2091 642.0 0.1961 753.0 0.1310 876.0 0.1504 991.0 0.1317 10 95.5 0.1169 11 101.0 0.0977 12 115.0 0.1023 13 123.5 0.0879 14 138.5 0.0724 15 147.5 0.0564 16 163.0 0.0495 17 171.0 0.0325 18 189.0 0.0182 19 211.0 0.0047 20 212.5 NA 21 815.5 -0.2112 22 816.5 -0.1896 23 817.5 -0.0783 24 818.5 0.2963 25 819.5 0.1448 26 839.5 0.0936 27 864.0 0.0560 28 888.0 0.0310 29 960.5 0.0056 30 1009.0 -0.0163 The values in column brom$abs, measured on 30 subsequent points in time need to be calculated to Bromide concentrations, using the previously established relationship “linear1”. At first, I thought it could be done by: predict.lm(linear1, brom$abs) Error in eval(predvars, data, env) : numeric 'envir' arg not of length one But, R gives the above error message. Then, after some searching around on different fora and R-communities (including this one), I learned that the “newdata” in “predict.lm” actually needs to be coerced into a separate dataframe. Thus: mabs - data.frame(Abs = brom$abs) predict.lm(linear1, mabs) Error in eval(expr, envir, enclos) : object 'conc' not found There is no column with name conc in your dataframe mabs. You regressed abs on conc. For prediction you need data for conc and not abs. So provide data for conc. Or change the regression around: lm(conc ~ abs, data=stds) if that makes any sense. What you did with mabs wouldn't have worked anyway because Abs is not the same as abs. And it wasn't necessary. Berend Again, R gives an error...probably because I made an error, but I truly fail to see where. I hope somebody can explain to me clearly what I'm doing wrong and what I should do to instead. Any help is greatly appreciated, thanks ! -- View this message in context: http://r.789695.n4.nabble.com/Help-on-predict-lm-tp4509586p4509586.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ignore error getting next result
On Mar 27, 2012, at 2:18 PM, C Lin wrote: As a matter of fact, I did read the FAQ. However, in the FAQ coef() is used to return the coefficients of lm() if it succeeded. I cannot find similar function for pvalue. So your question has nothing to do with the subject line? If you are trying to get information about the object returned by the wilcox.test function, then you should be looking at the help page in the Value section for that function. -- David. CC: r-help@r-project.org From: dwinsem...@comcast.net To: bac...@hotmail.com Subject: Re: [R] ignore error getting next result Date: Tue, 27 Mar 2012 13:40:39 -0400 On Mar 27, 2012, at 12:56 PM, C Lin wrote: Dear All, How do I ignore an error and still getting result of next iteration. I am trying to do wilcox.test on a loop, when the test fail, I would like to continue doing the next iteration and getting the p-value. I tried to do tryCatch or try but I cannot retrieve the p-value if the test is not fail. sample code: test2=list(numeric(0),c(10,20)); test1=list(c(1),c(1,2,3,4)); for (i in 1:2){ wtest=wilcox.test(test1[[i]],test2[[i]]) } i=1 will fail, I want to ignore this and get the pvalue for i=2. Please read the FAQ entry And you would be advise to read through the rest of the FAQ as well. http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-capture-or-ignore-errors-in-a-long-simulation_003f Thanks, Lin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert day of year back into a date format.
There may very well be a better solution, but this works. format(strptime(dayofyear, format=%j), format=%m-%d) On Tue, Mar 27, 2012 at 11:12 AM, Sam Albers tonightstheni...@gmail.comwrote: Hello, I am having trouble figuring out how to convert a Day of Year integer back into a Date format. For example I have the following: date - c('2008-01-01','2008-01-02','2008-01-03','2008-01-04','2008-01-05','2008-01-06','2008-01-07', '2008-01-08','2008-01-09','2008-01-10','2008-01-11','2008-01-12','2008-01-13','2008-01-14','2008-01-15', '2008-01-16','2008-01-17','2008-01-18','2008-01-19','2008-01-20','2008-01-21','2008-01-22','2008-01-23') ## this is then converted into a number corresponding to the day of the year like so: dayofyear - strptime(date, format=%Y-%m-%d)$yday + 1 ## Now my question is how do I get back to a date format (obviously omitting the year). ## The end result is that I'd like to be able to have axis labels as something like Month-Day or just Month ## instead of just an integers which isn't always intuitive for people but I can't seem to figure out how to tell R ## to recognize an integer as a date. Any suggestions? Many thanks in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] installing R 2.14.2
Hi, On Tue, Mar 27, 2012 at 1:03 PM, Heba S abehsun...@hotmail.com wrote: Hello,I am trying to install a newer version of R (R 2.14.2) from this linkhttp://cran.r-project.org/bin/macosx/ However I am getting an error that it can not be installed on my computer. My Mac is version 10.6.8. Can you please advise me what the problem. I need the newer version to install the ggm package. If you want any meaningful help, you'll have to provide the exact error that you're getting, so please reproduce the error message (verbatim) in your follow up email. Also let us know when during the installation process the error occurs. Thanks, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ignore error getting next result
I'm sorry. I do appreciate you are trying to help. However, what I am trying to do is not exactly the same as in FAQ. If I do the following: test2=list(numeric(0),c(10,20)); test1=list(c(1),c(1,2,3,4)); for (i in 1:2){ tryCatch(wilcox.test(test1[[i]],test2[[i]]),error = function(e) NULL); } I cannot get the p-value of the test for i=2. any other input? anyone? Thanks, Lin CC: r-help@r-project.org From: dwinsem...@comcast.net To: bac...@hotmail.com Subject: Re: [R] ignore error getting next result Date: Tue, 27 Mar 2012 14:26:40 -0400 On Mar 27, 2012, at 2:18 PM, C Lin wrote: As a matter of fact, I did read the FAQ. However, in the FAQ coef() is used to return the coefficients of lm() if it succeeded. I cannot find similar function for pvalue. So your question has nothing to do with the subject line? If you are trying to get information about the object returned by the wilcox.test function, then you should be looking at the help page in the Value section for that function. -- David. CC: r-help@r-project.org From: dwinsem...@comcast.net To: bac...@hotmail.com Subject: Re: [R] ignore error getting next result Date: Tue, 27 Mar 2012 13:40:39 -0400 On Mar 27, 2012, at 12:56 PM, C Lin wrote: Dear All, How do I ignore an error and still getting result of next iteration. I am trying to do wilcox.test on a loop, when the test fail, I would like to continue doing the next iteration and getting the p-value. I tried to do tryCatch or try but I cannot retrieve the p-value if the test is not fail. sample code: test2=list(numeric(0),c(10,20)); test1=list(c(1),c(1,2,3,4)); for (i in 1:2){ wtest=wilcox.test(test1[[i]],test2[[i]]) } i=1 will fail, I want to ignore this and get the pvalue for i=2. Please read the FAQ entry And you would be advise to read through the rest of the FAQ as well. http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-capture-or-ignore-errors-in-a-long-simulation_003f Thanks, Lin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ignore error getting next result
On Mar 27, 2012, at 2:36 PM, C Lin wrote: I'm sorry. I do appreciate you are trying to help. However, what I am trying to do is not exactly the same as in FAQ. If I do the following: test2=list(numeric(0),c(10,20)); test1=list(c(1),c(1,2,3,4)); for (i in 1:2){ tryCatch(wilcox.test(test1[[i]],test2[[i]]),error = function(e) NULL); } I cannot get the p-value of the test for i=2. I say again READ THE HELP PAGE FOR wilcox.test (and I even suggested the section where you would find the answer.) test2=list(numeric(0),c(10,20)); test1=list(c(1),c(1,2,3,4));res - list() for (i in 1:2){ res - tryCatch(wilcox.test(test1[[i]],test2[[i]])$p.value, error = function(e) NULL); } res -- David any other input? anyone? Thanks, Lin CC: r-help@r-project.org From: dwinsem...@comcast.net To: bac...@hotmail.com Subject: Re: [R] ignore error getting next result Date: Tue, 27 Mar 2012 14:26:40 -0400 On Mar 27, 2012, at 2:18 PM, C Lin wrote: As a matter of fact, I did read the FAQ. However, in the FAQ coef() is used to return the coefficients of lm() if it succeeded. I cannot find similar function for pvalue. So your question has nothing to do with the subject line? If you are trying to get information about the object returned by the wilcox.test function, then you should be looking at the help page in the Value section for that function. -- David. CC: r-help@r-project.org From: dwinsem...@comcast.net To: bac...@hotmail.com Subject: Re: [R] ignore error getting next result Date: Tue, 27 Mar 2012 13:40:39 -0400 On Mar 27, 2012, at 12:56 PM, C Lin wrote: Dear All, How do I ignore an error and still getting result of next iteration. I am trying to do wilcox.test on a loop, when the test fail, I would like to continue doing the next iteration and getting the p-value. I tried to do tryCatch or try but I cannot retrieve the p-value if the test is not fail. sample code: test2=list(numeric(0),c(10,20)); test1=list(c(1),c(1,2,3,4)); for (i in 1:2){ wtest=wilcox.test(test1[[i]],test2[[i]]) } i=1 will fail, I want to ignore this and get the pvalue for i=2. Please read the FAQ entry And you would be advise to read through the rest of the FAQ as well. http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-capture-or-ignore-errors-in-a-long-simulation_003f Thanks, Lin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on predict.lm
R tries hard to keep you from committing scientific abuse. As stated, your problem seems to me akin to 1. Given that a man's age can be modelled as a function of the grayness of his hair, 2. predict a man's age from the temperature in Barcelona. Your calibration relates 'abs' and 'conc'. Now you want to predict 'abs' from _'hours'_ (I think). I suspect that concentration is actually related to time and this is the missing link that you'll have to provide. BTW, I'm surprised that you didn't find the requirement for 'newdata' to be a data frame on the predict.lm help page - it's pretty clearly stated there. Peter Ehlers On 2012-03-27 10:24, Nederjaard wrote: Hello, I'm new here, but will try to be as specific and complete as possible. I'm trying to use “lm“ to first estimate parameter values from a set of calibration measurements, and then later to use those estimates to calculate another set of values with “predict.lm”. First I have a calibration dataset of absorbance values measured from standard solutions with known concentration of Bromide: stds abs conc 1 -0.00210 2 0.1003 200 3 0.2395 500 4 0.3293 800 On this small calibration series, I perform a linear regression to find the parameter estimates of the relationship between absorbance (abs) and concentration (conc): linear1- lm(abs~conc, data=stds) summary(linear1) Call: lm(formula = abs ~ conc, data = stds) Residuals: 1 2 3 4 -0.012600 0.006467 0.020667 -0.014533 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1.050e-02 1.629e-02 0.645 0.58527 conc4.167e-04 3.378e-05 12.333 0.00651 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.02048 on 2 degrees of freedom Multiple R-squared: 0.987, Adjusted R-squared: 0.9805 F-statistic: 152.1 on 1 and 2 DF, p-value: 0.00651 Now I come with another dataset, which contains measured absorbance values of Bromide in solution: brom hours abs 1-1.0 0.0633 2 1.0 0.2686 3 5.0 0.2446 418.0 0.2274 529.0 0.2091 642.0 0.1961 753.0 0.1310 876.0 0.1504 991.0 0.1317 10 95.5 0.1169 11 101.0 0.0977 12 115.0 0.1023 13 123.5 0.0879 14 138.5 0.0724 15 147.5 0.0564 16 163.0 0.0495 17 171.0 0.0325 18 189.0 0.0182 19 211.0 0.0047 20 212.5 NA 21 815.5 -0.2112 22 816.5 -0.1896 23 817.5 -0.0783 24 818.5 0.2963 25 819.5 0.1448 26 839.5 0.0936 27 864.0 0.0560 28 888.0 0.0310 29 960.5 0.0056 30 1009.0 -0.0163 The values in column brom$abs, measured on 30 subsequent points in time need to be calculated to Bromide concentrations, using the previously established relationship “linear1”. At first, I thought it could be done by: predict.lm(linear1, brom$abs) Error in eval(predvars, data, env) : numeric 'envir' arg not of length one But, R gives the above error message. Then, after some searching around on different fora and R-communities (including this one), I learned that the “newdata” in “predict.lm” actually needs to be coerced into a separate dataframe. Thus: mabs- data.frame(Abs = brom$abs) predict.lm(linear1, mabs) Error in eval(expr, envir, enclos) : object 'conc' not found Again, R gives an error...probably because I made an error, but I truly fail to see where. I hope somebody can explain to me clearly what I'm doing wrong and what I should do to instead. Any help is greatly appreciated, thanks ! -- View this message in context: http://r.789695.n4.nabble.com/Help-on-predict-lm-tp4509586p4509586.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM. How to use categorical attributes?
Hi, On Tue, Mar 27, 2012 at 6:05 AM, Alekseiy Beloshitskiy abeloshits...@velti.com wrote: Hi All, Here is the case. I want to build classification model (SVM). Some of variables for this model are categorical attributes which represent words (usually 3-10 words - query for search in google). For example: search_id | query_words |..| result ---+--+--+ 1 | how,to,grow,tree |..| 4 2 | smartfone,htc,buy,price |..| 7 3 | buy,house,realty,london |..| 6 4 | where,to,go,weekend,cinema |..| 4 ... As you can see, words in the query are disordered and may occur in different queries. Total number of unique words for all queries is several thousands. The question is how to represent this variable (query_words) to use for SVM. Thank you for any advices! One approach is to wire up a bag of words type of design matrix. That is to say the matrix has as many columns as there are unique words. Each row is an observation (query), and the words that appear in the query have a value of 1 (or you can count the number of times each word appears). You can maybe get smarter and try to group like words together, but ... now you'll have two problems ... Hope you have lots of data! -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on predict.lm
FORTUNE!!! -- Bert On Tue, Mar 27, 2012 at 11:44 AM, Peter Ehlers ehl...@ucalgary.ca wrote: R tries hard to keep you from committing scientific abuse. As stated, your problem seems to me akin to 1. Given that a man's age can be modelled as a function of the grayness of his hair, 2. predict a man's age from the temperature in Barcelona. ... Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ignore error getting next result
test2=list(numeric(0),c(10,20)); test1=list(c(1),c(1,2,3,4)); for (i in 1:2){ tryCatch(wilcox.test(test1[[i]],test2[[i]]),error = function(e) NULL); } I cannot get the p-value of the test for i=2. You didn't store the results of wilcox.test anywhere. First make it work for data that does not cause errors in wilcox.test: f0 - function(list1, list2) { stopifnot(length(list1) == length(list2)) sapply(seq_along(list1), function(i) wilcox.test(list1[[i]], list2[[i]])$p.value) } f0( list(1:4, 5:7), list(11:12, (4:6)+.9)) [1] 0.133 0.700 Then add the call to tryCatch so it works when there is a problem. I use NA instead of NULL as the output of the error function so it goes into the vector of p.values. Use NULL if you are returning the whole output of wilcox.test instead of just the p.value component. f1 - function(list1, list2) { stopifnot(length(list1) == length(list2)) sapply(seq_along(list1), function(i)tryCatch( wilcox.test(list1[[i]], list2[[i]])$p.value, error=function(e)NA_real_)) } f1( list(1:4, 5:7), list(11:12, (4:6)+.9)) [1] 0.133 0.700 f1( list(1:4, numeric(0), 5:7), list(11:12, 17, (4:6)+.9)) [1] 0.133NA 0.700 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of C Lin Sent: Tuesday, March 27, 2012 11:36 AM To: dwinsem...@comcast.net Cc: r-help@r-project.org Subject: Re: [R] ignore error getting next result I'm sorry. I do appreciate you are trying to help. However, what I am trying to do is not exactly the same as in FAQ. If I do the following: test2=list(numeric(0),c(10,20)); test1=list(c(1),c(1,2,3,4)); for (i in 1:2){ tryCatch(wilcox.test(test1[[i]],test2[[i]]),error = function(e) NULL); } I cannot get the p-value of the test for i=2. any other input? anyone? Thanks, Lin CC: r-help@r-project.org From: dwinsem...@comcast.net To: bac...@hotmail.com Subject: Re: [R] ignore error getting next result Date: Tue, 27 Mar 2012 14:26:40 -0400 On Mar 27, 2012, at 2:18 PM, C Lin wrote: As a matter of fact, I did read the FAQ. However, in the FAQ coef() is used to return the coefficients of lm() if it succeeded. I cannot find similar function for pvalue. So your question has nothing to do with the subject line? If you are trying to get information about the object returned by the wilcox.test function, then you should be looking at the help page in the Value section for that function. -- David. CC: r-help@r-project.org From: dwinsem...@comcast.net To: bac...@hotmail.com Subject: Re: [R] ignore error getting next result Date: Tue, 27 Mar 2012 13:40:39 -0400 On Mar 27, 2012, at 12:56 PM, C Lin wrote: Dear All, How do I ignore an error and still getting result of next iteration. I am trying to do wilcox.test on a loop, when the test fail, I would like to continue doing the next iteration and getting the p-value. I tried to do tryCatch or try but I cannot retrieve the p-value if the test is not fail. sample code: test2=list(numeric(0),c(10,20)); test1=list(c(1),c(1,2,3,4)); for (i in 1:2){ wtest=wilcox.test(test1[[i]],test2[[i]]) } i=1 will fail, I want to ignore this and get the pvalue for i=2. Please read the FAQ entry And you would be advise to read through the rest of the FAQ as well. http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-capture-or-ignore-errors- in-a-long-simulation_003f Thanks, Lin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on predict.lm
R tries hard to keep you from committing scientific abuse. As stated, your problem seems to me akin to 1. Given that a man's age can be modelled as a function of the grayness of his hair, 2. predict a man's age from the temperature in Barcelona. Your calibration relates 'abs' and 'conc'. Now you want to predict 'abs' from 'hours' (I think). I suspect that concentration is actually related to time and this is the missing link that BTW, I'm surprised that you didn't find the requirement for 'newdata' to be a data frame on the predict.lm help page - it's pretty clearly stated there. Peter Ehlers On 2012-03-27 10:24, Nederjaard wrote: Hello, I'm new here, but will try to be as specific and complete as possible. I'm trying to use lm to first estimate parameter values from a set of calibration measurements, and then later to use those estimates to calculate another set of values with predict.lm. First I have a calibration dataset of absorbance values measured from standard solutions with known concentration of Bromide: stds abs conc 1 -0.00210 2 0.1003 200 3 0.2395 500 4 0.3293 800 On this small calibration series, I perform a linear regression to find the parameter estimates of the relationship between absorbance (abs) and concentration (conc): linear1- lm(abs~conc, data=stds) summary(linear1) Call: lm(formula = abs ~ conc, data = stds) Residuals: 1 2 3 4 -0.012600 0.006467 0.020667 -0.014533 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1.050e-02 1.629e-02 0.645 0.58527 conc4.167e-04 3.378e-05 12.333 0.00651 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.02048 on 2 degrees of freedom Multiple R-squared: 0.987, Adjusted R-squared: 0.9805 F-statistic: 152.1 on 1 and 2 DF, p-value: 0.00651 Now I come with another dataset, which contains measured absorbance values of Bromide in solution: brom hours abs 1-1.0 0.0633 2 1.0 0.2686 3 5.0 0.2446 418.0 0.2274 529.0 0.2091 642.0 0.1961 753.0 0.1310 876.0 0.1504 991.0 0.1317 10 95.5 0.1169 11 101.0 0.0977 12 115.0 0.1023 13 123.5 0.0879 14 138.5 0.0724 15 147.5 0.0564 16 163.0 0.0495 17 171.0 0.0325 18 189.0 0.0182 19 211.0 0.0047 20 212.5 NA 21 815.5 -0.2112 22 816.5 -0.1896 23 817.5 -0.0783 24 818.5 0.2963 25 819.5 0.1448 26 839.5 0.0936 27 864.0 0.0560 28 888.0 0.0310 29 960.5 0.0056 30 1009.0 -0.0163 The values in column brom$abs, measured on 30 subsequent points in time need to be calculated to Bromide concentrations, using the previously established relationship linear1. At first, I thought it could be done by: predict.lm(linear1, brom$abs) Error in eval(predvars, data, env) : numeric 'envir' arg not of length one But, R gives the above error message. Then, after some searching around on different fora and R-communities (including this one), I learned that the newdata in predict.lm actually needs to be coerced into a separate dataframe. Thus: mabs- data.frame(Abs = brom$abs) predict.lm(linear1, mabs) Error in eval(expr, envir, enclos) : object 'conc' not found Again, R gives an error...probably because I made an error, but I truly fail to see where. I hope somebody can explain to me clearly what I'm doing wrong and what I should do to instead. Any help is greatly appreciated, thanks ! -- View this message in context: http://r.789695.n4.nabble.com/Help-on-predict-lm-tp4509586p4509586.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ZAGA predictions in GAMLSS
Hello, I am modelling positive continuous data (including zeros) using the ZAGA distribution in GAMLSS and want to use the model for predictions. My final model includes smoothers (pb()) for the mu and nu parameter. First, I blindly used the default options for predictions but noticed that I do not have any zero values (or close to). Knowing this cannot be true, I learned that I also need the predictions for the other parameters (and not only mu as done by default), which I can extract e.g. with predictAll. My question is, how to combine all parameter values to calculate the expected value for one observation. It seems the function 'meanZAGA' does what I want, however not for new data. I tried to calculate the values I received with meanZAGA by hand in order to repeat it for predictions with new data but do not understand how to do it. I would appreciate any advise. Thank you very very much! Cheers, Astrid -- Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.octave fails with data from Octave 3.2.X
I wrote in my previous message the following Octave code: [Octave code:] octave:1 x=1; octave:2 save -ascii testdata.mat x Forget the -ascii. It should be -text or nothing (-text is the default). By the way, read.octave() does not really fail (it does return a value), but the result is somewhat corrupted: it contains the exported x variable, plus other empty elements corresponding to the blank lines, I think. Helios INSTITUTO DE BIOMECÁNICA DE VALENCIA Universidad Politécnica de Valencia • Edificio 9C Camino de Vera s/n • 46022 VALENCIA (ESPAÑA) Tel. +34 96 387 91 60 • Fax +34 96 387 91 69 www.ibv.org Antes de imprimir este e-mail piense bien si es necesario hacerlo. En cumplimiento de la Ley Orgánica 15/1999 reguladora de la Protección de Datos de Carácter Personal, le informamos de que el presente mensaje contiene información confidencial, siendo para uso exclusivo del destinatario arriba indicado. En caso de no ser usted el destinatario del mismo le informamos que su recepción no le autoriza a su divulgación o reproducción por cualquier medio, debiendo destruirlo de inmediato, rogándole lo notifique al remitente. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting patient drug timelines using ggplot2 (or some other means) -- Help!!!
Hello Dr. Winsemius, Not sure how or if the use of NAs you describe applies to my case. I'll go back to this again when the ggplot2 book arrives. It may be that this will provide a helpful insight then. Thanks, Paul --- On Fri, 3/23/12, David Winsemius dwinsem...@comcast.net wrote: From: David Winsemius dwinsem...@comcast.net Subject: Re: [R] Plotting patient drug timelines using ggplot2 (or some other means) -- Help!!! To: Paul Miller pjmiller...@yahoo.com Cc: R. Michael Weylandt michael.weyla...@gmail.com, Petr PIKAL petr.pi...@precheza.cz, Bert Gunter gunter.ber...@gene.com, r-help@r-project.org Received: Friday, March 23, 2012, 1:23 PM On Mar 23, 2012, at 2:15 PM, Paul Miller wrote: Hi Michael and Petr, Apologize for my failure to grasp what you were saying. My code is up and running now. Noticed what might be a shortcoming of my ggplot code. I have some instances where a drug starts and stops and then starts and stops again. It looks like my graphs show just a single unbroken line segment though. Put in NA entries at times you do not want plotted. Not sure exactly how that gets handled in ggplot but since plotting nothing was the usual behavior in base and lattice graphics, I would think that would have gotten carried over. I ordered Hadley Wickham's ggplot2 book earlier today. So hopefully I'll be able to figure that out myself once the book arrives. Thank you Michael, Petr, and Bert for your help with this. Thanks especially to Michael for patiently answering all my questions over the last day or so. Paul David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert day of year back into a date format.
On 27/03/2012 19:30, Justin Haynes wrote: There may very well be a better solution, but this works. format(strptime(dayofyear, format=%j), format=%m-%d) The answer depends on the year (think leap years), so I think you need strptime(paste(2008, dayofyear), format=%Y %j) Probably a better idea is as.Date(dayofyear - 1, origin = 2008-01-01) (as Jan 1 is day 1). On Tue, Mar 27, 2012 at 11:12 AM, Sam Alberstonightstheni...@gmail.comwrote: Hello, I am having trouble figuring out how to convert a Day of Year integer back into a Date format. For example I have the following: date- c('2008-01-01','2008-01-02','2008-01-03','2008-01-04','2008-01-05','2008-01-06','2008-01-07', '2008-01-08','2008-01-09','2008-01-10','2008-01-11','2008-01-12','2008-01-13','2008-01-14','2008-01-15', '2008-01-16','2008-01-17','2008-01-18','2008-01-19','2008-01-20','2008-01-21','2008-01-22','2008-01-23') ## this is then converted into a number corresponding to the day of the year like so: dayofyear- strptime(date, format=%Y-%m-%d)$yday + 1 ## Now my question is how do I get back to a date format (obviously omitting the year). ## The end result is that I'd like to be able to have axis labels as something like Month-Day or just Month ## instead of just an integers which isn't always intuitive for people but I can't seem to figure out how to tell R ## to recognize an integer as a date. Any suggestions? Many thanks in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survplot function
Dear R-helpers I am wondering if there is an option to the survplot function in the design package that allows for drawing Kaplan-Meier plots starting from 0 instead of 1, similar like fun = 'event' in the standard plotting function used on a survfit object. I apologize in advance for having missed any obvious informational sources but I really didn't find anything in the documentation. Best regards Thorsten Raff -- Thorsten Raff 2nd Medical Department, University Hospital Schleswig-Holstein, Campus Kiel Chemnitzstraße 33 24116 Kiel GERMANY phone: +49 431 1697-5234 fax: +49 431 1697-1264 email: t.raffatmed2.uni-kiel.de web: www.uk-sh.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to change the color of tcltk widget background color
hi, I'm a beginner of tcltk packages. I'm making some gui for some function and want to change the background color that is grey in default. anybody who knows the way that changes the color of it plz teach me how to do that. Forthemore is there a nice manual for tclck? Thanks. -- View this message in context: http://r.789695.n4.nabble.com/How-to-change-the-color-of-tcltk-widget-background-color-tp4509989p4509989.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What error distribution should I use?
Lívia Dorneles Audino livia.audino at gmail.com writes: I'm trying to make a glmm to identify the relationship between insect species richness with fragment size, isolation and time (different years). I already tried to analyse it using poisson distribution error, but I always face with the following warning: *glm.fit: fitted probabilities numerically 0 or 1 occurred * This is probably hapenning because my dataset has a lot of zeros. So, what error distribution should I use? I know you haven't gotten a lot of help on r-sig-mixed-models (sorry), but it would probably be better to post this question there. The answer is that this is a warning, not an error, so it indicates a need for caution but not necessarily that anything is wrong. In this case, an internal call to glm.fit() has difficulty when it tries to fit a subset of that data that are all-zero or all-one. It's quite possibly OK, provided that you've looked at your results, plotted predicted values, etc., and everything seems to make sense. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Year of data collection for 'diamonds' dataset in ggplot2
I believe it was 2008. Hadley On Mon, Mar 26, 2012 at 11:46 AM, Marina Doucerain marinadoucer...@gmail.com wrote: Hello, I'm wondering what was the year (or year range) of collection for the data included in the 'diamonds' dataset in ggplot2. This information would be very helpful in interpreting the 'price' variable. Thank you! Marina Doucerain __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calling java from R and using java time series double precision array
I solved some of my problems, but the one that remains is that reading the two-dimensional arrays into R transposes the matrix. The arrays I want to read are unequal interval time multi series with the first column being the times which are converted in java from calendar CnYrMoDaHrMnScDCMQ or CnYrMoDa and between to linear. What R programs do I use to plot and analyse this kind of time series? How can I prevent transpose. THIS IS THE JAVA CODE: public class Transf2R { Transf2R transf2R; public static void main(String[]args) { Transf2R transf2R=new Transf2R(); transf2R.transf2R=transf2R; transf2R.transf2R.main2(); } public static void main2() { double[][]arRet=arReturnMethod(); for(int i=0;i9;i++) { for(int j=0;j3;j++)System.out.print((int)con2Arr[i][j]+,); System.out.println(); } for(int i=0;i9;i++) { for(int j=0;j3;j++)System.out.print((int)arRet[i][j]+,); System.out.println(); } } public final static double con0dbl=10001; public final static double[]con1Vec=new double[] { 10001,10002,10003,10004,10005,10006 }; public final static double[][]con2Arr=new double[][] { { 10001,10002,10003 },{ 20001,20002,20003 },{ 30001,30002,30003 },{ 40001,40002,40003 } ,{ 50001,50002,50003 },{ 60001,60002,60003 },{ 70001,70002,70003 },{ 80001,80002,80003 } ,{ 90001,90002,90003 } }; public final static double[][]arReturnMethod() { double[][]retArr=new double[9][3]; for(int i=0;i9;i++)for(int j=0;j3;j++)retArr[i][j]=(i+1)*1000+j+1; return(retArr); } public final static double[][]dbl2DimArRet4R(double[][]dbl2DimAr4R) { return(dbl2DimAr4R); } public final static double[]dbl1DimVcRet4R(double[]dbl1DimVc4R) { return(dbl1DimVc4R); } public final static double dblRet4R(double dbl4R) { return(dbl4R); } public final static double dblNum4R=Math.PI; } WHICH PRODUCES THIS UPON RUNNING MAIN(): 10001,10002,10003, 20001,20002,20003, 30001,30002,30003, 40001,40002,40003, 50001,50002,50003, 60001,60002,60003, 70001,70002,70003, 80001,80002,80003, 90001,90002,90003, 1001,1002,1003, 2001,2002,2003, 3001,3002,3003, 4001,4002,4003, 5001,5002,5003, 6001,6002,6003, 7001,7002,7003, 8001,8002,8003, 9001,9002,9003, I FINALLY FIGURED OUT SOME R CODE THAT DEMONSTRATES WHAT I WANT TO DO: library(rJava) # loads package .jinit() # starts JVM [1] 0 .jaddClassPath(C:/ad/j) print(.jclassPath()) [1] C:\\Users\\ENVY17\\Documents\\R\\win-library\\2.13\\rJava\\java C:\\ad\\j trnsfer2R - .jnew(Transf2R) # creates link to java class arj9x3Ret - sapply(.jcall(trnsfer2R,returnSig=[[D,arReturnMethod),.jevalArray) print(arj9x3Ret) # note: row and column indices get interchanged [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [1,] 1001 2001 3001 4001 5001 6001 7001 8001 9001 [2,] 1002 2002 3002 4002 5002 6002 7002 8002 9002 [3,] 1003 2003 3003 4003 5003 6003 7003 8003 9003 dblNum - .jcall(trnsfer2R,returnSig=D,dblRet4R,trnsfer2R$dblNum4R) print(dblNum,digits=20) [1] 3.141592653589793116 conn1Vec - .jcall(trnsfer2R,returnSig=[D,dbl1DimVcRet4R,trnsfer2R$con1Vec) # con1Vec is java one dim array of double precision constants print(conn1Vec) [1] 10001 10002 10003 10004 10005 10006 conn2Arr - sapply(.jcall(trnsfer2R,returnSig=[[D,dbl2DimArRet4R,.jfield(trnsfer2R, [[D, con2Arr, convert=F)),.jevalArray) print(conn2Arr) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [1,] 10001 20001 30001 40001 50001 60001 70001 80001 90001 [2,] 10002 20002 30002 40002 50002 60002 70002 80002 90002 [3,] 10003 20003 30003 40003 50003 60003 70003 80003 90003 BUT THE TWO-DIMENSIONAL ARRAYS SEEM TO BE TRANSPOSED. -- View this message in context: http://r.789695.n4.nabble.com/calling-java-from-R-and-using-java-time-series-double-precision-array-tp4494581p4510410.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What error distribution should I use?
On 2012-03-27 15:11, Ben Bolker wrote: Lívia Dorneles Audinolivia.audinoat gmail.com writes: I'm trying to make a glmm to identify the relationship between insect species richness with fragment size, isolation and time (different years). I already tried to analyse it using poisson distribution error, but I always face with the following warning: *glm.fit: fitted probabilities numerically 0 or 1 occurred * This is probably hapenning because my dataset has a lot of zeros. So, what error distribution should I use? I know you haven't gotten a lot of help on r-sig-mixed-models (sorry), but it would probably be better to post this question there. The answer is that this is a warning, not an error, so it indicates a need for caution but not necessarily that anything is wrong. In this case, an internal call to glm.fit() has difficulty when it tries to fit a subset of that data that are all-zero or all-one. It's quite possibly OK, provided that you've looked at your results, plotted predicted values, etc., and everything seems to make sense. Livia: You might also find this quite extensive recent post from Ted Harding informative: https://stat.ethz.ch/pipermail/r-help/2012-March/307352.html Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rgdal package - get information
There is a mailing list R-Sig-Geo which is more appropriate for questions about the rgdal and related packages. If by read the information GDType you mean to get that Int16 description you can get it by delving into the attributes of the GDALinfo return value, for example: f - system.file(pictures/erdas_spnad83.tif, package = rgdal)[1] attr(GDALinfo(f), df)[GDType] GDType 1 Byte In your case that would be attr(GDALinfo(MOD13Q1.A2001049.h13v11.005.2007002215512.250m_16_days_EVI.tif), df)[GDType] If you just mean to read the data into R, then use readGDAL from the rgdal package. Extensions to this support that simplify some matters are available in the raster package. Cheers, Mike. On Wed, Mar 28, 2012 at 3:40 AM, julio cesar oliveira oliveir...@ufv.br wrote: Hi, I used GDALinfo(MOD13Q1.A2001049.h13v11.005.2007002215512.250m_16_days_EVI.tif) and got the results: rows 10 columns 11 bands 1 origin.x 150701.4 origin.y 7744897 res.x 250 res.y 250 ysign -1 oblique.x 0 oblique.y 0 driver GTiff projection +proj=utm +zone=23 +south +datum=WGS84 +units=m +no_defs file /MOD13Q1.A2001049.h13v11.005.2007002215512.250m_16_days_EVI.tif apparent band summary: *GDType* Bmin Bmax Bmean Bsd hasNoDataValue NoDataValue 1 *Int16* -32768 32767 0 0 FALSE 0 Metadata: AREA_OR_POINT=Point TIFFTAG_SOFTWARE=MODIS Reprojection Tool v4.1 March 2009 *How to read the information GDType?* Thanks, julio [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael Sumner Institute for Marine and Antarctic Studies, University of Tasmania Hobart, Australia e-mail: mdsum...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assigning vector or matrix sparsely (for use with mclapply)
It is (at least for me) really unclear what the problem is, or how it's related to mclapply. You say this works fine, except that what I want to get NA's in the return positions that were not recalculated. then, I can write newdata$y - ifelse ( is.na(olddata$y), mc.byselectrows( olddata, is.na(olddata$y), fun.calc.y ), olddata$y ) Why ??? Are you applying the function twice ? than why not simply v1.1 - mc.byselectrows( d, loc1, function(x) x[,2]^2 ) the second time ? If the problem is in keeping track of which rows got calculated, why not rename with the row.names omitted after mclapply (probably a good idea anyway): FUN.ON.ROWS - function(.index, ...) as.matrix(FUN(data.notdone[.index,], ...)) soln - mclapply( as.list(1:nrow(data.notdone)) , FUN.ON.ROWS, ... ) rv - do.call(rbind, soln) ## omits naming. if (ncol(rv)==1){ rv - as.vector(rv) ; names(rv) - row.names(data.notdone) } else rownames(rv) - row.names(data.notdone) rv } And finally, you don't even need row.names for c(v1,d[loc1,2]) Or am I missing something here ? BTW your code uses cat.stderr (which is local ? ) instead of cat, and has no call to multicore. Cheers On Mon, Mar 26, 2012 at 4:28 PM, ivo welch ivo.we...@gmail.com wrote: Dear R wizards--- I have a wrapper on mclapply() that makes it a little easier for me to do multiprocessing. (Posting this may make life easier for other googlers.) I pass a data frame, a vector that tells me what rows should be recomputed, and the function; and I get back a vector or matrix of answers. d - data.frame( id=1:6, val=11:16 ) loc - c(TRUE,TRUE,FALSE,TRUE,FALSE,TRUE) v1 - mc.byselectrows( d, loc, function(x) x[,2]^2 ) v2 - mc.byselectrows(d, loc, function(x) cbind(x[,2]^2,x[,2]^3)) mc.byselectrows - function(data.in, recalclist, FUN, ...) { data.notdone - data.in[recalclist,] cat.stderr([mc.byselectrows: , nrow(data.notdone), rows to be recomputed out of, nrow(data.in), ]\n) FUN.ON.ROWS - function(.index, ...) as.matrix(FUN(data.notdone[.index,], ...)) soln - mclapply( as.list(1:nrow(data.notdone)) , FUN.ON.ROWS, ... ) rv - do.call(rbind, soln) ## omits naming. if (ncol(rv)==1) rv - as.vector(rv) rv } this works fine, except that what I want to get NA's in the return positions that were not recalculated. then, I can write newdata$y - ifelse ( is.na(olddata$y), mc.byselectrows( olddata, is.na(olddata$y), fun.calc.y ), olddata$y ) I can do this very inelegantly, of course. I can merge recalclist into data.in and then write a loop that substitutes for the do.call to rbind. yikes. or I could do the recalclist contingency inside the FUN.ON.ROWS, but this is costly in terms of execution time. are there obvious solutions? advice appreciated. regards, /iaw Ivo Welch (ivo.we...@gmail.com) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is it possible to de-select with sqlQuery from the RODBC library?
Dear R-list, I'm queering a M$ Access database with the sqlQuery function from the RODBC library. As I cannot make a working example with a database here is an illustrative example, library(RODBC) mdbConnect-odbcConnectAccess(S:/data/ ... /databse.mdb) data - sqlQuery(mdbConnect, select id, DOB, V1, V2, ..., V1009, V1011, V1013 from someTable) I want everything in the table (someTable), except 'V1010' and 'V1012,' but I can't figure out how to make a negative or reverse SQL select statement. I have a lot of someTables and I have two or three variables in each table that I do not want R to fetch, Is there a way to define a reverse select in SQL? One would imagine it would look something like this, data - sqlQuery(mdbConnect, deselect V1010, V1o12 from someTable) Thanks, Eric __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assigning vector or matrix sparsely (for use with mclapply)
I wasn't thinking straight. old.data= 11:20 recalc.please= (old.data%%2==0) old.data[recalc.please] [1] 12 14 16 18 20 new.data[recalc.please]= old.data[recalc.please]^2 Error in new.data[recalc.please] = old.data[recalc.please]^2 : object 'new.data' not found # this is where I had given up, but the following works: new.data=old.data new.data[recalc.please]= old.data[recalc.please]^2 new.data [1] 11 144 13 196 15 256 17 324 19 400 sorry, guys. /iaw Ivo Welch (ivo.we...@gmail.com) On Tue, Mar 27, 2012 at 7:27 PM, ilai ke...@math.montana.edu wrote: It is (at least for me) really unclear what the problem is, or how it's related to mclapply. You say this works fine, except that what I want to get NA's in the return positions that were not recalculated. then, I can write newdata$y - ifelse ( is.na(olddata$y), mc.byselectrows( olddata, is.na(olddata$y), fun.calc.y ), olddata$y ) Why ??? Are you applying the function twice ? than why not simply v1.1 - mc.byselectrows( d, loc1, function(x) x[,2]^2 ) the second time ? If the problem is in keeping track of which rows got calculated, why not rename with the row.names omitted after mclapply (probably a good idea anyway): FUN.ON.ROWS - function(.index, ...) as.matrix(FUN(data.notdone[.index,], ...)) soln - mclapply( as.list(1:nrow(data.notdone)) , FUN.ON.ROWS, ... ) rv - do.call(rbind, soln) ## omits naming. if (ncol(rv)==1){ rv - as.vector(rv) ; names(rv) - row.names(data.notdone) } else rownames(rv) - row.names(data.notdone) rv } And finally, you don't even need row.names for c(v1,d[loc1,2]) Or am I missing something here ? BTW your code uses cat.stderr (which is local ? ) instead of cat, and has no call to multicore. Cheers On Mon, Mar 26, 2012 at 4:28 PM, ivo welch ivo.we...@gmail.com wrote: Dear R wizards--- I have a wrapper on mclapply() that makes it a little easier for me to do multiprocessing. (Posting this may make life easier for other googlers.) I pass a data frame, a vector that tells me what rows should be recomputed, and the function; and I get back a vector or matrix of answers. d - data.frame( id=1:6, val=11:16 ) loc - c(TRUE,TRUE,FALSE,TRUE,FALSE,TRUE) v1 - mc.byselectrows( d, loc, function(x) x[,2]^2 ) v2 - mc.byselectrows(d, loc, function(x) cbind(x[,2]^2,x[,2]^3)) mc.byselectrows - function(data.in, recalclist, FUN, ...) { data.notdone - data.in[recalclist,] cat.stderr([mc.byselectrows: , nrow(data.notdone), rows to be recomputed out of, nrow(data.in), ]\n) FUN.ON.ROWS - function(.index, ...) as.matrix(FUN(data.notdone[.index,], ...)) soln - mclapply( as.list(1:nrow(data.notdone)) , FUN.ON.ROWS, ... ) rv - do.call(rbind, soln) ## omits naming. if (ncol(rv)==1) rv - as.vector(rv) rv } this works fine, except that what I want to get NA's in the return positions that were not recalculated. then, I can write newdata$y - ifelse ( is.na(olddata$y), mc.byselectrows( olddata, is.na(olddata$y), fun.calc.y ), olddata$y ) I can do this very inelegantly, of course. I can merge recalclist into data.in and then write a loop that substitutes for the do.call to rbind. yikes. or I could do the recalclist contingency inside the FUN.ON.ROWS, but this is costly in terms of execution time. are there obvious solutions? advice appreciated. regards, /iaw Ivo Welch (ivo.we...@gmail.com) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] One last thing
Dear R, Thanks for helping me locate the source for the StructTS method from stats, but I've run in to a roadblock in reverse engineering it to locate a formula for its forecasting because it calls some compiled C code, a function called KalmanLike. I've looked through that R library that the StructTS method code was located in and could not find it. Sincerely, Alexander Fretheim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Zero inflated GAMM
Bert, Try posting on the R-sig-ME list for help with mixed models. Cheer, Neil On Wed, Mar 28, 2012 at 1:16 AM, Bert Harris aramidop...@gmail.com wrote: HI all, I am planning to get Zuur et al.'s new book when it comes out, but until then I was wondering if anyone could suggest examples of zero inflated or hurdle GAMMs. I have count data with many zeros, non-linear relationships, and site as a random effect. Thank you! Bert Harris, University of Adelaide [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R extract parts
Hello, my idea is to get results like this: user, sector, source, destine, count, average 7 1 22 22 4 186.25 # (109+100+214+322) 7 2 161 97 1 68 7 2 97 97 1 196 7 2 97 22 1 427 7 2 22 22 2 383 Your second column, 'sector', comes from where? What is it? Without it, try this. text= user poscommunications source v_destine 7 1 109 2222 7 2 100 2222 7 3 214 2222 7 4 322 2222 7 5 69920 22 161 7 6 68 16197 7 7 196 9797 7 8 427 9722 7 9460 2222 7 10 307 2222 7 11 9582 2222 7 12 55428 2222 7 139192 2222 7 14 19 2222 df1 - read.table(textConnection(text), header=TRUE) inx - df1$comm 1000 comm1000 - cumsum(inx) result - split(df1[!inx, ], list(comm1000[!inx], df1$source[!inx], df1$v_destine[!inx])) result - sapply(result, function(x) c(x$user[1], x$source[1], x$v_destine[1], nrow(x), mean(x$comm))) result - na.exclude(t(result)) rownames(result) - 1:nrow(result) colnames(result) - c(user, source, v_destine, count, average) attr(result, na.action) - NULL attr(result, class) - NULL result Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/R-extract-parts-tp4509042p4510566.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What error distribution should I use?
Could you please post a small example of your data and code which gives you this error. Your assumed error distribution sounds reasonable. I am interested as to why you have zeros... you have sites with species richness ==0 ?? Lívia Dorneles Audino wrote I'm trying to make a glmm to identify the relationship between insect species richness with fragment size, isolation and time (different years). I already tried to analyse it using poisson distribution error, but I always face with the following warning: *glm.fit: fitted probabilities numerically 0 or 1 occurred * This is probably hapenning because my dataset has a lot of zeros. So, what error distribution should I use? -- *Lívia * [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/What-error-distribution-should-I-use-tp4509479p4510351.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.