[R] Does replacing some values of a zoo object by NA reduce it's size ?
Dear R-helpers, Please have a look at the following. f1 is the same as f2 except that it has some values replaced by NA. But it's corresponding file is slightly bigger than the file containing f2. Could someone please tell me if this is an anomaly ? load(file1) ls() [1] f1 load(file2) ls() [1] f1 f2 dput(f1) structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, NA, NA, NA, 2, 2, 2, NA, NA, NA, NA), .Dim = c(10L, 2L), .Dimnames = list(NULL, c(v1, l1)), index = 1:10, class = zoo) dput(f2) structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), .Dim = c(10L, 2L), .Dimnames = list(NULL, c(v1, l1)), index = 1:10, class = zoo) -rw-r--r-- 1 ashimkapoor ashimkapoor 192 2011-09-27 11:08 file1 -rw-r--r-- 1 ashimkapoor ashimkapoor 179 2011-09-27 11:08 file2 Best Regards, Ashim. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two-way anova help
Hi Paul, There should not be any problem. Here is how I visualize the data table looks like: Obs Male_type  Female_type Response 1 1 1 34 2 1 1 44 3 1 2 38 ⦠⦠⦠⦠If your data frame has the above structure, R will automatically understand that there is replication. Your model form will remain the same. Regards, Indrajit From: Austin Paul austi...@usc.edu Cc: r-help@r-project.org r-help@r-project.org Sent: Tuesday, September 27, 2011 10:57 AM Subject: Re: [R] two-way anova help Hi, Yes. As I explained, the three male and three female types were crossed in all combinations (9 ways). For each of the 9 types, I have 5 replicate tanks (45 total tanks). And from each of the 45 tanks I have 50 observations for size. So the 5 replicates are somehow nested within the two-way interaction? If there was just 1 tank for each of the 9 crosses, yes, it would be very easy to code the two-way anova. It may still be very easy, but I'm not quite sure how to account for the replicate tanks. Hope this makes more sense. Austin On Mon, Sep 26, 2011 at 10:21 PM, Indrajit Sengupta indra_cali...@yahoo.com wrote: Can you explain what do you mean by 5 replicate tanks? Doing a two way anova is very simple in R. You would need to fit a linear model (lm function). Eg.: model - lm(y ~ male + female + male:female, data =) Regards, Indrajit From: Austin Paul austi...@usc.edu To: r-help@r-project.org Sent: Tuesday, September 27, 2011 6:13 AM Subject: [R] two-way anova help Hello, I am having some trouble coding a two-way anova due to replicated treatments. I have a factorial design with three male parents and three female parents. They were mated in all combinations and their babies were grown out and measured for size. 50 babies were measured for each of the 9 crosses. If I stopped here, I would have no troubles. But I also have 5 replicate tanks for each of the 9 crosses. My question is how to I code in the 5 replicate tanks per treatment? Thanks, Austin    [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two-way anova help
Somehow the table didn't come in the email as I had visualized. But I hope you are able to understand the columns. They are 1. Obs number 2. Male Type 3. Female Type 4. Response variable Male type can take three values 1, 2 and 3 Female type can take three values 1, 2 and 3 Note that to consider the above variables as factors and not covariates - you need to apply as.factor to those variables in R. Then you can run fit the linear model with or without interaction term and study the ANOVA table. Hope this helps. Regards, Indrajit To: Austin Paul austi...@usc.edu Cc: r-help@r-project.org r-help@r-project.org Sent: Tuesday, September 27, 2011 12:10 PM Subject: Re: [R] two-way anova help Hi Paul, There should not be any problem. Here is how I visualize the data table looks like: Obs Male_type  Female_type Response 1 1 1 34 2 1 1 44 3 1 2 38 ⦠⦠⦠⦠If your data frame has the above structure, R will automatically understand that there is replication. Your model form will remain the same. Regards, Indrajit From: Austin Paul austi...@usc.edu Cc: r-help@r-project.org r-help@r-project.org Sent: Tuesday, September 27, 2011 10:57 AM Subject: Re: [R] two-way anova help Hi, Yes. As I explained, the three male and three female types were crossed in all combinations (9 ways). For each of the 9 types, I have 5 replicate tanks (45 total tanks). And from each of the 45 tanks I have 50 observations for size. So the 5 replicates are somehow nested within the two-way interaction? If there was just 1 tank for each of the 9 crosses, yes, it would be very easy to code the two-way anova. It may still be very easy, but I'm not quite sure how to account for the replicate tanks. Hope this makes more sense. Austin On Mon, Sep 26, 2011 at 10:21 PM, Indrajit Sengupta indra_cali...@yahoo.com wrote: Can you explain what do you mean by 5 replicate tanks? Doing a two way anova is very simple in R. You would need to fit a linear model (lm function). Eg.: model - lm(y ~ male + female + male:female, data =) Regards, Indrajit From: Austin Paul austi...@usc.edu To: r-help@r-project.org Sent: Tuesday, September 27, 2011 6:13 AM Subject: [R] two-way anova help Hello, I am having some trouble coding a two-way anova due to replicated treatments. I have a factorial design with three male parents and three female parents. They were mated in all combinations and their babies were grown out and measured for size. 50 babies were measured for each of the 9 crosses. If I stopped here, I would have no troubles. But I also have 5 replicate tanks for each of the 9 crosses. My question is how to I code in the 5 replicate tanks per treatment? Thanks, Austin    [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.    [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two-way anova help
Tanks are the experimental unit, fishes within tanks are repeated measures of the treatment, there is no nesting of replicates. You can analyze the 45 x 50 individual data values by repeated measures (mixed effects models) or by summarizing the 50 measurements per tank=treatment into a single value and then doing exactly what Indrajit said with the 45 summarized values. -- Bert On Mon, Sep 26, 2011 at 10:27 PM, Austin Paul austi...@usc.edu wrote: Hi, Yes. As I explained, the three male and three female types were crossed in all combinations (9 ways). For each of the 9 types, I have *5 replicate tanks* (45 total tanks). And from each of the 45 tanks I have 50 observations for size. So the 5 replicates are somehow nested within the two-way interaction? If there was just 1 tank for each of the 9 crosses, yes, it would be very easy to code the two-way anova. It may still be very easy, but I'm not quite sure how to account for the replicate tanks. Hope this makes more sense. Austin On Mon, Sep 26, 2011 at 10:21 PM, Indrajit Sengupta indra_cali...@yahoo.com wrote: Can you explain what do you mean by 5 replicate tanks? Doing a two way anova is very simple in R. You would need to fit a linear model (lm function). Eg.: model - lm(y ~ male + female + male:female, data =) Regards, Indrajit -- *From:* Austin Paul austi...@usc.edu *To:* r-help@r-project.org *Sent:* Tuesday, September 27, 2011 6:13 AM *Subject:* [R] two-way anova help Hello, I am having some trouble coding a two-way anova due to replicated treatments. I have a factorial design with three male parents and three female parents. They were mated in all combinations and their babies were grown out and measured for size. 50 babies were measured for each of the 9 crosses. If I stopped here, I would have no troubles. But I also have 5 replicate tanks for each of the 9 crosses. My question is how to I code in the 5 replicate tanks per treatment? Thanks, Austin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package for Neural Network Classification Quality
Alejandro, Hi, somebody knows about one R-package which i can evaluate quality (recall, precision, accuracy, etc) of Neural network classification with more than 2 classes. I found ROCT package, http://cran.r-project.org/web/packages/ROCR/index.html, but it only workes with binary classifications, I guess that is because strictly these measures are defined for binary problems (though I expand them to multi-class situations by using class-A ./. not-class-A binary measures which comes quite naturally for my classes). In case you need something that takes soft or fuzzy class measures: I put my ideas about that into package softclassval and would much appreciate feedback. Best, Claudia Best regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two-way anova help
Hi Indrajit and Bert, I really appeciate your help. I have coded as you mentioned, but I feel like I am losing a lot of data because I am not accounting for the replicate tanks (what if the 5 replicates of the same cross type vary more among tanks than within?). Below is what my data would look like (if it comes through) obs male female rep response 1 1 1 1 34 2 1 1 1 44 . . . . . . . . . . 51 1 1 2 37 . . . . . . . . . . 251 1 2 1 42 So if I understand correctly, the five replicate tanks of each cross cannot be treated as technical replicates? They are not exactly repeated measures in the sense they are different individuals in different replicate tanks. If I pool all 250 observations for each cross, instead of treating it as 5x50 observations, I feel like I am losing a lot of information. Austin On Mon, Sep 26, 2011 at 11:40 PM, Indrajit Sengupta indra_cali...@yahoo.com wrote: Hi Paul, There should not be any problem. Here is how I visualize the data table looks like: Obs Male_type Female_type Response 1 1 1 34 2 1 1 44 3 1 2 38 If your data frame has the above structure, R will automatically understand that there is replication. Your model form will remain the same. Regards, Indrajit -- *From:* Austin Paul austi...@usc.edu *To:* Indrajit Sengupta indra_cali...@yahoo.com *Cc:* r-help@r-project.org r-help@r-project.org *Sent:* Tuesday, September 27, 2011 10:57 AM *Subject:* Re: [R] two-way anova help Hi, Yes. As I explained, the three male and three female types were crossed in all combinations (9 ways). For each of the 9 types, I have *5 replicate tanks* (45 total tanks). And from each of the 45 tanks I have 50 observations for size. So the 5 replicates are somehow nested within the two-way interaction? If there was just 1 tank for each of the 9 crosses, yes, it would be very easy to code the two-way anova. It may still be very easy, but I'm not quite sure how to account for the replicate tanks. Hope this makes more sense. Austin On Mon, Sep 26, 2011 at 10:21 PM, Indrajit Sengupta indra_cali...@yahoo.com wrote: Can you explain what do you mean by 5 replicate tanks? Doing a two way anova is very simple in R. You would need to fit a linear model (lm function). Eg.: model - lm(y ~ male + female + male:female, data =) Regards, Indrajit -- *From:* Austin Paul austi...@usc.edu *To:* r-help@r-project.org *Sent:* Tuesday, September 27, 2011 6:13 AM *Subject:* [R] two-way anova help Hello, I am having some trouble coding a two-way anova due to replicated treatments. I have a factorial design with three male parents and three female parents. They were mated in all combinations and their babies were grown out and measured for size. 50 babies were measured for each of the 9 crosses. If I stopped here, I would have no troubles. But I also have 5 replicate tanks for each of the 9 crosses. My question is how to I code in the 5 replicate tanks per treatment? Thanks, Austin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to terminate R program if found any execution error
The original poster may also be interested in options(error) to capture the 'any execution error' requirement. From the examples in help(options): ## Not run: ## on error, terminate the R session with error status 66 options(error = quote(q(no, status=66, runLast=FALSE))) stop(test it) Allan On 27/09/2011 06:22, Duncan Murdoch wrote: On 11-09-27 12:20 AM, arunkumar wrote: Hi I want to terminate R process if there are any execution error. a=a b=10 c=try(a/b) if(class(c)[1]==try-error) { stop(Wrong Input Value) } d=c*c if c fails then it should terminate the process. Please can anyone help Keep the try(a/b), but replace the conditional with if (inherits(c, try-error)) q(no) See ?q if you want to set an error status to be returned, or do want to save the workspace, etc. (And do use inherits() rather than comparing to a particular entry: your code will probably work in this example, but it's not the right way to test the class of something, and some day try-error might not be the first entry.) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Does replacing some values of a zoo object by NA reduce it's size ?
It is not an anomaly. The object is the same size (object.size(f1)==object.size(f2)); its file representation is different. On 27/09/2011 07:00, Ashim Kapoor wrote: Dear R-helpers, Please have a look at the following. f1 is the same as f2 except that it has some values replaced by NA. But it's corresponding file is slightly bigger than the file containing f2. Could someone please tell me if this is an anomaly ? load(file1) ls() [1] f1 load(file2) ls() [1] f1 f2 dput(f1) structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, NA, NA, NA, 2, 2, 2, NA, NA, NA, NA), .Dim = c(10L, 2L), .Dimnames = list(NULL, c(v1, l1)), index = 1:10, class = zoo) dput(f2) structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), .Dim = c(10L, 2L), .Dimnames = list(NULL, c(v1, l1)), index = 1:10, class = zoo) -rw-r--r-- 1 ashimkapoor ashimkapoor 192 2011-09-27 11:08 file1 -rw-r--r-- 1 ashimkapoor ashimkapoor 179 2011-09-27 11:08 file2 Best Regards, Ashim. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] regression with ordered arguments
Dear R listers, I am trying to be a new R user, but life is not that easy. My problem is the following one: let's assume to have 3 outcome variables (y1, y2, y3) and 3 explanatory ones (x1, x2, x3). How can I run the following three separate regressions without having to repeat the lm command three times? fit.1 - lm(y1 ~ x1) fit.2 - lm(y2 ~ x2) fit.3 - lm(y3 ~ x3) Both the y and x variables have been generated extracting random numbers from uniform distributions using a command such as: y1 - runif(100, min = 0, max = 1) I went to several introductory manuals, the manual R for stata users, econometrics in R, Introductory statistics with R and several blogs and help files, but I didn't find an answer to my question. can you please help me? In Stata I wouldn't have any problem in running this as a loop, but I really can't figure out how to do that with R. Thanks in advance for all your help. Best, f. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: regression with ordered arguments
Hi Dear R listers, I am trying to be a new R user, but life is not that easy. My problem is the following one: let's assume to have 3 outcome variables (y1, y2, y3) and 3 explanatory ones (x1, x2, x3). How can I run the following three separate regressions without having to repeat the lm command three times? fit.1 - lm(y1 ~ x1) fit.2 - lm(y2 ~ x2) fit.3 - lm(y3 ~ x3) Both the y and x variables have been generated extracting random numbers from uniform distributions using a command such as: y1 - runif(100, min = 0, max = 1) I went to several introductory manuals, the manual R for stata users, econometrics in R, Introductory statistics with R and several blogs and help files, but I didn't find an answer to my question. can you please help me? In Stata I wouldn't have any problem in running this as a loop, but I really can't figure out how to do that with R. You can construct loop with naming through paste, numbers and get in R too but you will find your life much easier to use R powerfull list operations. Insted of y1 - runif(100, min = 0, max = 1) ... lll - vector(mode=list, length=3) lll - lapply(1, function(x) runif(100, min = 0, max = 1)) you can use probably mapply for doing your regression. Or you can easily access part of the list by loop for (i in 1:3) lm(lll[[i]]~xx[[i]]) (if you have your x's in list xx) Regards Petr Thanks in advance for all your help. Best, f. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mahalanobis Distance
Hi One thought would be to fit say a GARCH model to your historical data series, divide the returns by the sigma estimates and then repeat. This would have the advantage of getting the data to be closer to the same scale. Regards David -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of jorgeA Sent: 26 September 2011 21:07 To: r-help@r-project.org Subject: Re: [R] Mahalanobis Distance Hello David, Thank you for the help anyway. Well answering your question However, I wonder how much value there is to computing the Mahalanobis distance with two variables that are measured on such different scales?: These two variables are subseries of the same time series. What I'm doing is using one method of forecasting time series that searches in the past of the time series, similar subseries to be an input of a forecasting function. I'm testing several distance measures, and one of that is the mahalanobis distance. But right now I'm stuck with this problem Best regards, Jorge Aikes Junior Universidade Estadual do Oeste do Paraná - Brazil. -- View this message in context: http://r.789695.n4.nabble.com/Mahalanobis-Distance-tp3844960p3845168.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Issued by UBS AG or affiliates to professional investors...{{dropped:30}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] creating a vector automatically containing numbers 1, 100, 10000, 1000000, ...
hello, i am looking for a way to get a vector containing the numbers: 1,100,1,100,... each element should have two ceros more than the one before it until the number 1e+200 without using a loop. thank you very much in advance for your help! marion [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a vector automatically containing numbers 1, 100, 10000, 1000000, ...
On Tue, Sep 27, 2011 at 9:33 AM, Marion Wenty marion.we...@gmail.com wrote: hello, i am looking for a way to get a vector containing the numbers: 1,100,1,100,... each element should have two ceros more than the one before it until the number 1e+200 without using a loop. 10^(seq(0,200,by=2) Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Auto size of plots.
Hi I'm wondering if there is anyway to auto set the margin of a plot. Right now I'm setting it with par(mar=...) so that all the text for a bar shows up inside the plot. But I don't want to change par(mar=) for every new plot I make. So is there any way to tell R to increase the margin automatic as long as something is outside its borders? //Joel -- View this message in context: http://r.789695.n4.nabble.com/Auto-size-of-plots-tp3846560p3846560.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two-way anova help
Hi Paul, Let me ask you one question - when you say tank does it indicate some kind of chamber or area in which all the 9 pairs are tested / treated / observed? If so - then each of the tanks would constitute a sort of block or a homogeneous group. Your data would look like this: Obs   Tank   Male_type   Female_type   Response 1   1   1   1   43 2   2   1   1   52 3   3   1   1   48 4   4   1   1   58 5   5   1   1   55 6   1   1   2   52 7   2   1   2   58 8   3   1   2   46 9   4   1   2   51 10   5   1   2   54 11   1   1   3   40 12   2   1   3   49 13   3   1   3   57 14   4   1   3   49 15   5   1   3   40 â¦Â   â¦Â   â¦Â   â¦Â   ⦠The only reason you would code it in the above way is if you believe there is some kind of effect of tank on the outcome. Then you can add the block effect / tank effect in your model and test its significance in a 3 - way ANOVA design. Your initial model will be: model1 - lm(y ~ male + female + tank + male:female, data = dataset) If the model says - your tank effect is not significant you can remove the term and look at the reduced form of the model: model2 - lm(y ~ male + female + male:female, data = dataset) Let me know if this is making sense to you. Regards, Indrajit From: Austin Paul austi...@usc.edu Cc: r-help@r-project.org r-help@r-project.org Sent: Tuesday, September 27, 2011 12:30 PM Subject: Re: [R] two-way anova help Hi Indrajit and Bert,  I really appeciate your help. I have coded as you mentioned, but I feel like I am losing a lot of data because I am not accounting for the replicate tanks (what if the 5 replicates of the same cross type vary more among tanks than within?). Below is what my data would look like (if it comes through)  obs male female rep response 1 1 1 1 34 2 1 1 1 44 . . . . . . . . . . 51 1 1 2 37 . . . . . . . . . . 251 1 2 1 42  So if I understand correctly, the five replicate tanks of each cross cannot be treated as technical replicates? They are not exactly repeated measures in the sense they are different individuals in different replicate tanks. If I pool all 250 observations for each cross, instead of treating it as 5x50 observations, I feel like I am losing a lot of information.  Austin   On Mon, Sep 26, 2011 at 11:40 PM, Indrajit Sengupta indra_cali...@yahoo.com wrote: Hi Paul, There should not be any problem. Here is how I visualize the data table looks like: Obs Male_type  Female_type Response 1 1 1 34 2 1 1 44 3 1 2 38 ⦠⦠⦠⦠If your data frame has the above structure, R will automatically understand that there is replication. Your model form will remain the same. Regards, Indrajit From: Austin Paul austi...@usc.edu Cc: r-help@r-project.org r-help@r-project.org Sent: Tuesday, September 27, 2011 10:57 AM Subject: Re: [R] two-way anova help Hi, Yes. As I explained, the three male and three female types were crossed in all combinations (9 ways). For each of the 9 types, I have 5 replicate tanks (45 total tanks). And from each of the 45 tanks I have 50 observations for size. So the 5 replicates are somehow nested within the two-way interaction? If there was just 1 tank for each of the 9 crosses, yes, it would be very easy to code the two-way anova. It may still be very easy, but I'm not quite sure how to account for the replicate tanks. Hope this makes more sense. Austin On Mon, Sep 26, 2011 at 10:21 PM, Indrajit Sengupta indra_cali...@yahoo.com wrote: Can you explain what do you mean by 5 replicate tanks? Doing a two way anova is very simple in R. You would need to fit a linear model (lm function). Eg.: model - lm(y ~ male + female + male:female, data =) Regards, Indrajit From: Austin Paul austi...@usc.edu To: r-help@r-project.org Sent: Tuesday, September 27, 2011 6:13 AM Subject: [R] two-way anova help Hello, I am having some trouble coding a two-way anova due to replicated treatments. I have a factorial design with three male parents and three female parents. They were mated in all combinations and their babies were grown out and measured for size. 50 babies were measured for each of the 9 crosses. If I stopped here, I would have no troubles. But I also have 5 replicate tanks for each of the 9 crosses. My question is how to I code in the 5 replicate tanks per treatment? Thanks, Austin    [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] creating a vector automatically containing numbers 1, 100, 10000, 1000000, ...
hello barry, thank you very much! i had thought of the seq function but only in regard to normal differences. marion 2011/9/27 Barry Rowlingson b.rowling...@lancaster.ac.uk On Tue, Sep 27, 2011 at 9:33 AM, Marion Wenty marion.we...@gmail.com wrote: hello, i am looking for a way to get a vector containing the numbers: 1,100,1,100,... each element should have two ceros more than the one before it until the number 1e+200 without using a loop. 10^(seq(0,200,by=2) Barry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two-way anova help
Austin, As you describe it, 'tank' is indeed nested within 'cross' - no tank contains the offspring from more than one cross. How you treat that depends on whether the tank effect is fixed or random. I would _guess_ it's essentially a nuisance effect; if it were the principal effect of interest, you'd not have needed to do multiple parent crosses to find out about it. If it's random and appreciable, it affects inference about the parentage, roughly because the random tank effect increases the uncertainty of the cross means. The lm model for tank nested in the cross groups would probably look something like response~male*female+male:female:tank although if tanks were numbered 1-45 you'd get away without the male:female: prefix, at least in lm - the important thing is that lm is told that each male:female cross has _different_ tanks. That should give you an anova table with the tank effect appearing as the last row but one. The classical treatment of that with tank as a random effect would be to recalculate F and p for the parent effects and interaction by calculating F from the ratio of their mean squares to the tank mean square if the tank mean square is significant.. If not you would have the usual soul-searching about whether to compare with the residual MS or to re-fit the model without the tank effect. (the results ought to be pretty similar either way unless the tank effect is close to significance - in which case I'd check against it anyway and then think about the answer). If tank is a fixed effect, you can use the table as is. A somewhat simpler way to treat the tank-as-random case in R (simoper for you and I, that is - the maths involved is not simpler but fortunately Douglas Bates did it all for us) would be to treat it as a mixed effects model, using lme from the nlme package. Assuming tank is numbered 1-45 or otherwise identified as something like tankID-factor(male:femal:tank) to make sure lme knows the tanks are different for each cross, that would look like library(nlme) mod - lme(response~male*female, random=~1|tank, ...) anova(mod) would then give you a test for the parent effects and interactions, taking into account the estimated between-tank variance whether it's large or not. and summary(mod) will tell you the between-tank component of variance. Finally, if you think the tank effect is probably real and certainly if it's large, taking the tank means and doing two-way anova on the resulting 45 mans should be pretty much the same as the more complicated stuff above. Caveat: I'm a chemist. Taking a chemist's advice about statistics is about as sensible as taking a statistician's advice about chemistry - the mileage can vary. S Ellison -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Austin Paul Sent: 27 September 2011 06:28 To: Indrajit Sengupta Cc: r-help@r-project.org Subject: Re: [R] two-way anova help Hi, Yes. As I explained, the three male and three female types were crossed in all combinations (9 ways). For each of the 9 types, I have *5 replicate tanks* (45 total tanks). And from each of the 45 tanks I have 50 observations for size. So the 5 replicates are somehow nested within the two-way interaction? If there was just 1 tank for each of the 9 crosses, yes, it would be very easy to code the two-way anova. It may still be very easy, but I'm not quite sure how to account for the replicate tanks. Hope this makes more sense. Austin On Mon, Sep 26, 2011 at 10:21 PM, Indrajit Sengupta indra_cali...@yahoo.com wrote: Can you explain what do you mean by 5 replicate tanks? Doing a two way anova is very simple in R. You would need to fit a linear model (lm function). Eg.: model - lm(y ~ male + female + male:female, data =) Regards, Indrajit -- *From:* Austin Paul austi...@usc.edu *To:* r-help@r-project.org *Sent:* Tuesday, September 27, 2011 6:13 AM *Subject:* [R] two-way anova help Hello, I am having some trouble coding a two-way anova due to replicated treatments. I have a factorial design with three male parents and three female parents. They were mated in all combinations and their babies were grown out and measured for size. 50 babies were measured for each of the 9 crosses. If I stopped here, I would have no troubles. But I also have 5 replicate tanks for each of the 9 crosses. My question is how to I code in the 5 replicate tanks per treatment? Thanks, Austin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative
[R] Workflow for binary classification problem using svm via e1071 package
Dear R-list! I am using the e1071 package in R to solve a binary classification problem in a dataset of round 180 predictor variables (blood metabolites) of two groups of subjects (patients and healthy controls). I am confused regarding the correct way to assess the classification accuracy of the trained svm. (A) The svm command allows to specificy via the 'cross=k' parameter to specify a k-fold crossvalidation. This results in k values for classification accuracy and their corresponding mean. (B) On the other hand most textbooks and tutorials I was browsing, recommend separating the data into a training and a test-set and then to assess prediction accuarcy by checking the accuracy of the trained svm when applied to the test-set. I am not sure whether (A) and (B) would be alternative ways to assess prediction accuracy? Or is option (A) only the accuracy of the svm when applied to the test set and therefore I should implement option (B) after I used option (A)? So would it be the correct way to use first (A) then do grid-search (via the tune command) to find the best hyperparameters and then test the trained svm by applying it to the test set? And in case I use a linear kernel instead of RBF, I guess I do not need to run grid-search as there are no hyperparameters to estimate? BEst, Jokel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Coercing a character zoo to a numeric
Dear R-helpers, It seems to me that a character zoo cannot be coerced to a numeric zoo. Below is a minimal example. Can someone tell me what I have done wrong? z-zoo(1:4,order.by=1:4) coredata(z)-as.character(coredata(z)) str(z) zoo series from 1 to 4 Data: chr [1:4] 1 2 3 4 Index: int [1:4] 1 2 3 4 coredata(z)-as.numeric(coredata(z)) str(z) zoo series from 1 to 4 Data: chr [1:4] 1 2 3 4 Index: int [1:4] 1 2 3 4 Many thanks, Ashim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sm.regression , one-covariate graph and regression line width
Hi all, I'm using sm library. In the two-dimensional plot of the analysis with one covariate, I'm able to modify several graphic parameters, either specifying them directly in the sm.regression() function, either through sm.options().The only graphical parameter i cannot modify is the width of the smoothed regression line. I can change its color, style, but not the width (which I must increase). The only result I was able to obtain is after specifying lwd in par(), which increases the width of not only the regression line , but of everything (axes, points, ect ect), which is definitely too much. par(lwd=3)sm.regression(a, b, col=red, lty=2) does anybody have a suggestion? Do I have to tweak the function code? Thanks in advance, D [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] change rownames
Hi All, How do I add com to the row names of a data frame, without deleting the plot numbers? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regression with ordered arguments
Dear Petr, thank you so much for your quick reply. I was sure that there were some smart ways to address my issue. I went through it and took some time to look at the help for lapply and mapply. However, some doubts still remain. Following your example, I did: lll -vector(mode = list, length = 3) mmm -vector(mode = list, length = 3) yyy - lapply(lll, function(x) runif(100, min =0, max = 1)) xxx - lapply(mmm, function(x) runif(100, min =0, max = 1)) but then I get stucking again. It's not clear to me how to pass a lm command to mapply. I tried to give a look at lapply and sapply, but I did not manage to go much further. It would be of big help if you could give me some more hints on this or if you could provide me with some references. I am sorry, but I find the help files quite cryptic. Is there a manual or some other source that you would advice me where I could find some more example on how to deal with similar issues? Thank you very much for your precious support, f. On 27 September 2011 10:08, Petr PIKAL petr.pi...@precheza.cz wrote: Hi Dear R listers, I am trying to be a new R user, but life is not that easy. My problem is the following one: let's assume to have 3 outcome variables (y1, y2, y3) and 3 explanatory ones (x1, x2, x3). How can I run the following three separate regressions without having to repeat the lm command three times? fit.1 - lm(y1 ~ x1) fit.2 - lm(y2 ~ x2) fit.3 - lm(y3 ~ x3) Both the y and x variables have been generated extracting random numbers from uniform distributions using a command such as: y1 - runif(100, min = 0, max = 1) I went to several introductory manuals, the manual R for stata users, econometrics in R, Introductory statistics with R and several blogs and help files, but I didn't find an answer to my question. can you please help me? In Stata I wouldn't have any problem in running this as a loop, but I really can't figure out how to do that with R. You can construct loop with naming through paste, numbers and get in R too but you will find your life much easier to use R powerfull list operations. Insted of y1 - runif(100, min = 0, max = 1) ... lll - vector(mode=list, length=3) lll - lapply(1, function(x) runif(100, min = 0, max = 1)) you can use probably mapply for doing your regression. Or you can easily access part of the list by loop for (i in 1:3) lm(lll[[i]]~xx[[i]]) (if you have your x's in list xx) Regards Petr Thanks in advance for all your help. Best, f. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coercing a character zoo to a numeric
Yes, once made into a character zoo, the core data is marked to be of mode character and most attempts to modify involve implicit coercion to that mode. The following however works: library(zoo) z - zoo(1:4, order.by=1:4) str(z) z.Str - z coredata(z.Str) - as.character(coredata(z)) str(z.Str) z.Num - z.Str mode(z.Num) - numeric str(z.Num) However, I prefer to use this sort of line of code: z.Num - zoo(as.double(z.Str), index(z.Str)) finding it a little more transparent. Hope this helps, Michael Weylandt On Tue, Sep 27, 2011 at 5:56 AM, Ashim Kapoor ashimkap...@gmail.com wrote: Dear R-helpers, It seems to me that a character zoo cannot be coerced to a numeric zoo. Below is a minimal example. Can someone tell me what I have done wrong? z-zoo(1:4,order.by=1:4) coredata(z)-as.character(coredata(z)) str(z) zoo series from 1 to 4 Data: chr [1:4] 1 2 3 4 Index: int [1:4] 1 2 3 4 coredata(z)-as.numeric(coredata(z)) str(z) zoo series from 1 to 4 Data: chr [1:4] 1 2 3 4 Index: int [1:4] 1 2 3 4 Many thanks, Ashim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coercing a character zoo to a numeric
Dear Michael, I don't think this is mentioned in the zoo FAQ. May I ask where you read this? Some references ? Thank you for your help, Ashim On Tue, Sep 27, 2011 at 4:06 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Yes, once made into a character zoo, the core data is marked to be of mode character and most attempts to modify involve implicit coercion to that mode. The following however works: library(zoo) z - zoo(1:4, order.by=1:4) str(z) z.Str - z coredata(z.Str) - as.character(coredata(z)) str(z.Str) z.Num - z.Str mode(z.Num) - numeric str(z.Num) However, I prefer to use this sort of line of code: z.Num - zoo(as.double(z.Str), index(z.Str)) finding it a little more transparent. Hope this helps, Michael Weylandt On Tue, Sep 27, 2011 at 5:56 AM, Ashim Kapoor ashimkap...@gmail.comwrote: Dear R-helpers, It seems to me that a character zoo cannot be coerced to a numeric zoo. Below is a minimal example. Can someone tell me what I have done wrong? z-zoo(1:4,order.by=1:4) coredata(z)-as.character(coredata(z)) str(z) zoo series from 1 to 4 Data: chr [1:4] 1 2 3 4 Index: int [1:4] 1 2 3 4 coredata(z)-as.numeric(coredata(z)) str(z) zoo series from 1 to 4 Data: chr [1:4] 1 2 3 4 Index: int [1:4] 1 2 3 4 Many thanks, Ashim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Pearson chi-square test
Dear all, I have some trouble understanding the chisq.test function. Take the following example: set.seed(1) A - cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE) B - cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE) C - cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE) x - table(A,B) y - table(A,C) When I calculate the test statistic by hand I get a value of approximately 75.9: http://en.wikipedia.org/wiki/Pearson's_chi-square_test#Calculating_the_test-statistic sum((x-y)^2/y) But when I do chisq.test(x,y) I get a value of 12.2 while chisq.test(y,x) gives a value of 10.3. I understand that I must be doing something wrong here, but I'm not sure what. Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coercing a character zoo to a numeric
It's just a guess from playing around with a few things. The author of the package is on this list and he could both confirm that I'm right and say why exactly it is implemented like this. My hunch is that it ultimately comes from the fact that coredata(z) != z Consider this: x = letters[1:5] x2 - x1 - x mode(x) x1[1:5] - 1:5 # Modify the values of x1 without changing the mode print(x1) x2 - 1:5 # Replace x2 print(x2) Michael Weylandt On Tue, Sep 27, 2011 at 6:41 AM, Ashim Kapoor ashimkap...@gmail.com wrote: Dear Michael, I don't think this is mentioned in the zoo FAQ. May I ask where you read this? Some references ? Thank you for your help, Ashim On Tue, Sep 27, 2011 at 4:06 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Yes, once made into a character zoo, the core data is marked to be of mode character and most attempts to modify involve implicit coercion to that mode. The following however works: library(zoo) z - zoo(1:4, order.by=1:4) str(z) z.Str - z coredata(z.Str) - as.character(coredata(z)) str(z.Str) z.Num - z.Str mode(z.Num) - numeric str(z.Num) However, I prefer to use this sort of line of code: z.Num - zoo(as.double(z.Str), index(z.Str)) finding it a little more transparent. Hope this helps, Michael Weylandt On Tue, Sep 27, 2011 at 5:56 AM, Ashim Kapoor ashimkap...@gmail.comwrote: Dear R-helpers, It seems to me that a character zoo cannot be coerced to a numeric zoo. Below is a minimal example. Can someone tell me what I have done wrong? z-zoo(1:4,order.by=1:4) coredata(z)-as.character(coredata(z)) str(z) zoo series from 1 to 4 Data: chr [1:4] 1 2 3 4 Index: int [1:4] 1 2 3 4 coredata(z)-as.numeric(coredata(z)) str(z) zoo series from 1 to 4 Data: chr [1:4] 1 2 3 4 Index: int [1:4] 1 2 3 4 Many thanks, Ashim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coercing a character zoo to a numeric
Dear Micheal, Thank you. So to make a zoo which has factors in into a numeric,we have to go from factor to character to numeric. The coredata goes fine from factor to character. In the numeric conversion we need a new zoo object. Best, Ashim On Tue, Sep 27, 2011 at 4:21 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: It's just a guess from playing around with a few things. The author of the package is on this list and he could both confirm that I'm right and say why exactly it is implemented like this. My hunch is that it ultimately comes from the fact that coredata(z) != z Consider this: x = letters[1:5] x2 - x1 - x mode(x) x1[1:5] - 1:5 # Modify the values of x1 without changing the mode print(x1) x2 - 1:5 # Replace x2 print(x2) Michael Weylandt On Tue, Sep 27, 2011 at 6:41 AM, Ashim Kapoor ashimkap...@gmail.comwrote: Dear Michael, I don't think this is mentioned in the zoo FAQ. May I ask where you read this? Some references ? Thank you for your help, Ashim On Tue, Sep 27, 2011 at 4:06 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Yes, once made into a character zoo, the core data is marked to be of mode character and most attempts to modify involve implicit coercion to that mode. The following however works: library(zoo) z - zoo(1:4, order.by=1:4) str(z) z.Str - z coredata(z.Str) - as.character(coredata(z)) str(z.Str) z.Num - z.Str mode(z.Num) - numeric str(z.Num) However, I prefer to use this sort of line of code: z.Num - zoo(as.double(z.Str), index(z.Str)) finding it a little more transparent. Hope this helps, Michael Weylandt On Tue, Sep 27, 2011 at 5:56 AM, Ashim Kapoor ashimkap...@gmail.comwrote: Dear R-helpers, It seems to me that a character zoo cannot be coerced to a numeric zoo. Below is a minimal example. Can someone tell me what I have done wrong? z-zoo(1:4,order.by=1:4) coredata(z)-as.character(coredata(z)) str(z) zoo series from 1 to 4 Data: chr [1:4] 1 2 3 4 Index: int [1:4] 1 2 3 4 coredata(z)-as.numeric(coredata(z)) str(z) zoo series from 1 to 4 Data: chr [1:4] 1 2 3 4 Index: int [1:4] 1 2 3 4 Many thanks, Ashim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A very big matrix inside a function
Dear listers: As a part of a intermediate process, I need to use and modify a very big matrix (some 3x3) inside the body of a function. If the matrix is defined in the function, R shows a error message Cannot allocate vector of size 6.7 Gb. But if I define the matrix before the function is called (as soon as the dimension can be calculated) R can allocate it. The problem is that although my function can use the matrix, cannot modify it. The schema is the following: # OPTION 1 # Function definition Myfunction-function(parameters, ...) { ... Verybigmatrix-matrix(0, n, n) # Matrix definition inside the function ... } # Main process ... Value.returned-Myfunction(parameters, ...) # R issues an error message Cannot allocate vector of size 6.7 Gb ... # OPTION 2 # Function definition... Myfunction-function(parameters, ...) { ... Verybigmatrix[index1, index2]-newvalue # Issues an error message Cannot allocate vector of size 6.7 Gb # It assignation is made with `-` error is issued too ... } # Main process ... Verybigmatrix-matrix(0, n, n) # Definition before Myfunction is called doesn't issue an error ... Value.returned-Myfunction(parameters, ...) # R sees Verybigmatrix but if I try to modify it issues an error message Cannot allocate vector of size 6.7 Gb ... Characteristics of the machine I'm working on: Operating system: Redhat Enterprise Linux 6 Kernel and CPU: Linux 2.6.32-131.6.1.el6.x86_64 on x86_64 Processor information: Intel(R) Xeon(TM) CPU 3.60GHz, 4 cores Memory: 8 GB Thanks in advance for your help. Francisco __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question concerning Box.test
Hi everyone, I've got a question concerning the function Box.test for testing autocorrelation in my data. My data consist of (daily) returns of several stocks over time (first row=time, all other rows=stock returns). I intend to perform a Box-Ljung test for my returns (for each stock). Since I have about 3000 stocks in my list, I'm not able to perform the test individually for each stock. Unfortunately the Box.test only works for univariate series. My goal is to get a list with every p-value (from the output) of the 3000 tests (that is a list with 3000 p-values). Any hint how to do this? I tried to do this with the function mapply, but it didn't work. Many thanks in advance best regards S.B. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Capturing the error information
Hi, I want to capture the error information without throwing it out. if i use try command it throws and also stores. and not able to get the error if i use tryCatch command. a=a b=10 c=try(a/b) Please help -- View this message in context: http://r.789695.n4.nabble.com/Capturing-the-error-information-tp3846624p3846624.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with zoo::window()
I have the following time series: class(CCasadesz2) [1] zoo setmanes - cut(time(CCasadesz2),breaks=weeks) CCasadeswz - aggregate(CCasadesz2,sum,by=setmanes) class(CCasadeswz) [1] zoo summary(CCasadeswz) Index CCasadeswz 2009-01-12 00:00:00: 1 Min. : 4.0 2009-01-19 00:00:00: 1 1st Qu.: 150.0 2009-01-26 00:00:00: 1 Median : 268.0 2009-02-02 00:00:00: 1 Mean : 316.6 2009-02-09 00:00:00: 1 3rd Qu.: 387.5 2009-02-23 00:00:00: 1 Max. :1435.0 (Other):93 summary(time(CCasadeswz)) shows dates for 2009 and 2010, and plot(CCasadeswz) correctly displays the data for 2009 and 2010. But when I try to select year 2010: CCasadeswz2010 = window(CCasadeswz,start=as.Date(2010-01-01), end=as.Date(2010-12-31)) Warning messages: 1: In which(in.index all.indexes = start all.indexes = end) : Incompatible methods (Ops.factor, Ops.Date) for = 2: In which(in.index all.indexes = start all.indexes = end) : Incompatible methods (Ops.factor, Ops.Date) for = summary(CCasadeswz2010) Index 2009-01-12 00:00:00:0 2009-01-19 00:00:00:0 2009-01-26 00:00:00:0 2009-02-02 00:00:00:0 2009-02-09 00:00:00:0 2009-02-16 00:00:00:0 (Other):0 (it seems no data have been selected!) plot(CCasadeswz2010) Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs) : need finite 'ylim' values Calls: plot ... boxplot - boxplot.default - do.call - bxp - plot.window In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf 5: In is.na(rows) : is.na() applied to non-(list or vector) of type 'NULL' Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs) : need finite 'ylim' values Calls: plot ... boxplot - boxplot.default - do.call - bxp - plot.window why do I get the warning in window()? My command follows what is mentioned in the manual: R window(z, start = as.Date(2005-02-15), end = as.Date(2005-02-28)) Thanks Agus -- Dr. Agustin Lobo Institut de Ciencies de la Terra Jaume Almera (CSIC) Lluis Sole Sabaris S/N 08028 Barcelona Spain Tel. +34 934095410 Fax. +34 934110012 email: agustin.l...@ictja.csic.es __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coercing a character zoo to a numeric
Yes, that sounds right. Michael PS -- If you are interested, the code zoo:::`coredata-.zoo` contains the line x[] - value confirming my hunch about the old mode being inherited unless a coercion to a more general one is needed. On Tue, Sep 27, 2011 at 7:01 AM, Ashim Kapoor ashimkap...@gmail.com wrote: Dear Micheal, Thank you. So to make a zoo which has factors in into a numeric,we have to go from factor to character to numeric. The coredata goes fine from factor to character. In the numeric conversion we need a new zoo object. Best, Ashim On Tue, Sep 27, 2011 at 4:21 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: It's just a guess from playing around with a few things. The author of the package is on this list and he could both confirm that I'm right and say why exactly it is implemented like this. My hunch is that it ultimately comes from the fact that coredata(z) != z Consider this: x = letters[1:5] x2 - x1 - x mode(x) x1[1:5] - 1:5 # Modify the values of x1 without changing the mode print(x1) x2 - 1:5 # Replace x2 print(x2) Michael Weylandt On Tue, Sep 27, 2011 at 6:41 AM, Ashim Kapoor ashimkap...@gmail.comwrote: Dear Michael, I don't think this is mentioned in the zoo FAQ. May I ask where you read this? Some references ? Thank you for your help, Ashim On Tue, Sep 27, 2011 at 4:06 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Yes, once made into a character zoo, the core data is marked to be of mode character and most attempts to modify involve implicit coercion to that mode. The following however works: library(zoo) z - zoo(1:4, order.by=1:4) str(z) z.Str - z coredata(z.Str) - as.character(coredata(z)) str(z.Str) z.Num - z.Str mode(z.Num) - numeric str(z.Num) However, I prefer to use this sort of line of code: z.Num - zoo(as.double(z.Str), index(z.Str)) finding it a little more transparent. Hope this helps, Michael Weylandt On Tue, Sep 27, 2011 at 5:56 AM, Ashim Kapoor ashimkap...@gmail.comwrote: Dear R-helpers, It seems to me that a character zoo cannot be coerced to a numeric zoo. Below is a minimal example. Can someone tell me what I have done wrong? z-zoo(1:4,order.by=1:4) coredata(z)-as.character(coredata(z)) str(z) zoo series from 1 to 4 Data: chr [1:4] 1 2 3 4 Index: int [1:4] 1 2 3 4 coredata(z)-as.numeric(coredata(z)) str(z) zoo series from 1 to 4 Data: chr [1:4] 1 2 3 4 Index: int [1:4] 1 2 3 4 Many thanks, Ashim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question concerning Box.test
Did you try regular apply? If you have univariate input, there's no reason to use the multivariate mapply. Or more generally: apply(P[-1,],1,function(p) Box.test(p)$p.value) Michael On Tue, Sep 27, 2011 at 4:45 AM, Samir Benzerfa benze...@gmx.ch wrote: Hi everyone, I've got a question concerning the function Box.test for testing autocorrelation in my data. My data consist of (daily) returns of several stocks over time (first row=time, all other rows=stock returns). I intend to perform a Box-Ljung test for my returns (for each stock). Since I have about 3000 stocks in my list, I'm not able to perform the test individually for each stock. Unfortunately the Box.test only works for univariate series. My goal is to get a list with every p-value (from the output) of the 3000 tests (that is a list with 3000 p-values). Any hint how to do this? I tried to do this with the function mapply, but it didn't work. Many thanks in advance best regards S.B. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Intersection between circle and line?
Thanks! In principle it is a mathematical problem. I have already found some solutions with google. I thought there is maybe an existing R function or package which handel this stuff. But it seems not to be. sos package gives me no suitable information for that. With best regards -- View this message in context: http://r.789695.n4.nabble.com/Intersection-between-circle-and-line-tp3844063p3846846.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pearson chi-square test
Not sure what you want to test here with two matrices, but reading the manual helps here as well: y a vector; ignored if x is a matrix. x and y are matrices in your example, so it comes as no surprise that you get different results. On top of that, your manual calculation is not correct if you want to test whether two samples come from the same distribution (so don't be surprised if R still gives a different value...). HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Michael Haenlein Sent: Tuesday, September 27, 2011 12:45 To: r-help@r-project.org Subject: [R] Pearson chi-square test Dear all, I have some trouble understanding the chisq.test function. Take the following example: set.seed(1) A - cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE) B - cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE) C - cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE) x - table(A,B) y - table(A,C) When I calculate the test statistic by hand I get a value of approximately 75.9: http://en.wikipedia.org/wiki/Pearson's_chi- square_test#Calculating_the_test-statistic sum((x-y)^2/y) But when I do chisq.test(x,y) I get a value of 12.2 while chisq.test(y,x) gives a value of 10.3. I understand that I must be doing something wrong here, but I'm not sure what. Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Changing colour in barchart
Thanks for your help - very useful! Martyn. -- View this message in context: http://r.789695.n4.nabble.com/Changing-colour-in-barchart-tp3843209p3846866.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Capturing the error information
On 11-09-27 5:39 AM, arunkumar wrote: Hi, I want to capture the error information without throwing it out. if i use try command it throws and also stores. and not able to get the error if i use tryCatch command. a=a b=10 c=try(a/b) That's how. c has now captured the error information. It did not throw the error, it just printed it. If you don't want it to print, see ?try. Duncan Murdoch Please help -- View this message in context: http://r.789695.n4.nabble.com/Capturing-the-error-information-tp3846624p3846624.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with zoo::window()
On Tue, Sep 27, 2011 at 5:23 AM, Agustin Lobo agustin.l...@ictja.csic.es wrote: PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Provide the output of dput(CCasadesz2) or if that is very large try to cut it down to make the example minimal. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coercing a character zoo to a numeric
On Tue, Sep 27, 2011 at 5:56 AM, Ashim Kapoor ashimkap...@gmail.com wrote: Dear R-helpers, It seems to me that a character zoo cannot be coerced to a numeric zoo. Below is a minimal example. Can someone tell me what I have done wrong? z-zoo(1:4,order.by=1:4) coredata(z)-as.character(coredata(z)) str(z) ‘zoo’ series from 1 to 4 Data: chr [1:4] 1 2 3 4 Index: int [1:4] 1 2 3 4 coredata(z)-as.numeric(coredata(z)) str(z) ‘zoo’ series from 1 to 4 Data: chr [1:4] 1 2 3 4 Index: int [1:4] 1 2 3 4 See ?zoo where it says that the zoo object may be a numeric vector, matrix or a factor. Thus character is not supported although I suspect that a number of operations continue to work anyways -- although evidently not that one. This seems to result in a numeric zoo object: aggregate(z, identity, as.numeric) although these sorts of computations with zoo objects that strictly speaking are not legal are, of course, not officially supported. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regression with ordered arguments
Hi Francesco Dear Petr, thank you so much for your quick reply. I was sure that there were some smart ways to address my issue. I went through it and took some time to look at the help for lapply and mapply. However, some doubts still remain. Following your example, I did: lll -vector(mode = list, length = 3) mmm -vector(mode = list, length = 3) yyy - lapply(lll, function(x) runif(100, min =0, max = 1)) xxx - lapply(mmm, function(x) runif(100, min =0, max = 1)) but then I get stucking again. It's not clear to me how to pass a lm command to mapply. I tried to give a look at lapply and sapply, but I did not manage to go much further. It would be of big help if you could give me some more hints on this or if you could provide me with some references. I am sorry, but I find the help files quite cryptic. Is there a manual or some other source that you would advice me where I could find some more example on how to deal with similar issues? You can use for cycle for (i in 1:3) lll[[i]] -lm(yyy[[i]]~xxx[[i]]) put result of lm to list object lll then lapply(lll, summary) gives you summary of each lm function, if you want to use mapply mmm-mapply(function(x,y) lm(y~x), xxx, yyy, SIMPLIFY=F) you can see structure of any object by str(mmm) Both results shall be similar, for this case I believe that for loop is easier to understand. Regards Petr Thank you very much for your precious support, f. On 27 September 2011 10:08, Petr PIKAL petr.pi...@precheza.cz wrote: Hi Dear R listers, I am trying to be a new R user, but life is not that easy. My problem is the following one: let's assume to have 3 outcome variables (y1, y2, y3) and 3 explanatory ones (x1, x2, x3). How can I run the following three separate regressions without having to repeat the lm command three times? fit.1 - lm(y1 ~ x1) fit.2 - lm(y2 ~ x2) fit.3 - lm(y3 ~ x3) Both the y and x variables have been generated extracting random numbers from uniform distributions using a command such as: y1 - runif(100, min = 0, max = 1) I went to several introductory manuals, the manual R for stata users, econometrics in R, Introductory statistics with R and several blogs and help files, but I didn't find an answer to my question. can you please help me? In Stata I wouldn't have any problem in running this as a loop, but I really can't figure out how to do that with R. You can construct loop with naming through paste, numbers and get in R too but you will find your life much easier to use R powerfull list operations. Insted of y1 - runif(100, min = 0, max = 1) ... lll - vector(mode=list, length=3) lll - lapply(1, function(x) runif(100, min = 0, max = 1)) you can use probably mapply for doing your regression. Or you can easily access part of the list by loop for (i in 1:3) lm(lll[[i]]~xx[[i]]) (if you have your x's in list xx) Regards Petr Thanks in advance for all your help. Best, f. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] wgrib2 for windows 7
Hi all, I want to install wgrib2 on my windows 7 laptop. It appears that a precompiled version exists and works for many. I managed to download a precompiled version and placed it in a directory and added this directory to the PATH. However, when execute R-command, I get the error suggesting that wgrib2 is not recognized as either internal or external command. Any suggestions? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ANOVA define as factor or not
Hi all This is probably a simple problem but somehow I am having much trouble with finding a solution, so I seek your help! I have a data-set with continuous response variables. The explanatory variably is 4xpH treatments (so 8.08, 7.94, 7.81 and 7.71) so also continuous and not technically factorial. However I have decided to do Anova's (as well as regression) to explore the effect of pH. What I don't understand is why the anova done in such a way: summary(aov(BioMass~pH)) ... gives me completely different p-values if I define the pH as factor or not. And what would be the correct approach? Help on this subject would be much appreciated! Hronn -- View this message in context: http://r.789695.n4.nabble.com/ANOVA-define-as-factor-or-not-tp3846861p3846861.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] model selection using logistf package
Hi everyone, I'm wondering how to select the best model when using logistf? AIC does not work neither does anova. I tried fitting a glm model but got the separation warning message so I tried using the logistf package but as I stepwise simplify the model I don't know if the simplification is motivated or not... Can anyone explain to me how I should approach this problem? I have data with y=0 and 1 and three x variables for 328 observations. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/model-selection-using-logistf-package-tp3846958p3846958.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] binomial logistic regression question
Dear subscribers, I am looking for a function which would allow me to model the dependent variable as the number of successes in a series of Bernoulli trials. My data looks like this ID TRIALS SUCCESSESS INDEP1 INDEP2 INDEP3 1 00.273 0.055 0.156 2 98170 74 0.123 0.456 0.789 3 14548630 0.124 0.235 0.007 4 14714949 0.888 0.357 0.321 5 60585 11 0.484 0.235 0.235 6 19895343 0.295 0.123 0.856 I want to find out how independent variables influence the number of successful trials (dependent variable) I had tried to use glm formula regression.glm - glm( SUCCESSESS ~ INDEP1 + INDEP2 + INDEP3, data = data, family = binomial, weights= TRIALS) but got the following: Error in eval(expr, envir, enclos) : y values must be 0 = y = 1 but there y has to be between 0 and 1 (which does make sense, but that's why I am using weights?) I would be grateful for any hints/suggestions on how to proceed. Cheers, Juta -- View this message in context: http://r.789695.n4.nabble.com/binomial-logistic-regression-question-tp3846954p3846954.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] model selection using logistf package
Stepwise variable selection without heavy penalization is invalid. Frank mael wrote: Hi everyone, I'm wondering how to select the best model when using logistf? AIC does not work neither does anova. I tried fitting a glm model but got the separation warning message so I tried using the logistf package but as I stepwise simplify the model I don't know if the simplification is motivated or not... Can anyone explain to me how I should approach this problem? I have data with y=0 and 1 and three x variables for 328 observations. Thanks! - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/model-selection-using-logistf-package-tp3846958p3847087.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with zoo::window()
On Tue, Sep 27, 2011 at 5:23 AM, Agustin Lobo agustin.l...@ictja.csic.es wrote: I have the following time series: class(CCasadesz2) [1] zoo setmanes - cut(time(CCasadesz2),breaks=weeks) CCasadeswz - aggregate(CCasadesz2,sum,by=setmanes) cut produces a factor, not a Date. The by= argument in aggregate.zoo should not be a factor but rather should define the new index. If you want the new index to be a Date class then by= should define a Date vector. See ?aggregate.zoo . Try: setmanes - as.Date(format(setmanes)) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regression with ordered arguments
That's cool! it works :-))) for me (as a stata user) these are quite basic things and I didn't find them anywhere for what concerns R. I can't figure out why. Really, thank you so much, f. On 27 September 2011 14:20, Petr PIKAL petr.pi...@precheza.cz wrote: Hi Francesco Dear Petr, thank you so much for your quick reply. I was sure that there were some smart ways to address my issue. I went through it and took some time to look at the help for lapply and mapply. However, some doubts still remain. Following your example, I did: lll -vector(mode = list, length = 3) mmm -vector(mode = list, length = 3) yyy - lapply(lll, function(x) runif(100, min =0, max = 1)) xxx - lapply(mmm, function(x) runif(100, min =0, max = 1)) but then I get stucking again. It's not clear to me how to pass a lm command to mapply. I tried to give a look at lapply and sapply, but I did not manage to go much further. It would be of big help if you could give me some more hints on this or if you could provide me with some references. I am sorry, but I find the help files quite cryptic. Is there a manual or some other source that you would advice me where I could find some more example on how to deal with similar issues? You can use for cycle for (i in 1:3) lll[[i]] -lm(yyy[[i]]~xxx[[i]]) put result of lm to list object lll then lapply(lll, summary) gives you summary of each lm function, if you want to use mapply mmm-mapply(function(x,y) lm(y~x), xxx, yyy, SIMPLIFY=F) you can see structure of any object by str(mmm) Both results shall be similar, for this case I believe that for loop is easier to understand. Regards Petr Thank you very much for your precious support, f. On 27 September 2011 10:08, Petr PIKAL petr.pi...@precheza.cz wrote: Hi Dear R listers, I am trying to be a new R user, but life is not that easy. My problem is the following one: let's assume to have 3 outcome variables (y1, y2, y3) and 3 explanatory ones (x1, x2, x3). How can I run the following three separate regressions without having to repeat the lm command three times? fit.1 - lm(y1 ~ x1) fit.2 - lm(y2 ~ x2) fit.3 - lm(y3 ~ x3) Both the y and x variables have been generated extracting random numbers from uniform distributions using a command such as: y1 - runif(100, min = 0, max = 1) I went to several introductory manuals, the manual R for stata users, econometrics in R, Introductory statistics with R and several blogs and help files, but I didn't find an answer to my question. can you please help me? In Stata I wouldn't have any problem in running this as a loop, but I really can't figure out how to do that with R. You can construct loop with naming through paste, numbers and get in R too but you will find your life much easier to use R powerfull list operations. Insted of y1 - runif(100, min = 0, max = 1) ... lll - vector(mode=list, length=3) lll - lapply(1, function(x) runif(100, min = 0, max = 1)) you can use probably mapply for doing your regression. Or you can easily access part of the list by loop for (i in 1:3) lm(lll[[i]]~xx[[i]]) (if you have your x's in list xx) Regards Petr Thanks in advance for all your help. Best, f. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] binomial logistic regression question
On 09/27/2011 07:53 AM, majesty wrote: Dear subscribers, I am looking for a function which would allow me to model the dependent variable as the number of successes in a series of Bernoulli trials. My data looks like this ID TRIALS SUCCESSESS INDEP1 INDEP2 INDEP3 1 00.273 0.055 0.156 2 98170 74 0.123 0.456 0.789 3 14548630 0.124 0.235 0.007 4 14714949 0.888 0.357 0.321 5 60585 11 0.484 0.235 0.235 6 19895343 0.295 0.123 0.856 I want to find out how independent variables influence the number of successful trials (dependent variable) I had tried to use glm formula regression.glm- glm( SUCCESSESS ~ INDEP1 + INDEP2 + INDEP3, data = data, family = binomial, weights= TRIALS) Try regression.glm- glm(cbind(SUCCESSESS,TRIALS) ~ INDEP1 + INDEP2 + INDEP3, data = data,family = binomial) as is specified in the details of ?glm. -- Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mahalanobis Distance
David Cross is correct. Your covariance matrix is singular because you have more columns (15) than rows (2). David actually produced the covariance matrix for cbind(s.1, s.2) which is not singular, but you are using rbind(s.1, s.2). Try cor(rbind(s.1, s.2)) and you will see that the correlations are all +/-1. -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of david.jes...@ubs.com Sent: Tuesday, September 27, 2011 3:20 AM To: thetrooperm...@gmail.com; r-help@r-project.org Subject: Re: [R] Mahalanobis Distance Hi One thought would be to fit say a GARCH model to your historical data series, divide the returns by the sigma estimates and then repeat. This would have the advantage of getting the data to be closer to the same scale. Regards David -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of jorgeA Sent: 26 September 2011 21:07 To: r-help@r-project.org Subject: Re: [R] Mahalanobis Distance Hello David, Thank you for the help anyway. Well answering your question However, I wonder how much value there is to computing the Mahalanobis distance with two variables that are measured on such different scales?: These two variables are subseries of the same time series. What I'm doing is using one method of forecasting time series that searches in the past of the time series, similar subseries to be an input of a forecasting function. I'm testing several distance measures, and one of that is the mahalanobis distance. But right now I'm stuck with this problem Best regards, Jorge Aikes Junior Universidade Estadual do Oeste do Paraná - Brazil. -- View this message in context: http://r.789695.n4.nabble.com/Mahalanobis-Distance-tp3844960p3845168.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Issued by UBS AG or affiliates to professional =\ invest...{{dropped:10}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matrix and list indices
Hi guys, I am trying to replace all elements of earth that are equal to zero with their corresponding elements in mars. I can do the replace with a bunch of for-loops, but I don't think this is the R way of doing things. my_list - list( earth=array(c(0,0,45,0,0,45,0,45),dim=c(2,2,2)), mars=array(c(8:1),dim=c(2,2,2))) my_list for (i in c(1:2)) { for (j in c(1:2)) { for (k in c(1:2)) { if (my_list$earth[i,j,k] == 0) { my_list$earth[i,j,k] - my_list$mars[i,j,k] } } } } my_list Do you guys have any suggestions for getting rid of the ugly for-loops? Many thanks, Fernando Álvarez Nordea e-Markets Strandgade 3 PO Box 850 DK-0900 Copenhagen C Denmark Tel.: +45 33 33 32 67 Mobile: +45 61 55 27 54 This transmission is intended solely for the person or entity to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient, please be notified that any dissemination, distribution or copying is strictly prohibited. If you have received this transmission by mistake, please let us know and then delete it from your system. P Please consider the impact on the environment before printing this e-mail. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting similar rows
Startsituation: structure(c(1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1), .Dim = 4:5, .Dimnames = structure(list(subject = c(s1, s2, s3, s4), class = c(c1, c2, c3, c4, c5)), .Names = c(subject, class)), class = c(xtabs, table), call = xtabs(formula = ~subject + class, data = ia)) I want a count of subjects that match the same classes in subject_cnt and a count of the number of classes in class_cnt The result of this example should be: structure(list(subject_cnt = c(2L, 1L, 1L), class_cnt = c(3L, 2L, 3L), c1 = c(1L, 1L, 0L), c2 = c(0L, 1L, 0L), c3 = c(1L, 0L, 1L), c4 = c(0L, 0L, 1L), c5 = c(1L, 0L, 1L)), .Names = c(subject_cnt, class_cnt, c1, c2, c3, c4, c5), class = data.frame, row.names = c(NA, -3L)) How can I achieve this in R, without complicated loops? PS. Note that the number of classes and subjects are in real quite big. Cheers, Lars -- View this message in context: http://r.789695.n4.nabble.com/Counting-similar-rows-tp3847051p3847051.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How does the survfit.coxph calculate the survival value?
Sorry for a long question. I am in an urgent using R to fit the cox proportional model. My data is a data frame including 100 individuals which are software stress tests. There are time-to-failure and six covariates meaning the system resource (every 4 minutes). Is it possible to use R to fit the Cox model? I used *coxph* to estimated the coefficients for both time-independent and time-dependent covariates. And next I need to calculated the survival and hazard rate for cox model with time-independent and that with time-dependent covariates. I see many people use *survfit.coxph *to get the survival. Could anybody please tell me how is the survival value calculated? For example, what kind of procedure. Any suggestion is a great help! -- View this message in context: http://r.789695.n4.nabble.com/How-does-the-survfit-coxph-calculate-the-survival-value-tp3847179p3847179.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] M-Splines design matrix
Hi Can any one inform me an r package used to fit Cubic M-splines or r functions used to have design and penalty matrices in fitting cubic M-splines please? Thanks, Tigist [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix and list indices
On Tue, Sep 27, 2011 at 9:43 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Untested, I believe this should work, though you might need to modify for floating point funny business in testing the equalities: my_list - list( earth=array(c(0,0,45,0,0,45,0,45),dim=c(2,2,2)), mars=array(c(8:1),dim=c(2,2,2))) my_list$earth[my_list$earth==0] - my_list$mars[my_list$earth==0] Hope this helps, Michael Weylandt On Tue, Sep 27, 2011 at 8:49 AM, fernando.cabr...@nordea.com wrote: Hi guys, I am trying to replace all elements of earth that are equal to zero with their corresponding elements in mars. I can do the replace with a bunch of for-loops, but I don't think this is the R way of doing things. my_list - list( earth=array(c(0,0,45,0,0,45,0,45),dim=c(2,2,2)), mars=array(c(8:1),dim=c(2,2,2))) my_list for (i in c(1:2)) { for (j in c(1:2)) { for (k in c(1:2)) { if (my_list$earth[i,j,k] == 0) { my_list$earth[i,j,k] - my_list$mars[i,j,k] } } } } my_list Do you guys have any suggestions for getting rid of the ugly for-loops? Many thanks, Fernando Álvarez Nordea e-Markets Strandgade 3 PO Box 850 DK-0900 Copenhagen C Denmark Tel.: +45 33 33 32 67 Mobile: +45 61 55 27 54 This transmission is intended solely for the person or entity to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient, please be notified that any dissemination, distribution or copying is strictly prohibited. If you have received this transmission by mistake, please let us know and then delete it from your system. P Please consider the impact on the environment before printing this e-mail. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] change rownames
Your data frame does not seem to have come through with the formatting. Could you send a plain text example (easily created using dput() ) of what you have now and what you want? Michael On Tue, Sep 27, 2011 at 7:24 AM, charlotte.ndir...@unil.ch wrote: Hi, I already know row.names(comm)[1:5]- paste(com, 1:5, sep=) But, my data looks like e.g. plots sp1 sp2 1 0.2 0.5 2 4 7 10 I would like to add com in front of the plot numbers without changing them? Thank you - Original Message - From: R. Michael Weylandt michael.weyla...@gmail.com To: charlotte.ndir...@unil.ch Subject: Re: [R] change rownames Date: Tue, 27 Sep 2011 06:39:18 -0400 I have no idea what you mean by plot numbers, but consider this: x - data.frame(a = 1:5, b = sin(1:5) , c = letters[1:5]) print(x) rownames(x) - paste(rownames(x),com) print(x) Hope this helps, Michael Weylandt On Tue, Sep 27, 2011 at 6:33 AM, charlotte.ndir...@unil.ch wrote: Hi All, How do I add com to the row names of a data frame, without deleting the plot numbers? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem about rbpspline-selection of knots
Hi, My problem is that i will use rbpspline{SpatialExtremes} to fit a penalized spline with radial basis functions to my data but i have two predictor variables and i cannot select knots. In R help there is an example such that n - 200 x - runif(n) fun - function(x) sin(3 * pi * x) y - fun(x) + rnorm(n, 0, sqrt(0.4)) knots - quantile(x, prob = 1:(n/4) / (n/4 + 1)) fitted - rbpspline(y, x, knots = knots, degree = 3) fitted plot(x, y) lines(fitted, col = 2) In my case, n-200 x1-runif(n) x2-runif(n) fun-function(x1,x2) 2*x1+x2^2 y - fun(x1,x2) + rnorm(n, 0, sqrt(0.4)) but at this point i do not know how to select knots. I want to take cross products of quantiles of x1 and x2. In R help it says that knots is defined as a vector that gives the coordinates of the knots. How can i give the coordinates of two dimensional knots as a vector? Thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA define as factor or not
HronnE wrote on 09/27/2011 06:27:38 AM: Hi all This is probably a simple problem but somehow I am having much trouble with finding a solution, so I seek your help! I have a data-set with continuous response variables. The explanatory variably is 4xpH treatments (so 8.08, 7.94, 7.81 and 7.71) so also continuous and not technically factorial. However I have decided to do Anova's (as well as regression) to explore the effect of pH. What I don't understand is why the anova done in such a way: summary(aov(BioMass~pH)) ... gives me completely different p-values if I define the pH as factor or not. And what would be the correct approach? Help on this subject would be much appreciated! Hronn If pH is a numeric variable (see ?class) then your formula is instructing aov() to include it as a linear term with one degree of freedom, just like lm() does if you're fitting a regression. If you want to treat pH as a categorical variable, then you must make sure that it is included in the model as a factor. Then you will see that it uses three degrees of freedom. BioMass - rnorm(100) pH - sample(c(8.08, 7.94, 7.81, 7.71), size=100, replace=TRUE) pH.f - as.factor(pH) summary(aov(BioMass ~ pH)) summary(lm(BioMass ~ pH)) summary(aov(BioMass ~ pH.f)) summary(lm(BioMass ~ pH.f)) Jean [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pearson chi-square test
Just for completeness: the manual calculation you'd want is most likely sum((x-y)^2 / (x+y)) (that's one you can find on the Wikipedia link you provided). To get the same from chisq.test, try something like chisq.test(data.frame(x,y)[,c(3,6)]) (there are surely smarter ways, but at least it works here). Note that something like chisq.test(as.vector(x), as.vector(y)) will give a different test, i.e. based on a contingency table of x cross y). M. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Meyners, Michael Sent: Tuesday, September 27, 2011 13:28 To: Michael Haenlein; r-help@r-project.org Subject: Re: [R] Pearson chi-square test Not sure what you want to test here with two matrices, but reading the manual helps here as well: y a vector; ignored if x is a matrix. x and y are matrices in your example, so it comes as no surprise that you get different results. On top of that, your manual calculation is not correct if you want to test whether two samples come from the same distribution (so don't be surprised if R still gives a different value...). HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Michael Haenlein Sent: Tuesday, September 27, 2011 12:45 To: r-help@r-project.org Subject: [R] Pearson chi-square test Dear all, I have some trouble understanding the chisq.test function. Take the following example: set.seed(1) A - cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE) B - cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE) C - cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE) x - table(A,B) y - table(A,C) When I calculate the test statistic by hand I get a value of approximately 75.9: http://en.wikipedia.org/wiki/Pearson's_chi- square_test#Calculating_the_test-statistic sum((x-y)^2/y) But when I do chisq.test(x,y) I get a value of 12.2 while chisq.test(y,x) gives a value of 10.3. I understand that I must be doing something wrong here, but I'm not sure what. Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Keep consecutive year observations (remove gap's) in panel data (dataframes). Difficulties in using lag(). Package plm.
Hi everyone. I have two questions. Ive found some other questions and answers similar to these but they didnt solve my problem. Im working with a panel of firm/years observations (see my reproducible example). Im using the plm package. My panel not only is unbalanced but also have some gaps in years. #reproducible example data1-data.frame(year=c(2001,2002,2003,2004,2005,2001,2002,2004,2005,2001,2 002,2003,2005), firm=c(1,1,1,1,1,2,2,2,2,3,3,3,3),x=c(11,22,32,25,26,47,85,98,101,14,87,56,1 4)) data1 #load package plm and format data data2-plm.data(data1,index=c(firm,year)) First I want to keep for each firm the longest serie of consecutive years. So I want a dataframe like this (keeping years 2001 and 2002 in firm 2) year firm x 1 20011 11 2 20021 22 3 20031 32 4 20041 25 5 20051 26 6 20012 47 7 20022 85 8 20013 14 9 20023 87 10 20033 56 Or like this (keeping years 2004 and 2005 in firm 2) year firm x 1 20011 11 2 20021 22 3 20031 32 4 20041 25 5 20051 26 6 20042 98 7 20052 101 8 20013 14 9 20023 87 10 20033 56 Second, I need to create a new variable that is the lagged value of x. I've done newdata1-transform(data1,y=lag(x,1)) But it doesn't work. I also need to create a new variable that is the opposite of lag(). I've done newdata2-transform(data1,z=lag(x,-1)) But, of course, it doesn't work neither. Thank you for all your help. Cecília Carmo (Universidade de Aveiro Portugal) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pearson chi-square test
Dear Michael, Thanks very much for your answers! The purpose of my analysis is to test whether the contingency table x is different from the contingency table y. Or, to put it differently, whether there is a significant difference between the joint distribution AB and AC. Based on your answer I'm wondering whether the best way to do this is really a chisq.test? Or is there probably a different function or package I should use altogether? Thanks, Michael -Original Message- From: Meyners, Michael [mailto:meyner...@pg.com] Sent: Dienstag, 27. September 2011 17:00 To: Michael Haenlein; r-help@r-project.org Subject: RE: [R] Pearson chi-square test Just for completeness: the manual calculation you'd want is most likely sum((x-y)^2 / (x+y)) (that's one you can find on the Wikipedia link you provided). To get the same from chisq.test, try something like chisq.test(data.frame(x,y)[,c(3,6)]) (there are surely smarter ways, but at least it works here). Note that something like chisq.test(as.vector(x), as.vector(y)) will give a different test, i.e. based on a contingency table of x cross y). M. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Meyners, Michael Sent: Tuesday, September 27, 2011 13:28 To: Michael Haenlein; r-help@r-project.org Subject: Re: [R] Pearson chi-square test Not sure what you want to test here with two matrices, but reading the manual helps here as well: y a vector; ignored if x is a matrix. x and y are matrices in your example, so it comes as no surprise that you get different results. On top of that, your manual calculation is not correct if you want to test whether two samples come from the same distribution (so don't be surprised if R still gives a different value...). HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Michael Haenlein Sent: Tuesday, September 27, 2011 12:45 To: r-help@r-project.org Subject: [R] Pearson chi-square test Dear all, I have some trouble understanding the chisq.test function. Take the following example: set.seed(1) A - cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE) B - cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE) C - cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE) x - table(A,B) y - table(A,C) When I calculate the test statistic by hand I get a value of approximately 75.9: http://en.wikipedia.org/wiki/Pearson's_chi- square_test#Calculating_the_test-statistic sum((x-y)^2/y) But when I do chisq.test(x,y) I get a value of 12.2 while chisq.test(y,x) gives a value of 10.3. I understand that I must be doing something wrong here, but I'm not sure what. Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] binomial logistic regression question
On Tue, Sep 27, 2011 at 2:59 PM, Patrick Breheny patrick.breh...@uky.eduwrote: On 09/27/2011 07:53 AM, majesty wrote: Dear subscribers, I am looking for a function which would allow me to model the dependent variable as the number of successes in a series of Bernoulli trials. My data looks like this ID TRIALS SUCCESSESS INDEP1 INDEP2 INDEP3 1 00.273 0.055 0.156 2 98170 74 0.123 0.456 0.789 3 14548630 0.124 0.235 0.007 4 14714949 0.888 0.357 0.321 5 60585 11 0.484 0.235 0.235 6 19895343 0.295 0.123 0.856 I want to find out how independent variables influence the number of successful trials (dependent variable) I had tried to use glm formula regression.glm- glm( SUCCESSESS ~ INDEP1 + INDEP2 + INDEP3, data = data, family = binomial, weights= TRIALS) Try regression.glm- glm(cbind(SUCCESSESS,TRIALS) ~ INDEP1 + INDEP2 + INDEP3, data = data,family = binomial) That should be regression.glm - glm(cbind(SUCCESSESS,TRIALS - SUCCESSESS) ~ INDEP1 + INDEP2 + INDEP3, data = data,family = binomial) i.e., the second column in the response is the number of failures. as is specified in the details of ?glm. -- Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Göran Broström [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM accuracy question
Il 27/09/11 01:58, R. Michael Weylandt ha scritto: Why exactly do you want to stabilize your results? If it's in preparation for publication/classroom demo/etc., certainly resetting the seed before each run (and hence getting the same sample() output) will make your results exactly reproducible. However, if you are looking for a clearer picture of the true efficacy of your svm and there's no real underlying order to the data set (i.e., not a time series), then a straight sample() seems better to me. I'm not particularly well read on the svm literature, but it sounds like you are worried by widely varying performance of the svm itself. If that's the case, it seems (to me at least) that there are certain data points that are strongly informative and it might be a more interesting question to look into which ones those are. I guess my answer, as a total non-savant in the field, is that it depends on your goal: repeated runs with sample will give you more information about the strength of the svm while setting the seed will give you reproducibility. Importance sampling might be of interest, particularly if it could be tied to the information content of each data point, and a quick skim of the MC variance reduction literature might just provide some fun insights. I'm not entirely sure how you mean to bootstrap the act of setting the seed (a randomly set seed seems to be the same as not setting a seed at all) but that might give you a nice middle ground. Sorry this can't be of more help, Michael On Mon, Sep 26, 2011 at 6:32 PM, Riccardo G-Mail ric.rom...@gmail.com mailto:ric.rom...@gmail.com wrote: Hi, I'm working with support vector machine for the classification purpose, and I have a problem about the accuracy of prediction. I divided my data set in train (1/3 of enteire data set) and test (2/3 of data set) using the sample function. Each time I perform the svm model I obtain different result, according with the result of the sample function. I would like to stabilize the performance of my analysis. To do this I used the set.seed function. Is there a better way to do this? Should I perform a bootstrap on my work-flow (sample and svm)? Here is an example of my workflow: ### not to run index - 1:nrow(myData) set.seed(23) testindex - sample(index, trunc(length(index)/3)) testset - myData[testindex, ] trainset - myData[-testindex, ] tune.svm() svm.model - svm(Factor ~ ., data = myData, cost = from tune.svm, gamma = from tune.svm, cross= 10, subset= testset) summary(svm.model) predict(svm.model, testset) Best Riccardo R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/__listinfo/r-help https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/__posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thanks for your suggestion, I'm agree with you about the uselessness of set.seed inside a bootstrap; the idea of bootstrap exclude the set.seed. In my mind the bootstrap could allow me to understand the distribution of the prediction accuracy of the model. My doubt stems from the fact that I'm not a statistician. Best __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting similar rows
Metronome123 wrote on 09/27/2011 07:24:50 AM: Startsituation: structure(c(1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1), .Dim = 4:5, .Dimnames = structure(list(subject = c(s1, s2, s3, s4), class = c(c1, c2, c3, c4, c5)), .Names = c(subject, class)), class = c(xtabs, table), call = xtabs(formula = ~subject + class, data = ia)) I want a count of subjects that match the same classes in subject_cnt and a count of the number of classes in class_cnt The result of this example should be: structure(list(subject_cnt = c(2L, 1L, 1L), class_cnt = c(3L, 2L, 3L), c1 = c(1L, 1L, 0L), c2 = c(0L, 1L, 0L), c3 = c(1L, 0L, 1L), c4 = c(0L, 0L, 1L), c5 = c(1L, 0L, 1L)), .Names = c(subject_cnt, class_cnt, c1, c2, c3, c4, c5), class = data.frame, row.names = c(NA, -3L)) How can I achieve this in R, without complicated loops? PS. Note that the number of classes and subjects are in real quite big. Cheers, Lars Try this: xt - structure(c(1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1), .Dim = 4:5, .Dimnames = structure(list(subject = c(s1, s2, s3, s4), class = c(c1, c2, c3, c4, c5)), .Names = c(subject, class)), class = c(xtabs, table)) df - as.data.frame(unclass(xt)) dfu - unique(df) class_cnt - apply(dfu, 1, sum) subject_cnt - tabulate(match(apply(df, 1, paste, collapse=-), apply(dfu, 1, paste, collapse=-))) data.frame(subject_cnt, class_cnt, dfu) Jean [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pearson chi-square test
I suspect that the chisquare-test might not be appropriate, as you have constraints (same number of observations for A in both contingency tables). I further suspect that there is no test readily available for that, but I might be wrong. Maybe randomization tests could help here, but it would require a bit of thinking AND programming to accomplish that. chisq.test might give you an approximate solution, but I can't say how good this will be (and it might also depend on the data, btw). Best, Michael From: Michael Haenlein Sent: Tuesday, September 27, 2011 17:05 To: r-help@r-project.org Cc: Meyners, Michael Subject: RE: [R] Pearson chi-square test Dear Michael, Thanks very much for your answers! The purpose of my analysis is to test whether the contingency table x is different from the contingency table y. Or, to put it differently, whether there is a significant difference between the joint distribution AB and AC. Based on your answer I'm wondering whether the best way to do this is really a chisq.test? Or is there probably a different function or package I should use altogether? Thanks, Michael -Original Message- From: Meyners, Michael Sent: Dienstag, 27. September 2011 17:00 To: Michael Haenlein; r-help@r-project.org Subject: RE: [R] Pearson chi-square test Just for completeness: the manual calculation you'd want is most likely sum((x-y)^2 / (x+y)) (that's one you can find on the Wikipedia link you provided). To get the same from chisq.test, try something like chisq.test(data.frame(x,y)[,c(3,6)]) (there are surely smarter ways, but at least it works here). Note that something like chisq.test(as.vector(x), as.vector(y)) will give a different test, i.e. based on a contingency table of x cross y). M. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Meyners, Michael Sent: Tuesday, September 27, 2011 13:28 To: Michael Haenlein; r-help@r-project.org Subject: Re: [R] Pearson chi-square test Not sure what you want to test here with two matrices, but reading the manual helps here as well: y a vector; ignored if x is a matrix. x and y are matrices in your example, so it comes as no surprise that you get different results. On top of that, your manual calculation is not correct if you want to test whether two samples come from the same distribution (so don't be surprised if R still gives a different value...). HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Michael Haenlein Sent: Tuesday, September 27, 2011 12:45 To: r-help@r-project.org Subject: [R] Pearson chi-square test Dear all, I have some trouble understanding the chisq.test function. Take the following example: set.seed(1) A - cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE) B - cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE) C - cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE) x - table(A,B) y - table(A,C) When I calculate the test statistic by hand I get a value of approximately 75.9: http://en.wikipedia.org/wiki/Pearson's_chi- square_test#Calculating_the_test-statistic sum((x-y)^2/y) But when I do chisq.test(x,y) I get a value of 12.2 while chisq.test(y,x) gives a value of 10.3. I understand that I must be doing something wrong here, but I'm not sure what. Thanks, Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA define as factor or not
Thank you very much for a quick reply! I had not realized the degrees of freedom changed. It was my lack of understanding of the aov function. I will continue defining the pH as factor for the ANOVA's. Cheers, Hronn - -- Hrönn Egilsdóttir PhD Student Marine Research Institute Skúlagata 4 121 Reykjavík ICELAND -- View this message in context: http://r.789695.n4.nabble.com/ANOVA-define-as-factor-or-not-tp3846861p3847781.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] wgrib2 for windows 7
On 27.09.2011 14:06, Bhaskaran wrote: Hi all, I want to install wgrib2 I guess you will have to explain what wgrib2 is and why it is related to R. Uwe Ligges on my windows 7 laptop. It appears that a precompiled version exists and works for many. I managed to download a precompiled version and placed it in a directory and added this directory to the PATH. However, when execute R-command, I get the error suggesting that wgrib2 is not recognized as either internal or external command. Any suggestions? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question concerning Box.test
Send this again using dput() to give a plain text output and I'll look at it. Also, I think you should probably look into the difference between a row and a column. Michael On Tue, Sep 27, 2011 at 11:48 AM, Samir Benzerfa benze...@gmx.ch wrote: Many thanks for your hint. I tried regular apply now. However, it still doesnt work. Function apply works fine with other regular functions like sum or mean. But for the function Box.test(x, ) it gives me the following error message: ** ** *Error in Box.test( ) : * * x is not a vector or univariate time series* * * For simplicity, I tried to do the test with a simple 2x20 Matrix for 2 stocks (see below), but it still does not work. It works well if I do the test individually for each row à Box.test(x[,1], ) and Box.test(x[,2], )** ** ** ** BANK.ABC ABC.MATERIAL 1 0.00.0 2 0.00.0 3 0.00.0 4 0.003181659 -0.008194479 5 -0.006386799 -0.008352074 6 0.0280287240.008352074 7 -0.0153476920.004116566 8 -0.0159100020.016086820 9 0.0032289700.019305155 10 -0.013062473 -0.011479818 11 0.00.0 12 -0.038090050 -0.011791525 13 0.021189299 -0.008042720 14 -0.003460532 -0.008194479 15 -0.010550182 -0.012589127 16 0.0174438900.016705694 17 0.0101396310.0 18 -0.017090.012120633 19 0.0102999570.023271342 20 0.0 -0.007619397 ** ** Any other hints? My goal is to do the Box.test for each row (for each stock) separately. So I want R to take each row one by one and perform the test. ** ** ** ** *Von:* R. Michael Weylandt [mailto:michael.weyla...@gmail.com] *Gesendet:* Dienstag, 27. September 2011 13:12 *An:* Samir Benzerfa *Cc:* r-help@r-project.org *Betreff:* Re: [R] Question concerning Box.test ** ** Did you try regular apply? If you have univariate input, there's no reason to use the multivariate mapply. Or more generally: apply(P[-1,],1,function(p) Box.test(p)$p.value) Michael On Tue, Sep 27, 2011 at 4:45 AM, Samir Benzerfa benze...@gmx.ch wrote:** ** Hi everyone, I've got a question concerning the function Box.test for testing autocorrelation in my data. My data consist of (daily) returns of several stocks over time (first row=time, all other rows=stock returns). I intend to perform a Box-Ljung test for my returns (for each stock). Since I have about 3000 stocks in my list, I'm not able to perform the test individually for each stock. Unfortunately the Box.test only works for univariate series. My goal is to get a list with every p-value (from the output) of the 3000 tests (that is a list with 3000 p-values). Any hint how to do this? I tried to do this with the function mapply, but it didn't work. Many thanks in advance best regards S.B. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** ** [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] searching several subsequences in a single string sequence
Hi all I am analyzing bird song element sequences. I would like to know how can I get how many times a given subsequence is found in single string sequence. For example: If I have this single sequence: ABCABAABABABCAB I am looking for the subsequence ABC. Want I need to get here is that the subsequence is found twice. Any idea how can I do this? Thanks in advance Marcelo Araya-Salas Ph.D. Student Avian Communication and Evolution Lab Department of Biology New Mexico State University Lab: 575-646-4863 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] searching several subsequences in a single string sequence
Hi Marcelo, Try this: x - ABCABAABABABCAB length(gregexpr(pattern=ABC, x)[[1]]) See ?gregexpr for more details (though I admit that it is not easy to understand this help page) HTH, Ivan Le 9/27/2011 18:51, Marcelo Araya a écrit : Hi all I am analyzing bird song element sequences. I would like to know how can I get how many times a given subsequence is found in single string sequence. For example: If I have this single sequence: ABCABAABABABCAB I am looking for the subsequence ABC. Want I need to get here is that the subsequence is found twice. Any idea how can I do this? Thanks in advance Marcelo Araya-Salas Ph.D. Student Avian Communication and Evolution Lab Department of Biology New Mexico State University Lab: 575-646-4863 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Dept. Mammalogy Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] searching several subsequences in a single string sequence
On Tue, Sep 27, 2011 at 5:51 PM, Marcelo Araya marcelo...@gmail.com wrote: Hi all I am analyzing bird song element sequences. I would like to know how can I get how many times a given subsequence is found in single string sequence. For example: If I have this single sequence: ABCABAABABABCAB I am looking for the subsequence ABC. Want I need to get here is that the subsequence is found twice. Any idea how can I do this? gregexpr will return the position and length of multiple matches. And you can feed it a vector. So: songs=c(ABCABAABABABCAB,ABACAB,ABABCABCBC) gregexpr(m,songs) [[1]] [1] 1 11 attr(,match.length) [1] 3 3 [[2]] [1] -1 attr(,match.length) [1] -1 [[3]] [1] 3 6 attr(,match.length) [1] 3 3 - in the first item, it was found at posn 1 and 11 - in the second it wasnt found at all - in the third, it was found at posn 3 and 6 so just do some apply-ing to the returned list and get the length of each element. Job done! Barry PS bonus points for spotting the hidden prog-rock song title. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding axis to an ellipse: ellipse package
Dear list members, This might be a silly question but I just can't figure it out. I am using the ellipse package on covariance matrices. I would simply like to plot my ellipses WITH its two axis ploted as well. These axis represents the 2 eigen vectors of my matrix and it is important that I can graphically show them. Is there an easy way to do so? Many thanks, Antoine -- View this message in context: http://r.789695.n4.nabble.com/Adding-axis-to-an-ellipse-ellipse-package-tp3847954p3847954.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merger two 3-d scatter plot
If you are talking about the scatterplot3d package, and we assume your data is in data.frame called dat: library(scatterplot3d) s3d - scatterplot3d(dat$obs30, dat$Cases, dat$RANK1) s3d$points3d(dat$obs30, dat$Cases, dat$RANK2, col=2) Uwe Ligges On 26.09.2011 22:08, XINLI LI wrote: Dear David and R groups: I have the data as follows, I want to plot the Rank1 ~ obs30*Cases and Rank2 ~ obs30*Cases on the same plot as one 3-D scatter plot, how to do that? Any help is highly appreciated. // ID obs30 Cases RANK1 RANK2 1 0.03175 63 82 81 2 0.0 34 1 34 3 0.0 36 2 41 4 0.0 54 3 26 5 0.0 22 4 42 6 0.00746 134 39 32 7 0.0 2 5 53 8 0.01190 168 46 31 9 0.03012 166 78 86 10 0.00775 129 43 37 11 0.01290 155 51 43 12 0.00459 218 24 6 13 0.04348 23 92 73 14 0.02198 182 66 71 15 0.01546 194 60 62 16 0.01370 73 47 40 17 0.00424 236 23 2 18 0.00735 136 31 19 19 0.03030 66 86 85 20 0.03030 33 65 58 21 0.02273 132 59 59 22 0.02439 123 68 68 23 0.08333 24 83 78 24 0.01266 79 50 45 25 0.01024 293 28 4 26 0.00926 108 29 14 27 0.03750 160 85 95 28 0.01290 155 55 55 29 0.00935 107 30 20 30 0.04598 87 89 94 31 0.01087 92 41 36 32 0.0 2 6 50 33 0.01695 118 42 23 34 0.04918 61 88 92 # my test codes print(scatterplot3d(RANK1 ~ obs30 + Cases, type = h, angle = plot.angle, color = rgb(0,0,0,0.5), pch = 20, cex.symbols=2, col.axis=gray, col.grid=gray)) # print(scatterplot3d(RANK2 ~ obs30 + Cases, type = h, angle = plot.angle, color = rgb(0,0,0,0.5), pch = 20, cex.symbols=2, col.axis=gray, col.grid=gray)) On Mon, Sep 26, 2011 at 1:33 PM, David Winsemiusdwinsem...@comcast.netwrote: On Sep 26, 2011, at 3:06 PM, XINLI LI wrote: Dear R groups: I have the data as follows, I want to plot the Rank1 ~ obs30*Cases and Rank2 ~ obs30*Cases on the same plot as one 3-D scatter plot, how to do that? Any help is highly appreciated. There is no Method in that data and your code throws an error. ID obs30 Cases RANK1 RANK2 1 0.03175 63 82 81 2 0.0 34 1 34 3 0.0 36 2 41 4 0.0 54 3 26 5 0.0 22 4 42 6 0.00746 134 39 32 7 0.0 2 5 53 8 0.01190 168 46 31 9 0.03012 166 78 86 10 0.00775 129 43 37 11 0.01290 155 51 43 12 0.00459 218 24 6 13 0.04348 23 92 73 14 0.02198 182 66 71 15 0.01546 194 60 62 16 0.01370 73 47 40 17 0.00424 236 23 2 18 0.00735 136 31 19 19 0.03030 66 86 85 20 0.03030 33 65 58 21 0.02273 132 59 59 22 0.02439 123 68 68 23 0.08333 24 83 78 24 0.01266 79 50 45 25 0.01024 293 28 4 26 0.00926 108 29 14 27 0.03750 160 85 95 28 0.01290 155 55 55 29 0.00935 107 30 20 30 0.04598 87 89 94 31 0.01087 92 41 36 32 0.0 2 6 50 33 0.01695 118 42 23 34 0.04918 61 88 92 I used the following code for one plot: print(cloud(RANK1 ~ Cases * obs30, data = xll, groups = Method, screen = list(z=20, x= -70), perspective = FALSE, key = list(title = Ranking HOSP, x = 100, y = 100, corner= c(0, 0), border = TRUE, points = Rows(trellis.par.get(**superpose.symbol), 1:2 # ##**##** ##** print(cloud(RANK2 ~ Cases * obs30, data = xll, groups = Method, screen = list(z=20, x= -70), perspective = FALSE, key = list(title = Ranking HOSP, x = 100, y = 100, corner= c(0, 0), border = TRUE, points = Rows(trellis.par.get(**superpose.symbol), 1:2 [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting similar rows
Jean: Thanks! Works great! Lars Op 27 sep. 2011 (w39), om 17:22 heeft Jean V Adams [via R] het volgende geschreven: df - as.data.frame(unclass(xt)) dfu - unique(df) class_cnt - apply(dfu, 1, sum) subject_cnt - tabulate(match(apply(df, 1, paste, collapse=-), apply(dfu, 1, paste, collapse=-))) data.frame(subject_cnt, class_cnt, dfu) -- View this message in context: http://r.789695.n4.nabble.com/Counting-similar-rows-tp3847051p3847973.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survival analysis: interval censored data
halo david when I use type= 'interval' Call: survfit(formula = Surv(ingreso, fecha, estado, type = interval) ~ categoria) categoria=C time n.risk n.event survival std.err lower 95% CI upper 95% CI 2004 95.00 13.14 0.862 0.0354 0.795 0.934 2007 31.86 7.19 0.667 0.0695 0.544 0.818 2008 1.67 1.67 0.000 NaN NA NA categoria=E time n.risk n.event survival std.err lower 95% CI upper 95% CI 2004 112.0 18.47 0.835 0.0351 0.769 0.907 2005 40.5 1.06 0.813 0.0401 0.738 0.896 2007 37.5 7.46 0.651 0.0620 0.540 0.785 and when I use just Call: survfit(formula = Surv(ingreso, fecha, estado) ~ categoria) categoria=C time n.risk n.event entered censored survival std.err lower 95% CI upper 95% CI 2006 63 5 0 23 0.921 0.0341 0.856 0.990 2007 35 2 30 1 0.868 0.0483 0.778 0.968 2008 62 5 1 3 0.798 0.0536 0.700 0.910 2009 55 4 0 5 0.740 0.0570 0.636 0.861 2010 46 5 0 41 0.660 0.0611 0.550 0.791 categoria=E time n.risk n.event entered censored survival std.err lower 95% CI upper 95% CI 2005 71 7 3 0 0.901 0.0354 0.835 0.973 2006 67 2 0 22 0.875 0.0391 0.801 0.955 2007 43 1 36 1 0.854 0.0432 0.774 0.943 2008 77 5 0 8 0.799 0.0469 0.712 0.896 2009 64 4 1 3 0.749 0.0502 0.657 0.854 2010 58 7 0 51 0.658 0.0545 0.560 0.774 and I don t know why when I use type = interval it does not survival calculed for very year regards De: Daniel Malter dan...@umd.edu Para: r-help@r-project.org Enviado: martes 27 de septiembre de 2011 7:06 Asunto: Re: [R] survival analysis: interval censored data Please adhere to the posting guide (i.e., provide a sample of self contained code and provide the error message). And what does but it is not working mean? Is there an error code? rueu wrote: hello: my data looks like: time1 time2 event catagoria 2004 2006 1 C 2004 2005 0 C 2005 2010 1 E 2007 2009 1 C 2006 2007 0 E 2008 2010 0 C 2008 2010 1 E ... and the census interval is 1 year I have tried this surara-survfit(Surv(time1,time2,event,type=interval)~categoria) but it is not working [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/survival-analysis-interval-censored-data-tp3845269p3846205.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] searching several subsequences in a single string sequence
Barry Rowlingson wrote on 09/27/2011 12:06:21 PM: On Tue, Sep 27, 2011 at 5:51 PM, Marcelo Araya marcelo...@gmail.com wrote: Hi all I am analyzing bird song element sequences. I would like to know how can I get how many times a given subsequence is found in single string sequence. For example: If I have this single sequence: ABCABAABABABCAB I am looking for the subsequence ABC. Want I need to get here is that the subsequence is found twice. Any idea how can I do this? gregexpr will return the position and length of multiple matches. And you can feed it a vector. So: songs=c(ABCABAABABABCAB,ABACAB,ABABCABCBC) gregexpr(m,songs) [[1]] [1] 1 11 attr(,match.length) [1] 3 3 [[2]] [1] -1 attr(,match.length) [1] -1 [[3]] [1] 3 6 attr(,match.length) [1] 3 3 - in the first item, it was found at posn 1 and 11 - in the second it wasnt found at all - in the third, it was found at posn 3 and 6 so just do some apply-ing to the returned list and get the length of each element. Job done! Barry PS bonus points for spotting the hidden prog-rock song title. For example, songs - c(ABCABAABABABCAB, ABACAB, ABABCABCBC) counts - gregexpr(ABC, songs) sapply(counts, length) Jean P.S. 1981 Genesis album! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mahalanobis Distance
Hello David(s), First of all, thank you for your help. I was running some tests, and I wish to know if I have correctly understood your explanation. Well, when I use rbind(), I get the variables binded by row, and when I use cbind() I get the variables binded by column. The dist() function, as the help says, computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix, so, in that case I use rbind() (as the help example does). The mahalanobis() function help says returns the squared Mahalanobis distance of all rows in x and the vector mu = center with respect to Sigma = cov., so, here again, the calculations are done by row. Using cbind() I get one result for each row like this: mahalanobis(testeCbind, center = colMeans(testeCbind), cov=var(testeCbind)) I get as result 15 values (the number of rows). With dist(), using euclidean and rbind() I get only one value (because is calculated by row). Thinking on that way, mahalanobis distance is not so aproprietad for my kind of input data. Am I correct? Or is there a way to make the calculation of mahalanobis of all points and get only one value as the result of how distante the variables (subseries) are? Thank you all again. Best regars, Jorge Aikes Junior -- View this message in context: http://r.789695.n4.nabble.com/Mahalanobis-Distance-tp3844960p3848247.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] M-Splines design matrix
library(sos) ???'M-spline' This produces ZERO matches. If you want a nonnegative fit and you don't necessarily need M-splines, there are many ways to do that, some of which can be found with the sos package (which also includes a vignette). There are many different packages offering spline capabilities, but none seem to mention M-Splines. From the Wikipedia article, it appears that M-splines are standard B-splines with a fit constrained to be non-negative. The construction of a B-spline basis matrix is described in section 3.5 of Functional Data Analysis with R and Matlab (Ramsay, Hooker and Graves, 2009, Springer). The fda package includes script files to recreate all but one of the 76 figures in the book. The following code (patterned after section 3.5) produces a B-spline matrix: tst - create.bspline.basis() predict(tst, seq(0, 1, .1)) Hope this helps. Spencer Graves On 9/27/2011 5:25 AM, Tigisti G wrote: Hi Can any one inform me an r package used to fit Cubic M-splines or r functions used to have design and penalty matrices in fitting cubic M-splines please? Thanks, Tigist [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 web: www.structuremonitoring.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] my R query
On Sep 27, 2011, at 1:47 PM, arindam fadikar wrote: I have made a level plot in R of a variable using the lattice package. This grid corresponds to South Asia. I am only interested in viewing the values of this variable (aerosol optical depth) for certain countries in South Asia. I have a dummy variable that takes the value 1 for the countries I am interested in and 0 otherwise. Is it possible for me to colour this part of the grid black or any other colour? Here is my R code: levelplot(aod ~ longitude + latitude | factor(day), data = aod_Jan, aspect=iso, contour = TRUE, layout=c(1,1)) This was cross-posted on StackOverflow where the user name was ridhima. I posted a response there about 45 minutes ago. Cross- posting at short intervals is deprecated on r-help. Obviously when one has not gotten an answer in some reasonable period of time, say 48 hours, it would make sense to repost in on a different list. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] normalizing a negative binomial distribution and/or incorporating variance structures in a GAMM
Meredith Jantzen mjantzen at uwo.ca writes: Hello everyone, Apologies in advance, as this is partially a stats question and partially an R question. I have been using a GAM to model the activity level of bats going into and coming out from a forested edge. I had eight microphones set up in a line transect at each of eight sites, and I am hoping to construct a model for each of 7 species. My count data has a reverse J-shaped skew and is overdispersed with a fair amount of zeros, and I haven't found any transformations that will completely normalize it (I've tried square roots and logs). Meanwhile, the variance in call numbers varies between sites and between microphones. I wanted to use a GAMM to incorporate varComb and varIdent, but these can only be applied on data with a gaussian distribution. Are there any packages I should be looking into that I don't know about that will apply a variance structure on a negative binomial distribution? Or is there some transformation that I should be using that will solve my normality issues? I've been searching the R-help boards, everything in Zuur and Woods, but I haven't found an answer yet. I'm not entirely clear about this, but this question and the previous question that Simon Wood answered (about neg binom and GAMM) suggest to me that you might be going in slightly the wrong direction. If your data are non-normally distributed, your choices are typically (1) pick an alternative family of distributions to characterize the variation (e.g. neg binomial or ZINB), (2) use some form of robust estimation (e.g. rlm in the MASS package), or (3) try to find a transformation of the data that makes the data normal (and/or homoscedastic, and/or linear with respect to the predictor variables). Among ecologists #3 is the classical approach and #1 is the most common modern approach. Combining #1 and #3 doesn't make that much sense to me. One doesn't necessarily expect the variance to be constant in a negative binomial model; are the *standardized* residuals heteroscedastic? (i.e. does the boxplot of residuals(m,type=pearson) vs site, microphone, or site*microphone combination look funky?) It's not absolutely clear whether you need zero-inflation explicitly or not. There are tests for zero-inflation and overdispersion (see ref below), but I don't know of any that are implemented in R ... your choices seem to be * negative binomial in mgcv:gam, without zero-inflation; * ZINB in pscl, without the sophisticated GAM machinery of mgcv (but you can use spline terms via splines::ns(v,n) where v is the predictor variable and n is the number of knots -- it just won't do all the slick automatic complexity selection that mgcv does) * it looks like the COZIGAM package will do zero-inflated GAMs, but it doesn't do negative binomials ... @article{deng_score_2005, title = {Score tests for zero-inflation and over-dispersion in generalized linear models}, volume = {15}, url = {http://www3.stat.sinica.edu.tw/statistica/j15n1/j15n115/j15n115.html}, journal = {Statistica Sinica}, author = {Deng, D. and Paul, {S.R.}}, year = {2005}, pages = {257–276} } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I check a package is installed or not?
Dear list, How can I detect a package is installed or not? If not, then install it. For example, in a script, I want to check the package DESeq is installed or not. If not, then I will using this script to install it. source(http://www.bioconductor.org/biocLite.R;) biocLite(DESeq) The pseudo script would be like this: try: library(DESeq) catch: source(http://www.bioconductor.org/biocLite.R;) biocLite(DESeq) Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] inset one map on top of another map
I want to overlay a small inset map on top of another map, but I can't figure out how to do it. For example, here are two different maps: # map 1 - Ohio map(state, region= ohio) # map 2 - US with Ohio darkened map(state) map(state, region=ohio, fill=T, add=T) I would like to add map 2 as a small inset in the corner of map 1. I have tried: map(state, region= ohio) par(new=TRUE, mar=c(3, 3, 15, 15)) map(state) map(state, region=ohio, fill=T, add=T) but this seems to erase map 1 and replace it with a full size version of map 2. I can successfully overlay an unrelated plot using similar code: map(state, region= ohio) par(new=TRUE, mar=c(3, 3, 15, 15)) plot(1:10, 1:10) So, there must be something about the maps() function that I'm tripping over. Any suggestions? I am using R for Windows 2.13.0 and the maps package version 2.1-5. Jean `·.,, (((º `·.,, (((º `·.,, (((º Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data import
I see what you mean. Sorry and thanks for pointing that out to me Ben. bbolker wrote: B77S bps0002 at auburn.edu writes: I have never used that function, but I know that with read.csv() you can do the following to select only the columns you want: chosen_vars - read.csv(Workbook1.csv, header=T)[c(variable1, variable3)] This is not actually selectively importing: it's importing the whole thing and *then* selecting the relevant columns. If the original poster is trying to avoid importing the whole data set because (for example) it's got a gigantic number of columns and will be very slow and/or tax their system, then this idiom won't help. Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/Data-import-tp3842196p3848802.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I check a package is installed or not?
On Sep 27, 2011, at 3:19 PM, Fabrice Tourre wrote: Dear list, How can I detect a package is installed or not? If not, then install it. For example, in a script, I want to check the package DESeq is installed or not. If not, then I will using this script to install it. source(http://www.bioconductor.org/biocLite.R;) biocLite(DESeq) The pseudo script would be like this: try: library( Perhaps: if ( require(DESeq) ) { do-stuff } else { source(http://www.bioconductor.org/biocLite.R;) biocLite(DESeq) } 'require' reports success or failure with a logical and loads the package if successful. You need to add logical.return=TRUE to a 'library' call to get avoid getting a character value back. catch: source(http://www.bioconductor.org/biocLite.R;) biocLite(DESeq) Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] compare proportions
Hi, I have a seemingly simple proportional test. here is the question I am trying to answer: There is a test running each day in the lab, the test comes out as either positive or negative. So at the end of each month, we can calculate a positive rate in that month as the proportion of positive test results. The data look like: Month # positive # total tests positive rate January 24 205 11.7% February 31 234 13.2% March 26 227 11.5% : : : August 42 241 17.4% The total # of positive before August is 182, and the total # of tests before August is 1526. It appears that from January to July, the positive rate is between 11% to 13%, the rate in August is up around 17%. So the question is whether is up in August is statistically significant? I can think of 3 ways to do this test: 1.1. Use binom.test(), set “p” as the average positive rate between January and July (=182/1526): binom.test(42,241,182/1526) Exact binomial test data: 42 and 241 number of successes = 42, number of trials = 241, p-value = 0.0125 alternative hypothesis: true probability of success is not equal to 0.1192661 95 percent confidence interval: 0.1285821 0.2281769 sample estimates: probability of success 0.1742739 2. 2. Use prop.test(), where I compare the average positive rate between January July with the positive rate in August: prop.test(c(182,42),c(1526,241)) 2-sample test for equality of proportions with continuity correction data: c(182, 42) out of c(1526, 241) X-squared = 5.203, df = 1, p-value = 0.02255 alternative hypothesis: two.sided 95 percent confidence interval: -0.107988625 -0.002026982 sample estimates: prop 1 prop 2 0.1192661 0.1742739 3. 2. 3. Use prop.test(), where I compare the average monthly positive rate between January July with the positive rate in August. The average monthly # of positives is 182/7=26, the average monthly $ of total tests is 1526/7=216: prop.test(c(26,42),c(218,241)) 2-sample test for equality of proportions with continuity correction data: c(26, 42) out of c(218, 241) X-squared = 2.3258, df = 1, p-value = 0.1272 alternative hypothesis: two.sided 95 percent confidence interval: -0.12375569 0.01374008 sample estimates: prop 1 prop 2 0.1192661 0.1742739 As you can see that the method 3 gave insignificant p value compared to method 1 2. While I understand each method is testing different hypothesis, but for the question I am trying to answer (does August have higher positive rate compare to earlier months?), which method is more relevant? Thanks for any suggestion, John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] remove NaN from element in a vector in a list
Hello, What is the best way to turn a matrix into a list removing NaN's? I'm new to R... Start: mt = matrix(c(1,4,NaN,5,3,6),2,3) mt [,1] [,2] [,3] [1,]1 NaN3 [2,]456 Desired result: lst [[1]] [1] 1 3 [[2]] [1] 4 5 6 Thanks! Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with switch function across R versions 2.10 and 2.13
Hello, The following piece of code works fine in R.2.10 (ubuntu): switch(distr, normal = {if (is.infinite(param[desv])) n - c(n,La desv. estándar no puede ser Inf.) if (param[desv]0) n - c(n,La desv. estándar no puede ser 0.) }, expo = {if (param[tasa]=0) n - c(n,La tasa debe ser 0.) }, gamma = {if (param[s]=0) n - c(n,El parametro s debe ser 0.) if (param[a]=0) n - c(n,El parámetro a debe ser 0.) }, unifc = {if (param[min]param[max]) n - c(n,Min. no puede ser mayor que Max.)}, binom = {if (param[p]0|param[p]1) n - c(n,p debe estar entre 0 y 1) if (param[n]0|!all.equal(param[n], trunc(param[n]))) n - c(n,Valor de n no entero o negativo.) if (param[n]max(x)) n - c(n,paste(El valor máximo de,nombre, no puede exceder el valor máximo de n.)) }, geom = {if (param[p]0|param[p]1) n - c(n,p debe estar entre 0 y 1) }, pois = {if (param[tasa]0) n - c(n,tasa no puede ser negativa)}, unifd = {if (param[min]param[max]) n - c(n,minimo no puede ser mayor que maximo) }, , stop(paste(Distribución,distr,no reconocida.)) ) However, using R2.13 (under Windows), I get the following error: Error en .local(x, distr, ...) : duplicado de interuptores por defecto: '' y 'stop(paste...' The 2.13 help on the function says the following: In the case of no match, if there is a unnamed element of ... its value is returned. (If there is more than one such argument an error is returned. Before R 2.13.0 the first one would have been used.) So it appears to be an issue of the pre or post 2.13 version being used. However, i don't see how the above code has more than one unamed element and thus how this error is produced. Can anyone enlighten me on this error? Thanks in advance, josé loreto romero palma [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove NaN from element in a vector in a list
Try this: alply(mt, 1, function(x) as.numeric(na.omit(x))) The as.numeric() addition may be necessary to strip the extra attributes na.omit() wants to add. Michael On Tue, Sep 27, 2011 at 4:02 PM, Ben qant ccqu...@gmail.com wrote: Hello, What is the best way to turn a matrix into a list removing NaN's? I'm new to R... Start: mt = matrix(c(1,4,NaN,5,3,6),2,3) mt [,1] [,2] [,3] [1,]1 NaN3 [2,]456 Desired result: lst [[1]] [1] 1 3 [[2]] [1] 4 5 6 Thanks! Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove NaN from element in a vector in a list
alply is from the plyr package. You'll need to call that if its not already loaded. M On Tue, Sep 27, 2011 at 4:07 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Try this: alply(mt, 1, function(x) as.numeric(na.omit(x))) The as.numeric() addition may be necessary to strip the extra attributes na.omit() wants to add. Michael On Tue, Sep 27, 2011 at 4:02 PM, Ben qant ccqu...@gmail.com wrote: Hello, What is the best way to turn a matrix into a list removing NaN's? I'm new to R... Start: mt = matrix(c(1,4,NaN,5,3,6),2,3) mt [,1] [,2] [,3] [1,]1 NaN3 [2,]456 Desired result: lst [[1]] [1] 1 3 [[2]] [1] 4 5 6 Thanks! Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I check a package is installed or not?
On 11-09-27 3:19 PM, Fabrice Tourre wrote: Dear list, How can I detect a package is installed or not? If not, then install it. Here's one way. This might be slow if you have all of CRAN installed, but it's quick enough if you only have a few dozen packages: installed - rownames(installed.packages()) DESeq %in% installed Another way is to use if (!require(DESeq)) ... but that will give warnings if DESeq is not there. Duncan Murdoch For example, in a script, I want to check the package DESeq is installed or not. If not, then I will using this script to install it. source(http://www.bioconductor.org/biocLite.R;) biocLite(DESeq) The pseudo script would be like this: try: library(DESeq) catch: source(http://www.bioconductor.org/biocLite.R;) biocLite(DESeq) Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove NaN from element in a vector in a list
Excellent! Thank you! ben On Tue, Sep 27, 2011 at 2:07 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: alply is from the plyr package. You'll need to call that if its not already loaded. M On Tue, Sep 27, 2011 at 4:07 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Try this: alply(mt, 1, function(x) as.numeric(na.omit(x))) The as.numeric() addition may be necessary to strip the extra attributes na.omit() wants to add. Michael On Tue, Sep 27, 2011 at 4:02 PM, Ben qant ccqu...@gmail.com wrote: Hello, What is the best way to turn a matrix into a list removing NaN's? I'm new to R... Start: mt = matrix(c(1,4,NaN,5,3,6),2,3) mt [,1] [,2] [,3] [1,]1 NaN3 [2,]456 Desired result: lst [[1]] [1] 1 3 [[2]] [1] 4 5 6 Thanks! Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with switch function across R versions 2.10 and 2.13
On 11-09-27 4:03 PM, jose romero wrote: Hello, The following piece of code works fine in R.2.10 (ubuntu): switch(distr, normal= {if (is.infinite(param[desv])) n- c(n,La desv. estándar no puede ser Inf.) if (param[desv]0) n- c(n,La desv. estándar no puede ser0.) }, expo= {if (param[tasa]=0) n- c(n,La tasa debe ser0.) }, gamma = {if (param[s]=0) n- c(n,El parametro s debe ser0.) if (param[a]=0) n- c(n,El parámetro a debe ser0.) }, unifc = {if (param[min]param[max]) n- c(n,Min. no puede ser mayor que Max.)}, binom = {if (param[p]0|param[p]1) n- c(n,p debe estar entre 0 y 1) if (param[n]0|!all.equal(param[n], trunc(param[n]))) n- c(n,Valor de n no entero o negativo.) if (param[n]max(x)) n- c(n,paste(El valor máximo de,nombre, no puede exceder el valor máximo de n.)) }, geom= {if (param[p]0|param[p]1) n- c(n,p debe estar entre 0 y 1) }, pois= {if (param[tasa]0) n- c(n,tasa no puede ser negativa)}, unifd = {if (param[min]param[max]) n- c(n,minimo no puede ser mayor que maximo) }, , You have an extra comma beyond the end of this line. Duncan Murdoch stop(paste(Distribución,distr,no reconocida.)) ) However, using R2.13 (under Windows), I get the following error: Error en .local(x, distr, ...) : duplicado de interuptores por defecto: '' y 'stop(paste...' The 2.13 help on the function says the following: In the case of no match, if there is a unnamed element of ... its value is returned. (If there is more than one such argument an error is returned. Before R 2.13.0 the first one would have been used.) So it appears to be an issue of the pre or post 2.13 version being used. However, i don't see how the above code has more than one unamed element and thus how this error is produced. Can anyone enlighten me on this error? Thanks in advance, josé loreto romero palma [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove NaN from element in a vector in a list
On Sep 27, 2011, at 4:02 PM, Ben qant wrote: Hello, What is the best way to turn a matrix into a list removing NaN's? I'm new to R... Start: mt = matrix(c(1,4,NaN,5,3,6),2,3) mt [,1] [,2] [,3] [1,]1 NaN3 [2,]456 apply(mt, 1, function(x) x[!is.nan(x)] ) [[1]] [1] 1 3 [[2]] [1] 4 5 6 The function is.finite would also remove infinities as well as the NaNs. Desired result: lst [[1]] [1] 1 3 [[2]] [1] 4 5 6 Thanks! David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I check a package is installed or not?
Thank you for your sharing I got it. On Tue, Sep 27, 2011 at 9:44 PM, David Winsemius dwinsem...@comcast.net wrote: On Sep 27, 2011, at 3:19 PM, Fabrice Tourre wrote: Dear list, How can I detect a package is installed or not? If not, then install it. For example, in a script, I want to check the package DESeq is installed or not. If not, then I will using this script to install it. source(http://www.bioconductor.org/biocLite.R;) biocLite(DESeq) The pseudo script would be like this: try: library( Perhaps: if ( require(DESeq) ) { do-stuff } else { source(http://www.bioconductor.org/biocLite.R;) biocLite(DESeq) } 'require' reports success or failure with a logical and loads the package if successful. You need to add logical.return=TRUE to a 'library' call to get avoid getting a character value back. catch: source(http://www.bioconductor.org/biocLite.R;) biocLite(DESeq) Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove NaN from element in a vector in a list
apply(mt, 1, function(x) x[!is.nan(x)] ) [[1]] [1] 1 3 [[2]] [1] 4 5 6 You need to be a little careful with apply: mt2 - matrix(c(1,4,2,5,3,6),2,3) apply(mt2, 1, function(x) x[!is.nan(x)] ) [,1] [,2] [1,]14 [2,]25 [3,]36 Depending on the input you will get a list or matrix as output. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I check a package is installed or not?
Hi I'm using this :) if (is.element('DESeq', installed.packages()[,1]) == FALSE) { install.packages('DESeq') } Robin 2011/9/27 Fabrice Tourre fabrice.c...@gmail.com Dear list, How can I detect a package is installed or not? If not, then install it. For example, in a script, I want to check the package DESeq is installed or not. If not, then I will using this script to install it. source(http://www.bioconductor.org/biocLite.R;) biocLite(DESeq) The pseudo script would be like this: try: library(DESeq) catch: source(http://www.bioconductor.org/biocLite.R;) biocLite(DESeq) Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installing packages in R for UBUNTU
Hi. We downloaded R 2.13.1 for UBUNTU. We try to install several packages: car, maps, maptools, raster, and we found the following warning or error message: Error: **buffer overflow detected ***=/user/lib/R/bin/exec/R terminated*** We also found the usual error message that the packages are programmed in an earlier version. Do any of you know what could we do in order to install these packages successfully? Do we need to download a previous version of R for UBUNTU? Thanks Gilbert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is there a latex summary function in the quantreg package for just 1 tau?
Hello dear R help members, I wish to get a nice LaTeX table for a rq object. Trying to use the functions I found so far wouldn't work. I can start opening the functions up, but I am wondering if I had missed some function which is the one I should be using. Here is an example session for a bunch of possible errors: (Thanks) data(stackloss) y - stack.loss x - stack.x rq_object - rq(y ~ x, tau = .5) rq_object_summary - summary(rq_object) latex(rq_object) # Error in UseMethod(latex) : # no applicable method for 'latex' applied to an object of class rq latex(rq_object_summary) # Error in UseMethod(latex) : # no applicable method for 'latex' applied to an object of class summary.rq latex.summary.rqs(rq_object_summary) # Error in x$tau : $ operator is invalid for atomic vectors summary.rqs(rq_object) # Error in xi$coefficients[, i] : incorrect number of dimensions summary.rqs(rq_object_summary) # Error in xi$residuals[, i] : incorrect number of dimensions Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing packages in R for UBUNTU
Hi! 2011/9/27 gbre...@ssc.wisc.edu: We downloaded R 2.13.1 for UBUNTU. We try to install several packages: car, maps, maptools, raster, and we found the following warning or error message: I suggest to use apt-get (synaptic) as much as possible to install R packages and then use install.packages() for those that are not in the Ubuntu repositories. Henri-Paul -- Henri-Paul Indiogine Curriculum Instruction Texas AM University TutorFind Learning Centre Email: hindiog...@gmail.com Skype: hindiogine Website: http://people.cehd.tamu.edu/~sindiogine __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mahalanobis Distance
Since you are only looking at the distance between two points, they must fall on a line so no matter how many values you have for each point, their dimension is still 1. Mahalanobis distance is a way of measuring distance in multivariate space when the variables (columns) are correlated with one another. In this case, Euclidian distance (which assumes each dimension is orthogonal to all the others) is inappropriate. With two points and one dimension, all distance measures are effectively equivalent since they can be converted to one another by multiplying by an appropriate constant. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of jorgeA Sent: Tuesday, September 27, 2011 12:08 PM To: r-help@r-project.org Subject: Re: [R] Mahalanobis Distance Hello David(s), First of all, thank you for your help. I was running some tests, and I wish to know if I have correctly understood your explanation. Well, when I use rbind(), I get the variables binded by row, and when I use cbind() I get the variables binded by column. The dist() function, as the help says, computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix, so, in that case I use rbind() (as the help example does). The mahalanobis() function help says returns the squared Mahalanobis distance of all rows in x and the vector mu = center with respect to Sigma = cov., so, here again, the calculations are done by row. Using cbind() I get one result for each row like this: mahalanobis(testeCbind, center = colMeans(testeCbind), cov=var(testeCbind)) I get as result 15 values (the number of rows). With dist(), using euclidean and rbind() I get only one value (because is calculated by row). Thinking on that way, mahalanobis distance is not so aproprietad for my kind of input data. Am I correct? Or is there a way to make the calculation of mahalanobis of all points and get only one value as the result of how distante the variables (subseries) are? Thank you all again. Best regars, Jorge Aikes Junior -- View this message in context: http://r.789695.n4.nabble.com/Mahalanobis-Distance-tp3844960p3848247.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] searching several subsequences in a single string sequence
On Tue, Sep 27, 2011 at 6:15 PM, Jean V Adams jvad...@usgs.gov wrote: For example, songs - c(ABCABAABABABCAB, ABACAB, ABABCABCBC) counts - gregexpr(ABC, songs) sapply(counts, length) That will still return '1' for the case where its not found, because of the -1. sapply(counts,function(x){sum(x0)}) will return 0. you might get faster performance if you do 'fixed=TRUE' on the gregexpr call as well. P.S. 1981 Genesis album! +1 bonus. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.