Re: [R] cuhre usage ?? multidimensional integration
I'm not quite familiar with R. Could you recommend some relative material. Actually, I'm not intend to loop from 1 to 1. I'm just trying to see if it works. It's supposed to be some len1. I'm trying to solve the following example: there are 3 types of variables(x,w,t), they are not independent and have different distribution. Within each type, there are more than 1 variables. I need to integrate all the variable. joint pdf is like f(x,w,t)=∏_i▒〖f1(wi)f2(ti)〗(∏_j▒〖f3(xj) ∏_j▒〖f4(xj*wi-ti〗)〗) -- View this message in context: http://r.789695.n4.nabble.com/cuhre-usage-multidimensional-integration-tp3873478p3886287.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The Sets package and element equality test
By going through the code, I came to a similar conclusion also: it seems the match.2.list function must be modified in the following way to make it work: match.2.list - function(l1,l2){ if (class(l1)==element){ l1 - l1[[1]] l2 - l2[[1]] } length(l1)==length(l2) } because depending whether it is called from the constructor or from the %e% function, the assumptions about the input types are not the same. No, this is not not the problem, it's really the hashing code which makes this fail. If you turn hashing off, then the original match.2.list function will work as expected: sets_options(hash, FALSE) match.2.list - function(l1,l2){ + length(l1)==length(l2) + } s - set(list(1,2),list(3,4)) lset - cset(s,matchfun = matchfun(match.2.list)) lset {list(2)} list(1,8) %e% lset [1] TRUE And also, if instead of working with standard objects, I want to work with S4 / refClass objects, I cannot build cset objects, because I have to go first with the set construction which prompts an error: Currently, S4 objects are not supported at all as set elements. I will have a look on this. Best David Error in as.vector(x, character) : cannot coerce type 'environment' to vector of type 'character' So far, I don't know how to work around this latter issue. Thanks again for the package and your help. Regards Johnny On Sat, Oct 8, 2011 at 2:40 PM, David Meyer mey...@technikum-wien.at mailto:mey...@technikum-wien.at wrote: Dear Johnny, this is a bug in the hashing-code of sets. Use: sets_options(hash, FALSE) lset - cset(s, matchfun = matchfun(match.2.list)) which will work. Thanks for pointing this out! David Hi, I tried to use the sets package yesterday, which seems to be a very powerful package: thanks to the developers. I wanted to define sets of lists where 2 lists are considered equal if they have the same length. So, I implemented: match.2.list - function(l1,l2){ length(l1)==length(l2) } and then defined my cset as: s - set(list(1,2),list(3,4)) lset - cset(s,matchfun(match.2.list)) so if I now do: y - list(3,4) y %e% lset I get the correct answer, which is TRUE. But if I do: x - list(1,8) x %e% lset I now get FALSE, even though x is a list of length 2, and should thus match any of the 2 lists in lset. I must be doing something wrong; I checked with the doc, but I don't understand. -- Priv.-Doz. Dr. David Meyer Institut für Wirtschaftsinformatik Fachhochschule Technikum Wien Höchstädtplatz 5, 1200 Wien T: +43 1 333 40 77-394 tel:%2B43%201%20333%2040%2077-394 F: +43 1 333 40 77-99 394 tel:%2B43%201%20333%2040%2077-99%20394 E: david.me...@technikum-wien.at mailto:david.me...@technikum-wien.at I: www.technikum-wien.at http://www.technikum-wien.at -- Priv.-Doz. Dr. David Meyer Institut für Wirtschaftsinformatik Fachhochschule Technikum Wien Höchstädtplatz 5, 1200 Wien T: +43 1 333 40 77-394 F: +43 1 333 40 77-99 394 E: david.me...@technikum-wien.at I: www.technikum-wien.at __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] HWEBayes, swapping the homozygotes genotype frequencies
I evaluated the Bayes factor in the k=2 allele case with a triangular prior under the null as in the example in the help file: HWETriangBF2(nvec=c(88,10,2)) [1] 0.4580336 When I swap the n11 entry and n22 entry of nvec, I received totally different Bayes factor: HWETriangBF2(nvec=c(2,10,88)) [1] 5.710153 In my understanding, defining the genotype frequency as n11 or n22 are arbitrary. So I was expecting the same value of Bayes factor. This is the case for conjugate Dirichlet prior: DirichNormHWE(nvec=c(88,10,2), c(1,1))/DirichNormSat(nvec=c(88,10,2), c(1,1,1)) [1] 1.542047 DirichNormHWE(nvec=c(2,10,88), c(1,1))/DirichNormSat(nvec=c(2,10,88), c(1,1,1)) [1] 1.542047 Could you explain why the HWETriangBF2 is returining completely different values of Bayes Factor?? -- View this message in context: http://r.789695.n4.nabble.com/HWEBayes-swapping-the-homozygotes-genotype-frequencies-tp3886313p3886313.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] round() and negative digits
On 09/10/11 00:18, Duncan Murdoch wrote: On 11-10-07 5:26 PM, Carl Witthoft wrote: Just wondering here -- I tested and found to my delight that % round(325.4,-2) [1] 300 gave me exactly what I would have expected (and wanted). Since it's not explicitly mentioned in the documentation that negative 'digits' is allowed, I just wanted to ask whether this behavior is intentional or a happy turn of events. I'm always paranoid that something not explicitly documented might disappear in future revisons. It is intentional, and one of the regression tests confirms that it's there, so it won't disappear by mistake, and would be very unlikely to disappear intentionally. Uh, wouldn't it be *nice* to mention this --- not completely obvious --- capability in the help file? cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] round() and negative digits
On 09/10/11 10:39, Duncan Murdoch wrote: On 11-10-08 5:32 PM, Rolf Turner wrote: On 09/10/11 00:18, Duncan Murdoch wrote: On 11-10-07 5:26 PM, Carl Witthoft wrote: Just wondering here -- I tested and found to my delight that % round(325.4,-2) [1] 300 gave me exactly what I would have expected (and wanted). Since it's not explicitly mentioned in the documentation that negative 'digits' is allowed, I just wanted to ask whether this behavior is intentional or a happy turn of events. I'm always paranoid that something not explicitly documented might disappear in future revisons. It is intentional, and one of the regression tests confirms that it's there, so it won't disappear by mistake, and would be very unlikely to disappear intentionally. Uh, wouldn't it be *nice* to mention this --- not completely obvious --- capability in the help file? If we told you all of R's secrets, we'd have to kill you. Fortune nomination? cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with using last observation carried forward analysis for a clinical trial please
Hi, I have a series of id's with multiple visits and questionnaire scores. This is a clinical trial that will be analyzed using the last observation carried forward method. In other words, in order to comply with intent to treat analysis when many subjects withdraw, data points for the last visit must be generated and filled in with the last observation. The ultimate goal is to tabulate the difference in qustionnaires between the start of the trial and the end of trial. I am lost at about how to do this. Each subject had multiple visits, up to 13. In general, they took a questionnaire at each visit. However, if a questionnaire was not completed or the visit is missing, the data point does not exist. To explain, I created a table as analogy. My goal is to take something that looks like the following: ID Visit score 1 1 10 2 1 12 2 3 15 3 1 1 3 2 6 4 1 16 4 2 1 4 3 7 4 4 17 I think I then need to change to this in order to perfrom locf in zoo: ID Visit score 1 1 10 1 2 na 1 3 na 1 4 na 2 1 12 2 2 na 2 3 15 2 4 na 3 1 1 3 2 6 3 3 na 3 4 na 4 1 16 4 2 1 4 3 7 4 4 17 then change to: ID Visit score 1 1 10 1 2 10 1 3 10 1 4 10 2 1 12 2 2 12 2 3 15 2 4 15 3 1 1 3 2 6 3 3 6 3 4 6 4 1 16 4 2 1 4 3 7 4 4 17 I would then like to take visit 4 and subtract visit 1 to create the difference in the questionnaire scores during the clinical trial. I will then compare this score by t.test between placebo and drug groups. Would anyone please have some guidance about how to do this in r? I would be grateful for assistance. Regards, Matt -- View this message in context: http://r.789695.n4.nabble.com/help-with-using-last-observation-carried-forward-analysis-for-a-clinical-trial-please-tp3886396p3886396.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R-help tranform data
__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Permutation or Bootstrap to obtain p-value for one sample
On Oct 8, 2011, at 16:04 , francy wrote: Hi, I am having trouble understanding how to approach a simulation: I have a sample of n=250 from a population of N=2,000 individuals, and I would like to use either permutation test or bootstrap to test whether this particular sample is significantly different from the values of any other random samples of the same population. I thought I needed to take random samples (but I am not sure how many simulations I need to do) of n=250 from the N=2,000 population and maybe do a one-sample t-test to compare the mean score of all the simulated samples, + the one sample I am trying to prove that is different from any others, to the mean value of the population. But I don't know: (1) whether this one-sample t-test would be the right way to do it, and how to go about doing this in R (2) whether a permutation test or bootstrap methods are more appropriate This is the data frame that I have, which is to be sampled: df- i.e. x y 1 2 3 4 5 6 7 8 . . . . . . 2,000 I have this sample from df, and would like to test whether it is has extreme values of y. sample1- i.e. x y 3 4 7 8 . . . . . . 250 For now I only have this: R=999 #Number of simulations, but I don't know how many... t.values =numeric(R) #creates a numeric vector with 999 elements, which will hold the results of each simulation. for (i in 1:R) { sample1 - df[sample(nrow(df), 250, replace=TRUE),] But I don't know how to continue the loop: do I calculate the mean for each simulation and compare it to the population mean? Any help you could give me would be very appreciated, Thank you. The straightforward way would be a permutation test, something like this msamp - mean(sample1$y) mpop - mean(df$y) msim - replicate(1, mean(sample(df$y, 250))) sum(abs(msim-mpop) = abs(msamp-mpop))/1 I don't really see a reason to do bootstrapping here. You say you want to test whether your sample could be a random sample from the population, so just simulate that sampling (which should be without replacement, like your sample is). Bootstrapping might come in if you want a confidence interval for the mean difference between your sample and the rest. Instead of sampling means, you could put a full-blown t-test inside the replicate expression, like: psim - replicate(1, {s-sample(1:2000, 250); t.test(df$y[s], df$y[-s])$p.value}) and then check whether the p value for your sample is small compared to the distribution of values in psim. That'll take quite a bit longer, though; t.test() is a more complex beast than mean(). It is not obvious that it has any benefits either, unless you specifically wanted to investigate the behavior of the t test. (All code untested. Caveat emptor.) -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] round() and negative digits
On 09-Oct-11 00:46:58, Carl Witthoft wrote: On 10/8/11 6:11 PM, (Ted Harding) wrote: Carl Witthoft's serendipitous discovery is a nice example of how secrets can be guessed by wondering what if ... ?. So probably you don;t need to tell the secrets. Taking the negative digits to their logical extreme: round(654.321,2) # [1] 654.32 round(654.321,1) # [1] 654.3 round(654.321,0) # [1] 654 round(654.321,-1) # [1] 650 round(654.321,-2) # [1] 700 round(654.321,-3) # [1] 1000 round(654.321,-4) # [1] 0 which is what you'd logically expect (but is it what you would intuitively expect?). Oh, oh, somebody's going all metaphysical on us. Nor should one forget the rounding rules (not OS-dependent in this case, I think ... ?): round(5000,-4) # [1] 0 round(15000,-4) # [1] 2 Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 09-Oct-11 Time: 08:59:58 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sapply(pred,cor,y=resp)
Hello. I am wondering why I am getting NA for all in cors=sapply(pred,cor,y=resp). I suppose that each column in pred has NAs in them. Is there some way to fix this? Thanks str(pred) 'data.frame': 200 obs. of 13 variables: $ mnO2: num 9.8 8 11.4 4.8 9 13.1 10.3 10.6 3.4 9.9 ... $ Cl : num 60.8 57.8 40 77.4 55.4 ... $ NO3 : num 6.24 1.29 5.33 2.3 10.42 ... $ NH4 : num 578 370 346.7 98.2 233.7 ... $ oPO4: num 105 428.8 125.7 61.2 58.2 ... $ PO4 : num 170 558.8 187.1 138.7 97.6 ... $ Chla: num 50 1.3 15.6 1.4 10.5 ... $ a1 : num 0 1.4 3.3 3.1 9.2 15.1 2.4 18.2 25.4 17 ... $ a2 : num 0 7.6 53.6 41 2.9 14.6 1.2 1.6 5.4 0 ... $ a3 : num 0 4.8 1.9 18.9 7.5 1.4 3.2 0 2.5 0 ... $ a4 : num 0 1.9 0 0 0 0 3.9 0 0 2.9 ... $ a5 : num 34.2 6.7 0 1.4 7.5 22.5 5.8 5.5 0 0 ... $ a6 : num 8.3 0 0 0 4.1 12.6 6.8 8.7 0 0 ... str(y=resp) Error in length(object) : 'object' is missing str(resp) num [1:200] 8 8.35 8.1 8.07 8.06 8.25 8.15 8.05 8.7 7.93 ... cors=sapply(pred,cor,y=resp) cors mnO2 Cl NO3 NH4 oPO4 PO4 Chla a1 a2 a3 a4 a5 a6 NA NA NA NA NA NA NA NA NA NA NA NA NA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sapply(pred,cor,y=resp)
Hi, It is probably more confusing with several steps combined, but you are correct that it is because there are NAs. It is fairly common for R functions to return NA if there are any NA values unless you explicitly set an argument on what to do with missing values. A quick look at ?cor clearly shows that there is such an argument with several options. Try adding ', use = pairwise.complete.obs ' to your sapply call. Hope that helps, Josh On Sun, Oct 9, 2011 at 12:47 AM, William Claster dmfall2...@yahoo.com wrote: Hello. I am wondering why I am getting NA for all in cors=sapply(pred,cor,y=resp). I suppose that each column in pred has NAs in them. Is there some way to fix this? Thanks str(pred) 'data.frame': 200 obs. of 13 variables: $ mnO2: num 9.8 8 11.4 4.8 9 13.1 10.3 10.6 3.4 9.9 ... $ Cl : num 60.8 57.8 40 77.4 55.4 ... $ NO3 : num 6.24 1.29 5.33 2.3 10.42 ... $ NH4 : num 578 370 346.7 98.2 233.7 ... $ oPO4: num 105 428.8 125.7 61.2 58.2 ... $ PO4 : num 170 558.8 187.1 138.7 97.6 ... $ Chla: num 50 1.3 15.6 1.4 10.5 ... $ a1 : num 0 1.4 3.3 3.1 9.2 15.1 2.4 18.2 25.4 17 ... $ a2 : num 0 7.6 53.6 41 2.9 14.6 1.2 1.6 5.4 0 ... $ a3 : num 0 4.8 1.9 18.9 7.5 1.4 3.2 0 2.5 0 ... $ a4 : num 0 1.9 0 0 0 0 3.9 0 0 2.9 ... $ a5 : num 34.2 6.7 0 1.4 7.5 22.5 5.8 5.5 0 0 ... $ a6 : num 8.3 0 0 0 4.1 12.6 6.8 8.7 0 0 ... str(y=resp) Error in length(object) : 'object' is missing str(resp) num [1:200] 8 8.35 8.1 8.07 8.06 8.25 8.15 8.05 8.7 7.93 ... cors=sapply(pred,cor,y=resp) cors mnO2 Cl NO3 NH4 oPO4 PO4 Chla a1 a2 a3 a4 a5 a6 NA NA NA NA NA NA NA NA NA NA NA NA NA [[alternative HTML version deleted]] In the future, please post in plain text, not HTML (as the posting guide requests). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with using last observation carried forward analysis for a clinical trial please
On Oct 8, 2011, at 8:43 PM, maspitze wrote: Hi, I have a series of id's with multiple visits and questionnaire scores. This is a clinical trial that will be analyzed using the last observation carried forward method. In other words, in order to comply with intent to treat analysis when many subjects withdraw, data points for the last visit must be generated and filled in with the last observation. The ultimate goal is to tabulate the difference in qustionnaires between the start of the trial and the end of trial. I am lost at about how to do this. Each subject had multiple visits, up to 13. In general, they took a questionnaire at each visit. However, if a questionnaire was not completed or the visit is missing, the data point does not exist. To explain, I created a table as analogy. My goal is to take something that looks like the following: ID Visit score 1 1 10 2 1 12 2 3 15 3 1 1 3 2 6 4 1 16 4 2 1 4 3 7 4 4 17 I think I then need to change to this in order to perfrom locf in zoo: ID Visit score 1 1 10 1 2 na 1 3 na 1 4 na 2 1 12 2 2 na 2 3 15 2 4 na 3 1 1 3 2 6 3 3 na 3 4 na 4 1 16 4 2 1 4 3 7 4 4 17 require(zoo) dat2 - merge(data.frame(ID=rep(1:4, each=4), Visit=rep(1:4, 4)), dat, all.x=TRUE) dat2$Vscr.fill - na.locf(dat2$score) dat2 lapply(split(dat2, dat2$ID), function(x) x$Vscr.fill[4]-x $Vscr.fill[1] ) # $`1` [1] 0 $`2` [1] 3 $`3` [1] 5 $`4` [1] 1 -- David then change to: ID Visit score 1 1 10 1 2 10 1 3 10 1 4 10 2 1 12 2 2 12 2 3 15 2 4 15 3 1 1 3 2 6 3 3 6 3 4 6 4 1 16 4 2 1 4 3 7 4 4 17 I would then like to take visit 4 and subtract visit 1 to create the difference in the questionnaire scores during the clinical trial. I will then compare this score by t.test between placebo and drug groups. Would anyone please have some guidance about how to do this in r? I would be grateful for assistance. Regards, Matt -- View this message in context: http://r.789695.n4.nabble.com/help-with-using-last-observation-carried-forward-analysis-for-a-clinical-trial-please-tp3886396p3886396.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] round() and negative digits
On 11-10-09 4:00 AM, (Ted Harding) wrote: On 09-Oct-11 00:46:58, Carl Witthoft wrote: On 10/8/11 6:11 PM, (Ted Harding) wrote: Carl Witthoft's serendipitous discovery is a nice example of how secrets can be guessed by wondering what if ... ?. So probably you don;t need to tell the secrets. Taking the negative digits to their logical extreme: round(654.321,2) # [1] 654.32 round(654.321,1) # [1] 654.3 round(654.321,0) # [1] 654 round(654.321,-1) # [1] 650 round(654.321,-2) # [1] 700 round(654.321,-3) # [1] 1000 round(654.321,-4) # [1] 0 which is what you'd logically expect (but is it what you would intuitively expect?). Oh, oh, somebody's going all metaphysical on us. Nor should one forget the rounding rules (not OS-dependent in this case, I think ... ?): round(5000,-4) # [1] 0 round(15000,-4) # [1] 2 The intention is that those are not OS dependent, but since they rely on exact representations, there could be differences: not all platforms support the extended real 80 bit intermediate representations. (If you were rounding to 0 d.p., they should all agree on a round to even rule. Rounding to -4 d.p. involves dividing by 10^4, and that could lead to errors.) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] round() and negative digits
On Sat, 8 Oct 2011, Duncan Murdoch wrote: On 11-10-07 5:26 PM, Carl Witthoft wrote: Just wondering here -- I tested and found to my delight that % round(325.4,-2) [1] 300 gave me exactly what I would have expected (and wanted). Since it's not explicitly mentioned in the documentation that negative 'digits' is allowed, I just wanted to ask whether this behavior is intentional or a happy turn of events. I'm always paranoid that something not explicitly documented might disappear in future revisons. It is intentional, and one of the regression tests confirms that it's there, so it won't disappear by mistake, and would be very unlikely to disappear intentionally. It needs careful documentation though (as do the corner cases of signif). Things like round(325.4,-3) [1] 0 signif(325.4,-3) [1] 300 signif(325.4,0) [1] 300 may not be what you expect. Sometimes it is better not to document things than try to give precise details which may get changed *and* there will be useRs who misread (and maybe even file bug reports on their misreadings). The source is the ultimate documentation. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generalized Additive Models: How to create publication-ready regression tables
You may build your customized matrix merging the components of the objects before calling the xtable function: my.matrix - rbind(model$coefficients, [vector containing errors]) xtable(my.matrix) (I'm sorry I don't know exactly where the standard errors are stored / how to compute them) You can paste parentheses before and after the number with the paste function. Best, Emilio Maybe your can manipulate the standard errors in the object as text, pasting the parenthesis before and after with the paste function (before you call the xtable function) 2011/10/8 davidyeager dyea...@gmail.com Thanks! Yes, that produces tables that are formatted in the same way as the gam output. I'm hoping to have publication-ready tables that have standard errors in parentheses listed below coefficients. Do you know of a method to do that? David -- View this message in context: http://r.789695.n4.nabble.com/Generalized-Additive-Models-How-to-create-publication-ready-regression-tables-tp3884432p3885238.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with statistics in R - how to measure the effect of users in groups
Hi, I'm a newbie to R. My knowledge of statistics is mostly self-taught. My problem is how to measure the effect of users in groups. I can calculate a particular attribute for a user in a group. But my hypothesis is that the user's attribute is not independent of each other and that the user's attribute depends on the group ie that user's behaviour change based on the group. Let me give an example: users*Group 1*Group 2*Group 3 u1*10*5*n/a u2*6*n/a*4 u3*5*2*3 For example, I want to be able to prove that u1 behaviour is different in group 1 than other groups and the particular thing about Group 1 is that users in Group 1 tend to have a higher value of the attribute under measurement. Hence, can use R to test my hypothesis. I'm willing to learn; so if this is very simple, just point me in the direction of any online resources about it. At the moment, I don't even how to define these class of problems? That will be a start. Regards Gawesh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding solution
Dear all, I have a system of simultaneous equations with 2 unknowns as follows: x*y + (1-x) = 0.05 x*(y - .5)^2 + (1-x)*0.6 = 0.56^2 Ofcourse I can do it manually however wondering whether there is any direct way in R available to get the solution of this system? Thanks and regards, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding solution
On Oct 9, 2011, at 14:02 , Bogaso Christofer wrote: Dear all, I have a system of simultaneous equations with 2 unknowns as follows: x*y + (1-x) = 0.05 x*(y - .5)^2 + (1-x)*0.6 = 0.56^2 Ofcourse I can do it manually however wondering whether there is any direct way in R available to get the solution of this system? Not really (can't vouch for all 3000+ packages, though...) You can sometimes get away with converting to a minimization problem: f - function(xy,x=xy[1],y=xy[2])(x*y + (1-x) - 0.05)^2+(x*(y - .5)^2 + (1-x)*0.6 - 0.56^2)^2 optim(par=c(0,0),f,method=BFGS)$par [1] 0.91674574 -0.03627402 $value [1] 5.351777e-13 $counts function gradient 39 13 $convergence [1] 0 $message NULL Notice that there is some risk of falling into a local minimum which has nothing to do with the solution. Always check that the minimum actually comes out as zero. Thanks and regards, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Permutation or Bootstrap to obtain p-value for one sample
On Oct 9, 2011, at 12:00 , francesca casalino wrote: Thank you very much to both Ken and Peter for the very helpful explanations. Just to understand this better (sorry for repeating but I am also new in statistics…so please correct me where I am wrong): Ken' method: Random sampling of the mean, and then using these means to construct a distribution of means (the 'null' distribution), and I can then use this normal distribution and compare the population mean to my mean using, for example, z-score. Of note: The initial distributions are not normal, so I thought I needed to base my calculations on the median, but I can use the mean to construct a normal distribution. This would be defined a bootstrap test. Peter's method: Random sampling of the mean, and then comparing each sampled mean with the population mean and see if it is higher or equal to the difference between my sample and the population mean. This is a permutation test, but to actually get CI and a p-value I would need bootstrap method. Did I understand this correctly? I tried to start with Ken's approach for now, and followed his steps, but: 1) I get a lot of NaN in the sampling distribution, is this normal? 2) I think I am doing again something wrong when I try to find a p-value…This is what I did: nreps=1 mean.dist=rep(NA,nreps) for(replication in 1:nreps) { my.sample=sample(population$Y, 250, replace=F) #Peter mentioned that this sampling should be without replacement, so I went for that--- mean.for.rep=mean(my.sample) #mean for this replication mean.dist[replication]=mean.for.rep #store the mean } hist(mean.dist,main=Null Dist of Means, col=chartreuse) #Show the means in a nifty color mean_dist= mean(mean.dist, na.rm=TRUE) sd_pop= sd(mean.dist, na.rm=TRUE) mean_sample= mean(population$Y, na.rm=TRUE) You mean sample$Y, I hope. z_stat= (mean_sample - mean_dist)/(sd_pop/sqrt(2089)) p_value= 2*pnorm(-abs(z_stat)) Is this correct? No. First, notice that replicate() does all the red tape of the for loop for you. It's not a grave error to do what you're doing, but I did give you code, so maybe you should try it... Worse: The whole point of doing a permutation test is to compare the observed value directly to the actual permutation distribution. So you take your observed mean and see whether it falls in the middle or in the tails of the histogram. The p value can be estimated as the relative frequency of simulated means falling as far or further out in the tails as the observed one. You do not want to approximate the simulated distribution with a normal distribution. If that's what you wanted, you could do it with no simulation at all -- there are simple formulas relating the mean and variance of the sample mean to the mean and variance of the population, and you'd more or less wind up reinventing the two-sample t-test. THANK YOU SO MUCH FOR ALL YOUR HELP!! 2011/10/9 Ken Hutchison vicvoncas...@gmail.com Hi Francy, A bootstrap test would likely be sufficient for this problem, but a one-sample t-test isn't advisable or necessary in my opinion. If you use a t-test multiple times you are making assumptions about the distribution of your data; more importantly, your probability of Type 1 error will be increased with each test. So, a valid thing to do would be to sample (computation for this problem won't be expensive so do alotta reps) and compare your mean to the null distribution of means. I.E. nreps=1 mean.dist=rep(NA,nreps) for(replication in 1:nreps) { my.sample=sample(population, 2500, replace=T) #replace could be false, depends on preference mean.for.rep=mean(my.sample) #mean for this replication mean.dist[replication]=mean.for.rep #store the mean } hist(mean.dist,main=Null Dist of Means, col=chartreuse) #Show the means in a nifty color You can then perform various tests given the null distribution, or infer from where your sample mean lies within the distribution or something to that effect. Also, if the distribution is normal, which is somewhat likely since it is a distribution of means: (shapiro.test or require(nortest) ad.test will let you know) you should be able to make inference from that using parametric methods (once) which will fit the truth a bit better than a t.test. Hope that's helpful, Ken Hutchison On Sat, Oct 8, 2011 at 10:04 AM, francy francy.casal...@gmail.com wrote: Hi, I am having trouble understanding how to approach a simulation: I have a sample of n=250 from a population of N=2,000 individuals, and I would like to use either permutation test or bootstrap to test whether this particular sample is significantly different from the values of any other random samples of the same population. I thought I needed to take random samples (but I am not sure how many simulations I need to do) of n=250 from the N=2,000
[R] strucchange Nyblom-Hansen Test?
I want to apply Nyblom-Hansen test with the strucchange package, but I don't know how is the correct way and what is the difference between the following two approaches (leeding to different results): data(longley) # 1. Approach: sctest(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = longley, type = Nyblom-Hansen) #results in: #Score-based CUSUM test with mean L2 norm # #data: Employed ~ Year + GNP.deflator + GNP + Armed.Forces #f(efp) = 0.8916, p-value = 0.4395 #2. Approach: sctest(gefp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = longley), functional = meanL2BB) #results in: #M-fluctuation test # #data: gefp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = longley) #f(efp) = 0.8165, p-value = 0.3924 I could not find any examples or further remarks of the first approach with sctest(..., type = Nyblom-Hansen). Maybe the first approach is unlike the second no joint test for all coefficients? Thank you in advance for your help! -- View this message in context: http://r.789695.n4.nabble.com/strucchange-Nyblom-Hansen-Test-tp3887208p3887208.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Permutation or Bootstrap to obtain p-value for one sample
Thank you very much to both Ken and Peter for the very helpful explanations. Just to understand this better (sorry for repeating but I am also new in statistics so please correct me where I am wrong): Ken' method: Random sampling of the mean, and then using these means to construct a distribution of means (the 'null' distribution), and I can then use this normal distribution and compare the population mean to my mean using, for example, z-score. Of note: The initial distributions are not normal, so I thought I needed to base my calculations on the median, but I can use the mean to construct a normal distribution. This would be defined a bootstrap test. Peter's method: Random sampling of the mean, and then comparing each sampled mean with the population mean and see if it is higher or equal to the difference between my sample and the population mean. This is a permutation test, but to actually get CI and a p-value I would need bootstrap method. Did I understand this correctly? I tried to start with Ken's approach for now, and followed his steps, but: 1) I get a lot of NaN in the sampling distribution, is this normal? 2) I think I am doing again something wrong when I try to find a p-value This is what I did: nreps=1 mean.dist=rep(NA,nreps) for(replication in 1:nreps) { my.sample=sample(population$Y, 250, replace=F) #Peter mentioned that this sampling should be without replacement, so I went for that--- mean.for.rep=mean(my.sample) #mean for this replication mean.dist[replication]=mean.for.rep #store the mean } hist(mean.dist,main=Null Dist of Means, col=chartreuse) #Show the means in a nifty color mean_dist= mean(mean.dist, na.rm=TRUE) sd_pop= sd(mean.dist, na.rm=TRUE) mean_sample= mean(population$Y, na.rm=TRUE) z_stat= (mean_sample - mean_dist)/(sd_pop/sqrt(2089)) p_value= 2*pnorm(-abs(z_stat)) Is this correct? THANK YOU SO MUCH FOR ALL YOUR HELP!! 2011/10/9 Ken Hutchison vicvoncas...@gmail.com Hi Francy, A bootstrap test would likely be sufficient for this problem, but a one-sample t-test isn't advisable or necessary in my opinion. If you use a t-test multiple times you are making assumptions about the distribution of your data; more importantly, your probability of Type 1 error will be increased with each test. So, a valid thing to do would be to sample (computation for this problem won't be expensive so do alotta reps) and compare your mean to the null distribution of means. I.E. nreps=1 mean.dist=rep(NA,nreps) for(replication in 1:nreps) { my.sample=sample(population, 2500, replace=T) #replace could be false, depends on preference mean.for.rep=mean(my.sample) #mean for this replication mean.dist[replication]=mean.for.rep #store the mean } hist(mean.dist,main=Null Dist of Means, col=chartreuse) #Show the means in a nifty color You can then perform various tests given the null distribution, or infer from where your sample mean lies within the distribution or something to that effect. Also, if the distribution is normal, which is somewhat likely since it is a distribution of means: (shapiro.test or require(nortest) ad.test will let you know) you should be able to make inference from that using parametric methods (once) which will fit the truth a bit better than a t.test. Hope that's helpful, Ken Hutchison On Sat, Oct 8, 2011 at 10:04 AM, francy francy.casal...@gmail.com wrote: Hi, I am having trouble understanding how to approach a simulation: I have a sample of n=250 from a population of N=2,000 individuals, and I would like to use either permutation test or bootstrap to test whether this particular sample is significantly different from the values of any other random samples of the same population. I thought I needed to take random samples (but I am not sure how many simulations I need to do) of n=250 from the N=2,000 population and maybe do a one-sample t-test to compare the mean score of all the simulated samples, + the one sample I am trying to prove that is different from any others, to the mean value of the population. But I don't know: (1) whether this one-sample t-test would be the right way to do it, and how to go about doing this in R (2) whether a permutation test or bootstrap methods are more appropriate This is the data frame that I have, which is to be sampled: df- i.e. x y 1 2 3 4 5 6 7 8 . . . . . . 2,000 I have this sample from df, and would like to test whether it is has extreme values of y. sample1- i.e. x y 3 4 7 8 . . . . . . 250 For now I only have this: R=999 #Number of simulations, but I don't know how many... t.values =numeric(R) #creates a numeric vector with 999 elements, which will hold the results of each simulation. for (i in 1:R) { sample1 - df[sample(nrow(df), 250, replace=TRUE),] But I don't know how to continue the loop: do I calculate the mean for each simulation and compare it to the population mean? Any help you could
Re: [R] Finding solution
R is not the right tool for all things. This looks like a job for a computer algebra system. That said, R **does** have at least one interface to such a system. See the Ryacas package (check my capitalization, which may be wrong). HelpeRs may provide you with others. -- Bert On Sun, Oct 9, 2011 at 5:02 AM, Bogaso Christofer bogaso.christo...@gmail.com wrote: Dear all, I have a system of simultaneous equations with 2 unknowns as follows: x*y + (1-x) = 0.05 x*(y - .5)^2 + (1-x)*0.6 = 0.56^2 Ofcourse I can do it manually however wondering whether there is any direct way in R available to get the solution of this system? Thanks and regards, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pdIdent in smoothing regression model
Hi there, I am reading the 2004 paper Smoothing with mixed model software in Journal of Statistical Software, by Ngo and Wand. I tried to run their first example in Section 2.1 using R but I had some problems. Here is the code: library(nlme) fossil - read.table(fossil.dat,header=T) x - fossil$age y - 10*fossil$strontium.ratio knots - seq(94,121,length=25) n - length(x) X - cbind(rep(1,n),x) Z - outer(x,knots,-) Z - Z*(Z0) fit - lme(y~-1+X,random=pdIdent(~-1+Z)) When I ran the code fit - lme(y~-1+X,random=pdIdent(~-1+Z)) I got an error message: Error in getGroups.data.frame(dataMix, groups) : Invalid formula for groups I was really puzzled. I asked Dr. Ngo and he said they did it in S-plus but not R. Does anyone knows how to do it in R? Thanks! Lei Liu Associate Professor Division of Biostatistics Department of Public Health Sciences University of Virginia School of Medicine http://people.virginia.edu/~ll9f/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Permutation or Bootstrap to obtain p-value for one sample
I may be speaking out of turn here, but I would prefer not to see R-help turn into a tutorial site for basic statistics.Such sites already exist (e.g. http://stats.stackexchange.com/). I realize that there is occasionally reason to venture down this path a way within legitimate R contexts, but this seems to me to have gone way beyond. Just my opinion, of course. -- Bert On Sun, Oct 9, 2011 at 3:00 AM, francesca casalino francy.casal...@gmail.com wrote: Thank you very much to both Ken and Peter for the very helpful explanations. Just to understand this better (sorry for repeating but I am also new in statistics so please correct me where I am wrong): Ken' method: Random sampling of the mean, and then using these means to construct a distribution of means (the 'null' distribution), and I can then use this normal distribution and compare the population mean to my mean using, for example, z-score. Of note: The initial distributions are not normal, so I thought I needed to base my calculations on the median, but I can use the mean to construct a normal distribution. This would be defined a bootstrap test. Peter's method: Random sampling of the mean, and then comparing each sampled mean with the population mean and see if it is higher or equal to the difference between my sample and the population mean. This is a permutation test, but to actually get CI and a p-value I would need bootstrap method. Did I understand this correctly? I tried to start with Ken's approach for now, and followed his steps, but: 1) I get a lot of NaN in the sampling distribution, is this normal? 2) I think I am doing again something wrong when I try to find a p-value This is what I did: nreps=1 mean.dist=rep(NA,nreps) for(replication in 1:nreps) { my.sample=sample(population$Y, 250, replace=F) #Peter mentioned that this sampling should be without replacement, so I went for that--- mean.for.rep=mean(my.sample) #mean for this replication mean.dist[replication]=mean.for.rep #store the mean } hist(mean.dist,main=Null Dist of Means, col=chartreuse) #Show the means in a nifty color mean_dist= mean(mean.dist, na.rm=TRUE) sd_pop= sd(mean.dist, na.rm=TRUE) mean_sample= mean(population$Y, na.rm=TRUE) z_stat= (mean_sample - mean_dist)/(sd_pop/sqrt(2089)) p_value= 2*pnorm(-abs(z_stat)) Is this correct? THANK YOU SO MUCH FOR ALL YOUR HELP!! 2011/10/9 Ken Hutchison vicvoncas...@gmail.com Hi Francy, A bootstrap test would likely be sufficient for this problem, but a one-sample t-test isn't advisable or necessary in my opinion. If you use a t-test multiple times you are making assumptions about the distribution of your data; more importantly, your probability of Type 1 error will be increased with each test. So, a valid thing to do would be to sample (computation for this problem won't be expensive so do alotta reps) and compare your mean to the null distribution of means. I.E. nreps=1 mean.dist=rep(NA,nreps) for(replication in 1:nreps) { my.sample=sample(population, 2500, replace=T) #replace could be false, depends on preference mean.for.rep=mean(my.sample) #mean for this replication mean.dist[replication]=mean.for.rep #store the mean } hist(mean.dist,main=Null Dist of Means, col=chartreuse) #Show the means in a nifty color You can then perform various tests given the null distribution, or infer from where your sample mean lies within the distribution or something to that effect. Also, if the distribution is normal, which is somewhat likely since it is a distribution of means: (shapiro.test or require(nortest) ad.test will let you know) you should be able to make inference from that using parametric methods (once) which will fit the truth a bit better than a t.test. Hope that's helpful, Ken Hutchison On Sat, Oct 8, 2011 at 10:04 AM, francy francy.casal...@gmail.com wrote: Hi, I am having trouble understanding how to approach a simulation: I have a sample of n=250 from a population of N=2,000 individuals, and I would like to use either permutation test or bootstrap to test whether this particular sample is significantly different from the values of any other random samples of the same population. I thought I needed to take random samples (but I am not sure how many simulations I need to do) of n=250 from the N=2,000 population and maybe do a one-sample t-test to compare the mean score of all the simulated samples, + the one sample I am trying to prove that is different from any others, to the mean value of the population. But I don't know: (1) whether this one-sample t-test would be the right way to do it, and how to go about doing this in R (2) whether a permutation test or bootstrap methods are more appropriate This is the data frame that I have, which is to be sampled: df- i.e. x y 1 2 3 4 5 6 7 8 . . . . . .
Re: [R] strucchange Nyblom-Hansen Test?
On Sun, 9 Oct 2011, buehlerman wrote: I want to apply Nyblom-Hansen test with the strucchange package, but I don't know how is the correct way and what is the difference between the following two approaches (leeding to different results): The difference is that sctest(formula, type = Nyblom-Hansen) applies the Nyblom-Hansen test statistic to a model which assesses both coefficients _and_ error variance. The approach via functional = meanL2BB, on the other hand, allows to apply the same type of test statistic to the score functions of any model. In your case, where you used the default fit = glm in gefp(), a linear regression model is used where the error variance is _not_ included as a full model parameter but only as a nuisance parameter. Hence, the difference. Of course, one may also add another score function for the error variance. On example(DIJA, package = strucchange) I provide a function normlm() with corresponding estfun() method. If you load these, you can do: R sctest(gefp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = longley, fit = normlm), functional = meanL2BB) M-fluctuation test data: gefp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = longley, fit = normlm) f(efp) = 0.8916, p-value = 0.4395 which leads to the same output as sctest(formula, type = Nyblom-Hansen). Finally, instead of using gefp(..., fit = normlm), you could have also used efp(..., type = Score-CUSUM): R sctest(efp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = longley, type = Score-CUSUM), functional = meanL2) Score-based CUSUM test with mean L2 norm data: efp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = longley, type = Score-CUSUM) f(efp) = 0.8916, p-value = 0.4395 I hope that this clarifies more than adding to the confusion ;-) The reason for the various approaches is that efp() was always confined to the linear model and gefp() then extended it to arbitrary estimating function-based models. And for the linear model this provides the option of treating the variance of a nuisance parameter or a full model parameter. Hope that helps, Z data(longley) # 1. Approach: sctest(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = longley, type = Nyblom-Hansen) #results in: #Score-based CUSUM test with mean L2 norm # #data: Employed ~ Year + GNP.deflator + GNP + Armed.Forces #f(efp) = 0.8916, p-value = 0.4395 #2. Approach: sctest(gefp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = longley), functional = meanL2BB) #results in: #M-fluctuation test # #data: gefp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = longley) #f(efp) = 0.8165, p-value = 0.3924 I could not find any examples or further remarks of the first approach with sctest(..., type = Nyblom-Hansen). Maybe the first approach is unlike the second no joint test for all coefficients? Thank you in advance for your help! -- View this message in context: http://r.789695.n4.nabble.com/strucchange-Nyblom-Hansen-Test-tp3887208p3887208.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Substract previous element
Dear all, I have a matrix with data and I want to substract from every value the previous element. Let's assume that my vector(matrix) is c-(1,2,3,4,5) I want to get remove_previous c-(0,1,2,3,4). How I can do that efficiently in R? I would like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generalized Additive Models: How to create publication-ready regression tables
In many cases a flexible parametric fit, using regression splines, will result in a fit that is as good as a gam, with similar regression shapes. The rms package has a latex method that will represent such fits in interpretable algebraic form. latex(fit) does that, and print(fit, latex=TRUE) will give a nice table though not formatted in the way you described. Frank Emilio López wrote: You may build your customized matrix merging the components of the objects before calling the xtable function: my.matrix - rbind(model$coefficients, [vector containing errors]) xtable(my.matrix) (I'm sorry I don't know exactly where the standard errors are stored / how to compute them) You can paste parentheses before and after the number with the paste function. Best, Emilio Maybe your can manipulate the standard errors in the object as text, pasting the parenthesis before and after with the paste function (before you call the xtable function) 2011/10/8 davidyeager lt;dyeager@gt; Thanks! Yes, that produces tables that are formatted in the same way as the gam output. I'm hoping to have publication-ready tables that have standard errors in parentheses listed below coefficients. Do you know of a method to do that? David -- View this message in context: http://r.789695.n4.nabble.com/Generalized-Additive-Models-How-to-create-publication-ready-regression-tables-tp3884432p3885238.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Generalized-Additive-Models-How-to-create-publication-ready-regression-tables-tp3884432p3887519.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] axes3d/bbox3d - axis values not fixed
Excellent! Thank you! Ben On Sat, Oct 8, 2011 at 3:36 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote: On 11-10-08 11:04 AM, Ben qant wrote: Thank you! Sorry, I have a couple more questions: 1) How to I turn off the box shading completely? I figured out how to lighten it up to a grey color with col=c(white... . I looked in the package pdf, but I didn't see anything. 2) I read about the zunit in the package pdf, but no matter what I change it to it doesn't seem to change anything. Here is where I am at right now: x- 1:10 y- 1:10 z- matrix(outer(x-5,y-5) + rnorm(100), 10, 10) open3d() persp3d(x, y, z, col=red, alpha=0.7,aspect=c(1,1,1),**xlab='',ylab='',zlab='z', axes=F) bbox3d(xat=c(5, 6), xlab=c(a, b), yat=c(2,4,6), zunit=10, col=c(white,black)) You need to play with the material properties of the bounding box. I think this gives what you want: bbox3d(xat=c(5, 6), xlab=c(a, b), yat=c(2,4,6), zunit=10, col=black, front=line, back=line, lit=FALSE) I see different axes for zunit=5, zunit=10, zunit=20. Not sure why you don't. Duncan Thank you for your help! Ben On Sat, Oct 8, 2011 at 5:09 AM, Duncan Murdochmurdoch.duncan@gmail.**commurdoch.dun...@gmail.com wrote: On 11-10-07 2:32 PM, Ben qant wrote: Hello, I'm using the rgl package and plotting a plot with it. I'd like to have all the axes values auto-hide, but I want to plot a series of characters instead of the values of the measurement for 2 of the axes. So in the end I will have one axis (z actually) behave per normal (auto-hide) and I'd like the other two axes to be custom character vectors that auto-hide. The axes in rgl are a little messy. It's been on my todo list for a long time to fix them, but there are a lot of details to handle, and only so much time. Essentially there are two separate systems for axes: the bbox3d system, and the axis3d system. The former is the ones that appear and disappear, the latter is really just lines and text added to the plot. Example: x- 1:10 y- 1:10 z- matrix(outer(x-5,y-5) + rnorm(100), 10, 10) open3d() persp3d(x, y, z, col=red, alpha=0.7, aspect=c(1,1,0.5),xlab='',ylab='',zlab='', axes=F) For the above, axes=F for demonstration purposes only. Now when I call: axes3d() ...the axis values hide/behave the way I want, but I want my own characters in there instead of the values that default. You want to use the bbox3d() call. For example, bbox3d(xat=c(5, 10), xlab=c(V, X), yat=c(2,4,6), zunit=10) for three different types of labels on the three axes. It would be nice if axis3d had the same options as bbox3d, but so far it doesn't. Duncan Murdoch Trying again: open3d() persp3d(x, y, z, col=red, alpha=0.7, aspect=c(1,1,0.5),xlab='',ylab='',zlab='', axes=F) axis3d('x',labels='test') ...puts in a custom character label 'test', but I loose the behavior/hiding. Also, then how do I get the values for the z axis to populate with the default values and auto-hide with the other two custom string axes auto-hiding? I'm pretty sure I need to use bbox3d(), but I'm not having any luck. I'm new'ish to R and very new to the rgl package. Hopefully that makes sense. Thanks for your help! ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-helphttps://stat.ethz.ch/mailman/**listinfo/r-help https://stat.**ethz.ch/mailman/listinfo/r-**helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.htmlhttp://www.**R-project.org/posting-guide.**htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Substract previous element
Take a look at diff() though offhand I dont know what it does to matrices M On Oct 9, 2011, at 11:00 AM, Alaios ala...@yahoo.com wrote: Dear all, I have a matrix with data and I want to substract from every value the previous element. Let's assume that my vector(matrix) is c-(1,2,3,4,5) I want to get remove_previous c-(0,1,2,3,4). How I can do that efficiently in R? I would like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Substract previous element
Vectors are not matrices in R, though matrices are special cases of vectors. See ?diff for a solution that works with vectors. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Alaios ala...@yahoo.com wrote: Dear all, I have a matrix with data and I want to substract from every value the previous element. Let's assume that my vector(matrix) is c-(1,2,3,4,5) I want to get remove_previous c-(0,1,2,3,4). How I can do that efficiently in R? I would like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] variable name question
Hi All, This is surely an easy question but somehow I am not being able to get it. I am using R 2.13.2 and have a data set where variable names like this appear: pci1990, pci1991, ... , pci2009. pci1990 has data on per capita income for 1990, pci1991 has data on per capita income for 1991, and so on. I would like to create the logarithm of per capita for each of the year and could do so in STATA with the following commands: forvalues number = 1990/2009 { gen lpci`number' = log(pci`number') } What would be the corresponding set of commands in R? Thanks a lot in advance. Deepankar [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] convert apply to lappy
Dear all I want to convert a apply to lapply. The reason for that is that there is a function mclappy that uses exact the same format as the lapply function. My code looks like that mean_power_per_tip - function(data) { return((apply(data[,],2,MeanTip))); } where data is a [m,n] matrix. I would like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable name question
This is surely an easy question but somehow I am not being able to get it. get() is the key -- it takes a string and returns the object with that string as its name. Assign() goes the other way Try this: for (i in 1990:2009) { varName = paste(pci, i, collapse = ) assign(varName, log(get(varName)) } That said, the standard advice is that its usually more R-ish to keep all your data in a list, data frame or, for this case, a matrix. Michael On Sun, Oct 9, 2011 at 11:46 AM, Deepankar Basu basu...@gmail.com wrote: Hi All, This is surely an easy question but somehow I am not being able to get it. I am using R 2.13.2 and have a data set where variable names like this appear: pci1990, pci1991, ... , pci2009. pci1990 has data on per capita income for 1990, pci1991 has data on per capita income for 1991, and so on. I would like to create the logarithm of per capita for each of the year and could do so in STATA with the following commands: forvalues number = 1990/2009 { gen lpci`number' = log(pci`number') } What would be the corresponding set of commands in R? Thanks a lot in advance. Deepankar [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fast or space-efficient lookup?
Dear R experts---I am struggling with memory and speed issues. Advice would be appreciated. I have a long data set (of financial stock returns, with stock name and trading day). All three variables, stock return, id and day, are irregular. About 1.3GB in object.size (200MB on disk). now, I need to merge the main data set with some aggregate data (e.g., the SP500 market rate of return, with a day index) from the same day. this market data set is not a big data set (object.size=300K, 5 columns, 12000 rows). let's say my (dumb statistical) plan is to run one grand regression, where the individual rate of return is y and the market rate of return is x. the following should work without a problem: combined - merge( main, aggregate.data, by=day, all.x=TRUE, all.y=FALSE ) lm( stockreturn ~ marketreturn, data=combined ) alas, the merge is neither space-efficient nor fast. in fact, I run out of memory on my 16GB linux machine. my guess is that by whittling it down, I could work it (perhaps doing it in chunks, and then rbinding it), but this is painful. in perl, I would define a hash with the day as key and the market return as value, and then loop over the main data set to supplement it. is there a recommended way of doing such tasks in R, either super-fast (so that I merge many many times) or space efficient (so that I merge once and store the results)? sincerely, /iaw Ivo Welch (ivo.we...@gmail.com) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert apply to lappy
Hi Alex, If data is a matrix, probably the easiest option would be: tips - as.data.frame(data) mclapply(tips, foo) By the way, I would recommend not using 'data' (which is also a function) as the name of the object storing your data. If your data set has many columns and performance is an issue I might convert it to a list instead of a data frame. Note that if you wanted the equivalent of apply(tips, 1, foo), you could transpose your matrix first: as.data.frame(t(data)). lapply works on columns of a data frame because each column is basically an element of a list (list apply). Cheers, Josh On Sun, Oct 9, 2011 at 8:47 AM, Alaios ala...@yahoo.com wrote: Dear all I want to convert a apply to lapply. The reason for that is that there is a function mclappy that uses exact the same format as the lapply function. My code looks like that mean_power_per_tip - function(data) { return((apply(data[,],2,MeanTip))); } where data is a [m,n] matrix. I would like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple levelplot with title
Hi all, I'm new to R and to the mailing list, so please bear with me :-) I would like to create multiple levelplots on the same chart with a nice main title with something like this: print(levelplot(matrix(c(1,2,3,4), 2, 2)), split=c(1, 1, 2, 1)) print(levelplot(matrix(c(1,2,3,4), 2, 2)), split=c(2, 1, 2, 1), newpage=FALSE) I found a trick: mtext(Test, outer = TRUE, cex = 1.5) here: https://stat.ethz.ch/pipermail/r-help/2008-July/168163.html but it doesn't works for me. Could anyone please show me some pointers what should I read in order to get an insight why this isn't working as I expect? What I managed to find a workaround by using panel.text(), but I don't really like it since it requires defined x/y coordinates and not scales if the picture is resized. panel.text(x=20, y=110, Test) Thanks in advance! Richard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ANOVA from imported data has only 1 degree of freedom
Hi, I'm trying to analyse some data I have imported into R from a .csv file but when I carry out the aov command the results show only one degree of freedom when there should be 14. Does anyone know why? I'd really appreciate some help, the data is pasted below. /The imported table looks ike this this:/ Order Transect Sample Abundance 1 Coleoptera1 113 2 Coleoptera1 212 3 Coleoptera1 311 4 Coleoptera1 413 5 Coleoptera1 5 6 6 Coleoptera2 118 7 Coleoptera2 218 8 Coleoptera2 316 9 Coleoptera2 421 10 Coleoptera2 511 11 Coleoptera3 119 12 Coleoptera3 216 13 Coleoptera3 3 9 14 Coleoptera3 432 15 Coleoptera3 529 /The command I am using is this:/ anova2-aov(Abundance~Transect,data=beetle) /The results come out like this:/ Df Sum Sq Mean Sq F value Pr(F) Transect 1 250.00 250.000 7.2394 0.01852 * Residuals 13 448.93 34.533 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Cheers, Sam -- View this message in context: http://r.789695.n4.nabble.com/ANOVA-from-imported-data-has-only-1-degree-of-freedom-tp3887528p3887528.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Permutation or Bootstrap to obtain p-value for one sample
I'll concur with Peter Dalgaard that * a permutation test is the right thing to do - your problem is equivalent to a two-sample test, * don't bootstrap, and * don't bother with t-statistics but I'll elaborate a bit on on why, including * two approaches to the whole problem - and how your approach relates to the usual approach, * an interesting tidbit about resampling t statistics. First, I'm assuming that your x variable is irrelevant, only y matters, and that sample1 is a proper subset of df. I'll also assume that you want to look for differences in the mean, rather than arbitrary differences (in which case you might use e.g. a Kolmogorov-Smirnov test). There are two general approaches to this problem: (1) two-sample problem, sample1$y vs df$y[rows other than sample 1] (2) the approach you outlined, thinking of sample1$y as part of df$y. Consider (1), and call the two data sets y1 and y2 The basic permutation test approach is * compute the test statistic theta(y1, y2), e.g. mean(y1)-mean(y2) * repeat (or 9) times: draw a sample of size n1 from the pooled data, call that Y1, call the rest Y2 compute theta(Y1, Y2) * P-value for a one-sided test is (1 + k) / (1 + ) where k is the number of permutation samples with theta(Y1,Y2) = theta(y1,y2) The test statistic could be mean(y1) - mean(y2) mean(y1) sum(y1) t-statistic (pooled variance) P-value for a t-test (pooled variance) mean(y1) - mean(pooled data) t-statistic (unpooled variance) P-value for a t-test (unpooled variance) median(y1) - median(y2) ... The first six of those are equivalent - they give exactly the same P-value for the permutation test. The reason is that those test statistics are monotone transformations of each other, given the data. Hence, doing the pooled-variance t calculations gains nothing. Now consider your approach (2). That is equivalent to the permutation test using the test statistic: mean(y1) - mean(pooled data). Why not a bootstrap? E.g. pool the data and draw samples of size n1 and n2 from the pooled data, independently and with replacement. That is similar to the permutation test, but less accurate. Probably the easiest way to see this is to suppose there is 1 outlier in the pooled data. In any permutation iteration there is exactly 1 outlier among the two samples. With bootstrapping, there could be 0, 1, 2, The permutation test answers the question - given that there is exactly 1 outlier in my combined data, what is the probability that random chance would give a difference as large as I observed. The bootstrap would answer some other question. Tim Hesterberg NEW! Mathematical Statistics with Resampling and R, Chihara Hesterberg http://www.amazon.com/Mathematical-Statistics-Resampling-Laura-Chihara/dp/1118029852/ref=sr_1_1?ie=UTF8 http://home.comcast.net/~timhesterberg (resampling, water bottle rockets, computers to Guatemala, shower = 2650 light bulbs, ...) On Oct 8, 2011, at 16:04 , francy wrote: Hi, I am having trouble understanding how to approach a simulation: I have a sample of n=250 from a population of N=2,000 individuals, and I would like to use either permutation test or bootstrap to test whether this particular sample is significantly different from the values of any other random samples of the same population. I thought I needed to take random samples (but I am not sure how many simulations I need to do) of n=250 from the N=2,000 population and maybe do a one-sample t-test to compare the mean score of all the simulated samples, + the one sample I am trying to prove that is different from any others, to the mean value of the population. But I don't know: (1) whether this one-sample t-test would be the right way to do it, and how to go about doing this in R (2) whether a permutation test or bootstrap methods are more appropriate This is the data frame that I have, which is to be sampled: df- i.e. x y 1 2 3 4 5 6 7 8 . . . . . . 2,000 I have this sample from df, and would like to test whether it is has extreme values of y. sample1- i.e. x y 3 4 7 8 . . . . . . 250 For now I only have this: R=999 #Number of simulations, but I don't know how many... t.values =numeric(R) #creates a numeric vector with 999 elements, which will hold the results of each simulation. for (i in 1:R) { sample1 - df[sample(nrow(df), 250, replace=TRUE),] But I don't know how to continue the loop: do I calculate the mean for each simulation and compare it to the population mean? Any help you could give me would be very appreciated, Thank you. The straightforward way would be a permutation test, something like this msamp - mean(sample1$y) mpop - mean(df$y) msim - replicate(1, mean(sample(df$y, 250))) sum(abs(msim-mpop) = abs(msamp-mpop))/1 I don't really see a reason to do bootstrapping here. You say you want to test whether your sample could be a random sample from the population, so just simulate that sampling
[R] variable name question
Hi All, This is surely an easy question but somehow I am not being able to get it. I am using R 2.13.2 and have a data set where variable names like this appear: pci1990, pci1991, ... , pci2009. pci1990 has data on per capita income for 1990, pci1991 has data on per capita income for 1991, and so on. I would like to create the logarithm of per capita for each of the year and could do so in STATA with the following commands: forvalues number = 1990/2009 { gen lpci`number' = log(pci`number') } What would be the corresponding set of commands in R? Thanks a lot in advance. Deepankar __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cuhre usage ?? multidimensional integration
Have you taken a look at the provided introductory materials? type help.start() to get An Introduction to R which I believe is provided as a part of every pre-packaged version of R so it's most likely already on your machine. Read that and then we can sort through how to correctly implement your function. Michael On Sat, Oct 8, 2011 at 7:28 PM, sevenfrost linshuan...@gmail.com wrote: I'm not quite familiar with R. Could you recommend some relative material. Actually, I'm not intend to loop from 1 to 1. I'm just trying to see if it works. It's supposed to be some len1. I'm trying to solve the following example: there are 3 types of variables(x,w,t), they are not independent and have different distribution. Within each type, there are more than 1 variables. I need to integrate all the variable. joint pdf is like f(x,w,t)=∏_i▒〖f1(wi)f2(ti)〗(∏_j▒〖f3(xj) ∏_j▒〖f4(xj*wi-ti〗)〗) -- View this message in context: http://r.789695.n4.nabble.com/cuhre-usage-multidimensional-integration-tp3873478p3886287.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Permutation or Bootstrap to obtain p-value for one sample
Dear Peter and Tim, Thank you very much for taking the time to explain this to me! It is much more clear now. And sorry for using the space here maybe inappropriately, I really hope this is OK and gets posted, I think it is really important that non-statisticians like myself get a good idea of the concepts behind the functions of R. I am really grateful you went through this with me. -f 2011/10/9 Tim Hesterberg timhesterb...@gmail.com I'll concur with Peter Dalgaard that * a permutation test is the right thing to do - your problem is equivalent to a two-sample test, * don't bootstrap, and * don't bother with t-statistics but I'll elaborate a bit on on why, including * two approaches to the whole problem - and how your approach relates to the usual approach, * an interesting tidbit about resampling t statistics. First, I'm assuming that your x variable is irrelevant, only y matters, and that sample1 is a proper subset of df. I'll also assume that you want to look for differences in the mean, rather than arbitrary differences (in which case you might use e.g. a Kolmogorov-Smirnov test). There are two general approaches to this problem: (1) two-sample problem, sample1$y vs df$y[rows other than sample 1] (2) the approach you outlined, thinking of sample1$y as part of df$y. Consider (1), and call the two data sets y1 and y2 The basic permutation test approach is * compute the test statistic theta(y1, y2), e.g. mean(y1)-mean(y2) * repeat (or 9) times: draw a sample of size n1 from the pooled data, call that Y1, call the rest Y2 compute theta(Y1, Y2) * P-value for a one-sided test is (1 + k) / (1 + ) where k is the number of permutation samples with theta(Y1,Y2) = theta(y1,y2) The test statistic could be mean(y1) - mean(y2) mean(y1) sum(y1) t-statistic (pooled variance) P-value for a t-test (pooled variance) mean(y1) - mean(pooled data) t-statistic (unpooled variance) P-value for a t-test (unpooled variance) median(y1) - median(y2) ... The first six of those are equivalent - they give exactly the same P-value for the permutation test. The reason is that those test statistics are monotone transformations of each other, given the data. Hence, doing the pooled-variance t calculations gains nothing. Now consider your approach (2). That is equivalent to the permutation test using the test statistic: mean(y1) - mean(pooled data). Why not a bootstrap? E.g. pool the data and draw samples of size n1 and n2 from the pooled data, independently and with replacement. That is similar to the permutation test, but less accurate. Probably the easiest way to see this is to suppose there is 1 outlier in the pooled data. In any permutation iteration there is exactly 1 outlier among the two samples. With bootstrapping, there could be 0, 1, 2, The permutation test answers the question - given that there is exactly 1 outlier in my combined data, what is the probability that random chance would give a difference as large as I observed. The bootstrap would answer some other question. Tim Hesterberg NEW! Mathematical Statistics with Resampling and R, Chihara Hesterberg http://www.amazon.com/Mathematical-Statistics-Resampling-Laura-Chihara/dp/1118029852/ref=sr_1_1?ie=UTF8 http://home.comcast.net/~timhesterberg (resampling, water bottle rockets, computers to Guatemala, shower = 2650 light bulbs, ...) On Oct 8, 2011, at 16:04 , francy wrote: Hi, I am having trouble understanding how to approach a simulation: I have a sample of n=250 from a population of N=2,000 individuals, and I would like to use either permutation test or bootstrap to test whether this particular sample is significantly different from the values of any other random samples of the same population. I thought I needed to take random samples (but I am not sure how many simulations I need to do) of n=250 from the N=2,000 population and maybe do a one-sample t-test to compare the mean score of all the simulated samples, + the one sample I am trying to prove that is different from any others, to the mean value of the population. But I don't know: (1) whether this one-sample t-test would be the right way to do it, and how to go about doing this in R (2) whether a permutation test or bootstrap methods are more appropriate This is the data frame that I have, which is to be sampled: df- i.e. x y 1 2 3 4 5 6 7 8 . . . . . . 2,000 I have this sample from df, and would like to test whether it is has extreme values of y. sample1- i.e. x y 3 4 7 8 . . . . . . 250 For now I only have this: R=999 #Number of simulations, but I don't know how many... t.values =numeric(R) #creates a numeric vector with 999 elements, which will hold the results of each simulation. for (i in 1:R) { sample1 - df[sample(nrow(df), 250, replace=TRUE),] But I don't know how to continue
Re: [R] Substract previous element
Thanks a lot for the help From: Jeff Newmiller jdnew...@dcn.davis.ca.us Sent: Sunday, October 9, 2011 6:35 PM Subject: Re: [R] Substract previous element Vectors are not matrices in R, though matrices are special cases of vectors. See ?diff for a solution that works with vectors. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Dear all, I have a matrix with data and I want to substract from every value the previous element. Let's assume that my vector(matrix) is c-(1,2,3,4,5) I want to get remove_previous c-(0,1,2,3,4). How I can do that efficiently in R? I would like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] apply to a matrix and insert in the middle of an array
If possible I'd like to produce a function that applies a formula to a column in a matrix (essentially calculating the mse) and then inserts it between values of a an array ... confusing I know, here is an example of what I'm trying to accomplish: ## create a matrix (a - matrix(c(3,6,4,8,5,9,12,15),nrow=4)) ## get the mean of all columns (b - apply(a,2,mean)) ## calculate the mse of column 1 (c - (sd(a[,1])/sqrt(length(a[,1] ## calculate the mse of column 2 (d - (sd(a[,2])/sqrt(length(a[,2] ## now create a new vector with each mean value and its corresponding ## mse right beside it (e - c(b[1],c,b[2],d)) Would anyone have any suggestions how to accomplish this using an apply statement? Thanks a bunch, J === Dr. Jim Maas University of East Anglia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA from imported data has only 1 degree of freedom
On Oct 9, 2011, at 11:10 AM, shardman wrote: Hi, I'm trying to analyse some data I have imported into R from a .csv file but when I carry out the aov command the results show only one degree of freedom when there should be 14. Does anyone know why? I'd really appreciate some help, the data is pasted below. To me this appears likely to be homework. If so then please read your college's rules regarding academic honesty. In any case, please read the Posting Guide. /The imported table looks ike this this:/ Order Transect Sample Abundance 1 Coleoptera1 113 2 Coleoptera1 212 3 Coleoptera1 311 4 Coleoptera1 413 5 Coleoptera1 5 6 6 Coleoptera2 118 7 Coleoptera2 218 8 Coleoptera2 316 9 Coleoptera2 421 10 Coleoptera2 511 11 Coleoptera3 119 12 Coleoptera3 216 13 Coleoptera3 3 9 14 Coleoptera3 432 15 Coleoptera3 529 /The command I am using is this:/ anova2-aov(Abundance~Transect,data=beetle) /The results come out like this:/ Df Sum Sq Mean Sq F value Pr(F) Transect 1 250.00 250.000 7.2394 0.01852 * Residuals 13 448.93 34.533 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Cheers, Sam -- View this message in context: http://r.789695.n4.nabble.com/ANOVA-from-imported-data-has-only-1-degree-of-freedom-tp3887528p3887528.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with twitteR package
Hey Guys, I just started fooling around with the twitteR package in order to get a record of all tweets from a single public account. When I run userTimeline, I get the default 20 most recent tweets just fine. However, when I specify an arbitrary number of tweets (as described in the documentation from June 14th, 2011), I get the following warning: bjaTweets-userTimeline(BeijingAir, n=50) Warning message: In mapCurlOptNames(names(.els), asNames = TRUE) : Unrecognized CURL options: n Does anyone familiar with the twitteR package know what is going on with options? Alternatively, if there are any other simple means for getting this sort of data? Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fast or space-efficient lookup?
I think you are looking for the 'data.table' package. On 09/10/2011 17:31, ivo welch wrote: Dear R experts---I am struggling with memory and speed issues. Advice would be appreciated. I have a long data set (of financial stock returns, with stock name and trading day). All three variables, stock return, id and day, are irregular. About 1.3GB in object.size (200MB on disk). now, I need to merge the main data set with some aggregate data (e.g., the SP500 market rate of return, with a day index) from the same day. this market data set is not a big data set (object.size=300K, 5 columns, 12000 rows). let's say my (dumb statistical) plan is to run one grand regression, where the individual rate of return is y and the market rate of return is x. the following should work without a problem: combined- merge( main, aggregate.data, by=day, all.x=TRUE, all.y=FALSE ) lm( stockreturn ~ marketreturn, data=combined ) alas, the merge is neither space-efficient nor fast. in fact, I run out of memory on my 16GB linux machine. my guess is that by whittling it down, I could work it (perhaps doing it in chunks, and then rbinding it), but this is painful. in perl, I would define a hash with the day as key and the market return as value, and then loop over the main data set to supplement it. is there a recommended way of doing such tasks in R, either super-fast (so that I merge many many times) or space efficient (so that I merge once and store the results)? sincerely, /iaw Ivo Welch (ivo.we...@gmail.com) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] two-dimensional Kolmogorov-Smirnow (2DKS) test
The reference below describes the utility of the two-dimensional Kolmogorov-Smirnow (2DKS) test for detecting relationships in bivariate data. If this test has been implemented in R I would love to know about it! Thanks, jake Garvey, J. E., E.A. Marschall, and R.A. Wright (1998). From star charts to stoneflies: detecting relationships in continuous bivariate data. Ecology 79(2): 442-447. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply to a matrix and insert in the middle of an array
On Oct 9, 2011, at 1:04 PM, Maas James Dr (MED) wrote: If possible I'd like to produce a function that applies a formula to a column in a matrix (essentially calculating the mse) and then inserts it between values of a an array ... confusing I know, here is an example of what I'm trying to accomplish: ## create a matrix (a - matrix(c(3,6,4,8,5,9,12,15),nrow=4)) ## get the mean of all columns (b - apply(a,2,mean)) ## calculate the mse of column 1 (c - (sd(a[,1])/sqrt(length(a[,1] ## calculate the mse of column 2 (d - (sd(a[,2])/sqrt(length(a[,2] The sd function, like the var function, returns its values based on columns, so just sd(a) fives you a vector of columns. Construting an interleaving vector of column indices can be done matrix manipulations: c(b, sd(a))[ c( matrix(1:(2*length(b)), 2, byrow=TRUE) # create a row-oriented matrix ) # the interleaving matrix is then extracted as a vector by columns ] #[1] 5.25 2.217356 10.25 4.272002 ( You could also use the apply method of returning function(x) { c(mean(x), sd(x) } from each column.) and then wrap the c() function around that: c( apply(a, 2, function(x){ c(mean(x), sd(x) ) }) ) #[1] 5.25 2.217356 10.25 4.272002 ## now create a new vector with each mean value and its corresponding ## mse right beside it (e - c(b[1],c,b[2],d)) Would anyone have any suggestions how to accomplish this using an apply statement? Thanks a bunch, J === Dr. Jim Maas University of East Anglia David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable name question
Hello, In R you just need to take the log of the whole whole data.frame where you have your pci* and store in a new variable. You do not need to use a for loop: log.df - log(your_data_frame) Regards, Carlos Ortega www.qualityexcellence.es On Sun, Oct 9, 2011 at 5:34 PM, deepankar db...@econs.umass.edu wrote: Hi All, This is surely an easy question but somehow I am not being able to get it. I am using R 2.13.2 and have a data set where variable names like this appear: pci1990, pci1991, ... , pci2009. pci1990 has data on per capita income for 1990, pci1991 has data on per capita income for 1991, and so on. I would like to create the logarithm of per capita for each of the year and could do so in STATA with the following commands: forvalues number = 1990/2009 { gen lpci`number' = log(pci`number') } What would be the corresponding set of commands in R? Thanks a lot in advance. Deepankar __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two-dimensional Kolmogorov-Smirnow (2DKS) test
On Oct 9, 2011, at 1:46 PM, beaulieu.j...@epamail.epa.gov wrote: The reference below describes the utility of the two-dimensional Kolmogorov-Smirnow (2DKS) test for detecting relationships in bivariate data. If this test has been implemented in R I would love to know about it! I have not been able to find an R implementation. (Tried RSiteSearch and RSeek with 2d kolmogorov and pursued promising links and citations. This 2005 article suggests that it is not a simple (or even a well-defined) problem: http://bura.brunel.ac.uk/bitstream/2438/1166/1/acat2007.pdf This effort at a parallelized solution reports little improvement, perhaps supporting the Lopes, el al conclusion that the Cooks algorithm is not O(n*log(n) after all. And this is a page with a link to some Matlab code: http://www.subcortex.net/research/code/testing-for-differences-in-multidimensional-distributions Thanks, jake Garvey, J. E., E.A. Marschall, and R.A. Wright (1998). From star charts to stoneflies: detecting relationships in continuous bivariate data. Ecology 79(2): 442-447. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable name question
On Oct 9, 2011, at 2:24 PM, Carlos Ortega wrote: Hello, In R you just need to take the log of the whole whole data.frame where you have your pci* and store in a new variable. You do not need to use a for loop: log.df - log(your_data_frame) Possibly with a selection for the column names in question: log.pci.df - log( your_data_frame[ , grep(^pci, names(your_data_frame) ) ] You can also cbind() your identifiers and covariates columns to the result. -- David. Regards, Carlos Ortega www.qualityexcellence.es On Sun, Oct 9, 2011 at 5:34 PM, deepankar db...@econs.umass.edu wrote: Hi All, This is surely an easy question but somehow I am not being able to get it. I am using R 2.13.2 and have a data set where variable names like this appear: pci1990, pci1991, ... , pci2009. pci1990 has data on per capita income for 1990, pci1991 has data on per capita income for 1991, and so on. I would like to create the logarithm of per capita for each of the year and could do so in STATA with the following commands: forvalues number = 1990/2009 { gen lpci`number' = log(pci`number') } What would be the corresponding set of commands in R? Thanks a lot in advance. Deepankar __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA from imported data has only 1 degree of freedom
Hi David, Thanks for your message. I can assure you this is not homework. I'm working on an ecology project and am trying to analyse the results from the fieldwork. I don't want other people to do the work for me I was just hoping someone might be able to spot where I have made a mistake, I'm still teaching myself to use R after having used SPSS for a long time. I did read the posting guidelines but couldn't see any reason not to ask for help, I'm sorry if I've got it wrong, Yours, Sam -- View this message in context: http://r.789695.n4.nabble.com/ANOVA-from-imported-data-has-only-1-degree-of-freedom-tp3887528p3887979.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplots
Hi, Another way to do that is with function barchart() in package lattice. Barchart requires a function which relates your variables with the option to specify groups. Check the examples (are under xyplot help) to apply them to your case. Regards, Carlos Ortega www.qualityexcellence.es On Thu, Oct 6, 2011 at 11:16 PM, Daniel Winkler winkler...@gmail.comwrote: Hello, I have somewhat of a weird data set and am attempting to create a barplot with it. I have 8 columns with different variables and their percentages. I have 1 column with representations of 4 different treatments the variables undergo. I also have 1 column with year the data was recorded. I want to create a bar plot that plots all 8 variables grouped by treatment and year. I've tried creating subsets of each of the variables, no luck. My variables are arranged similar to this: Vascular.plants Solid.rock Gravel.and.cobbles 72.51.0 5 67.52.0 6 67.52.5 9 59.02.0 25 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA from imported data has only 1 degree of freedom
On Oct 9, 2011, at 2:08 PM, shardman wrote: Hi David, Thanks for your message. I can assure you this is not homework. I'm working on an ecology project and am trying to analyse the results from the fieldwork. I don't want other people to do the work for me I was just hoping someone might be able to spot where I have made a mistake, I'm still teaching myself to use R after having used SPSS for a long time. I did read the posting guidelines but couldn't see any reason not to ask for help, I'm sorry if I've got it wrong, Please read the Posting Guide again. Nabble is not the standard venue where people read this list and you seem to have missed the part where the PG asked you to include relevant context in replies. It also also asks you to describe your scientific domain (which you have now done) and academic position (at least in part so the list can remain non- homework oriented). My memory from your initial posting was that you had numeric identifiers for you groups and did not wrap the group variable name in factor(), so it was treated as a continuous variable. Doesn't SPSS have some mechanism to flag a variable as discrete? Perhaps(from memory) something along these lines: aov(outcome ~ factor(group), data=dat) (Further comments from memory of earlier posting.) You also were asking why there were not 14 df in the output but there were since 1+13 = 14. Surely you were not expecting 14 df to be associated with the grouping variable when there were only 3 groups? Yours, Sam -- View this message in context: http://r.789695.n4.nabble.com/ANOVA-from-imported-data-has-only-1-degree-of-freedom-tp3887528p3887979.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple levelplot with title
Hi, Use function ltext() instead, also available in lattice package. Regards, Carlos Ortega www.qualityexcellence.es On Sun, Oct 9, 2011 at 6:30 PM, Richard O. Legendi richard.lege...@gmail.com wrote: Hi all, I'm new to R and to the mailing list, so please bear with me :-) I would like to create multiple levelplots on the same chart with a nice main title with something like this: print(levelplot(matrix(c(1,2,**3,4), 2, 2)), split=c(1, 1, 2, 1)) print(levelplot(matrix(c(1,2,**3,4), 2, 2)), split=c(2, 1, 2, 1), newpage=FALSE) I found a trick: mtext(Test, outer = TRUE, cex = 1.5) here: https://stat.ethz.ch/**pipermail/r-help/2008-July/**168163.htmlhttps://stat.ethz.ch/pipermail/r-help/2008-July/168163.html but it doesn't works for me. Could anyone please show me some pointers what should I read in order to get an insight why this isn't working as I expect? What I managed to find a workaround by using panel.text(), but I don't really like it since it requires defined x/y coordinates and not scales if the picture is resized. panel.text(x=20, y=110, Test) Thanks in advance! Richard __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bartlett's Test of Sphericity
You could also check this function I implemented awhile back: http://www.fernandohrosa.com.br/en/P/sphericity-test-for-covariance-matrices-in-r-sphericity-test/ On Fri, Jun 17, 2011 at 4:43 PM, thibault grava thibault.gr...@gmail.comwrote: Hello Dear R user, I want to conduct a Principal components analysis and I need to run two tests to check whether I can do it or not. I found how to run the KMO test, however i cannot find an R fonction for the Bartlett's test of sphericity. Does somebody know if it exists? Thanks for your help! Thibault [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Though this be randomness, yet there is structure in't. Fernando H Rosa - Statistician http://www.fernandohrosa.com.br / http://www.feferraz.net - Estatística, Matemática e Computação BankReview.com.br http://www.bankreview.com.br/ - Escolha melhor seus serviços financeiros! AprendaAlemao.net http://aprendaalemao.net/ - Seu ponto de partida para melhorar seu Alemão! @fhrosa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Connecting points over missing observations in Lattice
On Sat, Oct 8, 2011 at 5:59 AM, Allan Sikk a.s...@ucl.ac.uk wrote: Hello, I'm trying to plot connected time series of two variables in a lattice plot: xyplot(y1 + y2 ~ t, data=size, type=b) y2 has missing data for some of the observations and some points are therefore not connected. It would make theoretical sense to connect the points - is there a way of doing that? (Without filling the obserations using package 'zoo'). Break it up into lines and points and draw each separately (first three approaches) or use a custom panel (approach 4): # set up data library(zoo) library(lattice) DF - with(anscombe, data.frame(x = 1:11, y1, y2)) DF[2, 2] - DF[3, 3] - NA z - read.zoo(DF, FUN = identity) # approach 1 xyplot(cbind(na.approx(z), z), type = list(l, l, p, p), screen = 1, col = 1:2) # approach 2 xyplot(na.approx(z), screen = 1, col = 1:2) trellis.focus() panel.points(z[, 1], col = 1) panel.points(z[, 2], col = 2) trellis.unfocus() # approach 3 xyplot(z, screen = 1, type = n) trellis.focus() panel.points(na.omit(z[, 1]), col = 1, type = o) panel.points(na.omit(z[, 2]), col = 2, type = o) trellis.unfocus() # approach 4 xyplot(y1 + y2 ~ x, DF, panel = panel.superpose, panel.groups = function(x, y, subscripts, groups, ..., group.number) with(na.omit(data.frame(x, y)), panel.lines(x, y, type = o, col = group.number))) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] expression set (Bioconductor) problem
Note that exprs returns a matrix, so we can manipulate that just as we would for any other type of matrix. There is also a Bioconductor mailing list, which may be helpful. On Thu, Oct 6, 2011 at 4:56 AM, Clayton K Collings ccoll...@purdue.edu wrote: Hello R people, dim(exprs(estrogenrma) I have an expressionSet with 8 samples and 12695 features (genes) estrogenrma$estrogen present present absent absent present present absent absent estrogenrma$time.h 10 10 10 10 48 48 48 48 present - grep(present, as.character(estrogenrma$estrogen)) absent - grep(absent, as.character(estrogenrma$estrogen)) ten - grep(10, as.character(estrogenrma$time.h)) fortyeight - grep(48, as.character(estrogenrma$time.h)) present.10 - estrogenrma[, intersect(present, ten)] present.48 - estrogenrma[, intersect(present, fortyeight)] absent.10 - estrogenrma[, intersect(absent, ten)] absent.48 - estrogenrma[, intersect(absent, fortyeight)] present.10, present.48, absent.10, and absent.48 are four expression sets with two samples and 12695 features. How can I make a new 2 new expressionsets, each have 12695 features and one sample where expressionset1 = (present.10 + present.48) / 2 expressionset2 = (absent.10 + absent.48) / 2 ? Thanks, Clayton - Original Message - From: Tal Galili tal.gal...@gmail.com To: SML paral...@lafn.org Cc: r-help@r-project.org Sent: Thursday, October 6, 2011 4:09:43 AM Subject: Re: [R] Mean(s) from values in different row? One way for doing it would be to combine the columns using paste and then use tapply to get the means. For example: set.seed(32341) a1 = sample(c(a,b), 100,replace = T) a2 = sample(c(a,b), 100,replace = T) y = rnorm(100) tapply(y,paste(a1,a2), mean) Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Thu, Oct 6, 2011 at 8:40 AM, SML paral...@lafn.org wrote: Hello: Is there a way to get a mean from values stored in different rows? The data looks like this: YEAR-1, JAN, FEB, ..., DEC YEAR-2, JAN, FEB, ..., DEC YEAR-3, JAN, FEB, ..., DEC What I want is the mean(s) for just the consecutive winter months: YEAR-1.DEC, YEAR-2.JAN, YEAR-2.FEB YEAR-2.DEC, YEAR-3.JAN, YEAR-3.FEB etc. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mittag-Leffler function
Dear colleagues, Do you know any R code implemented for the Mittag-Leffler function (generalisation of the exponential function, fractional calculus) ? Thanks in advance, -- Dr Eric B Ferreira Exact Sciences Department Federal University of Alfenas Brazil [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable name question
On 10/10/11 04:53, R. Michael Weylandt wrote: SNIP Try this: for (i in 1990:2009) { varName = paste(pci, i, collapse = ) assign(varName, log(get(varName)) } SNIP I believe that ``sep= '' is needed here rather than collapse. Try: paste(junk,42,collapse=) You get [1] junk 42 with a space in it. Here paste is using the default value of sep, namely , and is not using collapse at all, since there is nothing to collapse (the value is a scalar to start with). Whereas paste(junk,42,sep=) gives [1] junk42 which is what is wanted. cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable name question
Thanks for both comments. Indeed the sep = is needed. On Sun, Oct 9, 2011 at 9:43 PM, Rolf Turner rolf.tur...@xtra.co.nz wrote: On 10/10/11 04:53, R. Michael Weylandt wrote: SNIP Try this: for (i in 1990:2009) { varName = paste(pci, i, collapse = ) assign(varName, log(get(varName)) } SNIP I believe that ``sep= '' is needed here rather than collapse. Try: paste(junk,42,collapse=) You get [1] junk 42 with a space in it. Here paste is using the default value of sep, namely , and is not using collapse at all, since there is nothing to collapse (the value is a scalar to start with). Whereas paste(junk,42,sep=) gives [1] junk42 which is what is wanted. cheers, Rolf Turner [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA from imported data has only 1 degree of freedom
shardman samuelhardman at hotmail.com writes: Hi, I'm trying to analyse some data I have imported into R from a .csv file but when I carry out the aov command the results show only one degree of freedom when there should be 14. Does anyone know why? I'd really appreciate some help, the data is pasted below. /The imported table looks ike this this:/ Order Transect Sample Abundance 1 Coleoptera1 113 [snip] 15 Coleoptera3 529 /The command I am using is this:/ anova2-aov(Abundance~Transect,data=beetle) I was going to tell you to read up on the distinction between factors and numeric variables in statistical models, but I can't immediately find a reference on line (this is bound to be in Peter Dalgaard's book or any other basic-stats-with-R text). str() will tell you whether the columns are numeric or factors. Try beetle - transform(beetle,Transect=factor(Transect)) [or ?read.csv and use colClasses explicitly to specify the classes of the columns] before running your anova. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fast or space-efficient lookup?
hi patrick. thanks. I think you are right.. combined - merge( main, aggregate.data, by=day, all.x=TRUE, all.y=FALSE ) lm( stockreturn ~ marketreturn, data=combined ) becomes something like main - as.data.table(main) setkey(main, mmdd) aggregate.data - as.data.table(aggregate.data) setkey(aggregate.data, mmdd) main - main[ aggregate.data ] this is fast and memory efficient. for me, on the plus side, data.table is a data.frame, so it can be used easily elsewhere. on the plus side, data.table is super-efficient. on the minus side, data.table often has very strange syntax. for example, main[aggregate.data] is counterintuitive. passing of functions in a slot that should be a tensor index is also strange. I would much prefer it if all non-tensor functionality was in functions, and not in arguments following [ ]. I have written this before: Given that applied end users of statistics typically use data.frame as their main container for data sets, data.frame should be as efficient and tuned as possible. cell assignments should be fast. indexing and copying should be fast. it would give R a whole lot more appeal. the functionality in data.table should be core functionality, not requiring an add-on with strange syntax. just my 5 cents...of course, the R developers are saints, putting in a lot of effort with no compense, so complaining is unfair. and thanks to Matthew Doyle for writing data.table, without which I couldn't do this AT ALL. regards, /iaw Ivo Welch (ivo.we...@gmail.com) On Sun, Oct 9, 2011 at 10:42 AM, Patrick Burns pbu...@pburns.seanet.com wrote: I think you are looking for the 'data.table' package. On 09/10/2011 17:31, ivo welch wrote: Dear R experts---I am struggling with memory and speed issues. Advice would be appreciated. I have a long data set (of financial stock returns, with stock name and trading day). All three variables, stock return, id and day, are irregular. About 1.3GB in object.size (200MB on disk). now, I need to merge the main data set with some aggregate data (e.g., the SP500 market rate of return, with a day index) from the same day. this market data set is not a big data set (object.size=300K, 5 columns, 12000 rows). let's say my (dumb statistical) plan is to run one grand regression, where the individual rate of return is y and the market rate of return is x. the following should work without a problem: combined- merge( main, aggregate.data, by=day, all.x=TRUE, all.y=FALSE ) lm( stockreturn ~ marketreturn, data=combined ) alas, the merge is neither space-efficient nor fast. in fact, I run out of memory on my 16GB linux machine. my guess is that by whittling it down, I could work it (perhaps doing it in chunks, and then rbinding it), but this is painful. in perl, I would define a hash with the day as key and the market return as value, and then loop over the main data set to supplement it. is there a recommended way of doing such tasks in R, either super-fast (so that I merge many many times) or space efficient (so that I merge once and store the results)? sincerely, /iaw Ivo Welch (ivo.we...@gmail.com) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fast or space-efficient lookup?
Hi Ivo, I'll just be brief, but regarding data.table's syntax: one person's strange is another's intuitive :-) Note also that data.table also provides a `merge.data.table` function which works pretty much as you expect merge.data.frame to. The exception being that its `by` argument will, by default, work on the intersection of keys between the two data.tables being merged, and not all columns of the same name between the two tables. Hope that's helpful. -steve On Sun, Oct 9, 2011 at 10:44 PM, ivo welch ivo.we...@gmail.com wrote: hi patrick. thanks. I think you are right.. combined - merge( main, aggregate.data, by=day, all.x=TRUE, all.y=FALSE ) lm( stockreturn ~ marketreturn, data=combined ) becomes something like main - as.data.table(main) setkey(main, mmdd) aggregate.data - as.data.table(aggregate.data) setkey(aggregate.data, mmdd) main - main[ aggregate.data ] this is fast and memory efficient. for me, on the plus side, data.table is a data.frame, so it can be used easily elsewhere. on the plus side, data.table is super-efficient. on the minus side, data.table often has very strange syntax. for example, main[aggregate.data] is counterintuitive. passing of functions in a slot that should be a tensor index is also strange. I would much prefer it if all non-tensor functionality was in functions, and not in arguments following [ ]. I have written this before: Given that applied end users of statistics typically use data.frame as their main container for data sets, data.frame should be as efficient and tuned as possible. cell assignments should be fast. indexing and copying should be fast. it would give R a whole lot more appeal. the functionality in data.table should be core functionality, not requiring an add-on with strange syntax. just my 5 cents...of course, the R developers are saints, putting in a lot of effort with no compense, so complaining is unfair. and thanks to Matthew Doyle for writing data.table, without which I couldn't do this AT ALL. regards, /iaw Ivo Welch (ivo.we...@gmail.com) On Sun, Oct 9, 2011 at 10:42 AM, Patrick Burns pbu...@pburns.seanet.com wrote: I think you are looking for the 'data.table' package. On 09/10/2011 17:31, ivo welch wrote: Dear R experts---I am struggling with memory and speed issues. Advice would be appreciated. I have a long data set (of financial stock returns, with stock name and trading day). All three variables, stock return, id and day, are irregular. About 1.3GB in object.size (200MB on disk). now, I need to merge the main data set with some aggregate data (e.g., the SP500 market rate of return, with a day index) from the same day. this market data set is not a big data set (object.size=300K, 5 columns, 12000 rows). let's say my (dumb statistical) plan is to run one grand regression, where the individual rate of return is y and the market rate of return is x. the following should work without a problem: combined- merge( main, aggregate.data, by=day, all.x=TRUE, all.y=FALSE ) lm( stockreturn ~ marketreturn, data=combined ) alas, the merge is neither space-efficient nor fast. in fact, I run out of memory on my 16GB linux machine. my guess is that by whittling it down, I could work it (perhaps doing it in chunks, and then rbinding it), but this is painful. in perl, I would define a hash with the day as key and the market return as value, and then loop over the main data set to supplement it. is there a recommended way of doing such tasks in R, either super-fast (so that I merge many many times) or space efficient (so that I merge once and store the results)? sincerely, /iaw Ivo Welch (ivo.we...@gmail.com) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HWEBayes, swapping the homozygotes genotype frequencies
Without really knowing this code, I can guess that it may be the triangular prior at work. Bayes Factors are notorious for being sensitive to the prior. Presumably, the prior somehow prefers to see the rarer allele as the BB, and not the AA homozygous genotype (this is a common assumption: that AA is the reference, and thus the major, more frequent, allele). -Aaron On Sat, Oct 8, 2011 at 7:52 PM, stat999 yumik091...@gmail.com wrote: I evaluated the Bayes factor in the k=2 allele case with a triangular prior under the null as in the example in the help file: HWETriangBF2(nvec=c(88,10,2)) [1] 0.4580336 When I swap the n11 entry and n22 entry of nvec, I received totally different Bayes factor: HWETriangBF2(nvec=c(2,10,88)) [1] 5.710153 In my understanding, defining the genotype frequency as n11 or n22 are arbitrary. So I was expecting the same value of Bayes factor. This is the case for conjugate Dirichlet prior: DirichNormHWE(nvec=c(88,10,2), c(1,1))/DirichNormSat(nvec=c(88,10,2), c(1,1,1)) [1] 1.542047 DirichNormHWE(nvec=c(2,10,88), c(1,1))/DirichNormSat(nvec=c(2,10,88), c(1,1,1)) [1] 1.542047 Could you explain why the HWETriangBF2 is returining completely different values of Bayes Factor?? -- View this message in context: http://r.789695.n4.nabble.com/HWEBayes-swapping-the-homozygotes-genotype-frequencies-tp3886313p3886313.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.