Re: [R] cubature
Who knows? You mention different machines, yet fail to hint what they are much less provide sessionInfo() output, and x is not defined in your global environment in your snippet. Please read the posting guide. I am not familiar with this library, but I would be surprised if it were capable of handling variable limits. I didn't see any hint that such would be supported in the help page. Are you sure the same code is being used on different machines? --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. JMDS jessica.d...@rmit.edu.au wrote: Hi, I am using adaptIntegrate from Cubature to do numerical integration on a double integral with a 1 x 2 vector x. Say the function is something simple to start like f(x)=x1*x2 and I wish to integrate x1 over (0,365-x2) and x2 over (0,365) f - function(x) {(x[2])*(x[1])} # x is vector int1-adaptIntegrate(f, lowerLimit = c(0, 0), upperLimit = c(365-x[2], 365)) I recieve the following error: Error in adaptIntegrate(f, lowerLimit = c(0, 0), upperLimit = c(365 - : object 'x' not found The problem is that this code works fine on other machines and I wonder If I am missing some referenced package somewhere. Any help would be greatly appreciated. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/cubature-tp4502516p4502516.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to compute within-group mean and sd?
Thanks Rui, Greg, and Michael. Your answers helped a lot! Best, Reeyarn On Sun, Mar 25, 2012 at 1:02 AM, Rui Barradas rui1...@sapo.pt wrote: Hello, You can use sql with package 'sqldf'. library(sqldf) # It needs package 'tcltk' ?sqldf tbl - data.frame(firm_id=rep(c(1,2), 10), value=rnorm(20)) sqldf(SELECT firm_id, count(*), avg(value), stdev(value) FROM tbl GROUP BY firm_id;) I've changed the name of your data.frame because 'table' is an R function. Hope this helps, Rui Barradas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multivariate function from univariate functions
I'm relatively new to R and I'm stuck. I'm trying to construct a surface to optimize from a multivariate dataset. The dataset contains the response of a system to various stimuli. I am trying to optimize the mix of stimuli to maximize the response. To do so, I've interpolated the various datasets of type response vs. stimuli and I now have an array of functions interps whose length is the length of the array of the names of the various stimuli. I've also created a vector containing names for the stimuli, vars = x.1, x.2, x.3... Anyway, each of the functions in interps depends only on one variable (obviously). I would like to construct a function, call it, surface which is essentially: surface(vars) = interps[[1]]vars[[1]] + interps[[2]]vars[[2]]+... I've tried constructing surface recursively in a for loop: surface - function(vars){0} for(i in 1:length(vars)){ surface - function(vars){surface(vars) + interps[[i]](vars[[i]])} } This results in an infinite recursion error...which I think I understand. Essentially, I'm trying to do the analog of something like i = i+1, but instead of adding integers, I want to add terms to a function. Any help is appreciated. -- View this message in context: http://r.789695.n4.nabble.com/Multivariate-function-from-univariate-functions-tp4502670p4502670.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Handling 8GB .txt file in R?
Thanks to all the suggestions. To the first individual that replied, I can't do any stuff with unix or perl. All I know is R. @KEN: I'm using Windows 7, 64 bit. @Steve: Here's the readLines output.. As we can see, lines 1-3 are empty and line 5 is empty, and there's also empty elements after line 5!. [1] [2] [3] [4] PERMNO DATETICKERPERMCO PRC VOLNUMTRDvwretdewretd [5] [6]106/01/19867952 . . . -0.000138 0.001926 [7]107/01/1986OMFGA 7952-2.56250 1000 . 0.013809 0.011061 [8]108/01/1986OMFGA 7952-2.5 12800 . -0.020744 -0.005117 [9]109/01/1986OMFGA 7952-2.5 1400 . -0.011219 -0.011588 [10]110/01/1986OMFGA 7952-2.5 8500 . 0.83 0.003651 [11]113/01/1986OMFGA 7952-2.62500 5450 . 0.002749 0.002433 - Isaac Research Assistant Quantitative Finance Faculty, UTS -- View this message in context: http://r.789695.n4.nabble.com/Handling-8GB-txt-file-in-R-tp4500971p4502706.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to install tclRequire(Iwidgets)
hi, i'm trying to make Tabbed Notebook Widget,but cannot install tclRequire(Iwidgets). I installed Activetcl but I don't know what can be the next step to execute tclRequire(Iwidgets). how can i do that? Thanks -- View this message in context: http://r.789695.n4.nabble.com/How-to-install-tclRequire-Iwidgets-tp4502775p4502775.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Weird POSIXct behaviour
Friends I have an xts that I wish to access. Browse[2] DATA.ba[[p]][2012-03-20 00:59:57,bid] bid 2012-03-20 00:59:57 1.4993 So far so good. Now putting the index into a variable: Browse[2] Time [1] 2012-03-20 00:59:57 NZDT Browse[2] DATA.ba[[p]][Time, bid] bid Where has it gone? Looking closer Browse[2] index(DATA.ba[[p]][2012-03-20 00:59:57,bid]) [1] 2012-03-20 00:59:57 NZDT Browse[2] Browse[2] Time [1] 2012-03-20 00:59:57 NZDT Browse[2] Browse[2] index(DATA.ba[[p]][2012-03-20 00:59:57,bid]) == Time [1] FALSE Browse[2] Browse[2] class(Time) [1] POSIXct POSIXt Browse[2] class(index(DATA.ba[[p]][2012-03-20 00:59:57,bid])) [1] POSIXct POSIXt So the variable 'Time' should be good to index DATA.ba[[p]]. I am hopelessly confused. cheers Worik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to install tclRequire(Iwidgets)
On Mar 25, 2012, at 08:11 , mrzung wrote: hi, i'm trying to make Tabbed Notebook Widget,but cannot install tclRequire(Iwidgets). I installed Activetcl but I don't know what can be the next step to execute tclRequire(Iwidgets). how can i do that? By adding Iwidgets to your Tcl/Tk installation. How to do that is not an R issue. (It depends on your platform, so I really don't know, but you might try looking at http://incrtcl.sourceforge.net ) Thanks -- View this message in context: http://r.789695.n4.nabble.com/How-to-install-tclRequire-Iwidgets-tp4502775p4502775.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weird POSIXct behaviour
I see NZDT going in and out of your time values. Could that have something to do with it? On Mar 25, 2012, at 10:15 , Worik R wrote: Friends I have an xts that I wish to access. Browse[2] DATA.ba[[p]][2012-03-20 00:59:57,bid] bid 2012-03-20 00:59:57 1.4993 So far so good. Now putting the index into a variable: Browse[2] Time [1] 2012-03-20 00:59:57 NZDT Browse[2] DATA.ba[[p]][Time, bid] bid Where has it gone? Looking closer Browse[2] index(DATA.ba[[p]][2012-03-20 00:59:57,bid]) [1] 2012-03-20 00:59:57 NZDT Browse[2] Browse[2] Time [1] 2012-03-20 00:59:57 NZDT Browse[2] Browse[2] index(DATA.ba[[p]][2012-03-20 00:59:57,bid]) == Time [1] FALSE Browse[2] Browse[2] class(Time) [1] POSIXct POSIXt Browse[2] class(index(DATA.ba[[p]][2012-03-20 00:59:57,bid])) [1] POSIXct POSIXt So the variable 'Time' should be good to index DATA.ba[[p]]. I am hopelessly confused. cheers Worik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Handling 8GB .txt file in R?
What you could try to do is skip the first 5 lines. After that the file seems to be 'normal'. With read.table.ffdf you could try something like # open a connection to the file con - file('yourfile', 'rt') # skip first 5 lines tmp - readLines(con, n=5) # read the remainder using read.table.ffdf ffdf - read.table.ffdf(file=con) # close connection close(con) HTH Jan On 03/25/2012 06:20 AM, iliketurtles wrote: Thanks to all the suggestions. To the first individual that replied, I can't do any stuff with unix or perl. All I know is R. @KEN: I'm using Windows 7, 64 bit. @Steve: Here's the readLines output.. As we can see, lines 1-3 are empty and line 5 is empty, and there's also empty elements after line 5!. [1] [2] [3] [4] PERMNO DATETICKERPERMCO PRC VOLNUMTRDvwretdewretd [5] [6]106/01/19867952 . . . -0.000138 0.001926 [7]107/01/1986OMFGA 7952-2.56250 1000 . 0.013809 0.011061 [8]108/01/1986OMFGA 7952-2.5 12800 . -0.020744 -0.005117 [9]109/01/1986OMFGA 7952-2.5 1400 . -0.011219 -0.011588 [10]110/01/1986OMFGA 7952-2.5 8500 . 0.83 0.003651 [11]113/01/1986OMFGA 7952-2.62500 5450 . 0.002749 0.002433 - Isaac Research Assistant Quantitative Finance Faculty, UTS -- View this message in context: http://r.789695.n4.nabble.com/Handling-8GB-txt-file-in-R-tp4500971p4502706.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to test omitted level from a multiple level factor against overall mean in regression models?
Hi there, I have a linear model with one factor having three levels. I want to check if the different levels significantly differ from the overall mean (using contr.sum). However one level (the last) is omitted in the standard procedure. To illustrate this: x - as.factor(c(1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3)) y - c(1.1,1.15,1.2,1.1,1.1,1.1,1.2,1.2,1.2,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3,3.1) test - data.frame(x,y) reg1 - lm(y~C(x,contr.sum),data=test) summary(reg1) Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1.60.06577 24.834 8.48e-15 *** C(x, contr.sum)1 -0.483330.10792 -4.479 0.00033 *** C(x, contr.sum)2 -0.483330.08936 -5.409 4.70e-05 *** Is it possible to get the effect for the third level (against the overall mean) in the table too. I figured out: reg2 - lm(y~C(relevel(x,3),contr.sum),data=test) summary(reg2) C(relevel(x, 3), contr.sum)1 0.966670.07951 12.158 8.24e-10 *** C(relevel(x, 3), contr.sum)2 -0.483330.10792 -4.479 0.00033 *** The first row now test the third level against the overall mean, but I find this approach not so convenient. Moreover, I wonder if it is meaningful at all regarding the cumulation of alpha error. Would a Bonferroni correction be sensible? Greetings and thanks in advance Jürgen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] output by(...)
Dear all, I have a question that is probably pretty stupid, so apologies in advance... I do a simple mydata.tab - by(my.data.frame, my.data.frame$category, colMeans) ...works fine, but I need to output the results to some flat file (kind of table) to work with it in Excel etc. So I am doing now capture.output(data.frame(unlist(mydata.tab)), file=mydata.txt) ...and process the result in Excel. Do you know a more pretty way to do this task? Perhaps something other than 'by()' to make a table of colMeans? Many thanks! Zdenek __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading big files in chunks-ff package
Your question is not completely clear. read.csv.ffdf automatically reads in the data in chunks. You don´t have to do anything for that. You can specify the size of the chunks using the next.rows option. Jan On 03/24/2012 09:29 PM, Mav wrote: Hello! A question about reading large CSV files I need to analyse several files with sizes larger than 3 GB. Those files have more than 10million rows (and up to 25 million) and 9 columns. Since I don´t have a large RAM memory, I think that the ff package can really help me. I am trying to use read.csv.ffdf but I have some questions: How can I read the files in several chunks…with an automatic way of calculating the number of rows to include in each chunk? (my problem is that the files have different number of rows) For instance…. I have used read.csv.ffdf(NULL, “file.csv”, sep=|, dec=.,header = T,row.names = NULL,colClasses = c(rep(integer, 3), rep(integer, 10), rep(integer, 6))) But with this way I am reading the whole fileI would prefer to read it in chunksbut I don´t know how to read it in chunks I have read the ff documentation but I am not good with R! Thanks in advance! -- View this message in context: http://r.789695.n4.nabble.com/Reading-big-files-in-chunks-ff-package-tp4502070p4502070.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] output by(...)
2012/3/25 Skála, Zdeněk (INCOMA GfK) zdenek.sk...@gfk.com: Dear all, I have a question that is probably pretty stupid, so apologies in advance... I do a simple mydata.tab - by(my.data.frame, my.data.frame$category, colMeans) ...works fine, but I need to output the results to some flat file (kind of table) to work with it in Excel etc. So I am doing now capture.output(data.frame(unlist(mydata.tab)), file=mydata.txt) ...and process the result in Excel. Do you know a more pretty way to do this task? Perhaps something other than 'by()' to make a table of colMeans? Try this: do.call(rbind, by(iris[-5], iris[[5]], colMeans)) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] output by(...)
On Mar 25, 2012, at 12:27 , Gabor Grothendieck wrote: 2012/3/25 Skála, Zdeněk (INCOMA GfK) zdenek.sk...@gfk.com: Dear all, I have a question that is probably pretty stupid, so apologies in advance... I do a simple mydata.tab - by(my.data.frame, my.data.frame$category, colMeans) ...works fine, but I need to output the results to some flat file (kind of table) to work with it in Excel etc. So I am doing now capture.output(data.frame(unlist(mydata.tab)), file=mydata.txt) ...and process the result in Excel. Do you know a more pretty way to do this task? Perhaps something other than 'by()' to make a table of colMeans? Try this: do.call(rbind, by(iris[-5], iris[[5]], colMeans)) How about aggregate(iris[-5], iris[5], mean) ? -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to test omitted level from a multiple level factor against overall mean in regression models?
2012/3/25 Biedermann, Jürgen juergen.biederm...@charite.de: Hi there, I have a linear model with one factor having three levels. I want to check if the different levels significantly differ from the overall mean (using contr.sum). However one level (the last) is omitted in the standard procedure. To illustrate this: x - as.factor(c(1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3)) y - c(1.1,1.15,1.2,1.1,1.1,1.1,1.2,1.2,1.2,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3,3.1) test - data.frame(x,y) reg1 - lm(y~C(x,contr.sum),data=test) summary(reg1) Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1.6 0.06577 24.834 8.48e-15 *** C(x, contr.sum)1 -0.48333 0.10792 -4.479 0.00033 *** C(x, contr.sum)2 -0.48333 0.08936 -5.409 4.70e-05 *** Is it possible to get the effect for the third level (against the overall mean) in the table too. I figured out: reg2 - lm(y~C(relevel(x,3),contr.sum),data=test) summary(reg2) C(relevel(x, 3), contr.sum)1 0.96667 0.07951 12.158 8.24e-10 *** C(relevel(x, 3), contr.sum)2 -0.48333 0.10792 -4.479 0.00033 *** The first row now test the third level against the overall mean, but I find this approach not so convenient. Moreover, I wonder if it is meaningful at all regarding the cumulation of alpha error. Would a Bonferroni correction be sensible? Try this: options(contrasts = c(contr.sum, contr.poly)) reg1 - lm(y~x,data=test) dummy.coef(reg1) Full coefficients are (Intercept): 1.63 x: 1 2 3 -0.483 -0.483 0.967 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Format wanted...
On 12-03-24 10:47 PM, J Toll wrote: On Sat, Mar 24, 2012 at 7:30 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: Do we have a format that always includes a decimal point and a given number of significant digits, but otherwise drops unnecessary characters? For example, if I wanted 5 digits, I'd want the following: Round to 5 digits: 1.234567 - 1.2346 Drop unnecessary zeros: 1.23 - 1.23 Force inclusion of a decimal point: 1 - 1. Duncan, Maybe sprintf() will work for you. As it's a wrapper for C sprintf, it should have its functionality. Maybe, but with which format string? Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Format wanted...
On Mar 25, 2012, at 7:14 AM, Duncan Murdoch wrote: On 12-03-24 10:47 PM, J Toll wrote: On Sat, Mar 24, 2012 at 7:30 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: Do we have a format that always includes a decimal point and a given number of significant digits, but otherwise drops unnecessary characters? For example, if I wanted 5 digits, I'd want the following: Round to 5 digits: 1.234567 - 1.2346 Drop unnecessary zeros: 1.23 - 1.23 Force inclusion of a decimal point: 1 - 1. Duncan, Maybe sprintf() will work for you. As it's a wrapper for C sprintf, it should have its functionality. Maybe, but with which format string? Duncan Murdoch I don't believe (though could be wrong), that you can do it all with one format string, but can do it conditionally based upon the input. According to the C printf documentation, the use of # forces a decimal point to be present, even if there are no trailing digits. Thus: sprintf(%#.f, 1) [1] 1. The other two values seem to be handled by signif() when applied to each value individually: signif(1.234567, 5) [1] 1.2346 signif(1.23, 5) [1] 1.23 But, not when a vector: signif(c(1.234567, 1.23), 5) [1] 1.2346 1.2300 So, wrapping that inside a function, using ifelse() to test for an integer value: signif.d - function(x, digits) { ifelse(x == round(x), sprintf(%.#f, x), signif(x, digits)) } x - c(1.234567, 1.23, 1) signif.d(x, 5) [1] 1.2346 1.23 1. signif.d(x, 6) [1] 1.23457 1.231. signif.d(x, 7) [1] 1.234567 1.23 1. Not extensively tested of course, but hopefully that might work for your needs Duncan. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to test omitted level from a multiple level factor against overall mean in regression models?
Hi Gabor, Thanks a lot for the answer. However, I'm not so much focusing on the pure effect value of the omitted factor level, but more on the statistical test if it differs significantly from 0. Do you know a way for this purpose too? Greetings Jürgen Von: Gabor Grothendieck [ggrothendi...@gmail.com] Gesendet: Sonntag, 25. März 2012 14:11 An: Biedermann, Jürgen Cc: r-help@R-project.org Betreff: Re: [R] How to test omitted level from a multiple level factor against overall mean in regression models? 2012/3/25 Biedermann, Jürgen juergen.biederm...@charite.de: Hi there, I have a linear model with one factor having three levels. I want to check if the different levels significantly differ from the overall mean (using contr.sum). However one level (the last) is omitted in the standard procedure. To illustrate this: x - as.factor(c(1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3)) y - c(1.1,1.15,1.2,1.1,1.1,1.1,1.2,1.2,1.2,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3,3.1) test - data.frame(x,y) reg1 - lm(y~C(x,contr.sum),data=test) summary(reg1) Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1.60.06577 24.834 8.48e-15 *** C(x, contr.sum)1 -0.483330.10792 -4.479 0.00033 *** C(x, contr.sum)2 -0.483330.08936 -5.409 4.70e-05 *** Is it possible to get the effect for the third level (against the overall mean) in the table too. I figured out: reg2 - lm(y~C(relevel(x,3),contr.sum),data=test) summary(reg2) C(relevel(x, 3), contr.sum)1 0.966670.07951 12.158 8.24e-10 *** C(relevel(x, 3), contr.sum)2 -0.483330.10792 -4.479 0.00033 *** The first row now test the third level against the overall mean, but I find this approach not so convenient. Moreover, I wonder if it is meaningful at all regarding the cumulation of alpha error. Would a Bonferroni correction be sensible? Try this: options(contrasts = c(contr.sum, contr.poly)) reg1 - lm(y~x,data=test) dummy.coef(reg1) Full coefficients are (Intercept): 1.63 x: 1 2 3 -0.483 -0.483 0.967 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] output by(...)
perfect, thanks a lot! :o) -z Odesílatel: Gabor Grothendieck [ggrothendi...@gmail.com] Odesláno: 25. března 2012 12:27 Komu: Skála, Zdeněk (INCOMA GfK) Kopie: r-help@r-project.org Předmět: Re: [R] output by(...) 2012/3/25 Skála, Zdeněk (INCOMA GfK) zdenek.sk...@gfk.com: Dear all, I have a question that is probably pretty stupid, so apologies in advance... I do a simple mydata.tab - by(my.data.frame, my.data.frame$category, colMeans) ...works fine, but I need to output the results to some flat file (kind of table) to work with it in Excel etc. So I am doing now capture.output(data.frame(unlist(mydata.tab)), file=mydata.txt) ...and process the result in Excel. Do you know a more pretty way to do this task? Perhaps something other than 'by()' to make a table of colMeans? Try this: do.call(rbind, by(iris[-5], iris[[5]], colMeans)) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Work -Shift Scheduling - Constraint Linear Programming
Dear Community, I've a Work -Shift Scheduling Problem I'd like to solve via constraint linear programming. Maybe something similar to http://support.sas.com/documentation/cdl/en/orcpug/63349/HTML/default/viewer.htm#orcpug_clp_sect037.htm Can anybody suggest me any package/R examples to solve this? If it's needed more details of my little problemm I can provide. Thanks in advance, u...@host.com as u...@host.com -- View this message in context: http://r.789695.n4.nabble.com/Work-Shift-Scheduling-Constraint-Linear-Programming-tp4503037p4503037.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
How do you decompose inequality in R, say by gender? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to calculate deviance on test data
Hi all, If I got a Cox model based on training set, then how should I calculate the Cox log partial likelihood for the test data? Actually I am trying to calculate the deviance on test dataset to evaluate the performance of prediction model, the equation is as follows: D = -2{L(test)[beta_train] - L(test)[0]}. It means using the beta coefficients got from training set to calculate the likelihood of test data. I know I can got log likelihood for training model, but how to get it for test data? Any package or function can do that? -- View this message in context: http://r.789695.n4.nabble.com/How-to-calculate-deviance-on-test-data-tp4503206p4503206.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SPLS does not output probabilities for predicted classes.
Hi, I was using SPLS package to do multi-class classification and would require probabilities to be output for each class. In the vignette, the documentation does say that you can output probabilities by requesting fit.type = response, however I still only get predicted classes, rather than the respective probabilities. library(spls) data(prostate) # SPLSDA with eta=0.8 3 hidden components f - splsda( prostate$x, prostate$y, K=3, eta=0.8, scale.x=FALSE ) # Prediction on the training dataset (pred.f - predict( f, type=fit )) #predict probabilities pred.f_withprobs - predict( f, type=fit, fit.type = response ) -- View this message in context: http://r.789695.n4.nabble.com/SPLS-does-not-output-probabilities-for-predicted-classes-tp4503031p4503031.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multivariate function from univariate functions
Naïve question: would a saturated multivariate model work as an interpolation function? On 3/24/12, physicistintheory physicistinthe...@gmail.com wrote: I'm relatively new to R and I'm stuck. I'm trying to construct a surface to optimize from a multivariate dataset. The dataset contains the response of a system to various stimuli. I am trying to optimize the mix of stimuli to maximize the response. To do so, I've interpolated the various datasets of type response vs. stimuli and I now have an array of functions interps whose length is the length of the array of the names of the various stimuli. I've also created a vector containing names for the stimuli, vars = x.1, x.2, x.3... Anyway, each of the functions in interps depends only on one variable (obviously). I would like to construct a function, call it, surface which is essentially: surface(vars) = interps[[1]]vars[[1]] + interps[[2]]vars[[2]]+... I've tried constructing surface recursively in a for loop: surface - function(vars){0} for(i in 1:length(vars)){ surface - function(vars){surface(vars) + interps[[i]](vars[[i]])} } This results in an infinite recursion error...which I think I understand. Essentially, I'm trying to do the analog of something like i = i+1, but instead of adding integers, I want to add terms to a function. Any help is appreciated. -- View this message in context: http://r.789695.n4.nabble.com/Multivariate-function-from-univariate-functions-tp4502670p4502670.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sent from my mobile device Ersatzistician and Chutzpahthologist I can answer any question. I don't know is an answer. I don't know yet is a better answer. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nonparametric bivariate distribution estimation and sampling
David, Great idea! It is simple yet very effective. I didn’t know that sample can be used with probability/weight. I feel that I learned a lot. Thanks a lot for the help! Heyi --- On Fri, 3/23/12, David Winsemius dwinsem...@comcast.net wrote: From: David Winsemius dwinsem...@comcast.net Subject: Re: [R] Nonparametric bivariate distribution estimation and sampling To: heyi xiao xiaohey...@yahoo.com Cc: Sarah Goslee sarah.gos...@gmail.com, r-help@r-project.org Date: Friday, March 23, 2012, 5:29 PM On Mar 23, 2012, at 3:55 PM, heyi xiao wrote: David, Thanks a lot for the specific suggestions. That’s very helpful. My question 1 is fully answered now. I guess I am not clear enough for my question 2. I would like to generate a random sample using the estimated probability density (as a result of my question 1) as the reference distribution. Say, I get a matrix of the estimated density (at some grid points) using MASS::kde2d. How can I use that result as a reference distribution to sample data from? I know it is a trivial issue for parametric distributions like bivariate normal, but what about such a nonparametric bivariate reference distribution? Any particular procedures or functions I can use? See if this works: data(geyser, package=MASS) x - cbind(geyser$duration, geyser$waiting) est - bkde2D(x, bandwidth=c(0.7, 7)) # Heh, I realized after I did this that I started with KernSmooth::bkde2D # and checked the results with MASS::kde2d # only difference appears to be name of density matrix # Construct a dataframe with X.Y information and the data from the bivariate density. # The output of bkde2D with n=50 is: #List of 3 #$ x1 : num [1:51] -0.2167 -0.0823 0.052 0.1863 0.3207 ... # $ x2 : num [1:51] 32.5 34.2 35.9 37.7 39.4 ... # $ fhat: num [1:51, 1:51] 3.05e-19 2.17e-19 3.25e-19 2.17e-19 0.00 ... # The index X.Y could be X + 51*Y and there would be a 1:1 mapping from (X,Y) to X.Y # and the fhat values would be properly arranged dfrm - expand.grid(X=1:51, Y=1:51) dfrm$fhat - c(est$fhat) #Sample randomly from X.Y with length=51*51 using the fhat values for prob. # The X.Y index never actually gets computed # but is implicit in the order of the data.frame sampfrm - dfrm[sample(51*51, 300, prob=est$fhat) , ] f2 - with(sampfrm, MASS::kde2d(X, Y, n = 50, lims = c(0, 51, 0, 51)) ) persp(f2) # Looks reasonable to my eye anyway. --David. The reason I don’t want to use sampling (with replacement, I can sample more data than I have without replacement), as this will generate lots of duplicate data points, if I want to generated bigger dataset yet my raw data do not have a big sample size. The scatter plot of the sampled data doesn’t look good this way. Heyi --- On Fri, 3/23/12, David Winsemius dwinsem...@comcast.net wrote: From: David Winsemius dwinsem...@comcast.net Subject: Re: [R] Nonparametric bivariate distribution estimation and sampling To: heyi xiao xiaohey...@yahoo.com Cc: Sarah Goslee sarah.gos...@gmail.com, r-help@r-project.org Date: Friday, March 23, 2012, 2:20 PM On Mar 23, 2012, at 1:53 PM, heyi xiao wrote: Sarah, Thanks for the response. I actually have several years of working experience with R and statistics, although may not be as good as you. that’s why I am here ;) I dug deeper into R documentations and previous R-help posts, and couldn’t found anything particular help. Again, I want to do two things: (1) estimate the probability density of this bivariate distribution using some nonparametric method (kernel, spline etc); ?MASS::kde2d ?KernSmooth::bkde2D ?ade4::s.kde2d help(package=locfit) (2) sample a big dataset from this bivariate distribution for a simulation study. What is wrong with `sample`? # to get sample of size n without replacement set.seed(42) dfrm[ sample(1:NROW(dfrm), n) , ] --David. If my questions are not clear enough show my how I can improve, or which part is not clear enough. If you have any particular suggestions/comments, you are more than welcome. Thanks! Heyi --- On Fri, 3/23/12, Sarah Goslee sarah.gos...@gmail.com wrote: From: Sarah Goslee sarah.gos...@gmail.com Subject: Re: [R] Nonparametric bivariate distribution estimation and sampling To: heyi xiao xiaohey...@yahoo.com Cc: r-help@r-project.org Date: Friday, March 23, 2012, 12:26 PM R can do all of that and more. But you'll need to put some work in reading about how to use R, about the statistical methods involved, and about how to use them to best effect. You might want, for instance, generalized additive models. Or not. If your question isn't more fully-formed than this, your best bet is almost certainly to talk to a local statistician, spend some time working with R, and then come back to the list with specific questions. Sarah On Fri,
Re: [R] Weird POSIXct behaviour
On Sun, Mar 25, 2012 at 3:15 AM, Worik R wor...@gmail.com wrote: Friends I have an xts that I wish to access. Browse[2] DATA.ba[[p]][2012-03-20 00:59:57,bid] bid 2012-03-20 00:59:57 1.4993 So far so good. Now putting the index into a variable: Browse[2] Time [1] 2012-03-20 00:59:57 NZDT Browse[2] DATA.ba[[p]][Time, bid] bid Where has it gone? It's hard to say, especially since you give no indication how you assigned the value to Time. A reproducible example, as requested in the posting guide, would be helpful. Also, what version of R, xts, and zoo are you using? Looking closer Browse[2] index(DATA.ba[[p]][2012-03-20 00:59:57,bid]) [1] 2012-03-20 00:59:57 NZDT Browse[2] Browse[2] Time [1] 2012-03-20 00:59:57 NZDT Browse[2] Browse[2] index(DATA.ba[[p]][2012-03-20 00:59:57,bid]) == Time [1] FALSE Browse[2] Browse[2] class(Time) [1] POSIXct POSIXt Browse[2] class(index(DATA.ba[[p]][2012-03-20 00:59:57,bid])) [1] POSIXct POSIXt So the variable 'Time' should be good to index DATA.ba[[p]]. I am hopelessly confused. The printed representation of the index value and class cannot be used to accurately determine equality. cheers Worik -- Joshua Ulrich | FOSS Trading: www.fosstrading.com R/Finance 2012: Applied Finance with R www.RinFinance.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get the input of a function right?
Your definition of x1, x2 and x3 requires the c() function. Try leaving some space around operators for increased readability. Remember that the that the result of operations on sets containing NA is often set to NA as well. My guess is that if you print out x it contains NA and that your function does operations on x that are included in the function result. Rob -Original Message- From: stella Sent: Thursday, March 22, 2012 10:43 AM To: r-help@r-project.org Subject: [R] How to get the input of a function right? Hi, I wrote a function with three inputs fun(x,y,z). x is a matrix of three vectors combined with cbind. e.g. x1-(1,2,3,4) x2-(2,3,4,5) x3-(3,4,5,6) x-cbind(x1,x2,x3) y is a vector e.g y-c(7,8,9) z is a real number e.g. z-2.5 If a give the function an input like this, I get 'NA' in return. If I give the function a vector e.g c(1,2,3) instead of 'x' the function works just fine. Does anyone has an idea why the function would not except 'x' as an input? Thank you very much! Stella -- View this message in context: http://r.789695.n4.nabble.com/How-to-get-the-input-of-a-function-right-tp4495879p4495879.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Robert W. Baer, Ph.D. Professor of Physiology Kirksville College of Osteopathic Medicine A. T. Still University of Health Sciences 800 W. Jefferson St. Kirksville, MO 63501 660-626-2322 FAX 660-626-2965 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ROC Analysis
Hi everybody, Pascal, your code works, but when I want to do the graph I have an error message. here is my code : x-rev(unlist(pred@cutoffs)) tpf-unlist(performance(pred, tpr)@y.values) fpf-unlist(performance(pred,fpr)@y.values) ll-length(x) p-(tpf[1:(ll-1)]-tpf[2:ll])/(fpf[1:(ll-1)]-fpf[2:ll]) plot(x,p) *Erreur dans xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ* So, when I look the lenghts of x and p, I have this : *x : numeric[1735] p : numeric[1734]* On the other hand, it's normal since I have the slope between two points on the ROC curve and so I have x points and x-1 slope values. How to get the graph?! All the best, Camille - -- Camille Leclerc, Master student Lab ESE, UMR CNRS 8079 Univ Paris-Sud Bat 362 F-91405 Orsay Cedex FRANCE -- View this message in context: http://r.789695.n4.nabble.com/ROC-Analysis-tp4469203p4503354.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multiple hexbin plots with varying greatest densities
Greetings: I have multiple hexbin plots with varying greatest densities. Right now, the data on each plot varies from 1-256 levels of density. The problem with that is that in Plot A the data are more scattered across a 2 x 2 grid whereas in Plot B the data are more concentrated in fewer cells. Hence, Plot A has a few very dense grid cells and plot B has nothing as dense as plot A. (In Plot A the legend shows that the range of densities per cell is 1-700 whereas for Plot B it is 1-453. Both plots use the full range of available colors with 256 gradients.) I want people to be able to compare the two plots and being able to see the differences in variability of the densities is one of the things I want them to see. AT PRESENT, each plot uses the full range of densities; hence, this makes it seem as if each plot has the maximum highest and minimum lowest densities. How can I do this? Thanks, Wayne Gray -- View this message in context: http://r.789695.n4.nabble.com/multiple-hexbin-plots-with-varying-greatest-densities-tp4503405p4503405.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading big files in chunks-ff package
Thank you Jan My problem is the following: For instance, I have 2 files with different number of rows (15 million and 8 million of rows each). I would like to read the first one in chunks of 5 million each. However between the first and second chunk, I would like to analyze those first 5 million of rows, write the analysis in a new csv and then proceed to read and analyze the second chunk and so on until the third chunk. With the second file, I would like to do the same...read the first chunk, analyze it and continue to read the second and analyze it. Basically my problem is that I manage to read the filesbut with so many rows...I cannot do any analyses (even filtering the rows) because of the RAM restrictions. Sorry if is still not clear. Thank you -- View this message in context: http://r.789695.n4.nabble.com/Reading-big-files-in-chunks-ff-package-tp4502070p4503642.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multivariate function from univariate functions
On Sat, Mar 24, 2012 at 08:04:49PM -0700, physicistintheory wrote: I'm relatively new to R and I'm stuck. I'm trying to construct a surface to optimize from a multivariate dataset. The dataset contains the response of a system to various stimuli. I am trying to optimize the mix of stimuli to maximize the response. To do so, I've interpolated the various datasets of type response vs. stimuli and I now have an array of functions interps whose length is the length of the array of the names of the various stimuli. I've also created a vector containing names for the stimuli, vars = x.1, x.2, x.3... Anyway, each of the functions in interps depends only on one variable (obviously). I would like to construct a function, call it, surface which is essentially: surface(vars) = interps[[1]]vars[[1]] + interps[[2]]vars[[2]]+... I've tried constructing surface recursively in a for loop: surface - function(vars){0} for(i in 1:length(vars)){ surface - function(vars){surface(vars) + interps[[i]](vars[[i]])} } Hi. Is the following close to what you are asking for? f1 - function(x) 3*x^2 + x f2 - function(x) 2*x^2 + 2*x f3 - function(x) x^2 + 3*x lstf - list(f1, f2, f3) args - c(x2, x4, x5) # input as a matrix or data frame surface - function(dat) { y - rep(0, times=nrow(dat)) for (i in seq.int(along=lstf)) { y - y + lstf[[i]](dat[, args[i]]) } y } dat - matrix(1:35, ncol=5) colnames(dat) - paste(x, 1:5, sep=) surface(dat) [1] 2140 2346 2564 2794 3036 3290 3556 # compare with an explicit formula f1(dat[, x2]) + f2(dat[, x4]) + f3(dat[, x5]) [1] 2140 2346 2564 2794 3036 3290 3556 # input as a named vector surface1 - function(x) { y - 0 for (i in seq.int(along=lstf)) { y - y + lstf[[i]](x[args[i]]) } unname(y) } vdat - 1:5 names(vdat) - paste(x, 1:5, sep=) surface1(vdat) [1] 94 # compare again unname(f1(vdat[x2]) + f2(vdat[x4]) + f3(vdat[x5])) [1] 94 Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading big files in chunks-ff package
The 'normal' way of doing that with ff is to first convert your csv file completely to a ffdf object (which stores its data on disk so shouldn't give any memory problems). You can then use the chunk routine (see ?chunk) to divide your data in the required chunks. Untested so may contain errors: ffdf - read.table.ffdf(...) chnks - chunk(from=1, to=nrow(yourffdf), by=5E6, method='seq') for (chnk in chnks) { # read data data - ffdf[chnk, ] # do your thing with the data # clean up rm(data) gc() } If you want to process your csv file directly in chunks, you could also have a look at the LaF package. Especially the process_blocks routine which does exactly that. The manual vignette (http://cran.r-project.org/web/packages/LaF/vignettes/LaF-manual.pdf) contains some examples how to do that. Jan Quoting Mav mastorvar...@gmail.com: Thank you Jan My problem is the following: For instance, I have 2 files with different number of rows (15 million and 8 million of rows each). I would like to read the first one in chunks of 5 million each. However between the first and second chunk, I would like to analyze those first 5 million of rows, write the analysis in a new csv and then proceed to read and analyze the second chunk and so on until the third chunk. With the second file, I would like to do the same...read the first chunk, analyze it and continue to read the second and analyze it. Basically my problem is that I manage to read the filesbut with so many rows...I cannot do any analyses (even filtering the rows) because of the RAM restrictions. Sorry if is still not clear. Thank you -- View this message in context: http://r.789695.n4.nabble.com/Reading-big-files-in-chunks-ff-package-tp4502070p4503642.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Handling 8GB .txt file in R?
As the other poster noted, you can just skip lines. Big matrix should work just fine, except I am not sure how the dates will be handled Here is some sample code from my stuff txtName is the file name of the file you are reading Directory is the path where you want to write the file.backed matrix filenameis the file.backed big matrix dname is a filename for describing the data sep What's your separator, comma or space? below I use tab, because my file is tab delimited replace my column names with yours PERMNO DATETICKERPERMCO PRC VOLNUMTRDvwretdewretd Your dates may be coerced in factors. Not sure how that will work. You can also try ff options(bigmemory.allow.dimnames=TRUE) D- read.big.matrix(txtName, skip = 5, backingpath = Directory, backingfile = filename, descriptorfile = dname, sep = \t, type = double, col.names = c(Id,SeriesNo,Date,Temp,Unc,Obs,Tobs) ) On Sat, Mar 24, 2012 at 9:20 PM, iliketurtles isaacm...@gmail.com wrote: Thanks to all the suggestions. To the first individual that replied, I can't do any stuff with unix or perl. All I know is R. @KEN: I'm using Windows 7, 64 bit. @Steve: Here's the readLines output.. As we can see, lines 1-3 are empty and line 5 is empty, and there's also empty elements after line 5!. [1] [2] [3] [4] PERMNO DATETICKERPERMCO PRC VOLNUMTRDvwretdewretd [5] [6]106/01/19867952 . . . -0.000138 0.001926 [7]107/01/1986OMFGA 7952-2.56250 1000 . 0.013809 0.011061 [8]108/01/1986OMFGA 7952-2.5 12800 . -0.020744 -0.005117 [9]109/01/1986OMFGA 7952-2.5 1400 . -0.011219 -0.011588 [10]110/01/1986OMFGA 7952-2.5 8500 . 0.83 0.003651 [11]113/01/1986OMFGA 7952-2.62500 5450 . 0.002749 0.002433 - Isaac Research Assistant Quantitative Finance Faculty, UTS -- View this message in context: http://r.789695.n4.nabble.com/Handling-8GB-txt-file-in-R-tp4500971p4502706.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Nonparmetric statistics with weighted data set?
I'm doing some work with a weighted data set (the CPS) and I'd like to use nonparametric techniques. Is there an R package that supports this? The np package doesn't appear to have provisions for weighted data. (Or have I missed something?) I need to perform both density estimation and regressions. thanks, bw __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple hexbin plots with varying greatest densities
On Mon, Mar 26, 2012 at 4:59 AM, Wayne Gray wgray@gmail.com wrote: Greetings: I have multiple hexbin plots with varying greatest densities. Right now, the data on each plot varies from 1-256 levels of density. The problem with that is that in Plot A the data are more scattered across a 2 x 2 grid whereas in Plot B the data are more concentrated in fewer cells. Hence, Plot A has a few very dense grid cells and plot B has nothing as dense as plot A. (In Plot A the legend shows that the range of densities per cell is 1-700 whereas for Plot B it is 1-453. Both plots use the full range of available colors with 256 gradients.) I want people to be able to compare the two plots and being able to see the differences in variability of the densities is one of the things I want them to see. AT PRESENT, each plot uses the full range of densities; hence, this makes it seem as if each plot has the maximum highest and minimum lowest densities. How can I do this? With the 'lattice' or 'centroids' styles you can use the maxarea= argument to plot.hexbin or to grid.hexagons. survey::svycoplot() is an example of this: you can either have the same scale across panels or have each panel use the full range of sizes. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nonparmetric statistics with weighted data set?
On Mon, Mar 26, 2012 at 8:45 AM, Barnet Wagman b...@norbl.com wrote: I'm doing some work with a weighted data set (the CPS) and I'd like to use nonparametric techniques. Is there an R package that supports this? The np package doesn't appear to have provisions for weighted data. (Or have I missed something?) I need to perform both density estimation and regressions. You may need to be more precise about what you mean by 'nonparametric techniques', but if you want smoothing you can probably do what you want with the survey package. svysmooth() does densities and both mean and quantile scatterplot smoothing. For regression models with smooth terms you would have to use regression splines in svyglm(), but they are almost indistinguishable from smoothing splines in practice. For other sorts of non-parametric techniques there are two-sample rank tests in the survey package. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weird POSIXct behaviour
My bad. I should be clearer about the source of my confusion. Given two identical string representations of POSIXct objects, can the two objects represent different times? Where has it gone? It's hard to say, especially since you give no indication how you assigned the value to Time. A reproducible example, as requested in the posting guide, would be helpful. Also, what version of R, xts, and zoo are you using? Browse[2] Time [1] 2012-03-20 00:59:57 NZDT Browse[2] index(DATA.ba[[p]][2012-03-20 00:59:57,bid]) [1] 2012-03-20 00:59:57 NZDT A reproducible example would be huge at this point. WHat I need is an answer to that simple question. If the answer is No then it is worth doing the work for a reproducible example. If Yes I need to learn why and better ways of pasing the objects in and out of matrices and vectors. Using R version 2.12.1 The zoo documentation (?zoo) does not include a version. Where can I find it? Worik Looking closer Browse[2] index(DATA.ba[[p]][2012-03-20 00:59:57,bid]) [1] 2012-03-20 00:59:57 NZDT Browse[2] Browse[2] Time [1] 2012-03-20 00:59:57 NZDT Browse[2] Browse[2] index(DATA.ba[[p]][2012-03-20 00:59:57,bid]) == Time [1] FALSE Browse[2] Browse[2] class(Time) [1] POSIXct POSIXt Browse[2] class(index(DATA.ba[[p]][2012-03-20 00:59:57,bid])) [1] POSIXct POSIXt So the variable 'Time' should be good to index DATA.ba[[p]]. I am hopelessly confused. The printed representation of the index value and class cannot be used to accurately determine equality. cheers Worik -- Joshua Ulrich | FOSS Trading: www.fosstrading.com R/Finance 2012: Applied Finance with R www.RinFinance.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weird POSIXct behaviour
On Sun, Mar 25, 2012 at 3:01 PM, Worik R wor...@gmail.com wrote: My bad. I should be clearer about the source of my confusion. Given two identical string representations of POSIXct objects, can the two objects represent different times? Yes. Here's an example (from my Ubuntu machine) of one way: (t1 - Sys.time()); (t2 - Sys.time())+0.001; t1 == t2 [1] 2012-03-25 15:13:48 CDT [1] 2012-03-25 15:13:48 CDT [1] FALSE options(digits.secs=3) (t1 - Sys.time()); (t2 - Sys.time())+0.001; t1 == t2 [1] 2012-03-25 15:17:36.520 CDT [1] 2012-03-25 15:17:36.523 CDT [1] FALSE Where has it gone? It's hard to say, especially since you give no indication how you assigned the value to Time. A reproducible example, as requested in the posting guide, would be helpful. Also, what version of R, xts, and zoo are you using? Browse[2] Time [1] 2012-03-20 00:59:57 NZDT Browse[2] index(DATA.ba[[p]][2012-03-20 00:59:57,bid]) [1] 2012-03-20 00:59:57 NZDT A reproducible example would be huge at this point. WHat I need is an answer to that simple question. If the answer is No then it is worth doing the work for a reproducible example. If Yes I need to learn why and better ways of pasing the objects in and out of matrices and vectors. No need for it to be huge. dput(DATA.ba[[p]][2012-03-20 00:59:57,bid]) would be a sufficient start. Using R version 2.12.1 The zoo documentation (?zoo) does not include a version. Where can I find it? From the output of sessionInfo(), or packageDescription(zoo). The output from sessionInfo() would be more helpful because it provides more information about your installation. Worik -- Joshua Ulrich | FOSS Trading: www.fosstrading.com R/Finance 2012: Applied Finance with R www.RinFinance.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] avoiding for loops
I have data that looks like this: df1 group id 1 red A 2 red B 3 red C 4 blue D 5 blue E 6 blue F I want a list of the groups containing vectors with the ids.I am avoiding subset(), as it is only recommended for interactive use. Here's what I have so far: df1 - data.frame(group=c(red, red, red, blue, blue, blue), id=c(A, B, C, D, E, F)) groups - levels(df1$group) byid - lapply(groups, ==, df1$group) groupIDX - lapply(byid, which) groupIDX [[1]] [1] 4 5 6 [[2]] [1] 1 2 3 This gives me a list of the indices for each group. I want to subset df1 based on this list. If I want just one group I can do this: df1[groupIDX[[1]],]$id [1] D E F What sort of statement should I use if I want a result like: [[1]] [1] D E F Levels: A B C D E F [[2]] [1] A B C Levels: A B C D E F So far, I've used a for loop. Can I express this with apply statements? groupIDs - list(1:length(groupIDX)) groupData- for (i in 1:length(groupIDX)) { groupIDs[[i]] - df1[groupIDX[[i]],]$id } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] avoiding for loops
df1 - data.frame(group=c(red, red, red, blue, blue, blue), id=c(A, B, C, D, E, F)) df1 - data.frame(group=c(red, red, red, blue, blue, + blue), id=c(A, B, C, D, E, F)) df1 group id 1 red A 2 red B 3 red C 4 blue D 5 blue E 6 blue F with(df1, split(id, group)) $blue [1] D E F Levels: A B C D E F $red [1] A B C Levels: A B C D E F On Sun, Mar 25, 2012 at 4:46 PM, Ed Siefker ebs15...@gmail.com wrote: I have data that looks like this: df1 group id 1 red A 2 red B 3 red C 4 blue D 5 blue E 6 blue F I want a list of the groups containing vectors with the ids.I am avoiding subset(), as it is only recommended for interactive use. Here's what I have so far: df1 - data.frame(group=c(red, red, red, blue, blue, blue), id=c(A, B, C, D, E, F)) groups - levels(df1$group) byid - lapply(groups, ==, df1$group) groupIDX - lapply(byid, which) groupIDX [[1]] [1] 4 5 6 [[2]] [1] 1 2 3 This gives me a list of the indices for each group. I want to subset df1 based on this list. If I want just one group I can do this: df1[groupIDX[[1]],]$id [1] D E F What sort of statement should I use if I want a result like: [[1]] [1] D E F Levels: A B C D E F [[2]] [1] A B C Levels: A B C D E F So far, I've used a for loop. Can I express this with apply statements? groupIDs - list(1:length(groupIDX)) groupData- for (i in 1:length(groupIDX)) { groupIDs[[i]] - df1[groupIDX[[i]],]$id } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] limma design matrix
I am trying to construct my design matrix needed in the {limma} function lmfit but am having trouble with the formula I am to specify in the function model.matrix. Namely when to I use ~0 + factors (ex 1) vs ~-1 + factors (ex 2). Any clarification on this would be greatly appreciated. Thanks in advanced. Ex 1: f - factor(targets$Target, levels=c(RNA1,RNA2,RNA3)) design - model.matrix(~0+f) [page 409, of Bioinformatics and computational biology solutions using R and Bioconductor] and Ex 2: library(ALL) pdat - pData(ALL) design - model.matrix(~-1 + factor(pdat$type) [page 237, of Bioinformatics and computational biology solutions using R and Bioconductor] -- View this message in context: http://r.789695.n4.nabble.com/limma-design-matrix-tp4503651p4503651.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 'names' attribute must be the same length as the vector
I have met into this problem when I tried to run panel regression by plm. My code: library(plm) indus - read.csv(file=full.csv,header=TRUE) industry-as.data.frame(indus) reg-lm(LnTSO2 ~ LnPGDP + LnPGDP2 + LnSOES + LnCOES + LnLIMD + LnSHOLD + LnPRIV + LnFIEs + LnEXP + LnIMP + LnLEXRE + LnVALTAX + LnIND1 + LnIND2 + LnIND3 + LnIND4 + LnIND5 + LnIND6 + LnIND7 + LnIND8 + LnIND9, data=industry, model = pooling) *Error in names(y) - namesy : 'names' attribute [1144] must be the same length as the vector [0]* My data is attached showing as a screenshot. http://r.789695.n4.nabble.com/file/n4503946/%E6%9C%AA%E5%91%BD%E5%90%8D.jpg Does anyone know why this is happening and how to fix?\ Thanks very much. -- View this message in context: http://r.789695.n4.nabble.com/names-attribute-must-be-the-same-length-as-the-vector-tp4503946p4503946.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R numerical integration
The quadinf command in library pracma still fails when mu=-2.986731 with sigma=53415.18. While Maple gives me an estimate of 0.5001701024. Maple: (for those who are interested) myf:=(mu,sigma)- evalf(Int(exp(-(x-mu)^2/2/sigma^2)/sigma/sqrt(2*Pi)/(1+exp(-x)), x=-infinity..infinity)); myf(-2.986731, 53415.18 ); 0.5001701024 These 'mu's and 'sigma's are now random starting points I generated for an optimization problem like I have mentioned. I should really investigate the behavior of this function before I ask R doing the integration. As I have mentioned that I had already realized the integral is between 0 and 1. And I have had a look at the contour plots of different mu and sigma. I am going to 'restrict' mu and sigma to certain (small) values, and still get the integral to produce a value between 0 and 1. All of your help is much appreciated. casper - ### PhD candidate in Statistics School of Mathematics, Statistics and Actuarial Science, University of Kent ### -- View this message in context: http://r.789695.n4.nabble.com/R-numerical-integration-tp4500095p4503766.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Error : DATA to MATRIX
Thanks David, your suggestion works fine.btw I have another question..If I set (n,m) little bit large, say (n=20,m=10), R cannot handle the large data frame generated through expand.grid.Is there any way to increase R-memory so that I can tackle large data.frame in R ? regards ritwik On Mar 23, 2012, at 2:53 AM, ritwi...@isical.ac.in wrote: Dear Sir/Madam, I'm getting a problem with a R-code which converts a data frame to a matrix. It first generate a (m^(n-m) * m) matrix A and then regenerate another matrix B having less dimension than A which satisfy some condition. Now I wish to assign each row of B to a vector as individual. My problem is when I set any choice of (n,m) except m=1 it works fine but setting m=1 I got the error : Error in B[i, ] : incorrect number of dimensions. Moreover if (n,m) is large (say, (20,8)) I got the error : Error: cannot allocate vector of size 3.0 Gb. I know this is due to large dimension of matrix A. How to solve this problem. My code is given below: ** n=5 m=3 R=numeric(0) # Generate all possible m-tuple ( variables having range 0 to n ) in a ( m^(n-m) * m ) matrix r = expand.grid(rep(list(0:(n-m)), m)) write.table(r,file=test.txt,row.names=FALSE,col.names=FALSE) a= read.table(file=test.txt,sep=,header=FALSE) A= data.matrix(a) #. # Generate matrix whose rowsum = n-m meet.crit = apply(A, 1, function(.row) any((sum(.row)) == n-m)) # criteron for being rowsum = n cbind(A, meet.crit) # Checking rowsum = n for each row -m B=A[meet.crit,] At this point the default behavior of the [ function is to return a vector rather than a matrix. You need to add drop=FALSE as an additional argument. Read the help page for ?[. # Generate matrix #. for(i in 1:choose(n-1,m-1)){ R=B[i,] } *** Can you please help me how to get rid of these errors. Thanking you in advance. Regards Ritwik Bhattacharya Senior Research Fellow SQC OR UNIT, KOLKATA INDIAN STATISTICAL INSTITUTE -- David Winsemius, MD West Hartford, CT This mail is scanned by Ironport This mail is scanned by Ironport __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] string substitution for argument in function
hello, I want to iterate through a list of names and use each element as an argument in a function. For instance: a = c('one','two','three') data= c() for(elem in a){data=cbind(elem = 2,data)} data elem elem elem [1,]222 instead I want 'elem' to be substituted by the string in the list. Doing it by hand would be: data = cbind('one'=2,data) data = cbind('two'=2,data) data = cbind('three'=2,data) data 'one' 'two' 'three' [1,]222 I guess that the clue would be in sub(),gsub(), paste() or similar but I didnt get it to work. I am comming from python were we woudl do something like: a = ['one','two','three'] data = {} for elem in a: data[elem] = 2 data {'three': 2, 'two': 2, 'one': 2} Thanks, Pedro - Prof. Dr. P. Martinez Arbizu DZMB-Forschungsinstitut Senckenberg Suedstrand 44 D-26382 Wilhelmshaven Germany Tel: +49 (0)4421 9475-100 Fax: +49 (0)4421 9475-111 Email: pmarti...@senckenberg.de Senckenberg Gesellschaft für Naturforschung Rechtsfähiger Verein gemäß § 22 BGB Senckenberganlage 25 60325 Frankfurt Direktorium: Prof. Dr. Dr. h.c. Volker Mosbrugger, Prof. Dr. Michael Türkay, Dr. Johannes Heilmann, Prof. Dr. Pedro Martinez Arbizu, Prof. Dr. Georg Zizka, Prof. Dr. Uwe Fritz Vorsitzender des Präsidiums: Dietmar Schmid Aufsichtsbehörde: Magistrat der Stadt Frankfurt am Main (Ordnungsamt) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Accessing more than two coefficients in a plot
I've successfully plotted (in the plot and abline code below) a simple regression of Lambda1_2 on VV1_2. I then successfully regressed Lambda1_2 on VV1_2, VV1_22 and VV1_212 producing lm2.l. When I go to plot lm2.l using abline I get the warning: 1: In abline(lm2.l, col = brown, lty = dotted, lwd = 2) : only using the first two of 4 regression coefficients Is there another function like abline that will produce a line using the constant and three coefficients from the lm2.l regression? lm.l - lm(Lambda1_2 ~ VV1_2, method = qr, model = TRUE, x = FALSE, y = FALSE, qr = TRUE) # unweighted regression lm2.l - lm(Lambda1_2 ~ VV1_2 + VV1_22 + VV1_212, method = qr, model = TRUE, x = FALSE, y = FALSE, qr = TRUE) # unweighted regression plot(VV1_2, Lambda1_2, ylim=yrange, tck=1, main=V(1) Parameters (V, V^2 V^0.5), xlab=VV1_2, ylab=Lambda Beta1_2,pch=19,col=red) {abline(lm2.l, col=brown, lty=dotted, lwd=2) abline(wlm2.l, col=gold,lty=longdash, lwd=2) points(VV1_2, Beta1_2, pch=19, col=blue) abline(lm2.b, col=black,lty=dotted, lwd=2) abline(wlm2.b, col=blue, lty=longdash, lwd=2) legend(topright, inset=.05, title=Parameters, labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors) } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Format wanted...
On 12-03-25 10:45 AM, Marc Schwartz wrote: On Mar 25, 2012, at 7:14 AM, Duncan Murdoch wrote: On 12-03-24 10:47 PM, J Toll wrote: On Sat, Mar 24, 2012 at 7:30 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: Do we have a format that always includes a decimal point and a given number of significant digits, but otherwise drops unnecessary characters? For example, if I wanted 5 digits, I'd want the following: Round to 5 digits: 1.234567 - 1.2346 Drop unnecessary zeros: 1.23 - 1.23 Force inclusion of a decimal point: 1 - 1. Duncan, Maybe sprintf() will work for you. As it's a wrapper for C sprintf, it should have its functionality. Maybe, but with which format string? Duncan Murdoch I don't believe (though could be wrong), that you can do it all with one format string, but can do it conditionally based upon the input. According to the C printf documentation, the use of # forces a decimal point to be present, even if there are no trailing digits. Thus: sprintf(%#.f, 1) [1] 1. The other two values seem to be handled by signif() when applied to each value individually: signif(1.234567, 5) [1] 1.2346 signif(1.23, 5) [1] 1.23 But, not when a vector: signif(c(1.234567, 1.23), 5) [1] 1.2346 1.2300 So, wrapping that inside a function, using ifelse() to test for an integer value: signif.d- function(x, digits) { ifelse(x == round(x), sprintf(%.#f, x), signif(x, digits)) } x- c(1.234567, 1.23, 1) signif.d(x, 5) [1] 1.2346 1.23 1. signif.d(x, 6) [1] 1.23457 1.231. signif.d(x, 7) [1] 1.234567 1.23 1. Not extensively tested of course, but hopefully that might work for your needs Duncan. Thanks. I had put together a different conditional (just do the conversion, then add a decimal point at the end if none is seen), but I was surprised that there was no standard format for this. In case anyone is interested, I want to output code in a language (GLSL) that sees 1 and 1. as different types. I want a floating point value, so I need the decimal point. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing more than two coefficients in a plot
Well, as a line in the plane is determined by 2 coefficients only, I'd guess that trying to find an R function that plots a line defined by 4 coefficients has about the same chance of success as finding a unicorn with 3 horns. You do understand that your linear model defines a hyperplane in your three covariates, do you not? Or do I misunderstand what you have requested? Cheers, Bert On Sun, Mar 25, 2012 at 2:32 PM, FJ M chicagobrownb...@hotmail.com wrote: I've successfully plotted (in the plot and abline code below) a simple regression of Lambda1_2 on VV1_2. I then successfully regressed Lambda1_2 on VV1_2, VV1_22 and VV1_212 producing lm2.l. When I go to plot lm2.l using abline I get the warning: 1: In abline(lm2.l, col = brown, lty = dotted, lwd = 2) : only using the first two of 4 regression coefficients Is there another function like abline that will produce a line using the constant and three coefficients from the lm2.l regression? lm.l - lm(Lambda1_2 ~ VV1_2, method = qr, model = TRUE, x = FALSE, y = FALSE, qr = TRUE) # unweighted regression lm2.l - lm(Lambda1_2 ~ VV1_2 + VV1_22 + VV1_212, method = qr, model = TRUE, x = FALSE, y = FALSE, qr = TRUE) # unweighted regression plot(VV1_2, Lambda1_2, ylim=yrange, tck=1, main=V(1) Parameters (V, V^2 V^0.5), xlab=VV1_2, ylab=Lambda Beta1_2,pch=19,col=red) {abline(lm2.l, col=brown, lty=dotted, lwd=2) abline(wlm2.l, col=gold,lty=longdash, lwd=2) points(VV1_2, Beta1_2, pch=19, col=blue) abline(lm2.b, col=black,lty=dotted, lwd=2) abline(wlm2.b, col=blue, lty=longdash, lwd=2) legend(topright, inset=.05, title=Parameters, labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors) } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Format wanted...
Duncan, On 25 March 2012 15:28, Duncan Murdoch murdoch.dun...@gmail.com wrote: In case anyone is interested, I want to output code in a language (GLSL) that sees 1 and 1. as different types. I want a floating point value, so I need the decimal point. GLSL, assuming it's the one that I'm looking at[1], supports implicit conversion from integer to float by appending .0 to the end. -- Sent from my mobile device Envoyait de mon portable 1. http://www.opengl.org/wiki/GLSL_Types#Implicit_conversion __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] svycoxph and test statistics
On 03/24/2012 06:00 AM, r-help-requ...@r-project.org wrote: I have been using the function 'svycoxph' in the Dr. Lumley's survey package (version 3.26) to compute coefficient estimates for Cox regression. I have noticed the p-values output are based on normal distribution (like in coxph); however in svyglm (and in other software, such as Stata or SAS) the p-values are computed via the t distribution with degrees of freedom equal to the number of PSUs minus number of strata. I am wondering why there is a difference here? I'm not aware of any theory papers that back up the use of a t-distribution. This is a Cox model, and do what my Gaussian package does is not usually the best approach. I'm far from an expert in survey work though, so I'll yeild to Thomas L for a definitive answer. In the case of mixed effects models I see the exact same leaning towards (approximate) REML vs ML; this is an area that I do know deeply and and the REML better than ML arguments from linear mixed effects models to NOT transfer over. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] string substitution for argument in function
Hi, This may help a = c('one','two','three') data.frame(eval(substitute(rbind(var,2),list(var=a ?substitute ?eval Weidong Gu On Sun, Mar 25, 2012 at 5:22 PM, Pedro Martinez pedro.marti...@senckenberg.de wrote: hello, I want to iterate through a list of names and use each element as an argument in a function. For instance: a = c('one','two','three') data= c() for(elem in a){data=cbind(elem = 2,data)} data elem elem elem [1,] 2 2 2 instead I want 'elem' to be substituted by the string in the list. Doing it by hand would be: data = cbind('one'=2,data) data = cbind('two'=2,data) data = cbind('three'=2,data) data 'one' 'two' 'three' [1,] 2 2 2 I guess that the clue would be in sub(),gsub(), paste() or similar but I didnt get it to work. I am comming from python were we woudl do something like: a = ['one','two','three'] data = {} for elem in a: data[elem] = 2 data {'three': 2, 'two': 2, 'one': 2} Thanks, Pedro - Prof. Dr. P. Martinez Arbizu DZMB-Forschungsinstitut Senckenberg Suedstrand 44 D-26382 Wilhelmshaven Germany Tel: +49 (0)4421 9475-100 Fax: +49 (0)4421 9475-111 Email: pmarti...@senckenberg.de Senckenberg Gesellschaft für Naturforschung Rechtsfähiger Verein gemäß § 22 BGB Senckenberganlage 25 60325 Frankfurt Direktorium: Prof. Dr. Dr. h.c. Volker Mosbrugger, Prof. Dr. Michael Türkay, Dr. Johannes Heilmann, Prof. Dr. Pedro Martinez Arbizu, Prof. Dr. Georg Zizka, Prof. Dr. Uwe Fritz Vorsitzender des Präsidiums: Dietmar Schmid Aufsichtsbehörde: Magistrat der Stadt Frankfurt am Main (Ordnungsamt) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] string substitution for argument in function
try this: `names-`(rep(2, 3), a) On Sun, Mar 25, 2012 at 6:22 PM, Pedro Martinez pedro.marti...@senckenberg.de wrote: hello, I want to iterate through a list of names and use each element as an argument in a function. For instance: a = c('one','two','three') data= c() for(elem in a){data=cbind(elem = 2,data)} data elem elem elem [1,] 2 2 2 instead I want 'elem' to be substituted by the string in the list. Doing it by hand would be: data = cbind('one'=2,data) data = cbind('two'=2,data) data = cbind('three'=2,data) data 'one' 'two' 'three' [1,] 2 2 2 I guess that the clue would be in sub(),gsub(), paste() or similar but I didnt get it to work. I am comming from python were we woudl do something like: a = ['one','two','three'] data = {} for elem in a: data[elem] = 2 data {'three': 2, 'two': 2, 'one': 2} Thanks, Pedro - Prof. Dr. P. Martinez Arbizu DZMB-Forschungsinstitut Senckenberg Suedstrand 44 D-26382 Wilhelmshaven Germany Tel: +49 (0)4421 9475-100 Fax: +49 (0)4421 9475-111 Email: pmarti...@senckenberg.de Senckenberg Gesellschaft für Naturforschung Rechtsfähiger Verein gemäß § 22 BGB Senckenberganlage 25 60325 Frankfurt Direktorium: Prof. Dr. Dr. h.c. Volker Mosbrugger, Prof. Dr. Michael Türkay, Dr. Johannes Heilmann, Prof. Dr. Pedro Martinez Arbizu, Prof. Dr. Georg Zizka, Prof. Dr. Uwe Fritz Vorsitzender des Präsidiums: Dietmar Schmid Aufsichtsbehörde: Magistrat der Stadt Frankfurt am Main (Ordnungsamt) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
On 26/03/12 00:18, Anjana Thampi wrote: How do you decompose inequality in R, say by gender? This has to be one of the most meaningless and ill-expressed questions I've ever seen on this list. And that's a high hurdle to clear. cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Seeming failure of options(width=60)
The following example gives output with a line length of 103 on my system. It is causing a nuisance in creating a vignette. Is there something other than e.g., options(width=60) I need to set? The Sweave FAQ suggests this should work. options(width=60) pastured - data.frame( time=c(9, 14, 21, 28, 42, 57, 63, 70, 79), yield= c(8.93, 10.8, 18.59, 22.33, 39.35, 56.11, 61.73, 64.62, 67.08)) regmod-yield ~ t1 - t2*exp(-exp(t3+t4*log(time))) huetstart-c(t1=70, t2=60, t3=0, t4=1) anlsx-try(nls(regmod, start=huetstart, trace=TRUE, data=pastured)) print(anlsx) John Nash __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Struggling with zoo and aggregate
Gabor, Thank you for your help -- it did help me a lot. However, with my data: lead_time cycler_squared fcst_date 1 6 0 5.405095e-02 07/31/2010 2 12 0 5.521620e-06 07/31/2010 3 18 0 1.565910e-04 07/31/2010 4 24 0 8.646822e-02 07/31/2010 5 30 0 1.719604e-02 07/31/2010 6 36 0 5.768113e-04 07/31/2010 7 42 0 2.501269e-06 07/31/2010 8 48 0 6.451727e-02 07/31/2010 9 612 2.857931e-01 07/31/2010 101212 1.138635e-01 07/31/2010 111812 2.225503e-02 07/31/2010 122412 1.182031e-03 07/31/2010 133012 8.841142e-04 07/31/2010 143612 1.082490e-01 07/31/2010 154212 1.502887e-05 07/31/2010 17 6 0 8.689588e-02 08/01/2010 1812 0 5.884336e-04 08/01/2010 1918 0 2.219316e-07 08/01/2010 2024 0 3.960752e-02 08/01/2010 2130 0 1.087413e-04 08/01/2010 2342 0 3.583030e-07 08/01/2010 2448 0 2.907109e-05 08/01/2010 25 612 8.693451e-02 08/01/2010 261212 3.208215e-02 08/01/2010 271812 0.00e+00 08/01/2010 28 6 0 2.962669e-02 08/02/2010 29 612 2.363506e-05 08/02/2010 301212 9.050178e-03 08/02/2010 from: z - read.zoo(q,index = 4, FUN = as.yearmon, format = %m/%d/%Y,aggregate = mean) I get: z lead_timecycle r_squared Jul 2010 25.6 5.60 0.05034771 Aug 2010 18.46154 4.615385 0.02191903 what I need is to NOT have the lead_time and cycle averaged, but only have the r_squared values averaged by lead_time and cycle. I can not seem to figure out the correct syntax to do this. I assume I use something like: q_agg-aggregate(q,by=list(q$lead_time,q$cycle),index = 4, FUN = as.yearmon, format = %m/%d/%Y) but I get errors or nonsense when I follow with... z - read.zoo(q_agg,index = 4, FUN = as.yearmon, format = %m/%d/%Y,aggregate = mean) or some variation of this. Regards, Tom On Sat, Mar 24, 2012 at 10:58 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Sat, Mar 24, 2012 at 10:44 PM, Thomas Adams thomas.ad...@noaa.gov wrote: All: I have a SQlite database where I have stored some verification data by date time (cycle Z/UTC), lead_time as well as type, duration, etc. I would like to analyze plot the data as monthly averages. I have looked at a bunch of examples which use some combination of zoo and aggregate, but I have not been able to successfully apply bits and pieces from the examples I have found. Any help is appreciated. BTW, I calculate mae (mean absolute error), mse (mean squared error), me (mean error), and other measures obtained by using the R verification package. The example below is limited to 20 records and shows lead_time, r_squared, (forecast) cycle, fcst_date (forecast date) -- the full data set is just over 2 years of daily data with 3 forecast cycles (00Z, 12Z, and 18Z) daily. From my query, below) how do I construct an appropriate data structure to analyze plot the data as monthly averages? Regards, Tom q-dbGetQuery(con,select lead_time,r_squared,cycle,fcst_date from verify_table where duration=6 limit 20) q lead_timer_squared cycle fcst_date 1 6 5.405095e-0200 07/31/2010 2 12 5.521620e-0600 07/31/2010 3 18 1.565910e-0400 07/31/2010 4 24 8.646822e-0200 07/31/2010 5 30 1.719604e-0200 07/31/2010 6 36 5.768113e-0400 07/31/2010 7 42 2.501269e-0600 07/31/2010 8 48 6.451727e-0200 07/31/2010 9 6 2.857931e-0112 07/31/2010 1012 1.138635e-0112 07/31/2010 1118 2.225503e-0212 07/31/2010 1224 1.182031e-0312 07/31/2010 1330 8.841142e-0412 07/31/2010 1436 1.082490e-0112 07/31/2010 1542 1.502887e-0512 07/31/2010 1648 NA12 07/31/2010 17 6 8.689588e-0200 08/01/2010 1812 5.884336e-0400 08/01/2010 1918 2.219316e-0700 08/01/2010 2024 3.960752e-0200 08/01/2010 Try this: Lines - lead_timer_squared cycle fcst_date 1 6 5.405095e-0200 07/31/2010 2 12 5.521620e-0600 07/31/2010 3 18 1.565910e-0400 07/31/2010 4 24 8.646822e-0200 07/31/2010 5 30 1.719604e-0200 07/31/2010 6 36 5.768113e-0400 07/31/2010 7 42 2.501269e-0600 07/31/2010 8 48 6.451727e-0200 07/31/2010 9 6 2.857931e-0112 07/31/2010 1012 1.138635e-0112 07/31/2010 1118 2.225503e-0212 07/31/2010 1224 1.182031e-0312 07/31/2010 1330 8.841142e-0412 07/31/2010 1436 1.082490e-0112 07/31/2010 library(zoo) q - read.table(text = Lines) z - read.zoo(q, index = 4, FUN =
Re: [R] Seeming failure of options(width=60)
I got an error message: options(width=60) pastured - data.frame( + time=c(9, 14, 21, 28, 42, 57, 63, 70, 79), + yield= c(8.93, 10.8, 18.59, 22.33, 39.35, + 56.11, 61.73, 64.62, 67.08)) regmod-yield ~ t1 - t2*exp(-exp(t3+t4*log(time))) huetstart-c(t1=70, t2=60, t3=0, t4=1) anlsx-try(nls(regmod, start=huetstart, trace=TRUE, data=pastured)) 13386.91 : 70 60 0 1 Error in nls(regmod, start = huetstart, trace = TRUE, data = pastured) : singular gradient print(anlsx) [1] Error in nls(regmod, start = huetstart, trace = TRUE, data = pastured) : \n singular gradient\n attr(,class) [1] try-error attr(,condition) simpleError in nls(regmod, start = huetstart, trace = TRUE, data = pastured): singular gradient and I see the number of characters of the message string is 94: nchar(anlsx) [1] 94 options(width) is not supposed to _wrap_ your character scalars; it cannot break a single string into several lines (use strwrap() instead in that case). See ?options for the exact meaning of the width option. # this may help you understand 'width' options(width = 50) rep(1, 100) options(width = 90) rep(1, 100) Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Sun, Mar 25, 2012 at 9:17 PM, John C Nash nas...@uottawa.ca wrote: The following example gives output with a line length of 103 on my system. It is causing a nuisance in creating a vignette. Is there something other than e.g., options(width=60) I need to set? The Sweave FAQ suggests this should work. options(width=60) pastured - data.frame( time=c(9, 14, 21, 28, 42, 57, 63, 70, 79), yield= c(8.93, 10.8, 18.59, 22.33, 39.35, 56.11, 61.73, 64.62, 67.08)) regmod-yield ~ t1 - t2*exp(-exp(t3+t4*log(time))) huetstart-c(t1=70, t2=60, t3=0, t4=1) anlsx-try(nls(regmod, start=huetstart, trace=TRUE, data=pastured)) print(anlsx) John Nash __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] trellis plot
Hi everyone, I am just trying to figure out how to do a xyplot where in addition to dots and lines I can change dots' colors according to an individual variable (e.g., marital disruption across time, a dummy 0/1). When I use groups specification (see below), I get two different lines for each individual based on groups, and what I want is to get one line connecting dots, and different dots' colors according to marital disruption. Any ideas about how to do that? person - rep(1:2, each=4) income - c(100, 120, 150, 200, 90, 100,120, 150) disruption - rep(c(0,1), 4) time - rep(c(1:4),2) dat - as.data.frame(cbind(person,time, income, disruption)) library(lattice) xyplot(income~time|as.factor(person),data=dat, type=c(p,g,o), col.line=black, xlab=Time, ylab=Familiar Income) # I just want to change dots' colors according to disruption, not to get two different lines: xyplot(income~time|as.factor(person),data=dat, type=c(p,g,o), col.line=black, groups=disruption, xlab=Time, ylab=Familiar Income) Thank you in advance! -- Sebastián Daza __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.