Re: [R] a very simple question
On Mar 18, 2012, at 4:43 PM, Dajiang Liu wrote: Dear All, I have a seemingly very simple question, but I just cannot figure out the answer. I attempted to run the following:a=0.1*(1:9);which(a==0.3);it returns integer(0). But obviously, the third element of a is equal to 0.3. I must have missed something. Can someone kindly explain why? Thanks a lot. It has already been explained on this list ... frequently in FAQt. Locate the FAQ and search for a question about why R doesn't think two numbers are equal. The FAQ should be part of a standard instalL on the main help page. Regards,Dajiang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
This is quite a CPu consuming process. My system got hung up for the big file I have. Within the for loop that you have suggested, can't I have a case statement for different value of nfields to be read and specify what format does the variable needs to be read? something like case # input format for 6 fields when nFields == 6 read.csv as string, string, string, numeric, numeric, numeric into dataframe1 #input format for 7 fields when nFields == 7 read.csv as string, string, string, string, numeric, numeric, numeric into dataframe2 end case # Output the two dataframes via some way of tracking the original line numbers of the input file - similar to _N_ in SAS . Dataframe1 to be outputed as it is while in dataframe2, concatenating the 3rd and the 4th strings. Could you please help with the format for the above? On Sat, Mar 17, 2012 at 4:54 AM, jim holtman jholt...@gmail.com wrote: Here is a solution that looks for the line with 7 elements and inserts the quotes: fileName - '/temp/text.txt' input - readLines(fileName) # count the fields to find 7 nFields - count.fields(fileName, sep = ',') # now fix the data for (i in which(nFields == 7)){ + # split on comma + z - strsplit(input[i], ',')[[1]] + input[i] - paste(z[1], z[2] + , paste('', z[3], ',', z[4], '', sep = '') # put on quotes + , z[5], z[6], z[7], sep = ',' + ) + } # now read in the data result - read.table(textConnection(input), sep = ',') result V1 V2 V3 V4 V5 V6 1 1968 21 0 2 Boston 1968 13 0 3 Boston 1968 18 0 4 Chicago 1967 44 0 5 Providence 1968 17 0 6 Providence 1969 48 0 7 Binky 1968 24 0 8 Chicago 1968 23 0 9 Dally 1968 7 0 10 Raleigh, North Carol 1968 25 0 11 Addy ABC-Dogs Stars-W8.1 Providence 1968 38 0 12 DEF_REQPRF/ Dartmouth 1967 31 1 13 PL 1967 38 1 14 XY PopatLal 1967 5 1 15 XY PopatLal 1967 6 8 16 XY PopatLal 1967 7 7 17 XY PopatLal 1967 9 1 18 XY PopatLal 1967 10 1 19 XY PopatLal 1967 13 1 20 XY PopatLal Boston 1967 6 1 21 XY PopatLal Boston 1967 7 11 22 XY PopatLal Boston 1967 9 2 23 XY PopatLal Boston 1967 10 3 24 XY PopatLal Boston 1967 7 2 On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal ashish.agarw...@gmail.com wrote: I have a file that is 5000 records and to edit that file is not easy. Is there any way to line 10 differently to account for changes in the third field? On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers ehl...@ucalgary.ca wrote: On 2012-03-16 10:48, Ashish Agarwal wrote: Line 10 has City and State that too separated by comma. For line 10 how can I read differently as compared to the other lines? Edit the file and put quotes around the city-state combination: Raleigh, North Carol __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sankey Diagrams in R
Dear R-list, I am trying to visualize where the dropout happens in our patient flow. We are currently using traditional flowcharts and it bothers me that I can't visualize both the percentage and the flow in one diagram. The other day I came across some interesting diagrams doing exactly what I wanted, they had both flow and percentages visualized on one diagram. Here is some nice examples apparently made with ‘sankeypython’ http://www.sankey-diagrams.com/tag/software/ It didn't take long to find a blog where a Ruser (thanks!) had posted an R script that actually produces an Sankey Diagram in R http://biologicalposteriors.blogspot.com/2010/07/sankey-diagrams-in-r.html See below for working example. My questions are, is this the most updated Sankey Diagram-script we have in the R community? Is there a better way to visualize flow and percentages in one diagram in R? Thanks, Eric ## the working example ## th, https://tonybreyal.wordpress.com/2011/11/24/source_https-sourcing-an-r-script-from-github/ sourc.https - function(url, ...) { # load package require(RCurl) # install.packages(c(RCurl), dependencies = TRUE) # parse and evaluate each .R script sapply(c(url, ...), function(u) { eval(parse(text = getURL(u, followlocation = TRUE, cainfo = system.file(CurlSSL, cacert.pem, package = RCurl))), envir = .GlobalEnv) }) } # Example from https://gist.github.com/1423501 sourc.https(https://raw.github.com/gist/1423501/55b3c6f11e4918cb6264492528b1ad01c429e581/Sankey.R;) # My example (there is another example inside Sankey.R): inputs = c(6, 144) losses = c(6,47,14,7, 7, 35, 34) unit = n = labels = c(Transfers, Referrals\n, Unable to Engage, Consultation only, Did not complete the intake, Did not engage in Treatment, Discontinued Mid-Treatment, Completed Treatment, Active in \nTreatment) SankeyR(inputs,losses,unit,labels) # Clean up my mess rm(inputs, labels, losses, SankeyR, sourc.https, unit) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] randomly subsample rows from subsets
Hello, Try text= fish fam length 1 a 71.46 2 a 71.06 3 a 62.94 4 b 79.46 5 b 52.38 6 b 56.78 7 b 92.08 8 c 96.86 9 d 98.09 10 d 17.23 11 d 98.35 12 d 82.43 13 e 83.85 14 e 33.92 15 e 23.16 16 e 31.39 17 e 57.08 18 e 27.05 19 f 62.38 20 f 83.21 21 f 18.72 22 f 84.32 23 g 15.99 24 h 40.33 25 h 92.73 26 h 59.08 27 i 29.05 fish - read.table(textConnection(text), header=TRUE) head(fish) set.seed(1) select - lapply(split(fish, fish$fam), function(x) if(NROW(x) 1) x[sample(NROW(x), 2), ]) select - select[!sapply(select, is.null)] # result as a list select # result as a data.frame do.call(rbind, select) Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/randomly-subsample-rows-from-subsets-tp4483477p4483613.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to use R script in VB?
Hello R friends, I want to use my R script in VB to make macro in Excel. I tried with RExcel but it seems to me that this package is just GUI API and I still have to run(connect) R to use the script. Google tells me there are some ways to make R script as an independent library/module/header so that I can call it in VB, C or JAVA. Where can I get detailed tutorial or manual for that? Thanks in advance, Dong-Joon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Output formatting in Latex and R
I am working on Latex and R and using following code. echo=FALSE= infile-read.table(test.txt,sep=\t) Col3 - unique(infile[,3]) LCol3 - length(Col3) for (i in 1:LCol3) { print(paste(Column, Col3[i])) print(infile[infile[,3]==Col3[i],-3]) } @ I am getting following output. 1] Column C V1 V2 V4 1 A B D 2 X T K [1] Column Z V1 V2 V4 3 Z U M 4 E V R 5 Z U M [1] Column P V1 V2 V4 6 E V R Blockquote I want to avoid numbering and columns names. I want my output as follows. Column C A B D X T K Column Z Z U M E V R Z U M Column P E V R How can i implement it? -- View this message in context: http://r.789695.n4.nabble.com/Output-formatting-in-Latex-and-R-tp4483631p4483631.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to Group Categorical data in R?
It is working fine. Thanks -- View this message in context: http://r.789695.n4.nabble.com/How-to-Group-Categorical-data-in-R-tp4477622p4483565.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Output formatting in Latex and R
Col3 - unique(Msg17$V3) LCol3 - length(Col3) for (i in 1:LCol3) { print(paste(Column, Col3[i])) write.table(Msg17[Msg17$V3==Col3[i],-3], row.names=F, col.names=F,quote=F) # If you R implementation does not accept 'F', use 'FALSE' } -- View this message in context: http://r.789695.n4.nabble.com/Output-formatting-in-Latex-and-R-tp4483631p4483863.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Output formatting in Latex and R
Great it works! But how can i put space or tab between two records? -- View this message in context: http://r.789695.n4.nabble.com/Output-formatting-in-Latex-and-R-tp4483631p4483921.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple density plot
Hi, some sample data would be *very* helpful... Kind regards, Kimmo 16.03.2012 15:44, statquant2 wrote: Hello I am looking for a special plot. Let's suppose I have *100 days and *each day I have a (1D) distribution of the same variable. I would like to plot *dates on x axis and *one distribution per date on the y axe. Do you know a way of doing it ? Cheers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a very simple question
As to the reasons, David as given you the necessary hints. In order to get around the issue, here is what I do: a - round( 0.1 * ( 1:9 ), 1 ) a [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 which( a == 0.3 ) [1] 3 Rgds, Rainer Original-Nachricht Datum: Sun, 18 Mar 2012 21:43:54 + Von: Dajiang Liu ldjst...@hotmail.com An: r-help@r-project.org Betreff: [R] a very simple question Dear All, I have a seemingly very simple question, but I just cannot figure out the answer. I attempted to run the following:a=0.1*(1:9);which(a==0.3);it returns integer(0). But obviously, the third element of a is equal to 0.3. I must have missed something. Can someone kindly explain why? Thanks a lot. Regards,Dajiang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- --- Gentoo Linux with KDE __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Singleton pattern
Thanks all of your answers and advices! They brought me some light! I'll have a look to memois package and to tracemem function in order to check if they can help me somehow, at least to understand and trace in detail how memory gets consumed. Thank you all! David 2012/3/16 Jan T. Kim jtt...@googlemail.com Using the singleton pattern in R has never occurred to me so far, as I think it applies to languages that support multiple references to one instance. R doesn't do that, at least not in ways that would be required for applying the singleton pattern as described in the GoF book, anyway. One would have to use closures and / or environments to approximate references, I suppose. When passed around as parameters, R objects don't get copied unless the called function starts modifying them, so if the primary concern is to prevent unnecessary / costly copying of bulky objects, creating the thing once and then passing it around as necessary, taking care that called functions don't change it, is perhaps good enough. Best regards, Jan On Fri, Mar 16, 2012 at 12:15:27PM -0400, Bryan Hanson wrote: Since no one else has bit, I'll take a stab. I'm an experienced R person, but I've recently been teaching myself objective-c and I've been using singletons quite a bit (and mis-using them quite a bit!). Not a computer scientist at all. You've been warned. I don't think there is a comparable concept in R. You do have a choice of S3 or S4 classes for your object orientation in R. S3 is very loose in that you can add to S3 objects readily and abuse them a lot. There really is no checking of them unless you implement it manually. S4 objects are much tighter and they are less readily modified and are self-checking (I know some will complain about this characterization but it's approximately correct). So perhaps you want an S4 object so it's less likely to get mangled, but I doubt there is a way to prevent users from copying it, which would be more along the lines of a singleton. You can google the archives for some great discussions of S3 vs S4 if that sounds interesting. Bryan *** Bryan Hanson Professor of Chemistry Biochemistry DePauw University On Mar 16, 2012, at 7:47 AM, David Cassany wrote: Hi all, I know it may not have much sense thinking about a Singleton Pattern in an R application which doesn't use any OOP facilities, however I'm curious to know if anybody faced the same issue. I've been googling but using singleton pattern as a key word leads to typical OOP languages like Java or C++ among others. So my problem is that I'd like to ensure some very big objects aren't copied again and again in some other variables. In the worst case I'll check all code by myself to ensure it but in this case the application won't force programmers to take it in consideration which is what I am really looking for. Any advice will be highly appreciated :P Thanks! -- *David Cassany Viladomat Software Developer Transmural Biote**ch S.L* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- +- Jan T. Kim ---+ | email: jtt...@gmail.com| | WWW: http://www.jtkim.dreamhosters.com/ | *-= hierarchical systems are for files, not for humans =-* __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- *David Cassany Viladomat Software Developer Transmural Biote**ch S.L* ** [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a very simple question
On Sun, Mar 18, 2012 at 09:43:54PM +, Dajiang Liu wrote: Dear All, I have a seemingly very simple question, but I just cannot figure out the answer. I attempted to run the following:a=0.1*(1:9);which(a==0.3);it returns integer(0). But obviously, the third element of a is equal to 0.3. I must have missed something. Can someone kindly explain why? Thanks a lot. Hi. A simple way to detect rounding problems is subtracting the numbers. a = 0.1*(1:4) a - 0.3 [1] -2.00e-01 -1.00e-01 5.551115e-17 1.00e-01 Use rounding to avoid it as suggested by others. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to use R script in VB?
Actually, RExcel and the StatConn DCOM connector are what you want, and this is not the right place to discuss it. Go to http://www.statconn.com/, and read the license carefully. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Dong-Joon Lim tgno3@gmail.com wrote: Hello R friends, I want to use my R script in VB to make macro in Excel. I tried with RExcel but it seems to me that this package is just GUI API and I still have to run(connect) R to use the script. Google tells me there are some ways to make R script as an independent library/module/header so that I can call it in VB, C or JAVA. Where can I get detailed tutorial or manual for that? Thanks in advance, Dong-Joon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] install R package on Unix cluster
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 18/03/12 14:40, Uwe Ligges wrote: On 18.03.2012 05:47, Lorenzo Cattarino wrote: Hi R users, Working from a PC, I am trying to install the spatstat package on a Unix cluster. I created the following PBS file to send a job array: #!/bin/bash -ue #PBS -m ae #PBS -M my email #PBS -J 1-45 #PBS -A my username #PBS -N job name #PBS -l resources #PBS -l walltime cd $PBS_O_WORKDIR module load R/2.14.1 R CMD INSTALL -l /path/to/library spatstat This command installs *a* source package from the current subdirectory spatstat. If there is no such directory containing the sources, it won't work. Either provide the gzipped tarball and give its name or use install.packages(spatstat) within an R script. It makes sense to install it outside the parallel processing into a common directory and just use it in parallel I can second that. My approach to install the package in my home directory which is then accessible from all nodes. If you are the admin of the cluster, you can install the package in the normal location and share this location so that it is accessible to all nodes. Cheers, Rainer (It is not entirely clear to me if you are really running the installation on all nodes). Uwe Ligges R CMD BATCH /path/to/folder/Script_$PBS_ARRAY_INDEX.R Obviosuly I failed to understand pag 19 of the R admin manual because I keep getting the following error message: Warning: invalid package ‘spatstat’ Error: ERROR: no packages specified I'd appreciate if you can point me in the right direction Thanks Lorenzo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D):+49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9m8SUACgkQoYgNqgF2egpLvACfbtskR/1VxaiGqs3ErCRV+gVS Q80An2WsyZ51VhmfvcpEpn8x2Zy/mexB =M+ME -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Output formatting in Latex and R
Hi, I am using follosing code and getting the below output. echo=FALSE= infile-read.table(/home/manish/Desktop/test.txt,sep=\t,header=TRUE) Col3 - unique(infile[,3]) LCol3 - length(Col3) for (i in 1:LCol3) { print(paste(Disease Risk:, Col3[i]),row.names=FALSE, col.names=FALSE,quote=FALSE) print(infile[infile[,3]==Col3[i],-3], row.names=FALSE, col.names=FALSE,quote=FALSE, width=10, justify = c(right, right, centre)) } @ http://r.789695.n4.nabble.com/file/n4484027/Screenshot.png Still [1] is written over there. How to avoid it? And i need to add tab and new line in between records. How can i implement it? Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/Output-formatting-in-Latex-and-R-tp4483631p4484027.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
How big is the file? In the example I sent I waa using 'textConnection' to reread the input. If the file is large, this can be slow. You will have better luck writing the converted data outmto a temporarynfile and reading it right back in. I am not such exactly what you are asking. You can crate output file names based on the input file name. What is it you want to do with the 'case' statement? Sent from my iPad On Mar 19, 2012, at 2:46, Ashish Agarwal ashish.agarw...@gmail.com wrote: This is quite a CPu consuming process. My system got hung up for the big file I have. Within the for loop that you have suggested, can't I have a case statement for different value of nfields to be read and specify what format does the variable needs to be read? something like case # input format for 6 fields when nFields == 6 read.csv as string, string, string, numeric, numeric, numeric into dataframe1 #input format for 7 fields when nFields == 7 read.csv as string, string, string, string, numeric, numeric, numeric into dataframe2 end case # Output the two dataframes via some way of tracking the original line numbers of the input file - similar to _N_ in SAS . Dataframe1 to be outputed as it is while in dataframe2, concatenating the 3rd and the 4th strings. Could you please help with the format for the above? On Sat, Mar 17, 2012 at 4:54 AM, jim holtman jholt...@gmail.com wrote: Here is a solution that looks for the line with 7 elements and inserts the quotes: fileName - '/temp/text.txt' input - readLines(fileName) # count the fields to find 7 nFields - count.fields(fileName, sep = ',') # now fix the data for (i in which(nFields == 7)){ + # split on comma + z - strsplit(input[i], ',')[[1]] + input[i] - paste(z[1], z[2] + , paste('', z[3], ',', z[4], '', sep = '') # put on quotes + , z[5], z[6], z[7], sep = ',' + ) + } # now read in the data result - read.table(textConnection(input), sep = ',') result V1 V2 V3 V4 V5 V6 1 1968 21 0 2 Boston 1968 13 0 3 Boston 1968 18 0 4 Chicago 1967 44 0 5 Providence 1968 17 0 6 Providence 1969 48 0 7 Binky 1968 24 0 8 Chicago 1968 23 0 9 Dally 1968 7 0 10 Raleigh, North Carol 1968 25 0 11 Addy ABC-Dogs Stars-W8.1Providence 1968 38 0 12 DEF_REQPRF/ Dartmouth 1967 31 1 13 PL 1967 38 1 14 XY PopatLal 1967 5 1 15 XY PopatLal 1967 6 8 16 XY PopatLal 1967 7 7 17 XY PopatLal 1967 9 1 18 XY PopatLal 1967 10 1 19 XY PopatLal 1967 13 1 20 XY PopatLal Boston 1967 6 1 21 XY PopatLal Boston 1967 7 11 22 XY PopatLal Boston 1967 9 2 23 XY PopatLal Boston 1967 10 3 24 XY PopatLal Boston 1967 7 2 On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal ashish.agarw...@gmail.com wrote: I have a file that is 5000 records and to edit that file is not easy. Is there any way to line 10 differently to account for changes in the third field? On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers ehl...@ucalgary.ca wrote: On 2012-03-16 10:48, Ashish Agarwal wrote: Line 10 has City and State that too separated by comma. For line 10 how can I read differently as compared to the other lines? Edit the file and put quotes around the city-state combination: Raleigh, North Carol __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] Help with dlply, loop and column names
I'm not sure I follow exactly what group of regression models you want to create, but a good first step might be to use reshape so that each party's vote share goes on a different row and the vote shares are all in the same column. Then you can use plyr grouping on tipo and party to make your models... library(reshape2) library(plyr) ast - melt(asturias.gen2011, id=c(municipio, total, tipo), variable.name=party, value.name=vote) dlply(ast, .(party, tipo), lm, formula=vote~total) or along those lines. This way you don't have to mess around with pasting together expressions to eval and so on... Peter On Sun, Mar 18, 2012 at 12:59 PM, Igor Sosa Mayor joseleopoldo1...@gmail.com wrote: Hi, I have a dataframe basically like this: head(asturias.gen2011[,c(1,4,9:14)]) municipio total upyd psoeppiu factipo 440 Allande 2031 1.44 31.10 39.75 4.01 21.62 1000-1 443Aller 12582 1.37 33.30 37.09 15.53 10.35 1-5 567 Amieva 805 1.48 32.69 37.36 6.15 20.16 1000 849 Avilés 84202 4.15 30.26 35.49 14.37 11.80 5 1087 Belmonte de Miranda 1751 1.66 38.42 35.74 7.22 14.81 1000-1 1260 Bimenes 1894 0.98 34.28 26.87 23.30 10.98 1000-1 I want to do the following: 1. for every party (psoe, pp, etc.) I want to create a variable like this: upyd.lm.tipos, psoe.lm.tipos, etc. 2. I want to store in this variable a regression (psoe~total), but split up by tipo. I have the main idea of using dlply from the plyr vignette. But when I try to put all this in a loop I'm coming into trouble and I'm at the moment really confused how to solve this problem: I have the following function: elecregtipos - function(y){ z-dlply(asturias.gen2011, .(tipo), function(x) lm(x[,y]~x$edad.media)) # rsq-function(x) summary(x)$r.squared # bcoefs-ldply(z, function(x) c(coef(x), rsquare=rsq(x))) # return (bcoefs) return(z) } And I try to call it with: for (y in c(upyd, psoe, pp, fac, iu)) { eval(parse(text=paste(y,'.lm.tipos', '- elecregtipos(',y,')',sep=''))) } At the moment I'm getting the error: Error en `[.data.frame`(x, , y) : objeto 'upyd' no encontrado If I call simply: elecregtipos(upyd) it works perfectly. The problem is the loop, column names, etc., but I'm really confused what I still could try, because I have already tried any possibility. Any hint? Thanks in advance. -- :: Igor Sosa Mayor :: joseleopoldo1...@gmail.com :: :: GnuPG: 0x1C1E2890 :: http://www.gnupg.org/ :: :: jabberid: rogorido :::: __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem reading mixed CSV file
Hi This is quite a CPu consuming process. My system got hung up for the big file I have. Within the for loop that you have suggested, can't I have a case statement for different value of nfields to be read and specify what format does the variable needs to be read? something like case # input format for 6 fields when nFields == 6 read.csv as string, string, string, numeric, numeric, numeric into dataframe1 #input format for 7 fields when nFields == 7 read.csv as string, string, string, string, numeric, numeric, numeric into dataframe2 end case # Output the two dataframes via some way of tracking the original line numbers of the input file - similar to _N_ in SAS . Dataframe1 to be outputed as it is while in dataframe2, concatenating the 3rd and the 4th strings. Could you please help with the format for the above? I would follow Jims suggestion, nFields - count.fields(fileName, sep = ',') count fields and read chunks to different files by using scan with modifying skip and nlines parameters. However if there is only few lines which differ it would be better to correct those few lines manually in some suitable editor. Elaborating omnipotent function for reading any kind of corrupted/nonstandard files seems to me suited only if you expect to read such files many times. Regards Petr On Sat, Mar 17, 2012 at 4:54 AM, jim holtman jholt...@gmail.com wrote: Here is a solution that looks for the line with 7 elements and inserts the quotes: fileName - '/temp/text.txt' input - readLines(fileName) # count the fields to find 7 nFields - count.fields(fileName, sep = ',') # now fix the data for (i in which(nFields == 7)){ + # split on comma + z - strsplit(input[i], ',')[[1]] + input[i] - paste(z[1], z[2] + , paste('', z[3], ',', z[4], '', sep = '') # put on quotes + , z[5], z[6], z[7], sep = ',' + ) + } # now read in the data result - read.table(textConnection(input), sep = ',') result V1 V2 V3 V4 V5 V6 1 1968 21 0 2 Boston 1968 13 0 3 Boston 1968 18 0 4 Chicago 1967 44 0 5 Providence 1968 17 0 6 Providence 1969 48 0 7 Binky 1968 24 0 8 Chicago 1968 23 0 9 Dally 1968 7 0 10 Raleigh, North Carol 1968 25 0 11 Addy ABC-Dogs Stars-W8.1Providence 1968 38 0 12 DEF_REQPRF/ Dartmouth 1967 31 1 13 PL 1967 38 1 14 XY PopatLal 1967 5 1 15 XY PopatLal 1967 6 8 16 XY PopatLal 1967 7 7 17 XY PopatLal 1967 9 1 18 XY PopatLal 1967 10 1 19 XY PopatLal 1967 13 1 20 XY PopatLal Boston 1967 6 1 21 XY PopatLal Boston 1967 7 11 22 XY PopatLal Boston 1967 9 2 23 XY PopatLal Boston 1967 10 3 24 XY PopatLal Boston 1967 7 2 On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal ashish.agarw...@gmail.com wrote: I have a file that is 5000 records and to edit that file is not easy. Is there any way to line 10 differently to account for changes in the third field? On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers ehl...@ucalgary.ca wrote: On 2012-03-16 10:48, Ashish Agarwal wrote: Line 10 has City and State that too separated by comma. For line 10 how can I read differently as compared to the other lines? Edit the file and put quotes around the city-state combination: Raleigh, North Carol __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read
Re: [R] Help with dlply, loop and column names
Peter: many thanks for your help. This is basically what I wanted to do and in a much more elegant way. On Mon, Mar 19, 2012 at 03:13:40AM -0700, Peter Meilstrup wrote: I'm not sure I follow exactly what group of regression models you want to create, but a good first step might be to use reshape so that each party's vote share goes on a different row and the vote shares are all in the same column. Then you can use plyr grouping on tipo and party to make your models... library(reshape2) library(plyr) ast - melt(asturias.gen2011, id=c(municipio, total, tipo), variable.name=party, value.name=vote) dlply(ast, .(party, tipo), lm, formula=vote~total) or along those lines. This way you don't have to mess around with pasting together expressions to eval and so on... Peter On Sun, Mar 18, 2012 at 12:59 PM, Igor Sosa Mayor joseleopoldo1...@gmail.com wrote: Hi, I have a dataframe basically like this: head(asturias.gen2011[,c(1,4,9:14)]) municipio total upyd psoeppiu factipo 440 Allande 2031 1.44 31.10 39.75 4.01 21.62 1000-1 443Aller 12582 1.37 33.30 37.09 15.53 10.35 1-5 567 Amieva 805 1.48 32.69 37.36 6.15 20.16 1000 849 Avilés 84202 4.15 30.26 35.49 14.37 11.80 5 1087 Belmonte de Miranda 1751 1.66 38.42 35.74 7.22 14.81 1000-1 1260 Bimenes 1894 0.98 34.28 26.87 23.30 10.98 1000-1 I want to do the following: 1. for every party (psoe, pp, etc.) I want to create a variable like this: upyd.lm.tipos, psoe.lm.tipos, etc. 2. I want to store in this variable a regression (psoe~total), but split up by tipo. I have the main idea of using dlply from the plyr vignette. But when I try to put all this in a loop I'm coming into trouble and I'm at the moment really confused how to solve this problem: I have the following function: elecregtipos - function(y){ z-dlply(asturias.gen2011, .(tipo), function(x) lm(x[,y]~x$edad.media)) # rsq-function(x) summary(x)$r.squared # bcoefs-ldply(z, function(x) c(coef(x), rsquare=rsq(x))) # return (bcoefs) return(z) } And I try to call it with: for (y in c(upyd, psoe, pp, fac, iu)) { eval(parse(text=paste(y,'.lm.tipos', '- elecregtipos(',y,')',sep=''))) } At the moment I'm getting the error: Error en `[.data.frame`(x, , y) : objeto 'upyd' no encontrado If I call simply: elecregtipos(upyd) it works perfectly. The problem is the loop, column names, etc., but I'm really confused what I still could try, because I have already tried any possibility. Any hint? Thanks in advance. -- :: Igor Sosa Mayor :: joseleopoldo1...@gmail.com :: :: GnuPG: 0x1C1E2890 :: http://www.gnupg.org/ :: :: jabberid: rogorido :::: __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- :: Igor Sosa Mayor :: joseleopoldo1...@gmail.com :: :: GnuPG: 0x1C1E2890 :: http://www.gnupg.org/ :: :: jabberid: rogorido :::: pgpGk3gWFBxxV.pgp Description: PGP signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot method for rasters and layout
Hi list, I thought I was used to layouts, but today I am facing a problem I cannot overcome : On my R installation (Windows 7 Pro, SP1, R version 2.13.0, daily update of packages), I am not able to put raster plots in user defined layouts : layout.matrix-matrix(c(1,2,3,4,5,5),2,3) layout(mat=layout.matrix) layout.show(5) works fine, I get the correct frames in the correct place. But, using 5 graphs (that all plot OK if plotted alone) : plot(raster1) plot(raster2) plot(raster3) plot(raster4) plot(any.other.graph.meant.to.be.in.frame.5) Plots giving the same layout as : par(mfrow=c(2,3)) plot(raster1) plot(raster2) plot(raster3) plot(raster4) plot(any.other.graph.supposed.to.fall.in.frame.5) i.e. 3 rasterplots on the first row followed by the fourth raster and the fifth graph, all of same size, the [2,3] frame being empty. I suppose this is due to a conflict between layout and the bigplot/smallplot approach used by the imageplot() function, from which the plot method for rasters is said to be inspired. But I am not sure and I cannot work it out. Do I miss something, and can anybody help ? All the best to all of you, thanks as always for all the work done here ! Olivier -- Olivier ETERRADOSSI Maître-Assistant, HDR Ecole des Mines dAlès (CMGD, site de Pau) Pôle Matériaux Polymères Avancés (MPA) Hélioparc, 2 av. P. Angot, F-64053 PAU CEDEX 9 Tel : 05 59 30 90 35 (direct) - 05 59 30 54 25 (std) Fax : 05 59 30 63 68 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Call for chapters: Data Mining Applications with R
Book title: Data Mining Applications with R URL: http://www.rdatamining.com/books/book2. Publisher: Elsevier Chapter proposal due date: 30 April 2012 Introduction R is one of the most widely used data mining tools in scientific and business applications, among dozens of commercial and open-source data mining software. It is free and expandable with over 3,600 packages. However, it is not easy for beginners to find appropriate packages or functions to use for their data mining tasks. It is more difficult, even for experienced users, to work out the optimal combination of multiple packages or functions to solve their business problems and the best way to use them in the data mining process of their applications. This book aims to facilitate using R in data mining applications by presenting real-world applications in various areas. Objective - This book will present around 20 applications on data mining with R. Each application is to be presented as one chapter, covering its background, business problems, data extraction and exploration, data preprocessing, modeling, model evaluation, findings and model deployment. In this way, it will help readers to learn to solve real-world problems with a set of data mining techniques and then apply the techniques and methodologies in their own data mining projects. Code examples and sample data will be provided, so that readers can easily learn the techniques by running the codes by themselves. Target audience --- The audience includes data miners, analysts and R users from industry, and university students and researchers who are interested in data mining with R. Topics -- data mining applications with R in, but not limited to, the following areas * Finance * Retail * Insurance * Telecommunications * Government * Crime Homeland Security * Stock Market * Social Welfare * Social Media * Sports * Medicine and Health * Education * Patent * Transport * Real Estate * Meteorology * Bioinformatics * Sentiment Analysis * Spatial Data Analysis * Scientific Computing Submission procedure Data miners and analysts are invited to submit by April 30, 2012, a 1-2 page manuscript proposal clearly explaining the mission and concerns of the proposed chapter. Authors of accepted proposals will be notified by May 15, 2012 about the status of their proposals. Full chapters are due by July 31, 2012. All submitted chapters will be reviewed by 2 or 3 reviewers. Please submit your chapter proposals and full chapters at https://www.easychair.org/account/signin.cgi?conf=dmar2013. Details about the book are available at http://www.rdatamining.com/books/book2. Book editors and contacts - Dr. Yanchang Zhao RDataMining.com, Australia yanchangzhao at gmail dot com Mr. Yonghua Cen Univ. of Technology, Sydney, Australia justin.cen at gmail dot com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot method for rasters and layout
Is this with SDI in Windows? I'd update to a recent version of R, and please provide reproducible code next time. It could be the same as this issue, now long ago fixed: https://stat.ethz.ch/pipermail/r-devel/2011-February/059906.html Cheers, Mike. On Mon, Mar 19, 2012 at 9:57 PM, Olivier Eterradossi olivier.eterrado...@mines-ales.fr wrote: Hi list, I thought I was used to layouts, but today I am facing a problem I cannot overcome : On my R installation (Windows 7 Pro, SP1, R version 2.13.0, daily update of packages), I am not able to put raster plots in user defined layouts : layout.matrix-matrix(c(1,2,3,4,5,5),2,3) layout(mat=layout.matrix) layout.show(5) works fine, I get the correct frames in the correct place. But, using 5 graphs (that all plot OK if plotted alone) : plot(raster1) plot(raster2) plot(raster3) plot(raster4) plot(any.other.graph.meant.to.be.in.frame.5) Plots giving the same layout as : par(mfrow=c(2,3)) plot(raster1) plot(raster2) plot(raster3) plot(raster4) plot(any.other.graph.supposed.to.fall.in.frame.5) i.e. 3 rasterplots on the first row followed by the fourth raster and the fifth graph, all of same size, the [2,3] frame being empty. I suppose this is due to a conflict between layout and the bigplot/smallplot approach used by the imageplot() function, from which the plot method for rasters is said to be inspired. But I am not sure and I cannot work it out. Do I miss something, and can anybody help ? All the best to all of you, thanks as always for all the work done here ! Olivier -- Olivier ETERRADOSSI Maître-Assistant, HDR Ecole des Mines d’Alès (CMGD, site de Pau) Pôle Matériaux Polymères Avancés (MPA) Hélioparc, 2 av. P. Angot, F-64053 PAU CEDEX 9 Tel : 05 59 30 90 35 (direct) - 05 59 30 54 25 (std) Fax : 05 59 30 63 68 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael Sumner Institute for Marine and Antarctic Studies, University of Tasmania Hobart, Australia e-mail: mdsum...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot method for rasters and layout
Mike, It is with SDI in Windows. Here is reproducible code (by the way, I just add the opening of any raster and plot it four times). library (raster) b- brick(system.file(external/rlogo.grd, package=raster)) layout.matrix-matrix(c(1,2,3,4,5,5),2,3,byrow=TRUE) layout(mat=layout.matrix) layout.show(5) plot(b[[1]]) plot(b[[1]]) plot(b[[1]]) plot(b[[1]]) Running this I get the four R logos in quadrants [,1:3] and [2,1], not in [1:2,1:2] I've read the thread you suggest, I'm afraid I don't fully understand it. Last tiume I updated R December 2011 from CRAN, so it seems that it is posterior to the fix. I'll update to 2.14 asap. Thank you. Olivier -Message d'origine- De : Michael Sumner [mailto:mdsum...@gmail.com] Envoyé : lundi 19 mars 2012 13:23 À : Olivier Eterradossi Cc : r-help@r-project.org Objet : Re: [R] plot method for rasters and layout Is this with SDI in Windows? I'd update to a recent version of R, and please provide reproducible code next time. It could be the same as this issue, now long ago fixed: https://stat.ethz.ch/pipermail/r-devel/2011-February/059906.html Cheers, Mike. On Mon, Mar 19, 2012 at 9:57 PM, Olivier Eterradossi olivier.eterrado...@mines-ales.fr wrote: Hi list, I thought I was used to layouts, but today I am facing a problem I cannot overcome : On my R installation (Windows 7 Pro, SP1, R version 2.13.0, daily update of packages), I am not able to put raster plots in user defined layouts : layout.matrix-matrix(c(1,2,3,4,5,5),2,3) layout(mat=layout.matrix) layout.show(5) works fine, I get the correct frames in the correct place. But, using 5 graphs (that all plot OK if plotted alone) : plot(raster1) plot(raster2) plot(raster3) plot(raster4) plot(any.other.graph.meant.to.be.in.frame.5) Plots giving the same layout as : par(mfrow=c(2,3)) plot(raster1) plot(raster2) plot(raster3) plot(raster4) plot(any.other.graph.supposed.to.fall.in.frame.5) i.e. 3 rasterplots on the first row followed by the fourth raster and the fifth graph, all of same size, the [2,3] frame being empty. I suppose this is due to a conflict between layout and the bigplot/smallplot approach used by the imageplot() function, from which the plot method for rasters is said to be inspired. But I am not sure and I cannot work it out. Do I miss something, and can anybody help ? All the best to all of you, thanks as always for all the work done here ! Olivier -- Olivier ETERRADOSSI Maître-Assistant, HDR Ecole des Mines d’Alès (CMGD, site de Pau) Pôle Matériaux Polymères Avancés (MPA) Hélioparc, 2 av. P. Angot, F-64053 PAU CEDEX 9 Tel : 05 59 30 90 35 (direct) - 05 59 30 54 25 (std) Fax : 05 59 30 63 68 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael Sumner Institute for Marine and Antarctic Studies, University of Tasmania Hobart, Australia e-mail: mdsum...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assign a value to an element
I am not sure that I understand but does something like this do what you want? ec-1:10 vec[vec==4] - 100 vec - 1:10 vec[ vec==4 | vec==8] - 100 vec - 1:10 aa - 50 vec[vec==4] - aa John Kane Kingston ON Canada -Original Message- From: marc_...@yahoo.fr Sent: Sun, 18 Mar 2012 18:24:34 + (GMT) To: r-help@r-project.org Subject: [R] assign a value to an element Assign can be used to set a value to a variable that has name as a value of another variable. Example: name-essai assign(name, plouf) essai [1] plouf OK. But how to do the same when it is only an element of a vector, data frame and so on that must be changed. vec-1:10 vec [1] 1 2 3 4 5 6 7 8 9 10 vec[4] [1] 4 name-vec[4] assign(name, 100) vec [1] 1 2 3 4 5 6 7 8 9 10 The reason is probably here (from help of assign): assign does not dispatch assignment methods, so it cannot be used to set elements of vectors, names, attributes, etc. I have found this solution: eval(parse(text=paste(name, -100, sep=))) vec [1] 1 2 3 100 5 6 7 8 9 10 Is-it the only way ? It is not very elegant ! Thanks a lot Marc __ Marc Girondot, Pr Laboratoire Ecologie, Systimatique et Evolution Equipe de Conservation des Populations et des Communautis CNRS, AgroParisTech et Universiti Paris-Sud 11 , UMR 8079 Bbtiment 362 91405 Orsay Cedex, France Tel: 33 1 (0)1.69.15.72.30 Fax: 33 1 (0)1.69.15.73.53 e-mail: marc.giron...@u-psud.fr Web: http://www.ese.u-psud.fr/epc/conservation/Marc.html FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks orcas on your desktop! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reshape data frame with dcast and melt
Hello, I implemented two functions reshape_long and reshape_wide (see full working example below) to reshape data frames. I created several small examples and the two functions seemed to work properly. However, using the reshape_wide function on my real data sets (about 200.000 to 300.000 rows) failed. What happens is set all values for X, Y and Z were set to 1. The structure of my real data looks exactly the same as the small example below. After working on it for 2 days I think the problem is that the primary key (test_name, group_name and id) is only unique in the wide form. After applying the reshape_long function the primary key is not longer unique. I was wondering if anyone can tell me whether the step from d1 - reshape_wide - d2 can work at all because of the non uniqueness of d1. library(reshape2) library(taRifx) reshape_long - function(data, ids) { # Bring data into long form data_long - melt(data, id.vars = ids, variable.name=Data_Points, value.name=value) data_long$value - as.numeric(data_long$value) # Remove rows were analyte value is NA data_long - data_long[!is.na(data_long$value), ] # Resort data formula_sort - as.formula(paste(~, paste(ids, collapse=+))) data_long - sort(data_long, f = formula_sort) return(data_long) } reshape_wide - function(data, ids) { # Bring data into wide form formula_wide - as.formula(paste(paste(ids, collapse=+), ~ Data_Points)) data_wide - dcast(data, formula_wide) # Resort data formula_sort - as.formula(paste(~, paste(ids, collapse=+))) data_wide - sort(data_wide, f = formula_sort) return(data_wide) } d - data.frame( test_name = c(rep(Test_A, 6), rep(Test_B, 6)), group_name = c(rep(Group_C, 3), rep(Group_D, 3), rep(Group_C, 3), rep(Group_D, 3)), id = c(I1, I2, I3, I4, I5, I6, I1, I2, I3, I7, I8, I9), X = c(NA,NA,1,2,3,4,5,6,NA,7,8,9), Y = as.numeric(10:21), Z = c(NA,22,23,NA,24,NA,25,26,NA,27,28,29) ) d d1 - reshape_long(d, ids=c(test_name, group_name, id)) d1 d2 - reshape_wide(d1, ids=c(test_name, group_name, id)) d2 identical(d,d2) -- View this message in context: http://r.789695.n4.nabble.com/Reshape-data-frame-with-dcast-and-melt-tp4484332p4484332.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Issue with asin()
Hello everyone, I am working for a few days already on a basic algorithm, very common in applied agronomy, that aims to determine the degree-days necessary for a given individual to reach a given growth stade. The algorithm (and context) is explained here: http://www.oardc.ohio-state.edu/gdd/glossary.htm , and so I implemented my function in R as follows: DD - function(Tmin, Tmax, Tseuil, meanT, method = DDsin) ### function that calculates the degree-days based on ### minimum and maximum recorded temperatures and the ### minimal threshold temperature (lower growth temperature) { ### method arcsin if(method == DDsin){ cond1 - (Tmax = Tseuil) cond2 - (Tmin = Tseuil) amp - ((Tmax - Tmin) / 2) print((Tseuil-meanT)/amp) alpha - asin((Tseuil - meanT) / amp) DD_ifelse3 - ((1 / pi) * ((meanT - Tseuil) * ((pi/2) - alpha)) + amp*cos(alpha)) DD - ifelse(cond1, 0, ifelse(cond2, (meanT - Tseuil), DD_ifelse3)) } ### method (Tmin + Tmax) / 2 else if(method == DDt2){ cond1 - (meanT Tseuil) DD - ifelse(cond1,(meanT - Tseuil),0) } else{ stop(\nMethod name is invalid.\nMethods available = DDsin (sinus) or DDt2 (mean)\n) } return(DD) } BUT! When I try to process random data: library(reshape2) library(plyr) station - rep(c(station1,station2,station3), 20) values_min - sample(-5:20, size = 60, replace = T) values_max - sample(20:40, size = 60, replace = T) meanT - ((values_min+values_max)/2) d - data.frame(station,values_min,values_max,meanT) names(d) - c(station, values_min,values_max,meanT) x-ddply(d, .(station), transform, t1 = cumsum(DD(values_min,values_max,0,meanT))) I get a warning on my alpha calculation (NaN produced); indeed, the values I give as argument to asin() are out of the range [-1:1], as the print() reveals. I can't figure out how to solve this issue, because the same algorithm works in Excel (visual basic). It is very annoying, especially because it seems that no occurence of such error using that algorithm can be found on Internet. Any help is welcome :) Thanks for your time P. -- View this message in context: http://r.789695.n4.nabble.com/Issue-with-asin-tp4484462p4484462.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with package tensor
Hi, I'm using R to create multidimensional data, ie tensors. R, for my work, is very good for import the data and I have seen that there are packages to manage tensor and to factor the tensor. I would ask a help regarding the package called tensor and tensorA. I have seen, unfortunately, that the support material is really little and it did not help me much. I explain in brief my situation. I have some data arrays of different size, they are matrices of large dimensions. From these I would create a tensor..someone tell me how do? Can you tell me an example that makes me understand how to build it? Thank you. giuseppe. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting border over map
Hi Ray, Thanks for reply /R/PlotGridded2DMap.R /R/image.plot.fix.R /R/image.plot.plt.fix.r are the functions those I wrote for plotting and they work with another data, but only I have some issue with only the codes those I provided before. and what do you mean by I am redefining the map() function in there? -- View this message in context: http://r.789695.n4.nabble.com/plotting-border-over-map-tp4479163p4484009.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] what is p,d q in arima() function of time series
i am new to time series i found in help about arima arima(x = data, order = c(p, d, q)) what is exactly p,d,q? if i not changed them,what effects will happens? -- View this message in context: http://r.789695.n4.nabble.com/what-is-p-d-q-in-arima-function-of-time-series-tp4484368p4484368.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to cluster/classify following time series?
how to cluster/classify attached time series ?(each column/time series consider as single unit while clustering/classifying) if my concept is wrong,tell me how to extract time series with highest information content ? given file is to do it http://r.789695.n4.nabble.com/file/n4484173/rasta.txt rasta.txt -- View this message in context: http://r.789695.n4.nabble.com/How-to-cluster-classify-following-time-series-tp4484173p4484173.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Save File after order
Hello, I'm trying to write the sorted data in a file of a data.frame, My question and my problem is that when I record in file adds a new column row.name, which apparently is the original position in the file. I wanted to write to the file without this column x-data.frame(name=x1,Time=20) x-rbind(x,data.frame(name=x2,Time=25)) x-rbind(x,data.frame(name=x3,Time=23)) x-rbind(x,data.frame(name=x2,Time=45)) x-rbind(x,data.frame(name=x1,Time=25)) x-rbind(x,data.frame(name=x1,Time=55)) x-x[order(x$name),] View(x) write.csv(data.frame(x$name,x$Time), file = ~/Desktop/DatasetOrder.csv) In this momment save this name Time 1 x1 20 5 x1 25 6 x1 55 2 x2 25 4 x2 45 3 x3 23 The ideia is save name Time x1 20 x1 25 x1 55 x2 25 x2 45 x3 23 Thanks -- View this message in context: http://r.789695.n4.nabble.com/Save-File-after-order-tp4484539p4484539.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fitted values with locfit
Dear memberships, I'm trying to estimate the following multivariate local regression model using the locfit package: BMI=m1(RCC)+m2(WCC) where (m1) and (m2) are unknown smooth functions. My problem is that once I get the regression done I cannot get the fitted values of each of this smooth functions (m1) and (m2). What I write is the following library(locfit) data(ais) fit2-locfit.raw(x=lp(ais$RCC,h=0.5,deg=1)+lp(ais$WCC,deg=1,h=0.75),y=ais$BMI,ev=dat(),kt=prod,kern=gauss) g21-predict(fit2,type=terms) If I done this on the computer the results of (g21) is a vector when I should have a matrix with 2 columns (one for each fitted smooth function). Please, somebody knows how can I get the estimated fitted values of both smooth functions (m1) and (m2) using a local linear regression with kernel weights as this example? thanks a lot in advance I'm very desperate. Alexandra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a very simple question
Thanks a lot for the clarification. I just find it very bizarre that if you run a=0.1*(1:9);which(a==0.4) it returns the right answer. Anyway, I will pay attention next time. Thanks a lot. Date: Mon, 19 Mar 2012 08:59:59 +0100 From: rainer.schuerm...@gmx.net Subject: Re: [R] a very simple question To: ldjst...@hotmail.com; r-help@r-project.org As to the reasons, David as given you the necessary hints. In order to get around the issue, here is what I do: a - round( 0.1 * ( 1:9 ), 1 ) a [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 which( a == 0.3 ) [1] 3 Rgds, Rainer Original-Nachricht Datum: Sun, 18 Mar 2012 21:43:54 + Von: Dajiang Liu ldjst...@hotmail.com An: r-help@r-project.org Betreff: [R] a very simple question Dear All, I have a seemingly very simple question, but I just cannot figure out the answer. I attempted to run the following:a=0.1*(1:9);which(a==0.3);it returns integer(0). But obviously, the third element of a is equal to 0.3. I must have missed something. Can someone kindly explain why? Thanks a lot. Regards,Dajiang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- --- Gentoo Linux with KDE [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] regression with proportion data
Hello, I want to determine the regression relationship between a proportion (y) and a continuous variable (x). Reading a number of sources (e.g. The R Book, Quick R,help), I believe I should be able to designate the model as: model-glm(formula=proportion~x, family=binomial(link=logit)) this runs but gives me error messages: Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! If I transform the proportion variable with log, it doesn't like that either (values not: 0y1) I understand that the binomial function concerns successes vs. failures and can use those raw data, but the R Book and other sources seem to suggest that proportion data are usable as well. Not so? Thank you, Georgiana May [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dotplot: how to change size in the y lab ?
Hi everyone, I'm trying to reduce the font size in the Y exe in this plot: http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=150 Anyone knows how to do it? I have checked the argument lab.cex and cex, but any of these works! if you want to check us this code: ### read the data d - read.csv( file( http://addictedtor.free.fr/graphiques/data/150/data.txt; ) ) ### workaround so that lattice does not order bank names alphabetically d$bank - ordered( d$bank, levels = d$bank ) ### load lattice and grid require( lattice ) require( grid ) ### setup the key k - simpleKey( c( Q2 2007, January 20th 2009 ) ) k$points$fill - c(lightblue, lightgreen) k$points$pch - 21 k$points$col - black k$points$cex - 1 ### create the plot dotplot( bank ~ MV2007 + MV2009 , data = d, horiz = T, par.settings = list( superpose.symbol = list( pch = 21, fill = c( lightblue, lightgreen), cex = 4, col = black ) ) , xlab = Market value ($Bn), key = k, panel = function(x, y, ...){ panel.dotplot( x, y, ... ) grid.text( unit( x, native) , unit( y, native) , label = x, gp = gpar( cex = .7 ) ) } ) Thank you in advance! José [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue with asin()
On 12-03-19 7:42 AM, Letnichev wrote: Hello everyone, I am working for a few days already on a basic algorithm, very common in applied agronomy, that aims to determine the degree-days necessary for a given individual to reach a given growth stade. The algorithm (and context) is explained here: http://www.oardc.ohio-state.edu/gdd/glossary.htm , and so I implemented my function in R as follows: DD- function(Tmin, Tmax, Tseuil, meanT, method = DDsin) ### function that calculates the degree-days based on ### minimum and maximum recorded temperatures and the ### minimal threshold temperature (lower growth temperature) { ### method arcsin if(method == DDsin){ cond1- (Tmax= Tseuil) cond2- (Tmin= Tseuil) These look like useful diagnostics of out-of-range values, but you don't use them before the arcsin transformation. amp- ((Tmax - Tmin) / 2) print((Tseuil-meanT)/amp) alpha- asin((Tseuil - meanT) / amp) DD_ifelse3- ((1 / pi) * ((meanT - Tseuil) * ((pi/2) - alpha)) + amp*cos(alpha)) DD- ifelse(cond1, 0, ifelse(cond2, (meanT - Tseuil), DD_ifelse3)) } ### method (Tmin + Tmax) / 2 else if(method == DDt2){ cond1- (meanT Tseuil) DD- ifelse(cond1,(meanT - Tseuil),0) } else{ stop(\nMethod name is invalid.\nMethods available = DDsin (sinus) or DDt2 (mean)\n) } return(DD) } BUT! When I try to process random data: It's a good idea to use set.seed when trying to debug problems like this. Then you can construct a reproducible example. I'd also suggest getting rid of ddply at least for debugging; it makes it harder to see what's going on. Duncan Murdoch library(reshape2) library(plyr) station- rep(c(station1,station2,station3), 20) values_min- sample(-5:20, size = 60, replace = T) values_max- sample(20:40, size = 60, replace = T) meanT- ((values_min+values_max)/2) d- data.frame(station,values_min,values_max,meanT) names(d)- c(station, values_min,values_max,meanT) x-ddply(d, .(station), transform, t1 = cumsum(DD(values_min,values_max,0,meanT))) I get a warning on my alpha calculation (NaN produced); indeed, the values I give as argument to asin() are out of the range [-1:1], as the print() reveals. I can't figure out how to solve this issue, because the same algorithm works in Excel (visual basic). It is very annoying, especially because it seems that no occurence of such error using that algorithm can be found on Internet. Any help is welcome :) Thanks for your time P. -- View this message in context: http://r.789695.n4.nabble.com/Issue-with-asin-tp4484462p4484462.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regression with proportion data
The logit link requires a binary response variable, not a proportion. Better bet is a beta regression. You can also do some stuff with linear regression if you do some transformations, but linear regression assumes the outcome is any number on the real number line bounded between -Inf and Inf. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Georgiana May Sent: Monday, March 19, 2012 10:06 AM To: r-help@r-project.org Subject: [R] regression with proportion data Hello, I want to determine the regression relationship between a proportion (y) and a continuous variable (x). Reading a number of sources (e.g. The R Book, Quick R,help), I believe I should be able to designate the model as: model-glm(formula=proportion~x, family=binomial(link=logit)) this runs but gives me error messages: Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! If I transform the proportion variable with log, it doesn't like that either (values not: 0y1) I understand that the binomial function concerns successes vs. failures and can use those raw data, but the R Book and other sources seem to suggest that proportion data are usable as well. Not so? Thank you, Georgiana May [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue with asin()
On 19-03-2012, at 12:42, Letnichev wrote: Hello everyone, I am working for a few days already on a basic algorithm, very common in applied agronomy, that aims to determine the degree-days necessary for a given individual to reach a given growth stade. The algorithm (and context) is explained here: http://www.oardc.ohio-state.edu/gdd/glossary.htm , and so I implemented my function in R as follows: DD - function(Tmin, Tmax, Tseuil, meanT, method = DDsin) ### function that calculates the degree-days based on ### minimum and maximum recorded temperatures and the ### minimal threshold temperature (lower growth temperature) { ### method arcsin if(method == DDsin){ cond1 - (Tmax = Tseuil) cond2 - (Tmin = Tseuil) amp - ((Tmax - Tmin) / 2) print((Tseuil-meanT)/amp) alpha - asin((Tseuil - meanT) / amp) DD_ifelse3 - ((1 / pi) * ((meanT - Tseuil) * ((pi/2) - alpha)) + amp*cos(alpha)) DD - ifelse(cond1, 0, ifelse(cond2, (meanT - Tseuil), DD_ifelse3)) } ### method (Tmin + Tmax) / 2 else if(method == DDt2){ cond1 - (meanT Tseuil) DD - ifelse(cond1,(meanT - Tseuil),0) } else{ stop(\nMethod name is invalid.\nMethods available = DDsin (sinus) or DDt2 (mean)\n) } return(DD) } BUT! When I try to process random data: library(reshape2) library(plyr) station - rep(c(station1,station2,station3), 20) values_min - sample(-5:20, size = 60, replace = T) values_max - sample(20:40, size = 60, replace = T) meanT - ((values_min+values_max)/2) d - data.frame(station,values_min,values_max,meanT) names(d) - c(station, values_min,values_max,meanT) x-ddply(d, .(station), transform, t1 = cumsum(DD(values_min,values_max,0,meanT))) I get a warning on my alpha calculation (NaN produced); indeed, the values I give as argument to asin() are out of the range [-1:1], as the print() reveals. I can't figure out how to solve this issue, because the same algorithm works in Excel (visual basic). That doesn't mean that Excel and/or Visual Basic gives correct answers. With the same input? Then what does Excel say that asin(-7.4) evaluates to? I tried asin(-1.2) and asin(-7.4) in LibreOffice Calc (3.5.0) and got #VALUE! (Error: wrong data type) twice. You'll have to present correct input to asin() if you want to avoid the NaN's. Berend It is very annoying, especially because it seems that no occurence of such error using that algorithm can be found on Internet. Any help is welcome :) Thanks for your time P. -- View this message in context: http://r.789695.n4.nabble.com/Issue-with-asin-tp4484462p4484462.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a very simple question
On 19-03-2012, at 13:47, Dajiang Liu wrote: Thanks a lot for the clarification. I just find it very bizarre that if you run a=0.1*(1:9);which(a==0.4) it returns the right answer. Anyway, I will pay attention next time. Thanks a lot. Look at a = 0.1*(1:4) a - 0.4 [1] -0.3 -0.2 -0.1 0.0 Berend Date: Mon, 19 Mar 2012 08:59:59 +0100 From: rainer.schuerm...@gmx.net Subject: Re: [R] a very simple question To: ldjst...@hotmail.com; r-help@r-project.org As to the reasons, David as given you the necessary hints. In order to get around the issue, here is what I do: a - round( 0.1 * ( 1:9 ), 1 ) a [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 which( a == 0.3 ) [1] 3 Rgds, Rainer Original-Nachricht Datum: Sun, 18 Mar 2012 21:43:54 + Von: Dajiang Liu ldjst...@hotmail.com An: r-help@r-project.org Betreff: [R] a very simple question Dear All, I have a seemingly very simple question, but I just cannot figure out the answer. I attempted to run the following:a=0.1*(1:9);which(a==0.3);it returns integer(0). But obviously, the third element of a is equal to 0.3. I must have missed something. Can someone kindly explain why? Thanks a lot. Regards,Dajiang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- --- Gentoo Linux with KDE [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regression with proportion data
Your response variable is not binomial, it's a proportion. Try the betareg function in the betareg package, which more correctly assumes that your response variable is Beta distributed (but beware that 1 and 0 are not allowed). The syntax is the same as in a glm. HTH Ruben -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Georgiana May Enviado el: lunes, 19 de marzo de 2012 15:06 Para: r-help@r-project.org Asunto: [R] regression with proportion data Hello, I want to determine the regression relationship between a proportion (y) and a continuous variable (x). Reading a number of sources (e.g. The R Book, Quick R,help), I believe I should be able to designate the model as: model-glm(formula=proportion~x, family=binomial(link=logit)) this runs but gives me error messages: Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! If I transform the proportion variable with log, it doesn't like that either (values not: 0y1) I understand that the binomial function concerns successes vs. failures and can use those raw data, but the R Book and other sources seem to suggest that proportion data are usable as well. Not so? Thank you, Georgiana May [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regression with proportion data
Hi Georgiana, Take a look at the betareg package at http://cran.r-project.org/web/packages/betareg/index.html HTH, Jorge.- On Mon, Mar 19, 2012 at 10:05 AM, Georgiana May wrote: Hello, I want to determine the regression relationship between a proportion (y) and a continuous variable (x). Reading a number of sources (e.g. The R Book, Quick R,help), I believe I should be able to designate the model as: model-glm(formula=proportion~x, family=binomial(link=logit)) this runs but gives me error messages: Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! If I transform the proportion variable with log, it doesn't like that either (values not: 0y1) I understand that the binomial function concerns successes vs. failures and can use those raw data, but the R Book and other sources seem to suggest that proportion data are usable as well. Not so? Thank you, Georgiana May [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue with asin()
Hi, You're not following the algorithm as given. The asin step shouldn't be done for all values, but only for the ones that don't meet the previous conditions. You're trying to calculate that step for ALL values, then only use certain ones. You must instead subset the values, THEN calculate that step. I would guess that your working Excel version does follow the correct algorithm, but it's hard to know for certain. Here's a version that more closely follows the given reference: MaxDailyTemp - values_max MinDailyTemp - values_min k - 0 GDD - rep(0, length(Tmin)) AvgDailyTemp - (MaxDailyTemp + MinDailyTemp)/2 # if MaxDailyTemp k # GDD = GDD + 0 # - add 0 # if MaxDailyTemp k MinDailyTemp k # GDD = GDD + AvgDailyTemp - k GDD[MaxDailyTemp k MinDailyTemp k] - AvgDailyTemp[MaxDailyTemp k MinDailyTemp k] - k # if MaxDailyTemp k MinDailyTemp k # GDD = GDD + (1/pi) * [ (AvgDailyTemp – k) * ( ( pi/2 ) – arcsine( theta ) ) + ( a * cos( arcsine( theta ) ) ) ] a - (MaxDailyTemp - MinDailyTemp)/2 theta - ((k - AvgDailyTemp)/a) GDD[MaxDailyTemp k MinDailyTemp k] - (1/pi) * ( (AvgDailyTemp[MaxDailyTemp k MinDailyTemp k] - k) * ( ( pi/2 ) - asin( theta[MaxDailyTemp k MinDailyTemp k] ) ) + ( a[MaxDailyTemp k MinDailyTemp k] * cos( asin( theta[MaxDailyTemp k MinDailyTemp k] ) ) ) ) sum(GDD) Sarah On Mon, Mar 19, 2012 at 7:42 AM, Letnichev chatelain.p...@gmail.com wrote: Hello everyone, I am working for a few days already on a basic algorithm, very common in applied agronomy, that aims to determine the degree-days necessary for a given individual to reach a given growth stade. The algorithm (and context) is explained here: http://www.oardc.ohio-state.edu/gdd/glossary.htm , and so I implemented my function in R as follows: DD - function(Tmin, Tmax, Tseuil, meanT, method = DDsin) ### function that calculates the degree-days based on ### minimum and maximum recorded temperatures and the ### minimal threshold temperature (lower growth temperature) { ### method arcsin if(method == DDsin){ cond1 - (Tmax = Tseuil) cond2 - (Tmin = Tseuil) amp - ((Tmax - Tmin) / 2) print((Tseuil-meanT)/amp) alpha - asin((Tseuil - meanT) / amp) DD_ifelse3 - ((1 / pi) * ((meanT - Tseuil) * ((pi/2) - alpha)) + amp*cos(alpha)) DD - ifelse(cond1, 0, ifelse(cond2, (meanT - Tseuil), DD_ifelse3)) } ### method (Tmin + Tmax) / 2 else if(method == DDt2){ cond1 - (meanT Tseuil) DD - ifelse(cond1,(meanT - Tseuil),0) } else{ stop(\nMethod name is invalid.\nMethods available = DDsin (sinus) or DDt2 (mean)\n) } return(DD) } BUT! When I try to process random data: library(reshape2) library(plyr) station - rep(c(station1,station2,station3), 20) values_min - sample(-5:20, size = 60, replace = T) values_max - sample(20:40, size = 60, replace = T) meanT - ((values_min+values_max)/2) d - data.frame(station,values_min,values_max,meanT) names(d) - c(station, values_min,values_max,meanT) x-ddply(d, .(station), transform, t1 = cumsum(DD(values_min,values_max,0,meanT))) I get a warning on my alpha calculation (NaN produced); indeed, the values I give as argument to asin() are out of the range [-1:1], as the print() reveals. I can't figure out how to solve this issue, because the same algorithm works in Excel (visual basic). It is very annoying, especially because it seems that no occurence of such error using that algorithm can be found on Internet. Any help is welcome :) Thanks for your time P. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Save File after order
It doesn't have anything to do with your use of order(). Those are the row names of your data frame. You can disable writing them with the row.names=FALSE argument to write.table(). Sarah On Mon, Mar 19, 2012 at 8:16 AM, MSousa ricardosousa2...@clix.pt wrote: Hello, I'm trying to write the sorted data in a file of a data.frame, My question and my problem is that when I record in file adds a new column row.name, which apparently is the original position in the file. I wanted to write to the file without this column x-data.frame(name=x1,Time=20) x-rbind(x,data.frame(name=x2,Time=25)) x-rbind(x,data.frame(name=x3,Time=23)) x-rbind(x,data.frame(name=x2,Time=45)) x-rbind(x,data.frame(name=x1,Time=25)) x-rbind(x,data.frame(name=x1,Time=55)) x-x[order(x$name),] View(x) write.csv(data.frame(x$name,x$Time), file = ~/Desktop/DatasetOrder.csv) In this momment save this name Time 1 x1 20 5 x1 25 6 x1 55 2 x2 25 4 x2 45 3 x3 23 The ideia is save name Time x1 20 x1 25 x1 55 x2 25 x2 45 x3 23 Thanks -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regression with proportion data
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Georgiana May Sent: 19 March 2012 14:06 To: r-help@r-project.org Subject: [R] regression with proportion data I understand that the binomial function concerns successes vs. failures and can use those raw data, but the R Book and other sources seem to suggest that proportion data are usable as well. Not so? You _can_ use a two-column matrix with counts of successes and failures in the two columns And if you know what the number n of observations was (which you would need to anyway for using proportions in a logistic regression) youcan calculate that matrix from the proportions and n, as long as you're reasonably careful about rounf=ding. S Ellison*** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] what is p,d q in arima() function of time series
https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average You may also be interested in forecast:::auto.arima Michael On Mon, Mar 19, 2012 at 6:54 AM, sagarnikam123 sagarnikam...@gmail.com wrote: i am new to time series i found in help about arima arima(x = data, order = c(p, d, q)) what is exactly p,d,q? if i not changed them,what effects will happens? -- View this message in context: http://r.789695.n4.nabble.com/what-is-p-d-q-in-arima-function-of-time-series-tp4484368p4484368.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hypergeometric function in ‘ mvtnorm’
To view the source of (most) functions, simply type funcname without parentheses: here, you get dmvt function (x, delta, sigma, df = 1, log = TRUE, type = shifted) { if (df == 0) return(dmvnorm(x, mean = delta, sigma = sigma, log = log)) if (is.vector(x)) { x - matrix(x, ncol = length(x)) } if (missing(delta)) { delta - rep(0, length = ncol(x)) } if (missing(sigma)) { sigma - diag(ncol(x)) } if (NCOL(x) != NCOL(sigma)) { stop(x and sigma have non-conforming size) } if (!isSymmetric(sigma, tol = sqrt(.Machine$double.eps), check.attributes = FALSE)) { stop(sigma must be a symmetric matrix) } if (length(delta) != NROW(sigma)) { stop(mean and sigma have non-conforming size) } m - NCOL(sigma) distval - mahalanobis(x, center = delta, cov = sigma) logdet - sum(log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values)) logretval - lgamma((m + df)/2) - (lgamma(df/2) + 0.5 * (logdet + m * logb(pi * df))) - 0.5 * (df + m) * logb(1 + distval/df) if (log) return(logretval) return(exp(logretval)) } Most of the functions in here you can see code for the same way: the only ones you won't be able to are eigen, lgamma, log, exp, but these methods are pretty well-documented and you shouldn't need to find code for them. If you do, you'll need to read the underlying C. Michael On Sun, Mar 18, 2012 at 11:12 PM, statfan irene_vr...@hotmail.com wrote: Is there any way to know how the dmvt function computes the hypergeometric function needed in the calculation for the density of multivariate t distribution? -- View this message in context: http://r.789695.n4.nabble.com/hypergeometric-function-in-mvtnorm-tp4483730p4483730.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Google Summer of Code
Once again, R has been accepted as an organization for the Google Summer of Code (2012). We invite students interested in this program to learn more about it. A good starting point is http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2012. The Google GSOC home page is http://www.google-melange.com/gsoc/homepage/google/gsoc2012 Workers who could mentor projects are also needed. We aim to have at least two mentors per student project, based on experiences reported in an article (starting page 64) in the recent issue of the R-Journal http://journal.r-project.org/archive/2011-2/RJournal_2011-2.pdf Those interested in either student or mentor participation should join our Google list gso...@googlegroups.com as this is how we are communicating. Please provide a 1 sentence intro to yourself as we have had attempts by spammers to join the group. Note that GSOC is about CODING. It is not intended to fund research, but many activities with R require code to advance our work, so the program can be very helpful to improving R. For information, the admins this year are Toby Dylan Hocking and John Nash, with backups Brian Peterson and Virgilio Gomez. Happy coding, John Nash __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a very simple question
On 19-Mar-2012 Dajiang Liu wrote: Thanks a lot for the clarification. I just find it very bizarre that if you run a=0.1*(1:9);which(a==0.4) it returns the right answer. Anyway, I will pay attention next time. Thanks a lot. The basic explanation is that, for an integer r (0r10), what is stored in binary representation by R for 0.1*r or for 0.r or for r/10 is always an approximation to the exact value (with the possible exception of r=5). The exact detail of the binary representation may depend on how it was obtained, by any of several different methods of calculation which, mathematically, are exactly equivalent but, in the binary representations stored in the computer, may be slightly different. Examples: 0.1*(1:9) - (1:9)/10 # [1] 0.00e+00 0.00e+00 5.551115e-17 0.00e+00 # [5] 0.00e+00 1.110223e-16 1.110223e-16 0.00e+00 # [8] 0.00e+00 0.1*(1:9) - c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9) # [1] 0.00e+00 0.00e+00 5.551115e-17 0.00e+00 # [5] 0.00e+00 1.110223e-16 1.110223e-16 0.00e+00 # [8] 0.00e+00 # (1:9)/10 - c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9) # [1] 0 0 0 0 0 0 0 0 0 cumsum(rep(0.3,9))/3 - (1:9)/10 # [1] -1.387779e-17 -2.775558e-17 0.00e+00 -5.551115e-17 # [5] 0.00e+00 0.00e+00 1.110223e-16 -1.110223e-16 # [9] -1.110223e-16 and so on ... The third example suggests that when R is given a decimal fraction 0.r it recognises that this is equivalent to r/10 and calculates it accordingly, hence the agreement between (1:9)/10 and c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9). (I would need to check the source code to verify that statement, however). The short answer (as has been pointed out) is that you cannot count on exact agreement, within R (or most other numerical software), between a value calculated by one numerical method and the value calculated by another numerical method which is mathematically equivalent. Some numerical software will work by storing the expression given to it not as a number but as a sequence of operations performed on given digits, only evaluating this at the last moment along with other similar expressions, working within the scale (e.g. decimal scale for numbers given like 123.456) thus obtaining maximum accuracy within the allocated storage. An example it the arbitrary-precision calculator 'bc'. Many (most?) hand-held digital calculators work to an internal decimal representation such as BCD (binary-coded decimal) where each byte is split into two half-bytes of 4 binary digits, each capable of storing a number from 0 to 9; then they can perform exact decimal arithmetic (to within the precision of storage) for decimal numbers, avoiding the imprecision resulting from conversion to binary (but may exhibit similar problems to the above for binary input). Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 19-Mar-2012 Time: 15:02:03 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Identifying a change in events between bins
Your description was too general for me to know exactly what you want but perhaps this will help you solve your own problem set.seed(123) evtlist = sample(c('fwd','rev'),100,replace=TRUE) evtlist [1] fwd rev fwd rev rev fwd rev rev rev fwd rev fwd rev [14] rev fwd rev fwd fwd fwd rev rev rev rev rev rev rev [27] rev rev fwd fwd rev rev rev rev fwd fwd rev fwd fwd [40] fwd fwd fwd fwd fwd fwd fwd fwd fwd fwd rev fwd fwd [53] rev fwd rev fwd fwd rev rev fwd rev fwd fwd fwd rev [66] fwd rev rev rev fwd rev rev rev fwd fwd fwd fwd rev [79] fwd fwd fwd rev fwd rev fwd fwd rev rev rev fwd fwd [92] rev fwd rev fwd fwd rev fwd fwd rev rle(evtlist) Run Length Encoding lengths: int [1:50] 1 1 1 2 1 3 1 1 1 2 ... values : chr [1:50] fwd rev fwd rev fwd rev fwd rev fwd ... rle(evtlist)$lengths [1] 1 1 1 2 1 3 1 1 1 2 1 1 3 9 2 4 2 1 12 1 2 1 1 1 2 2 1 1 [29] 3 1 1 3 1 3 4 1 3 1 1 1 2 3 2 1 1 1 2 1 2 1 see ?rle Rob -Original Message- From: Mark Hills Sent: Friday, March 16, 2012 3:55 PM To: r-help@r-project.org Subject: [R] Identifying a change in events between bins Hi there, First off, despite this being my first post here, I have scanned the R help forums a lot in the past few months to help with some questions, so a big thank you to the community as a whole for being so helpful! I'm somewhat of an R newbie, and have run up against a problem that I can't seem to solve. If anyone is able to help I would really appreciate it! I'm looking at a number of events across a chromosome, and have written a program that collects them into different bins, based on a specified binsize. The events are directional, either forward or reverse, and a chromosome can either be fwd/fwd (all the events fall into the fwd bins), rev/rev (all the events fall into the rev bins) or fwd/rev (events are evenly split). In some cases, chromosomes switch from one state to another (eg fwd/fwd to fwd/rev). There are a number of rules that dictate my data. First, while there is stochastic variation, the sum of fwd and rev in each bin should have approximately the same value. If I were to take the total number of events and divide them by the number of bins to get an average count per bin, I would expect approximately that value in each bin; in the case of fwd/fwd it would be about average number in the fwd column and close to zero in the rev column, in rev/rev it would be about the average number in the rev column and close to zero in the fwd column, and in fwd/rev it would be about half the average number in both. Hopefully my png attachment worked and you can see an example. The top plot shows fwd reads, the 2nd shows rev reads and the third shows fwd minus rev reads. What I would like to be able to do is to automatically assign regions in which the chromosome switches from one state to another. From the graphs (and from the read.table output below) you can see that this particular chromosome is fwd/fwd from bin 1 to 59, fwd/rev from bin 61 to 73, and rev/rev for the remainder of the chromosomes. These are generated from a read.table that looks like this: bin fwd rev 50 484 2 51 366 4 52 527 6 53 635 2 54 573 6 55 506 4 56 600 6 57 560 2 58 504 2 59 545 0 60 501 68 61 419 223 62 252 109 63 259 138 64 355 189 65 218 125 66 140 57 67 45 31 68 276 144 69 263 152 70 330 193 71 439 204 72 347 207 73 10 611 746 619 752 578 767 372 776 436 784 373 798 417 802 276 My question is this: 1. Is there an obvious way to automatically identify these regions? I am not sure how I can go about scanning previous lines within a read.table to find a point at which the values change. In the above example, I would like the program to identify that the fwd graph shifts from ~1x the average to ~0.5x the average between bin 61 and 62, and from ~0.5x the average to ~0x the average between bin 72 and 73. Conversely I'd like to identify the rev graph shifting from ~0x average to ~0.5x average between bins 59 and 60, and from 0.5x average to 1x average from bin 72 to 73. Finally, I'd like to cross-reference the output from fwd and rev to only pull out reciprocal switches (ie those that occur within 3 bins of each other in both fwd and rev data sets). What I've been trying to gt to work is to generate values based on 0, 0.5 and 1x the average events, and trying to pull out the range of bins that fall into each of those categories (possibly 1 SD higher or lower to account for the stochastic variation), but I'm not really sure how to go about that. 2. If I can find a way to identify a shift between bins, is there any way to then look in smaller bin sizes across those regions. The bins shown above are for 200,000 bases of DNA. If my program automatically found an event between bin 72 and 73 (14,400,000 bases to 14,600,000), is it possible to feed that
Re: [R] Dotplot: how to change size in the y lab ?
On Mon, Mar 19, 2012 at 7:56 AM, Jose Bustos Melo jbustosm...@yahoo.es wrote: Hi everyone, I'm trying to reduce the font size in the Y exe in this plot: dotplot( bank ~ MV2007 + MV2009 , data = d, horiz = T, par.settings = list( superpose.symbol = list( pch = 21, fill = c( lightblue, lightgreen), cex = 4, col = black ) ) , xlab = Market value ($Bn), key = k, panel = function(x, y, ...){ panel.dotplot( x, y, ... ) grid.text( unit( x, native) , unit( y, native) , label = x, gp = gpar( cex = .7 ) ) } ### add this , scales=list(y=list(cex=.5)) ) Cheers Thank you in advance! José [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a very simple question
On Mon, Mar 19, 2012 at 12:47:12PM +, Dajiang Liu wrote: Thanks a lot for the clarification. I just find it very bizarre that if you run a=0.1*(1:9);which(a==0.4) it returns the right answer. Anyway, I will pay attention next time. Thanks a lot. Hi. Yes, these things are bizarre sometimes. Compare print(0.1, digits=20) # [1] 0.1555 print(4*0.1, digits=20) # [1] 0.4000222 print(0.4, digits=20) # [1] 0.4000222 Equality of the last two is the reason for which(0.1*(1:9) == 0.4) [1] 4 while for 0.3, we get print(3*0.1, digits=20) # [1] 0.30004441 print(0.3, digits=20) # [1] 0.2999889 See http://rwiki.sciviews.org/doku.php?id=misc:r_accuracy for further hints. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plm function
Dear Ieva, plm(.., model=within) (which is the default for plm()) estimates a within model on time-demeaned data, which is equivalent to using the LSDV estimator. Therefore any time-constant dummy variable you add by hand will be discarded because of perfect collinearity. What kind of dummies are you trying to include? If they are time-constant they will be incompatible with the within (FE) estimator, but not with other uses of plm() like random effects ('model=random') or pooling ('model=pooling'). A reproducible example, as requested by the posting guide, would have clarified things. Best wishes, Giovanni Giovanni Millo, PhD Research Dept., Assicurazioni Generali SpA Via Machiavelli 4, 34132 Trieste (Italy) tel. +39 040 671184 fax +39 040 671160 - original message - Message: 15 Date: Wed, 14 Mar 2012 13:46:03 +0200 From: Ieva Sriubait? ieva.sriuba...@gmail.com To: r-help@R-project.org Subject: [R] plm function Message-ID: CAOCxseKEvj5uevHCNm-Or_E-yj=bacpb524tazq-9su+f+k...@mail.gmail.com Content-Type: text/plain Dear Sir/ Madam, I am writing about the panel data for my bachelor degree. I would really appreciate if You could help dealing with R functions. I am trying to estimate the panel data lm model with plm function. When i include 3dummy variables into the regression it dont appear in the sumarry of the model, but when i estimate a simple lm model it appears. Why is it so? What should i do to estimate the statistics for those dummy variables? Thank You. Ieva - end original message - Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:12}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm: getting the confidence interval for an Odds Ratio, when using predict()
On Mar 19, 2012, at 03:32 , Dominic Comtois wrote: Say I fit a logistic model and want to calculate an odds ratio between 2 sets of predictors. It is easy to obtain the difference in the predicted logodds using the predict() function, and thus get a point-estimate OR. But I can't see how to obtain the confidence interval for such an OR. For example: model - glm(chd ~age.cat + male + lowed, family=binomial(logit)) pred1 - predict(model, newdata=data.frame(age.cat=1,male=1,lowed=1)) pred2 - predict(model, newdata=data.frame(age.cat=2,male=0,lowed=0)) OR - exp(pred2-pred1) There's no trivial way since you need the covariance of pred2 and pred1 to calculate the variance of the difference. I think you can proceed somewhat like as follows (I can't be bothered to test it without a reproducible example to start from. You may need to throw in a few explicit t() and as.vector() here and there.) newd - data.frame(age.cat=c(1,2),male=c(1,0),lowed=c(1,0)) M - model.matrix(model, data=newd) V - vcov(model) contr - c(-1,1) %*% M se - contr %*% V %*% contr OR.ci - exp(pred2 - pred1 + qnorm(c(.025,.50,.975))*se) (Sanity check: contr %*% coef(model) should be same as pred2 - pred1 ) I'm not sure how general the model.matrix trick is. It works in cases like mm - glm(ff, data=trees) model.matrix(mm, data=trees[1,]) (Intercept) log(Height) log(Girth) 1 14.248495 2.116256 attr(,assign) [1] 0 1 2 but I see that there are cases where a data argument may be ignored. If that is the case, then you may have to construct the contr vector by hand. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fitting a histogram to a Gaussian curve
Hello, I am trying to fit my histogram to a smooth Gaussian curve(the data closely resembles one except a few bars). This is my code : #!/usr/bin/Rscript out_file = irc_20M_opencl_test.png png(out_file) scan(my.csv) - myvals hist(myvals, breaks = 50, main = My Distribution,xlab = My Values) pdens - density(myvals, na.rm=T) plot(pdens, col=black, lwd=3, xlab=My values, main=Default KDE) dev.off() print(paste(Plot was saved in:, getwd())) the problem here is that I a jagged distribution, you can see the result : http://s15.postimage.org/9ucmkx3bf/foobar.png this is the original histogram : http://s12.postimage.org/e0lfp7d5p/foobar2.png any ideas on how I can smoothen it to a Gaussian curve? Thanks, - vihan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Save File after order
Thanks -- View this message in context: http://r.789695.n4.nabble.com/Save-File-after-order-tp4484539p4485370.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 'Unexpected numeric constant'
Dear R-help, I am trying to rename the variables in a dataframe, called 'T1A' here. Seems renaming was successful, but when I call one of the variable I got error message and I wanted to know why. The data frame contains 365 rows and 49 columns. I would like to name the first column `DATE` and the others T0.5, T1, T1.5,...,T24 (as this is a set of data collected every half hour for a whole year). Original data is saved as csv file and column 2-49 are named in format '00:30,01:00,01:30,...,23:30,00:00'. When I read them into R by using read.csv, the column names are changed automatically as 'X0.30.00, X1.00.00,...,X23.30.00,X0.00.00' , which dont look great (i mean I would prefer it in a format as 'hh:mm', NOT using 'dot' between numbers that used to indicate time, but I have not found a solution...). So I decided to use a simplified version as above, e.g. T0.5, T1, T1.5,...,T24 and my code is: TIME-paste(rep(T,48),as.character(seq(0.5,24,by=0.5))) names(T1A)-c(DATE,TIME) class(T1A$T0.5) ## without a space between 'T' and '0.5' [1] NULL class(T1A$T 0.5) ## with a space between 'T' and '0.5' Error: unexpected numeric constant in class(T1A$T 0.5 I also tried the code below, but got same error message... TIME-paste(rep(T,48),seq(0.5,24,by=0.5)) names(T1A)-c(DATE,TIME) However, if I do not change the columns' name then everything works fine, e.g. I can call the variables with no problem. class(T1A$X00.30.00) [1] numeric Any thoughts?? Many thanks!!! HJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue with asin()
Hello, you're totally right, I tried first to control the flow with if (MaxDailyTemp k MinDailyTemp k){statement} but it was a bit messy. Then ifelse() was supposed to help me out, but it didn't. Thank you for your time, your code works exactly as I want :) P. -- View this message in context: http://r.789695.n4.nabble.com/Issue-with-asin-tp4484462p4485206.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue with asin()
Yes, with the same input I had two different outputs with Excel and R. When printing a debug report of Excel, it showed no anomalies and I am certain it didn't calculate odd values (such as NaNs). The way I coded was wrong, as Sarah said, I didn't follow completely the algorithm. The solution she suggested works perfectly, so I am out of trouble (for now :p ). Thanks for your time, P. -- View this message in context: http://r.789695.n4.nabble.com/Issue-with-asin-tp4484462p4485185.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Output formatting in Latex and R
Use the eol=\n\n option. The records should have a 2 line space. -- View this message in context: http://r.789695.n4.nabble.com/Output-formatting-in-Latex-and-R-tp4483631p4485457.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hypergeometric function in ‘ mvtnorm’
Thanks for your advice. I actually meant to ask about the pmvt for the distribution function. Viewing the source code pmvt uses the function mvt which uses the function probval which sources the fortran code: Fortran(mvtdst, N = as.integer(n), NU = as.integer(df), LOWER = as.double(lower), UPPER = as.double(upper), INFIN = as.integer(infin), CORREL = as.double(corrF), DELTA = as.double(delta), MAXPTS = as.integer(x$maxpts), ABSEPS = as.double(x$abseps), RELEPS = as.double(x$releps), error = as.double(error), value = as.double(value), inform = as.integer(inform), PACKAGE = mvtnorm) I wish to look at how this mvtdst calculates the hypergeometric function (2_F_1). Anyway that I can see that? Thanks -- View this message in context: http://r.789695.n4.nabble.com/hypergeometric-function-in-mvtnorm-tp4483730p4485277.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Coverage Probability
Hello. I'm allready this far. I have a function which is calculating the lower (l) and upper (u) limit for a confidence interval for the odds ratio. For example for 5 simulated 2x2 tables the upper and lower limits are: u [1] 2.496141 7.436524 8.209161 4.313587 3.318612 l [1] -0.9718608 1.1000713 1.5715373 0.1135158 -0.2700517 With (l[1]; u[1]) being the confidence interval for the odds ratio for the first simulated table and so on. Now I want to compute the coverage probability. For that I've created a function which is return 1 if the odds ratio is in the interval and 0 if it isn't. cover - function(theta, u, l){ if(theta = l theta = u){z=1} if(theta l || thetau){z=0}; return(z) } This works but unfortunately not if I want to summarize the function and divide it with the sample size to get the coverage probability. I tried it this way for(for(x in 1:5) {a = (sum(cover(theta, u[x], l[x]))/5; return(a)} Maybe someone can help me. Thank you -- View this message in context: http://r.789695.n4.nabble.com/Coverage-Probability-tp4485511p4485511.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coverage Probability
Hi hubinho, You are almost there. Try this slightly modification of your function: # theta, u and l are vectors of the same length foo - function(theta, u, l) mean(theta = l theta = u, na.rm = TRUE) foo(theta, u, l) HTH, Jorge.- On Mon, Mar 19, 2012 at 12:55 PM, hubinho wrote: Hello. I'm allready this far. I have a function which is calculating the lower (l) and upper (u) limit for a confidence interval for the odds ratio. For example for 5 simulated 2x2 tables the upper and lower limits are: u [1] 2.496141 7.436524 8.209161 4.313587 3.318612 l [1] -0.9718608 1.1000713 1.5715373 0.1135158 -0.2700517 With (l[1]; u[1]) being the confidence interval for the odds ratio for the first simulated table and so on. Now I want to compute the coverage probability. For that I've created a function which is return 1 if the odds ratio is in the interval and 0 if it isn't. cover - function(theta, u, l){ if(theta = l theta = u){z=1} if(theta l || thetau){z=0}; return(z) } This works but unfortunately not if I want to summarize the function and divide it with the sample size to get the coverage probability. I tried it this way for(for(x in 1:5) {a = (sum(cover(theta, u[x], l[x]))/5; return(a)} Maybe someone can help me. Thank you -- View this message in context: http://r.789695.n4.nabble.com/Coverage-Probability-tp4485511p4485511.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] car/MANOVA question
Dear colleagues, I had a question wrt the car package. How do I evaluate whether a simpler multivariate regression model is adequate? For instance, I do the following: ami - read.table(file = http://www.public.iastate.edu/~maitra/stat501/datasets/amitriptyline.dat;, col.names=c(TCAD, drug, gender, antidepressant,PR, dBP, QRS)) ami$gender - as.factor(ami$gender) ami$TCAD - ami$TCAD/1000 ami$drug - ami$drug/1000 library(car) fit.lm - lm(cbind(TCAD, drug) ~ gender + antidepressant + PR + dBP + QRS, data = ami) fit.manova - Manova(fit.lm) fit1.lm - update(fit.lm, .~ . - PR - dBP - QRS) fit1.manova - Manova(fit1.lm) Is there an easy way to find out whether the reduced model is adequate? I am thinking of something similar to the anova() function, I guess? Many thanks and best wishes, Ranjan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 'Unexpected numeric constant'
On 19-03-2012, at 17:39, HJ YAN wrote: Dear R-help, I am trying to rename the variables in a dataframe, called 'T1A' here. Seems renaming was successful, but when I call one of the variable I got error message and I wanted to know why. The data frame contains 365 rows and 49 columns. I would like to name the first column `DATE` and the others T0.5, T1, T1.5,...,T24 (as this is a set of data collected every half hour for a whole year). Original data is saved as csv file and column 2-49 are named in format '00:30,01:00,01:30,...,23:30,00:00'. When I read them into R by using read.csv, the column names are changed automatically as 'X0.30.00, X1.00.00,...,X23.30.00,X0.00.00' , which dont look great (i mean I would prefer it in a format as 'hh:mm', NOT using 'dot' between numbers that used to indicate time, but I have not found a solution...). So I decided to use a simplified version as above, e.g. T0.5, T1, T1.5,...,T24 and my code is: TIME-paste(rep(T,48),as.character(seq(0.5,24,by=0.5))) names(T1A)-c(DATE,TIME) class(T1A$T0.5) ## without a space between 'T' and '0.5' [1] NULL class(T1A$T 0.5) ## with a space between 'T' and '0.5' Error: unexpected numeric constant in class(T1A$T 0.5 I also tried the code below, but got same error message... TIME-paste(rep(T,48),seq(0.5,24,by=0.5)) names(T1A)-c(DATE,TIME) However, if I do not change the columns' name then everything works fine, e.g. I can call the variables with no problem. class(T1A$X00.30.00) [1] numeric Any thoughts?? Have you done ?paste The default separator character is a singe space. Use paste(., sep=) Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hypergeometric function in ‘ mvtnorm’
On 19-03-2012, at 16:54, statfan wrote: Thanks for your advice. I actually meant to ask about the pmvt for the distribution function. Viewing the source code pmvt uses the function mvt which uses the function probval which sources the fortran code: No it doesn't source. It call a compiled Fortran subroutine. Fortran(mvtdst, N = as.integer(n), NU = as.integer(df), LOWER = as.double(lower), UPPER = as.double(upper), INFIN = as.integer(infin), CORREL = as.double(corrF), DELTA = as.double(delta), MAXPTS = as.integer(x$maxpts), ABSEPS = as.double(x$abseps), RELEPS = as.double(x$releps), error = as.double(error), value = as.double(value), inform = as.integer(inform), PACKAGE = mvtnorm) I wish to look at how this mvtdst calculates the hypergeometric function (2_F_1). Anyway that I can see that? Yes. Download the source code of the package. Obtainable from CRAN. Unpack and browse. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rolling regressions with sample extended one period at a time
hey, thnks a lot. I got exactly what I wanted. -- View this message in context: http://r.789695.n4.nabble.com/Rolling-regressions-with-sample-extended-one-period-at-a-time-tp4470316p4485815.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear regression
Hello there, I am new to using regression in R. I wanted to solve a simple regression problem where I have 2 equations and 2 unknowns. So lets say - y1 = alpha1*A + beta1*B y2 = alpha2*A + beta2*B y1 - runif(10, 0,1) y2 - runif(10,0,1) alpha1 - 0.6 alpha2 - 0.75 beta1 - 1-alpha1 beta2 - 1-apha2 I now want this equation to estimate the values of A and B. Both A and B are constrained to be between (0,1). I would like to use lm with these constraints and I am having a little trouble in defining the equations correctly. Any help would be most appreciated. Thank you, Diviya [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 'Unexpected numeric constant'
On 2012-03-19 09:39, HJ YAN wrote: Dear R-help, I am trying to rename the variables in a dataframe, called 'T1A' here. Seems renaming was successful, but when I call one of the variable I got error message and I wanted to know why. The data frame contains 365 rows and 49 columns. I would like to name the first column `DATE` and the others T0.5, T1, T1.5,...,T24 (as this is a set of data collected every half hour for a whole year). Original data is saved as csv file and column 2-49 are named in format '00:30,01:00,01:30,...,23:30,00:00'. When I read them into R by using read.csv, the column names are changed automatically as 'X0.30.00, X1.00.00,...,X23.30.00,X0.00.00' , which dont look great (i mean I would prefer it in a format as 'hh:mm', NOT using 'dot' between numbers that used to indicate time, but I have not found a solution...). So I decided to use a simplified version as above, e.g. T0.5, T1, T1.5,...,T24 and my code is: TIME-paste(rep(T,48),as.character(seq(0.5,24,by=0.5))) names(T1A)-c(DATE,TIME) class(T1A$T0.5) ## without a space between 'T' and '0.5' [1] NULL class(T1A$T 0.5) ## with a space between 'T' and '0.5' Error: unexpected numeric constant in class(T1A$T 0.5 I also tried the code below, but got same error message... TIME-paste(rep(T,48),seq(0.5,24,by=0.5)) names(T1A)-c(DATE,TIME) However, if I do not change the columns' name then everything works fine, e.g. I can call the variables with no problem. class(T1A$X00.30.00) [1] numeric Any thoughts?? Many thanks!!! HJ Berend has shown you the problem with your use of paste(). If you want the original (illegal in R) names, then you can set the argument 'check.names' to FALSE in your read.csv() call. You will then have to remember to always put quotes around any use of these names in your code. But since it's generally better to use T1A[[name]] rather than T1A$name anyway, the need for quotes should not be a problem. Still, I wouldn't use illegal names. Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Automaticall adjust axis scales
Dear all, I have made a function that given a number of list elements plot them to the same window. The first element is plotted by using plot and all the rest are plotted under the same window by using lines. I have below a small and simple reproducible example. x1-c(1:10) plot(x1) x2-c(11:20) lines(x2) x3-c(31:40) lines(x3) as you might notice the two consecutive lines fail to be plotted as the axis were formed by the first plot. Would it be possible after the last lines to change the axis to the minimum and the maximum of all data sets to be visible? Any idea how I can do that? I would like to thank you for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] car/MANOVA question
Dear Ranjan, As you no doubt noticed, the Manova() function in the car package, or the Anova() function for which Manova() is an alias, produces type II or III tests for a multivariate linear model. To compare two nested multivariate linear models, as you wish to do, you can use the standard R anova() function -- see ?anova.mlm. I hope this helps, John John Fox Sen. William McMaster Prof. of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ On Mon, 19 Mar 2012 12:31:48 -0500 Ranjan Maitra mai...@iastate.edu wrote: Dear colleagues, I had a question wrt the car package. How do I evaluate whether a simpler multivariate regression model is adequate? For instance, I do the following: ami - read.table(file = http://www.public.iastate.edu/~maitra/stat501/datasets/amitriptyline.dat;, col.names=c(TCAD, drug, gender, antidepressant,PR, dBP, QRS)) ami$gender - as.factor(ami$gender) ami$TCAD - ami$TCAD/1000 ami$drug - ami$drug/1000 library(car) fit.lm - lm(cbind(TCAD, drug) ~ gender + antidepressant + PR + dBP + QRS, data = ami) fit.manova - Manova(fit.lm) fit1.lm - update(fit.lm, .~ . - PR - dBP - QRS) fit1.manova - Manova(fit1.lm) Is there an easy way to find out whether the reduced model is adequate? I am thinking of something similar to the anova() function, I guess? Many thanks and best wishes, Ranjan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hgu133plus2hsentrezgprobe library
Hello R community, I am processing raw Affymetrix CEL files and I am using the Michigan custom CDF library hgu133plus2hsentrezgprobe. I have been looking for documentation on the function that it contains...I am specifically interested in converting probe names to gene symbols. Does anybody know where I can find it? Thank a lot! Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting a histogram to a Gaussian curve
If I understand you correctly, a univariate Gaussian distribution is uniquely determined by its first two moments so you can just fit those directly (using sample mean for population mean and sample variance with Besel's correction for population variance) and get the best Gaussian (in a ML sense). E.g., x - rnorm(500, 3, 2) hist(x, freq = FALSE) lines(seq(min(x), max(x), length.out = 300) - y, dnorm(y, mean(x), sd(x)), col = 2) Hope this helps, Michael On Mon, Mar 19, 2012 at 12:47 PM, Vihan Pandey vihanpan...@gmail.com wrote: Hello, I am trying to fit my histogram to a smooth Gaussian curve(the data closely resembles one except a few bars). This is my code : #!/usr/bin/Rscript out_file = irc_20M_opencl_test.png png(out_file) scan(my.csv) - myvals hist(myvals, breaks = 50, main = My Distribution,xlab = My Values) pdens - density(myvals, na.rm=T) plot(pdens, col=black, lwd=3, xlab=My values, main=Default KDE) dev.off() print(paste(Plot was saved in:, getwd())) the problem here is that I a jagged distribution, you can see the result : http://s15.postimage.org/9ucmkx3bf/foobar.png this is the original histogram : http://s12.postimage.org/e0lfp7d5p/foobar2.png any ideas on how I can smoothen it to a Gaussian curve? Thanks, - vihan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automaticall adjust axis scales
I don't believe this is possible in base graphics: you need to plan your graphics ahead with something like plot(, ylim = range(x1, x2, x3)). There's a pen-and-paper approach which means once something is there, it's on the device permanently (unless you write over it). Perhaps an interactive graphics package would allow it -- but I'll happily be corrected (and informed) by others. As a style thing, your use of c() is unnecessary and confusing. identical(1:10, c(1:10)) Michael On Mon, Mar 19, 2012 at 2:40 PM, Alaios ala...@yahoo.com wrote: Dear all, I have made a function that given a number of list elements plot them to the same window. The first element is plotted by using plot and all the rest are plotted under the same window by using lines. I have below a small and simple reproducible example. x1-c(1:10) plot(x1) x2-c(11:20) lines(x2) x3-c(31:40) lines(x3) as you might notice the two consecutive lines fail to be plotted as the axis were formed by the first plot. Would it be possible after the last lines to change the axis to the minimum and the maximum of all data sets to be visible? Any idea how I can do that? I would like to thank you for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting a histogram to a Gaussian curve
I see, that could be an option, however isn't there a fitting function which would do that on given data? On 19 March 2012 19:49, R. Michael Weylandt michael.weyla...@gmail.com wrote: If I understand you correctly, a univariate Gaussian distribution is uniquely determined by its first two moments so you can just fit those directly (using sample mean for population mean and sample variance with Besel's correction for population variance) and get the best Gaussian (in a ML sense). E.g., x - rnorm(500, 3, 2) hist(x, freq = FALSE) lines(seq(min(x), max(x), length.out = 300) - y, dnorm(y, mean(x), sd(x)), col = 2) Hope this helps, Michael On Mon, Mar 19, 2012 at 12:47 PM, Vihan Pandey vihanpan...@gmail.com wrote: Hello, I am trying to fit my histogram to a smooth Gaussian curve(the data closely resembles one except a few bars). This is my code : #!/usr/bin/Rscript out_file = irc_20M_opencl_test.png png(out_file) scan(my.csv) - myvals hist(myvals, breaks = 50, main = My Distribution,xlab = My Values) pdens - density(myvals, na.rm=T) plot(pdens, col=black, lwd=3, xlab=My values, main=Default KDE) dev.off() print(paste(Plot was saved in:, getwd())) the problem here is that I a jagged distribution, you can see the result : http://s15.postimage.org/9ucmkx3bf/foobar.png this is the original histogram : http://s12.postimage.org/e0lfp7d5p/foobar2.png any ideas on how I can smoothen it to a Gaussian curve? Thanks, - vihan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting a histogram to a Gaussian curve
Take a look at fitdistr in the MASS package. fitdistr(x, normal) I don't think you need to supply start values for the normal since its loglikelihood function is nicely behaved. You may need to for harder distributions. Michael On Mon, Mar 19, 2012 at 2:54 PM, Vihan Pandey vihanpan...@gmail.com wrote: I see, that could be an option, however isn't there a fitting function which would do that on given data? On 19 March 2012 19:49, R. Michael Weylandt michael.weyla...@gmail.com wrote: If I understand you correctly, a univariate Gaussian distribution is uniquely determined by its first two moments so you can just fit those directly (using sample mean for population mean and sample variance with Besel's correction for population variance) and get the best Gaussian (in a ML sense). E.g., x - rnorm(500, 3, 2) hist(x, freq = FALSE) lines(seq(min(x), max(x), length.out = 300) - y, dnorm(y, mean(x), sd(x)), col = 2) Hope this helps, Michael On Mon, Mar 19, 2012 at 12:47 PM, Vihan Pandey vihanpan...@gmail.com wrote: Hello, I am trying to fit my histogram to a smooth Gaussian curve(the data closely resembles one except a few bars). This is my code : #!/usr/bin/Rscript out_file = irc_20M_opencl_test.png png(out_file) scan(my.csv) - myvals hist(myvals, breaks = 50, main = My Distribution,xlab = My Values) pdens - density(myvals, na.rm=T) plot(pdens, col=black, lwd=3, xlab=My values, main=Default KDE) dev.off() print(paste(Plot was saved in:, getwd())) the problem here is that I a jagged distribution, you can see the result : http://s15.postimage.org/9ucmkx3bf/foobar.png this is the original histogram : http://s12.postimage.org/e0lfp7d5p/foobar2.png any ideas on how I can smoothen it to a Gaussian curve? Thanks, - vihan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear regression
1. Homework assignment? We don't do homework here. 2. If not, a mixture model of some sort? I suggest you state the context of the problem more fully. R has several packages to do mixture modeling, if that's what you're trying to do. 3. In any case, this cannot be done with lm() (at least without tricks). 4. In your notation below, the separate regressions can be stacked into a single constrained regression model. 5. You might do better to find local statistical help, as you may have bitten off more than you can chew. -- Bert On Mon, Mar 19, 2012 at 11:29 AM, Diviya Smith diviya.sm...@gmail.com wrote: Hello there, I am new to using regression in R. I wanted to solve a simple regression problem where I have 2 equations and 2 unknowns. So lets say - y1 = alpha1*A + beta1*B y2 = alpha2*A + beta2*B y1 - runif(10, 0,1) y2 - runif(10,0,1) alpha1 - 0.6 alpha2 - 0.75 beta1 - 1-alpha1 beta2 - 1-apha2 I now want this equation to estimate the values of A and B. Both A and B are constrained to be between (0,1). I would like to use lm with these constraints and I am having a little trouble in defining the equations correctly. Any help would be most appreciated. Thank you, Diviya [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting a histogram to a Gaussian curve
I'll check it out, thanks a million Micheal! On 19 March 2012 19:59, R. Michael Weylandt michael.weyla...@gmail.com wrote: Take a look at fitdistr in the MASS package. fitdistr(x, normal) I don't think you need to supply start values for the normal since its loglikelihood function is nicely behaved. You may need to for harder distributions. Michael On Mon, Mar 19, 2012 at 2:54 PM, Vihan Pandey vihanpan...@gmail.com wrote: I see, that could be an option, however isn't there a fitting function which would do that on given data? On 19 March 2012 19:49, R. Michael Weylandt michael.weyla...@gmail.com wrote: If I understand you correctly, a univariate Gaussian distribution is uniquely determined by its first two moments so you can just fit those directly (using sample mean for population mean and sample variance with Besel's correction for population variance) and get the best Gaussian (in a ML sense). E.g., x - rnorm(500, 3, 2) hist(x, freq = FALSE) lines(seq(min(x), max(x), length.out = 300) - y, dnorm(y, mean(x), sd(x)), col = 2) Hope this helps, Michael On Mon, Mar 19, 2012 at 12:47 PM, Vihan Pandey vihanpan...@gmail.com wrote: Hello, I am trying to fit my histogram to a smooth Gaussian curve(the data closely resembles one except a few bars). This is my code : #!/usr/bin/Rscript out_file = irc_20M_opencl_test.png png(out_file) scan(my.csv) - myvals hist(myvals, breaks = 50, main = My Distribution,xlab = My Values) pdens - density(myvals, na.rm=T) plot(pdens, col=black, lwd=3, xlab=My values, main=Default KDE) dev.off() print(paste(Plot was saved in:, getwd())) the problem here is that I a jagged distribution, you can see the result : http://s15.postimage.org/9ucmkx3bf/foobar.png this is the original histogram : http://s12.postimage.org/e0lfp7d5p/foobar2.png any ideas on how I can smoothen it to a Gaussian curve? Thanks, - vihan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automaticall adjust axis scales
Perhaps matplot()? matplot(cbind(x1, x2, x3), type = 'l') See ?matplot for more information. HTH, Jorge.- On Mon, Mar 19, 2012 at 2:40 PM, Alaios wrote: Dear all, I have made a function that given a number of list elements plot them to the same window. The first element is plotted by using plot and all the rest are plotted under the same window by using lines. I have below a small and simple reproducible example. x1-c(1:10) plot(x1) x2-c(11:20) lines(x2) x3-c(31:40) lines(x3) as you might notice the two consecutive lines fail to be plotted as the axis were formed by the first plot. Would it be possible after the last lines to change the axis to the minimum and the maximum of all data sets to be visible? Any idea how I can do that? I would like to thank you for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automaticall adjust axis scales
Thanks for the immediate answer. is ther any alternative for the matplot? There might few limitations with matplot in my case. I will post again if needed when I will be at office tomorrow. Regards Alex From: Jorge I Velez jorgeivanve...@gmail.com Cc: R help R-help@r-project.org Sent: Monday, March 19, 2012 9:03 PM Subject: Re: [R] Automaticall adjust axis scales Perhaps matplot()? matplot(cbind(x1, x2, x3), type = 'l') See ?matplot for more information. HTH, Jorge.- On Mon, Mar 19, 2012 at 2:40 PM, Alaios wrote: Dear all, I have made a function that given a number of list elements plot them to the same window. The first element is plotted by using plot and all the rest are plotted under the same window by using lines. I have below a small and simple reproducible example. x1-c(1:10) plot(x1) x2-c(11:20) lines(x2) x3-c(31:40) lines(x3) as you might notice the two consecutive lines fail to be plotted as the axis were formed by the first plot. Would it be possible after the last lines to change the axis to the minimum and the maximum of all data sets to be visible? Any idea how I can do that? I would like to thank you for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regression with proportion data
On 2012-03-19 07:35, S Ellison wrote: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Georgiana May Sent: 19 March 2012 14:06 To: r-help@r-project.org Subject: [R] regression with proportion data I understand that the binomial function concerns successes vs. failures and can use those raw data, but the R Book and other sources seem to suggest that proportion data are usable as well. Not so? You _can_ use a two-column matrix with counts of successes and failures in the two columns And if you know what the number n of observations was (which you would need to anyway for using proportions in a logistic regression) youcan calculate that matrix from the proportions and n, as long as you're reasonably careful about rounf=ding. Yes, and you can also use the proportions directly; just specify the corresponding vector of number of trials as the 'weights' argument in the glm() call. See the Details section of ?glm. Peter Ehlers S Ellison*** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] where this Error comes from?
Dear all, While I am executing my code I receive the error below Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 'x' must be atomic the weird thing that I am not calling anywhere sort function nor do I rely on anyh sorting. How I can discover where this comes from (inside which function?). I would like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] where this Error comes from?
I call upon the great and mighty Google (hallowed be its name) to discover: traceback() and its more powerful cousin options(error = recover) Michael On Mon, Mar 19, 2012 at 3:22 PM, Alaios ala...@yahoo.com wrote: Dear all, While I am executing my code I receive the error below Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 'x' must be atomic the weird thing that I am not calling anywhere sort function nor do I rely on anyh sorting. How I can discover where this comes from (inside which function?). I would like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] where this Error comes from?
Call traceback() after seeing the error message. E.g., factor(list(1, 2:3, 4:6)) Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? traceback() 3: stop('x' must be atomic for 'sort.list'\nHave you called 'sort' on a list?) 2: sort.list(y) 1: factor(list(1, 2:3, 4:6)) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Alaios Sent: Monday, March 19, 2012 12:22 PM To: R help Subject: [R] where this Error comes from? Dear all, While I am executing my code I receive the error below Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 'x' must be atomic the weird thing that I am not calling anywhere sort function nor do I rely on anyh sorting. How I can discover where this comes from (inside which function?). I would like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] where this Error comes from?
?traceback ?options ## consider changing error option to recover ?debug Search on debugging in R to find more possibilities. R is a programming language. You need to learn how to debug code if you wish to program in R. -- Bert On Mon, Mar 19, 2012 at 12:22 PM, Alaios ala...@yahoo.com wrote: Dear all, While I am executing my code I receive the error below Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 'x' must be atomic the weird thing that I am not calling anywhere sort function nor do I rely on anyh sorting. How I can discover where this comes from (inside which function?). I would like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automaticall adjust axis scales
Or look at the xlim and ylim arguments to plot. E.g., x1 - 1:10 ; x2 - 11:17 ; x3 - 21:23 plot(NA, NA, xlim=range(1, length(x1), length(x2), length(x3)), ylim=range(x1, x2, x3), type=n, xlab=, ylab=) points(x1, type=b) lines(x2) points(x3) title(xlab=The X Values, ylab=The Y Values) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jorge I Velez Sent: Monday, March 19, 2012 12:03 PM To: Alaios Cc: R help Subject: Re: [R] Automaticall adjust axis scales Perhaps matplot()? matplot(cbind(x1, x2, x3), type = 'l') See ?matplot for more information. HTH, Jorge.- On Mon, Mar 19, 2012 at 2:40 PM, Alaios wrote: Dear all, I have made a function that given a number of list elements plot them to the same window. The first element is plotted by using plot and all the rest are plotted under the same window by using lines. I have below a small and simple reproducible example. x1-c(1:10) plot(x1) x2-c(11:20) lines(x2) x3-c(31:40) lines(x3) as you might notice the two consecutive lines fail to be plotted as the axis were formed by the first plot. Would it be possible after the last lines to change the axis to the minimum and the maximum of all data sets to be visible? Any idea how I can do that? I would like to thank you for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear regression
Hello Bert, This is definitely not for a homework problem. I am trying to estimate frequencies of mutations in different groups. The mutation frequencies can be modeled as a linear relation in cases of mixtures. So I have a lot of populations that follow the relationship - y = alpha*A + beta*B and I want to estimate A and B; given y, alpha and beta. A and B are both vectors of the same size as y. Can you suggest where I can find some information about your suggestion #4...that is exactly what I was hoping to do. Thanks, Diviya On Mon, Mar 19, 2012 at 3:02 PM, Bert Gunter gunter.ber...@gene.com wrote: 1. Homework assignment? We don't do homework here. 2. If not, a mixture model of some sort? I suggest you state the context of the problem more fully. R has several packages to do mixture modeling, if that's what you're trying to do. 3. In any case, this cannot be done with lm() (at least without tricks). 4. In your notation below, the separate regressions can be stacked into a single constrained regression model. 5. You might do better to find local statistical help, as you may have bitten off more than you can chew. -- Bert On Mon, Mar 19, 2012 at 11:29 AM, Diviya Smith diviya.sm...@gmail.com wrote: Hello there, I am new to using regression in R. I wanted to solve a simple regression problem where I have 2 equations and 2 unknowns. So lets say - y1 = alpha1*A + beta1*B y2 = alpha2*A + beta2*B y1 - runif(10, 0,1) y2 - runif(10,0,1) alpha1 - 0.6 alpha2 - 0.75 beta1 - 1-alpha1 beta2 - 1-apha2 I now want this equation to estimate the values of A and B. Both A and B are constrained to be between (0,1). I would like to use lm with these constraints and I am having a little trouble in defining the equations correctly. Any help would be most appreciated. Thank you, Diviya [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear regression
hi I think You Can Use solve function to solve the equations. ___ Niloofar.Javanrouh MSc Student Of BioStatistics Mashad University Of Medical Sciences From: Bert Gunter gunter.ber...@gene.com To: Diviya Smith diviya.sm...@gmail.com Cc: r-help@r-project.org Sent: Monday, March 19, 2012 11:32 PM Subject: Re: [R] Linear regression 1. Homework assignment? We don't do homework here. 2. If not, a mixture model of some sort? I suggest you state the context of the problem more fully. R has several packages to do mixture modeling, if that's what you're trying to do. 3. In any case, this cannot be done with lm() (at least without tricks). 4. In your notation below, the separate regressions can be stacked into a single constrained regression model. 5. You might do better to find local statistical help, as you may have bitten off more than you can chew. -- Bert On Mon, Mar 19, 2012 at 11:29 AM, Diviya Smith diviya.sm...@gmail.com wrote: Hello there, I am new to using regression in R. I wanted to solve a simple regression problem where I have 2 equations and 2 unknowns. So lets say - y1 = alpha1*A + beta1*B y2 = alpha2*A + beta2*B y1 - runif(10, 0,1) y2 - runif(10,0,1) alpha1 - 0.6 alpha2 - 0.75 beta1 - 1-alpha1 beta2 - 1-apha2 I now want this equation to estimate the values of A and B. Both A and B are constrained to be between (0,1). I would like to use lm with these constraints and I am having a little trouble in defining the equations correctly. Any help would be most appreciated. Thank you, Diviya [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coverage Probability
Thank you very much. This was, was i needed. Unfortunately I have one futher problem with this Code. I don't only need the coverage probability for one but for a range of different odds ratios. (for example [1;30]). I tried it with a loop but I get an error. I think again, that I'm almost there but having a little mistake. The complete code is: #setting values n1 - 10 n2 - 10 y - 100 alpha - 1 z-1.96 # creating 2x2 table for (i in 1:30) { theta - i x1 - exp(alpha +theta)/ (1+ exp(alpha +theta)) x2 - exp(alpha)/ (1+ exp(alpha)) n11 - rbinom(y, 10, x1) n12 - n1 - n11 n21 - rbinom(y, 10, x2) n22 - n2 - n21 # upper and lower limit gart interval gartu -function(z,d,e, f, g){log(((d+.5)*(g+.5))/((e+.5)*(f+.5)))+ z*sqrt(1/(d+.5)+1/(e+.5)+1/(f+.5)+1/(g+.5))} gartl -function(z,d,e, f, g){log(((d+.5)*(g+.5))/((e+.5)*(f+.5)))- z*sqrt(1/(d+.5)+1/(e+.5)+1/(f+.5)+1/(g+.5))} u - gartu(z, n11[i],n22[i],n12[i],n21[i]) l - gartl(z, n11[i],n22[i],n12[i],n21[i]) foo - function(theta, u, l) mean(theta = l theta = u, na.rm = TRUE) foo(theta, u, l) } -- View this message in context: http://r.789695.n4.nabble.com/Coverage-Probability-tp4485511p4485865.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] acs package: analyze data from the U.S. American Community Survey
We are pleased to announce version 0.8 of the acs package for R, now available on CRAN (http://cran.r-project.org/web/packages/acs/index.html. The package provides a general toolkit for managing, analyzing, and presenting data from the U.S. Census American Community Survey (ACS). Confidence intervals provided with the data are converted to standard errors and bundled with estimates in complex acs-class objects. The package provides new methods to conduct standard operations, plots, and tests on acs objects in statistically appropriate ways. In addition to improved documentation and bug-fixes, highlights include: * An improved read.acs function for importing data downloaded from the Census American FactFinder site. * rbind and cbind functions to help create larger acs objects from smaller ones. * A sum method to aggregate rows or columns of ACS data, dealing correctly with both estimates and standard errors. * A new apply method to allow users to apply virtually any function to each row or column of an acs data object. * A snazzy new plot method capable of plotting both density plots (for estimates of a single geography and variable) and multiple estimates with errors bars (for estimates of the same variable over multiple geographies, or vice versa). * New functions two deal with adjusting the nominal values of currency from different years for the purpose of comparing between one survey and another. * A new prompt method to serve as a helper function when changing geographic rownames or variable column names. For more info, examples, and demo plots, see the package documentation and/or http://eglenn.scripts.mit.edu/citystate/2012/03/acs-package-updated-version-0-8-now-on-cran/. -- Ezra Haber Glenn, AICP Lecturer in Community Development Department of Urban Studies and Planning Massachusetts Institute of Technology 77 Massachusetts Ave., Room 7-337 Cambridge, MA 02139 egl...@mit.edu http://dusp.mit.edu/faculty/eglenn | http://eglenn.scripts.mit.edu/citystate/ 617.253.2024 (w) 617.721.7131 (c) ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear regression
Note that your equations can be written: y = alpha*A + (1-alpha)*B, which is equivalent to y = (A-B) * alpha + B , i.e. of form y = C*alpha + B a simple linear equation in alpha You have two different values of alpha at which y was measured, so just stack up all your results into a single regression setup with these two different alphas. Except for your constraints. But if I understand you correctly, the problem is not that C and B must be between 0 and 1, it is that the response, y, must be (it is a frequency). If so, this suggests that you need to set this up as a glm, probably with a binomial link. Trivial to do, but I suspect you don't know about glm's, which is why I said that you may be out of your depth and seek local help. If I'm wrong, my apologies for misunderstanding. If I'm not, I'm sorry, but I don't wish to teach you about basic statistics on this list. Read up on generalized linear models, for which there are undoubtedly a host of good web tutorials available. Cheers, Bert On Mon, Mar 19, 2012 at 12:48 PM, Diviya Smith diviya.sm...@gmail.com wrote: Hello Bert, This is definitely not for a homework problem. I am trying to estimate frequencies of mutations in different groups. The mutation frequencies can be modeled as a linear relation in cases of mixtures. So I have a lot of populations that follow the relationship - y = alpha*A + beta*B and I want to estimate A and B; given y, alpha and beta. A and B are both vectors of the same size as y. Can you suggest where I can find some information about your suggestion #4...that is exactly what I was hoping to do. Thanks, Diviya On Mon, Mar 19, 2012 at 3:02 PM, Bert Gunter gunter.ber...@gene.com wrote: 1. Homework assignment? We don't do homework here. 2. If not, a mixture model of some sort? I suggest you state the context of the problem more fully. R has several packages to do mixture modeling, if that's what you're trying to do. 3. In any case, this cannot be done with lm() (at least without tricks). 4. In your notation below, the separate regressions can be stacked into a single constrained regression model. 5. You might do better to find local statistical help, as you may have bitten off more than you can chew. -- Bert On Mon, Mar 19, 2012 at 11:29 AM, Diviya Smith diviya.sm...@gmail.com wrote: Hello there, I am new to using regression in R. I wanted to solve a simple regression problem where I have 2 equations and 2 unknowns. So lets say - y1 = alpha1*A + beta1*B y2 = alpha2*A + beta2*B y1 - runif(10, 0,1) y2 - runif(10,0,1) alpha1 - 0.6 alpha2 - 0.75 beta1 - 1-alpha1 beta2 - 1-apha2 I now want this equation to estimate the values of A and B. Both A and B are constrained to be between (0,1). I would like to use lm with these constraints and I am having a little trouble in defining the equations correctly. Any help would be most appreciated. Thank you, Diviya [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coverage Probability
Hi hubinho, You need to initialize the for() loop and then store the u and l values properly: # parameters n1 - 10 n2 - 10 y - 100 alpha - 1 z-1.96 # creating B 2x2 tables B - 50 u - l - vector('numeric', B) for (i in 1:B){ theta - i x1 - exp(alpha +theta)/ (1+ exp(alpha +theta)) x2 - exp(alpha)/ (1+ exp(alpha)) n11 - rbinom(y, 10, x1) n12 - n1 - n11 n21 - rbinom(y, 10, x2) n22 - n2 - n21 # upper and lower limit gart interval gartu -function(z,d,e, f, g){log(((d+.5)*(g+.5))/((e+.5)*(f+.5)))+ z*sqrt(1/(d+.5)+1/(e+.5)+1/(f+.5)+1/(g+.5))} gartl -function(z,d,e, f, g){log(((d+.5)*(g+.5))/((e+.5)*(f+.5)))- z*sqrt(1/(d+.5)+1/(e+.5)+1/(f+.5)+1/(g+.5))} # store results u[i] - gartu(z, n11[i],n22[i],n12[i],n21[i]) l[i] - gartl(z, n11[i],n22[i],n12[i],n21[i]) } # coverage theta - 1:B foo - function(theta, u, l) mean(theta = l theta = u, na.rm = TRUE) foo(theta, u, l) # [1] 0.14 HTH, Jorge.- On Mon, Mar 19, 2012 at 2:25 PM, hubinho wrote: Thank you very much. This was, was i needed. Unfortunately I have one futher problem with this Code. I don't only need the coverage probability for one but for a range of different odds ratios. (for example [1;30]). I tried it with a loop but I get an error. I think again, that I'm almost there but having a little mistake. The complete code is: #setting values n1 - 10 n2 - 10 y - 100 alpha - 1 z-1.96 # creating 2x2 table for (i in 1:30) { theta - i x1 - exp(alpha +theta)/ (1+ exp(alpha +theta)) x2 - exp(alpha)/ (1+ exp(alpha)) n11 - rbinom(y, 10, x1) n12 - n1 - n11 n21 - rbinom(y, 10, x2) n22 - n2 - n21 # upper and lower limit gart interval gartu -function(z,d,e, f, g){log(((d+.5)*(g+.5))/((e+.5)*(f+.5)))+ z*sqrt(1/(d+.5)+1/(e+.5)+1/(f+.5)+1/(g+.5))} gartl -function(z,d,e, f, g){log(((d+.5)*(g+.5))/((e+.5)*(f+.5)))- z*sqrt(1/(d+.5)+1/(e+.5)+1/(f+.5)+1/(g+.5))} u - gartu(z, n11[i],n22[i],n12[i],n21[i]) l - gartl(z, n11[i],n22[i],n12[i],n21[i]) foo - function(theta, u, l) mean(theta = l theta = u, na.rm = TRUE) foo(theta, u, l) } -- View this message in context: http://r.789695.n4.nabble.com/Coverage-Probability-tp4485511p4485865.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lag based on Date objects with non-consecutive values
Hello all, I need to figure out a way to lag a variable in by a number of days without using the zoo package. I need to use a remote R connection that doesn't have the zoo package installed and is unwilling to do so. So that is, I want a function where I can specify the number of days to lag a variable against a Date formatted column. That is relatively easy to do. The problem arises when I don't have consecutive dates. I can't seem to figure out a way to insert an NA when there is non-consecutive date. So for example: ## A dataframe with non-consecutive dates set.seed(32) df1-data.frame( Date=seq(as.Date(1967-06-05,%Y-%m-%d),by=day, length=5), Dis1=rnorm(5, 1,10) ) df2-data.frame( Date=seq(as.Date(1967-07-05,%Y-%m-%d),by=day, length=10), Dis1=rnorm(5, 1,10) ) df - rbind(df1,df2); df ## A function to lag the variable by a specified number of days lag.day - function (lag.by, data) { c(rep(NA,lag.by), head(data$Dis1, -lag.by)) } ## Using the function df$lag1 - lag.day(lag.by=1, data=df); df ## returns this data frame Date Dis1 lag1 1 1967-06-05 1.146405NA 2 1967-06-06 9.732887 1.146405 3 1967-06-07 -9.279462 9.732887 4 1967-06-08 7.856646 -9.279462 5 1967-06-09 5.494370 7.856646 6 1967-06-15 5.070176 5.494370 7 1967-06-16 3.847314 5.070176 8 1967-06-17 -5.243094 3.847314 9 1967-06-18 9.396560 -5.243094 10 1967-06-19 4.112792 9.396560 ## When really what I would like is something like this: Date Dis1 lag1 1 1967-06-05 1.146405NA 2 1967-06-06 9.732887 1.146405 3 1967-06-07 -9.279462 9.732887 4 1967-06-08 7.856646 -9.279462 5 1967-06-09 5.494370 7.856646 6 1967-06-15 5.070176 NA 7 1967-06-16 3.847314 5.070176 8 1967-06-17 -5.243094 3.847314 9 1967-06-18 9.396560 -5.243094 10 1967-06-19 4.112792 9.396560 So can anyone recommend a way (either using my function or any other approaches) that I might be able to consistently lag values based on a lag.by value and consecutive dates? Thanks so much in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coverage Probability
Thank you very much again. But in this case I get the coverage probability as an average over all values for the odds ratio. I need a coverage probability for every value for the odds ratio. So the coverage probability for odds ratio = 1, than for odds ratio = 2 and so on. Sorry to bother you again but I have some problems with loops. -- View this message in context: http://r.789695.n4.nabble.com/Coverage-Probability-tp4485511p4486264.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unexpected input in function
Hi, Although the following statements work individually in R, they produce an error if placed inside a function as below: fsubt - function(a) { b - 1:length(a) b-a } The error message is: Error: unexpected input in: b - 1:length(a) b- Any insight would be greatly appreciated. Thanks, Jack [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected input in function
I think you'll need to provide a reproducible example, because your code works for me: fsubt - function(a) { + b - 1:length(a) + b-a + } fsubt(1:5) [1] 0 0 0 0 0 fsubt(sample(1:10)) [1] -8 -6 1 1 -1 5 3 1 4 0 fsubt(2) [1] -1 On Mon, Mar 19, 2012 at 4:01 PM, Schryver, Jack C. schryve...@ornl.gov wrote: Hi, Although the following statements work individually in R, they produce an error if placed inside a function as below: fsubt - function(a) { b - 1:length(a) b-a } The error message is: Error: unexpected input in: b - 1:length(a) b- Any insight would be greatly appreciated. Thanks, Jack -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hgu133plus2hsentrezgprobe library
Hi Eleni Question like this are better served on the bioconductor mailing list. Nonetheless try this ALL - topTable(fit2, coef=1, number=Inf) ALL$SYMBOL - unlist(mget(ALL$ID, hgu133plus2hsentrezgSYMBOL, ifnotfound=NA)) Here ALL is the output from limma for differential expression (ALL$ID is the probe on ENTREZ centric cdf from brainarray). Best Iain - Original Message - From: Eleni Christodoulou elenic...@gmail.com To: r-help@r-project.org Cc: Sent: Monday, 19 March 2012, 18:47 Subject: [R] hgu133plus2hsentrezgprobe library Hello R community, I am processing raw Affymetrix CEL files and I am using the Michigan custom CDF library hgu133plus2hsentrezgprobe. I have been looking for documentation on the function that it contains...I am specifically interested in converting probe names to gene symbols. Does anybody know where I can find it? Thank a lot! Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected input in function
The OP's error suggests (to me) that there's a line break error somewhere so it may be a funny quirk of encoding/OS incompatibility if it's from a source()'d script. Incidentally, the OP could also write the body of his function as a one liner with: seq_along(a) - a Michael On Mon, Mar 19, 2012 at 4:33 PM, Sarah Goslee sarah.gos...@gmail.com wrote: I think you'll need to provide a reproducible example, because your code works for me: fsubt - function(a) { + b - 1:length(a) + b-a + } fsubt(1:5) [1] 0 0 0 0 0 fsubt(sample(1:10)) [1] -8 -6 1 1 -1 5 3 1 4 0 fsubt(2) [1] -1 On Mon, Mar 19, 2012 at 4:01 PM, Schryver, Jack C. schryve...@ornl.gov wrote: Hi, Although the following statements work individually in R, they produce an error if placed inside a function as below: fsubt - function(a) { b - 1:length(a) b-a } The error message is: Error: unexpected input in: b - 1:length(a) b- Any insight would be greatly appreciated. Thanks, Jack -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coverage Probability
Hi hubinho, This starts to look as homework to me so this will be my last try in helping you. The general strategy would be along the lines of (1) write a function that does what you want for a value of theta and (2) sapply() that function to the vector of theta values you would like to evaluate: # function # -- B is the number of tables foo2 - function(theta, n1, n2, B = 1000, alpha = 1, z = 1.96){ # 2x2 tables x1 - exp(alpha +theta)/ (1+ exp(alpha +theta)) x2 - exp(alpha)/ (1+ exp(alpha)) n11 - rbinom(B, n1, x1) n12 - n1 - n11 n21 - rbinom(B, n2, x2) n22 - n2 - n21 # upper and lower limit gart interval gartu -function(z,d,e, f, g) log(((d+.5)*(g+.5))/((e+.5)*(f+.5)))+ z*sqrt(1/(d+.5)+1/(e+.5)+1/(f+.5)+1/(g+.5)) gartl -function(z,d,e, f, g) log(((d+.5)*(g+.5))/((e+.5)*(f+.5)))- z*sqrt(1/(d+.5)+1/(e+.5)+1/(f+.5)+1/(g+.5)) # calculations and results u - gartu(z, n11, n22, n12, n21) l - gartl(z, n11, n22, n12, n21) theta = l theta = u # TRUE if theta is in (l, u) } # example # -- B is the number of tables res - foo2(theta = 1, n1 = 10, n2 = 10, B = 1000) res # coverage mean(res) # different values of theta Theta - 1:30 colMeans(sapply(Theta, foo2, n1 = 10, n2 = 10, B = 1000)) HTH, Jorge.- On Mon, Mar 19, 2012 at 4:24 PM, hubinho wrote: Thank you very much again. But in this case I get the coverage probability as an average over all values for the odds ratio. I need a coverage probability for every value for the odds ratio. So the coverage probability for odds ratio = 1, than for odds ratio = 2 and so on. Sorry to bother you again but I have some problems with loops. -- View this message in context: http://r.789695.n4.nabble.com/Coverage-Probability-tp4485511p4486264.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected input in function
I think the most likely explanation is that something in the input string has had the effect of inserting an invisible character between the - and the a in b-a, and a possible suspect is pollution by UTF8: see the discussion at http://r.789695.n4.nabble.com/unexpected-input-in-rpart-td3168363.html Or a character copypasted from an editor that uses a non-ASCII encoding for its characters. See e.g.: http://support.rstudio.org/help/discussions/problems/ 386-error-unexpected-input-in and: http://www.mail-archive.com/r-help@r-project.org/msg71798.html On 19-Mar-2012 Sarah Goslee wrote: I think you'll need to provide a reproducible example, because your code works for me: fsubt - function(a) { + b - 1:length(a) + b-a + } fsubt(1:5) [1] 0 0 0 0 0 fsubt(sample(1:10)) [1] -8 -6 1 1 -1 5 3 1 4 0 fsubt(2) [1] -1 On Mon, Mar 19, 2012 at 4:01 PM, Schryver, Jack C. schryve...@ornl.gov wrote: Hi, Although the following statements work individually in R, they produce an error if placed inside a function as below: fsubt - function(a) { b - 1:length(a) b-a } The error message is: Error: unexpected input in: b - 1:length(a) b- Any insight would be greatly appreciated. Thanks, Jack -- Sarah Goslee http://www.functionaldiversity.org - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 19-Mar-2012 Time: 20:56:04 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.