Re: [R] Where is gdata?
Hi Liviu, Not if you library(gdata) first. Then ?read.xls should work. Yes, I did. I found something strange here which I can't explain. Win 7 64bit R 32/64 bit Just rebooted Win 7 and R library(gdata) gdata: Unable to locate valid perl interpreter gdata: gdata: read.xls() will be unable to read Excel XLS and XLSX files gdata: unless the 'perl=' argument is used to specify the location of a gdata: valid perl intrpreter. gdata: gdata: (To avoid display of this message in the future, please ensure gdata: perl is installed and available on the executable search path.) gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLX' (Excel 97-2004) files. gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLSX' (Excel 2007+) files. gdata: Run the function 'installXLSXsupport()' gdata: to automatically download and install the perl gdata: libaries needed to support Excel XLS and XLSX formats. Attaching package: 'gdata' The following object(s) are masked from 'package:utils': object.size It complains. ?read.xls starting httpd help server ... done Read Excel files Both 32 and 64 bit R worked. If there is NO complaint on running; library(gdata) Then ?read.xls can't work. Perl seems has been installed. But I can't recall, when and how; C:\dir C:\Users\satimiswin764\Documents\R\win-library\2.12\gdata\ . 11/22/2010 10:44 AMDIR perl 11/22/2010 10:44 AMDIR R 11/22/2010 10:44 AMDIR unitTests 11/22/2010 10:44 AMDIR xls dir C:\Users\satimiswin764\My Documents\R\win-library\2.12\gdata\perl 11/22/2010 10:44 AMDIR . 11/22/2010 10:44 AMDIR .. 11/22/2010 10:44 AMDIR Archive 11/22/2010 10:44 AM 418 install_modules.pl 11/22/2010 10:44 AMDIR IO 11/22/2010 10:44 AM 2,710 module_tools.pl 11/22/2010 10:44 AMDIR OLE 11/22/2010 10:44 AM 2,019 sheetCount.pl 11/22/2010 10:44 AM 2,019 sheetNames.pl 11/22/2010 10:44 AMDIR Spreadsheet 11/22/2010 10:44 AM 550 supportedFormats.pl 11/22/2010 10:44 AM 114 VERSIONS 11/22/2010 10:44 AM 5,512 xls2csv.pl 11/22/2010 10:44 AM 5,512 xls2tab.pl 11/22/2010 10:44 AM 5,512 xls2tsv.pl 9 File(s) 24,366 bytes 6 Dir(s) 16,776,032,256 bytes free B.R. Stephen L From: Liviu Andronic landronim...@gmail.com Cc: Gabor Grothendieck ggrothendi...@gmail.com; r-help r-help@r-project.org Sent: Mon, November 29, 2010 2:40:16 PM Subject: Re: [R] Where is gdata? ?read.xls I must run ??read.xls Not if you library(gdata) first. Then ?read.xls should work. Regards Liviu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help Please!!!!!!!!!
Hi, I have been working with Program R for my stats class and I keep coming upon the same error, I have read so many sites about inputting data from a text file into R and I'm using the data to do a correspondence analysis. I feel like I have read everything and it is still not explaining why the error message keeps coming up, I have used the exact examples I have seen in articles and the same error keeps popping up: Error in sum(N) : invalid 'type' (character) of argument I have spent so long trying to figure this out without success, I am sure it has to do with the fact that my rows have names in them. I have attached the text file I have been using and if you have any ideas as to how I can get R to plot the data using correspondence analysis with the column and row names that would be really helpful! Or if you could pass this email to someone who may know how to help me, that would be much appreciated. Thank you, Melissa Waldman my email: melissawald...@gmail.com NoneLight Medium Heavy SM 4 2 3 2 JM 4 3 7 4 SE 25 10 12 4 JE 18 24 33 13 S 10 6 7 2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Evaluation of survival analysis
Dear all, May I ask is there any functions in R to evaluate the fitness of coxph and survreg in survival analysis, please? For example, the results from Cox regression and Parametric survival analysis are shown below. Which method is prefered and how to see that / how to compare the methods? 1. coxph(formula = y ~ pspline(x1, df = 2)) coef se(coef) se2 Chisq DF p pspline(x1, df = 2), line 0.0522 0.00867 0.00866 36.23 1.00 1.8e-09 pspline(x1, df = 2), nonl3.27 1.04 7.5e-02 Iterations: 4 outer, 13 Newton-Raphson Theta= 0.91 Degrees of freedom for terms= 2 Likelihood ratio test=34.6 on 2.04 df, p=3.24e-08 2. survreg(formula = y ~ pspline(x1, df = 2)) coefse(coef)se2 ChisqDF p (Intercept)2.8199 0.15980 0.09933 311.37 1.0 0.0e+00 pspline(x1, df = 2), line -0.0193 0.00248 0.00248 60.35 1.0 8.0e-15 pspline(x1, df = 2), nonl 1.43 1.1 2.6e-01 Scale= 0.304 Iterations: 6 outer, 20 Newton-Raphson Theta= 0.991 Degrees of freedom for terms= 0.4 2.1 1.0 Likelihood ratio test=48.2 on 1.5 df, p=1.18e-11 I really appreciate for your help. Thank you very much in advance. Best wishes, He [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Where is gdata?
Hi, Stephen: The directory C:\Users\satimiswin764\My Documents\R\win-library\2.12\gdata\perl is NOT the perl interpreter but only perl code in the gdata package for R, invoked by certain R commands. You need to install something like Strawberry perl, as I've previously stated. Spencer On 11/29/2010 12:44 AM, Stephen Liu wrote: Hi Liviu, Not if you library(gdata) first. Then ?read.xls should work. Yes, I did. I found something strange here which I can't explain. Win 7 64bit R 32/64 bit Just rebooted Win 7 and R library(gdata) gdata: Unable to locate valid perl interpreter gdata: gdata: read.xls() will be unable to read Excel XLS and XLSX files gdata: unless the 'perl=' argument is used to specify the location of a gdata: valid perl intrpreter. gdata: gdata: (To avoid display of this message in the future, please ensure gdata: perl is installed and available on the executable search path.) gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLX' (Excel 97-2004) files. gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLSX' (Excel 2007+) files. gdata: Run the function 'installXLSXsupport()' gdata: to automatically download and install the perl gdata: libaries needed to support Excel XLS and XLSX formats. Attaching package: 'gdata' The following object(s) are masked from 'package:utils': object.size It complains. ?read.xls starting httpd help server ... done Read Excel files Both 32 and 64 bit R worked. If there is NO complaint on running; library(gdata) Then ?read.xls can't work. Perl seems has been installed. But I can't recall, when and how; C:\dir C:\Users\satimiswin764\Documents\R\win-library\2.12\gdata\ . 11/22/2010 10:44 AMDIR perl 11/22/2010 10:44 AMDIR R 11/22/2010 10:44 AMDIR unitTests 11/22/2010 10:44 AMDIR xls dir C:\Users\satimiswin764\My Documents\R\win-library\2.12\gdata\perl 11/22/2010 10:44 AMDIR . 11/22/2010 10:44 AMDIR .. 11/22/2010 10:44 AMDIR Archive 11/22/2010 10:44 AM 418 install_modules.pl 11/22/2010 10:44 AMDIR IO 11/22/2010 10:44 AM 2,710 module_tools.pl 11/22/2010 10:44 AMDIR OLE 11/22/2010 10:44 AM 2,019 sheetCount.pl 11/22/2010 10:44 AM 2,019 sheetNames.pl 11/22/2010 10:44 AMDIR Spreadsheet 11/22/2010 10:44 AM 550 supportedFormats.pl 11/22/2010 10:44 AM 114 VERSIONS 11/22/2010 10:44 AM 5,512 xls2csv.pl 11/22/2010 10:44 AM 5,512 xls2tab.pl 11/22/2010 10:44 AM 5,512 xls2tsv.pl 9 File(s) 24,366 bytes 6 Dir(s) 16,776,032,256 bytes free B.R. Stephen L From: Liviu Androniclandronim...@gmail.com Cc: Gabor Grothendieckggrothendi...@gmail.com; r-helpr-help@r-project.org Sent: Mon, November 29, 2010 2:40:16 PM Subject: Re: [R] Where is gdata? ?read.xls I must run ??read.xls Not if you library(gdata) first. Then ?read.xls should work. Regards Liviu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help Please!!!!!!!!!
On Sun, 28 Nov 2010 21:29:08 -0800 Melissa Waldman melissawald...@gmail.com wrote: Hi, I have been working with Program R for my stats class and I keep coming upon the same error, I have read so many sites about inputting data from a text file into R and I'm using the data to do a correspondence analysis. I feel like I have read everything and it is still not explaining why the error message keeps coming up, I have used the exact examples I have seen in articles and the same error keeps popping up: Error in sum(N) : invalid 'type' (character) of argument I have spent so long trying to figure this out without success, I am sure it has to do with the fact that my rows have names in them. I have attached the text file I have been using and if you have any ideas as to how I can get R to plot the data using correspondence analysis with the column and row names that would be really helpful! Or if you could pass this email to someone who may know how to help me, that would be much appreciated. Thank you, Melissa Waldman my email: melissawald...@gmail.com Hello Melissa, First of all, you need a descriptive subject, such as, Cannot read tabular data in R. R-help is a high-volume (100 to 200 messages per day) and each person that can help you is a specialist in one or another area. Secondly, please include in your mail an excerpt of the relevant code you used that read the data in and produced the error. From looking at your text file, I would delete the white space before None, save the file, and use the following function to read your data into a data.frame: read.delim(smokedata.txt) This assumes you used a tab character between each field. HTH, Edwin -- Dr. Edwin Groot, postdoctoral associate AG Laux Institut fuer Biologie III Schaenzlestr. 1 79104 Freiburg, Deutschland +49 761-2032945 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cross tabulate variables by subject id
Dear list, I have data like this: dat1 - data.frame(subject=rep(1:10,2), cond1=rep(c(A,B),each=5), cond2=rep(c(C,D),each=10), choice=sample(0:1,10,replace=TRUE)) I would like to compare subjects' choice for (cond1==A cond2==C) vs (cond1==A cond2==D), using mcnemar.test The ?mcnemar.test example has the data in a matrix: Performance 2nd Survey 1st Survey Approve Disapprove Approve794150 Disapprove 86570 So for my case, I need something like: Choice AC AD 0 1 0 ... 1 Where ... would be the sum of subjects who answered 0 or 1 to AC and/or AD respectively. I can get the first step by making an extra variable: dat1$condnew - paste(dat1$cond1,dat1$cond2,sep=) although I am sure there are more elegant ways, and especially, I am stumped how to fill in the cells of the table. Thanks, Marianne -- Marianne Promberger PhD, King's College London http://promberger.info R version 2.12.0 (2010-10-15) Ubuntu 9.04 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help Please!!!!!!!!!
On 29/11/10 05:29, Melissa Waldman wrote: Hi, I have been working with Program R for my stats class and I keep coming upon the same error, I have read so many sites about inputting data from a text file into R and I'm using the data to do a correspondence analysis. I feel like I have read everything and it is still not explaining why the error message keeps coming up, I have used the exact examples I have seen in articles and the same error keeps popping up: Error in sum(N) : invalid 'type' (character) of argument I have spent so long trying to figure this out without success, I am sure it has to do with the fact that my rows have names in them. I have attached the text file I have been using and if you have any ideas as to how I can get R to plot the data using correspondence analysis with the column and row names that would be really helpful! Or if you could pass this email to someone who may know how to help me, that would be much appreciated. Thank you, Melissa Waldman Hi Melissa, Welcome to the world of R. You didn't tell us which commands you were running that gave an error, but the error 'invalid 'type'' suggests to me you were trying to sum a variable that R thought was a character, and not a number. I would recomend you (re) read the introduction to R (http://cran.r-project.org/doc/manuals/R-intro.pdf), especially chapter 2, which deals with this. As a quick example, if you've read your file into a dataframe called foo, with columns none, light etc then doing class(foo$none) will tell you what R thinks this field is. If it is character then you can do foo$none - as.numeric(foo$none) to tell R to treat it as numbers. Regards, Paul. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bayes factor for a Welch or Yuen t-test
Although I have located a number of solutions for the Student t-test (equal variances), I have been unable to find any code for calculating a Bayes Factor for a Welch (unequal variances) or Yuen (trimmed mean) t-test. I wonder if anyone could help me with this? Many thanks, Andrew Wilson __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC read all columns as character
I'm using sqlQuery() to import excel-data (.xlsx). Almost everthing works perfect. But a column which contains type numeric as well as character is read as numeric. characters are unfortunately transformed to NA. How can I read all columns as characters? I've allready tried 'as.is=TRUE'. Thanks, juerg This e-mail (including any attachments) is confidential,...{{dropped:13}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Array help
Instead of: 7:1123:34 I think you mean: c(7:11, 23:34) Using '' for concatenation is not an unreasonable idea, but it is decidedly not what R does. It would be instructive to do: 7:11 23:34 at the R prompt to see what you get. On 29/11/2010 03:56, bfhancock wrote: Josh, the data set is called StatTemps and is in the PASWR package. I want to make an array that involves only the 8 a.m. and a separate array that involves only the 9 a.m. so i can get info on the temperatures in those groups. So I still want it in the format of StatTemps but in two arrays that are based on 8 a.m. or 9 a.m.I have been messing with this for a while. And no it's not homework, I am just trying to learn R so I am more appealing out in the field eventually. They book I am using is confusing! Hopefully what I am trying to do isn't confusing. I want to do an array that holds the info from 1:6 12:22 for 8am and then 7:11 23:34 for 9am. I was able easily make two arrays based on sex by just putting in StatTemps[1:11,,] StatTemps[12:34,,] and assumed I could just go StatTemps[1:612:22,,] and StatTemps[7:1123:34,,] but that didn't work. Any ideas? Thanks so much! -B -- Patrick Burns pbu...@pburns.seanet.com http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] HELPPPPPP
please i've a big problem. i've to do a econometric-quantitative methods assignment about the canadian lynx, the problem is that i really i don't know how to use r and how to apply all the steps. I begun the time plot, ACF and PACF but i'm not able to decide what is the correct model of ARIMA, Holt-winter, ecc to forecast the next 20 years of canadian lynx's cyle... if someone can help me i really really appreciate it. thanks... -- View this message in context: http://r.789695.n4.nabble.com/HELPP-tp3063358p3063358.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] weighted Spearman correlation coefficient
Hello, I would be grateful if anybody can help me in finding an R function to compute weighted Spearman correlation coefficient? Kind regards, Daniel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plot data inside matrix
Hi, I have this problem: I have this matrix: result property procProperty 2010-10-01 07:32:00 40 Asensor1 2010-10-01 17:32:00 15 Asensor3 2010-10-02 07:32:00 32 Asensor2 2010-10-03 04:33:21 20 Bsensor1 2010-10-03 04:33:21 33 Bsensor2 2010-10-03 14:33:21 12 Asensor3 2010-10-05 07:32:00 31 Bsensor1 2010-10-05 07:32:00 15 Bsensor2 2010-10-06 17:32:00 4Asensor3 I would like to plot this matrix in this way: create in this case 2 plots (one for each property: A and B ) for each plot there will be 3 lines (one for each procProperty: sensor1,sensor2,sensor3) composed by the result. How can I do this with few commands?? Thanks Alberto -- View this message in context: http://r.789695.n4.nabble.com/Plot-data-inside-matrix-tp3063417p3063417.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] in regards of plotting using functions.
Hello, I am using basic plotting technique to get a graph. but i want to color the points plotted onto the graph depending upon few mathematical logics. values x should be colored blue. values y should be colored green. how can i go forward with the programming part in drawing these plots from a single file. Please do let me know as soon as possible Regards, -- Pravin Nilawe Bioinformatics, +91 9869739671 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Where is gdata?
Hi Spencer, Download and install Strawberry perl from http://strawberryperl.com; Installation went through without problem. Start R library(gdata) gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED. Attaching package: 'gdata' The following object(s) are masked from 'package:utils': object.size ?read.xls starts Read Excel files. Thanks B.R. Stephen L From: Spencer Graves spencer.gra...@structuremonitoring.com Cc: Liviu Andronic landronim...@gmail.com; r-help r-help@r-project.org Sent: Mon, November 29, 2010 4:57:53 PM Subject: Re: [R] Where is gdata? Hi, Stephen: The directory C:\Users\satimiswin764\My Documents\R\win-library\2.12\gdata\perl is NOT the perl interpreter but only perl code in the gdata package for R, invoked by certain R commands. You need to install something like Strawberry perl, as I've previously stated. Spencer On 11/29/2010 12:44 AM, Stephen Liu wrote: Hi Liviu, Not if you library(gdata) first. Then ?read.xls should work. Yes, I did. I found something strange here which I can't explain. Win 7 64bit R 32/64 bit Just rebooted Win 7 and R library(gdata) gdata: Unable to locate valid perl interpreter gdata: gdata: read.xls() will be unable to read Excel XLS and XLSX files gdata: unless the 'perl=' argument is used to specify the location of a gdata: valid perl intrpreter. gdata: gdata: (To avoid display of this message in the future, please ensure gdata: perl is installed and available on the executable search path.) gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLX' (Excel 97-2004) files. gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLSX' (Excel 2007+) files. gdata: Run the function 'installXLSXsupport()' gdata: to automatically download and install the perl gdata: libaries needed to support Excel XLS and XLSX formats. Attaching package: 'gdata' The following object(s) are masked from 'package:utils': object.size It complains. ?read.xls starting httpd help server ... done Read Excel files Both 32 and 64 bit R worked. If there is NO complaint on running; library(gdata) Then ?read.xls can't work. Perl seems has been installed. But I can't recall, when and how; C:\dir C:\Users\satimiswin764\Documents\R\win-library\2.12\gdata\ . 11/22/2010 10:44 AMDIR perl 11/22/2010 10:44 AMDIR R 11/22/2010 10:44 AMDIR unitTests 11/22/2010 10:44 AMDIR xls dir C:\Users\satimiswin764\My Documents\R\win-library\2.12\gdata\perl 11/22/2010 10:44 AMDIR . 11/22/2010 10:44 AMDIR .. 11/22/2010 10:44 AMDIR Archive 11/22/2010 10:44 AM 418 install_modules.pl 11/22/2010 10:44 AMDIR IO 11/22/2010 10:44 AM 2,710 module_tools.pl 11/22/2010 10:44 AMDIR OLE 11/22/2010 10:44 AM 2,019 sheetCount.pl 11/22/2010 10:44 AM 2,019 sheetNames.pl 11/22/2010 10:44 AMDIR Spreadsheet 11/22/2010 10:44 AM 550 supportedFormats.pl 11/22/2010 10:44 AM 114 VERSIONS 11/22/2010 10:44 AM 5,512 xls2csv.pl 11/22/2010 10:44 AM 5,512 xls2tab.pl 11/22/2010 10:44 AM 5,512 xls2tsv.pl 9 File(s) 24,366 bytes 6 Dir(s) 16,776,032,256 bytes free B.R. Stephen L From: Liviu Androniclandronim...@gmail.com Cc: Gabor Grothendieckggrothendi...@gmail.com; r-helpr-help@r-project.org Sent: Mon, November 29, 2010 2:40:16 PM Subject: Re: [R] Where is gdata? ?read.xls I must run ??read.xls Not if you library(gdata) first. Then ?read.xls should work. Regards Liviu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] surpressing tickmarks / labels x-as for two sets of boxplot (plotted as stacked boxplots)
Hello, I am trying to plot two sets of boxplots together. These are estimates of two experiments and seven factors. The results of the two experiments I want to plot as boxplots stacked to each other. Therefore I plot first the results of the first experiment; and next with the add option the second set of boxplots. The boxplots are plotted at 'at = 1:7 - 0.15 for the first experiment and at=1:7 + 0.15 for the second. I surpress plotting the tickmarks and labels succesfully for the first boxplot with xaxt=n. But for the second this does not work! I want to plot the tickmarks and labels at position at=1:7, as below using the axis function. But with this code also tickmarks and labels are plotted at position at=1:7+0.15. boxplot(coefs ~ factor, data = temp, boxwex = 0.25, at = 1:7 - 0.15, subset = experiment == first, col = red, xlab = factor,xaxt=n, ylab = individual estimates) boxplot(coefs ~ factor, data = temp, naxt=n,add = TRUE, boxwex = 0.25, at = 1:7 + 0.15, subset = experiment == second, col = green) axis(at=1:7,side=1,c(fac1,fac2,fac3,fac4,fac5,fac6,fac7)) legend(6,-0.5, c(experiment1, experiment2), fill = c(red, green)) Does anyone know how I can surpress these labels for the second boxplot? Thanks in advance, Karin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cross tabulate variables by subject id
Hi Marianne, How about this... ac.ad - unstack(dat1, choice ~ cond1:cond2)[, c(A.C, A.D)] acad.xtab - with(ac.ad, table(A.C, A.D)) Michael On 29 November 2010 20:18, Marianne Promberger marianne.promber...@kcl.ac.uk wrote: Dear list, I have data like this: dat1 - data.frame(subject=rep(1:10,2), cond1=rep(c(A,B),each=5), cond2=rep(c(C,D),each=10), choice=sample(0:1,10,replace=TRUE)) I would like to compare subjects' choice for (cond1==A cond2==C) vs (cond1==A cond2==D), using mcnemar.test The ?mcnemar.test example has the data in a matrix: Performance 2nd Survey 1st Survey Approve Disapprove Approve 794 150 Disapprove 86 570 So for my case, I need something like: Choice AC AD 0 1 0 ... 1 Where ... would be the sum of subjects who answered 0 or 1 to AC and/or AD respectively. I can get the first step by making an extra variable: dat1$condnew - paste(dat1$cond1,dat1$cond2,sep=) although I am sure there are more elegant ways, and especially, I am stumped how to fill in the cells of the table. Thanks, Marianne -- Marianne Promberger PhD, King's College London http://promberger.info R version 2.12.0 (2010-10-15) Ubuntu 9.04 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: HELPPPPPP
Hi What does your teacher says about the procedures you shall use? You shall go through help pages for ?spectrum, ?acf, ?ar and maybe some others. Regards Petr r-help-boun...@r-project.org napsal dne 29.11.2010 11:33:58: please i've a big problem. i've to do a econometric-quantitative methods assignment about the canadian lynx, the problem is that i really i don't know how to use r and how to apply all the steps. I begun the time plot, ACF and PACF but i'm not able to decide what is the correct model of ARIMA, Holt-winter, ecc to forecast the next 20 years of canadian lynx's cyle... if someone can help me i really really appreciate it. thanks... -- View this message in context: http://r.789695.n4.nabble.com/HELPP- tp3063358p3063358.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Performance tuning tips when working with wide datasets
Richard Vlasimsky schrieb: Does anyone have any performance tuning tips when working with datasets that are extremely wide (e.g. 20,000 columns)? In particular, I am trying to perform a merge like below: merged_data - merge(data1, data2, by.x=ate,by.y=date,all=TRUE,sort=TRUE); This statement takes about 8 hours to execute on a pretty fast machine. The dataset data1 contains daily data going back to 1950 (20,000 rows) and has 25 columns. The dataset data2 contains annual data (only 60 observations), however there are lots of columns (20,000 of them). I have to do a lot of these kinds of merges so need to figure out a way to speed it up. I have tried a number of different things to speed things up to no avail. I've noticed that rbinds execute much faster using matrices than dataframes. However the performance improvement when using matrices (vs. data frames) on merges were negligible (8 hours down to 7). I tried casting my merge field (date) into various different data types (character, factor, date). This didn't seem to have any effect. I tried the hash package, however, merge couldn't coerce the class into a data.frame. I've tried various ways to parellelize computation in the past, and found that to be problematic for a variety of reasons (runaway forked processes, doesn't run in a GUI environment, doesn't run on Macs, etc.). I'm starting to run out of ideas, anyone? Merging a 60 row dataset shouldn't take that long. Thanks, Richard Hi Richard, I had similar problems (even with much less data) and found out that most of the running time was caused by memory swapping instead of CPU usage. If you do not need all of the merged data at once, block-wise processing can help, which means that you only generate that much merged data at once as fits into main memory. I ended up using package RSQLite (an embedded database) in the following way: -create a database connection (explained in the package docs) -copy data to database tables via dbWriteTable() -create indices on the columns which are used for merging, sth. like: dbGetQuery(con, 'create index index_year on table2(year)') - this speeds upjoining significantly -construct an SQL query to do the join / merge operation and send it to SQLite via dbSendQuery() -retreive the result in blocks of reasonable size with fetch() Unless there is an operation in the query which requires SQLite to process the whole result (e.g. sorting), the result rows will be created on the fly for every call of fetch() instead of a huge table being allocated in addition to the original data. I am not sure if this works with other database engines (there are a couple of database interfaces on CRAN); when I tried to use RPostgreSQL, it created the whole result set at once, leading to the same memory problem. Maybe that behavior can be changed by some config variable. Best regards, Andreas -- Andreas Borg Medizinische Informatik UNIVERSITÄTSMEDIZIN der Johannes Gutenberg-Universität Institut für Medizinische Biometrie, Epidemiologie und Informatik Obere Zahlbacher Straße 69, 55131 Mainz www.imbei.uni-mainz.de Telefon +49 (0) 6131 175062 E-Mail: b...@imbei.uni-mainz.de Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: in regards of plotting using functions.
Hi r-help-boun...@r-project.org napsal dne 29.11.2010 11:48:07: Hello, I am using basic plotting technique to get a graph. but i want to color the points plotted onto the graph depending upon few mathematical logics. values x should be colored blue. values y should be colored green. Untested plot(a, b, pch=19, col=c(green, blue)[ifelse(valuesx, 2, ifelse(valuesy, 1, NA))]) Regards Petr how can i go forward with the programming part in drawing these plots from a single file. Please do let me know as soon as possible Regards, -- Pravin Nilawe Bioinformatics, +91 9869739671 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] surpressing tickmarks / labels x-as for two sets of boxplot (plotted as stacked boxplots)
On 2010-11-29 03:17, Karin wrote: Hello, I am trying to plot two sets of boxplots together. These are estimates of two experiments and seven factors. The results of the two experiments I want to plot as boxplots stacked to each other. Therefore I plot first the results of the first experiment; and next with the add option the second set of boxplots. The boxplots are plotted at 'at = 1:7 - 0.15 for the first experiment and at=1:7 + 0.15 for the second. I surpress plotting the tickmarks and labels succesfully for the first boxplot with xaxt=n. But for the second this does not work! I want to plot the tickmarks and labels at position at=1:7, as below using the axis function. But with this code also tickmarks and labels are plotted at position at=1:7+0.15. boxplot(coefs ~ factor, data = temp, boxwex = 0.25, at = 1:7 - 0.15, subset = experiment == first, col = red, xlab = factor,xaxt=n, ylab = individual estimates) boxplot(coefs ~ factor, data = temp, naxt=n,add = TRUE, boxwex = 0.25, at = 1:7 + 0.15, subset = experiment == second, col = green) axis(at=1:7,side=1,c(fac1,fac2,fac3,fac4,fac5,fac6,fac7)) legend(6,-0.5, c(experiment1, experiment2), fill = c(red, green)) Does anyone know how I can surpress these labels for the second boxplot? Perhaps all you need is a bit more care in typing: naxt??? Peter Ehlers Thanks in advance, Karin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Moran I for very large data set
Hi Are there any more efficient ways of calculating the neighbourhood object for large datasets? I am trying to compute Moran I statistics for a very large data set (over 14,000 points). I have been using moran.test from the spdep package and everything works fine for a small data set (200 points). However, applying the same script to the whole dataset is taking days to compute (it so far has been going for 5 days and still no results). This is no surprise due to the number of computations required. I have found that calculating distances planar distances works much quicker but Great Circle distances are required. Thanks Gary Watmough [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] in regards of plotting using functions.
Thanks for your help guidance. I am taking these values from a file as co-ordinates. I tried using the code for the same but it gave an error. saying X object error. plot((x=[,1],y=[,2]), so how should i embedded the expression within the plot? regards, Pravin On Mon, Nov 29, 2010 at 5:20 PM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 29.11.2010 11:48:07: Hello, I am using basic plotting technique to get a graph. but i want to color the points plotted onto the graph depending upon few mathematical logics. values x should be colored blue. values y should be colored green. Untested plot(a, b, pch=19, col=c(green, blue)[ifelse(valuesx, 2, ifelse(valuesy, 1, NA))]) Regards Petr how can i go forward with the programming part in drawing these plots from a single file. Please do let me know as soon as possible Regards, -- Pravin Nilawe Bioinformatics, +91 9869739671 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Pravin Nilawe Bioinformatics, +91 9869739671 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Troubles in plotting to a postscript file (not to png)
Dear R users, I am trying to produce some plots in a postscript file, but I am experiencing some issues. I open the device with - setPS() postscript (file='gs_mcmc_dust.ps',width=5*3,height=5*3,horizontal = FALSE, paper = special,family = ComputerModern,encoding=TeXtext.enc)#, onefile = FALSE - (it's a 9x9 multiplot) and I close it with - dev.off() - Here are my problems: -) xlab=expression(paste(lambda,(,~mu,m,))) : it correctly prints the mu greek letter but it fails for the lambda letter, leaving a blank space -) text(min(mchain[2,]),max(tdensity$y),substitute( T[disk,med] == tmed %+-% tstd (K),list(tmed=tmed,tstd=tstd)),pos=4,cex=1.5 ) : it prints everything, but the +- symbol and (K) overlap with the substitute for tmed and tstd respectively. How can I force a blank space between numbers and symbols? Also, how can I set the number of decimal digits for tmed and tstd? ( option(digits=4) does not work ) -) once I close the R session without saving it (I answer n when quitting), the content of the ps file is erased. Do you know why? I solve these problems plotting to a PNG device, but a postscript file is what I need. Can you help me, please? Thank you very much in advance Cheers Gaetano __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Where is gdata?
On Mon, Nov 29, 2010 at 3:44 AM, Stephen Liu sati...@yahoo.com wrote: Hi Liviu, Not if you library(gdata) first. Then ?read.xls should work. Yes, I did. I found something strange here which I can't explain. Win 7 64bit R 32/64 bit Just rebooted Win 7 and R library(gdata) gdata: Unable to locate valid perl interpreter gdata: gdata: read.xls() will be unable to read Excel XLS and XLSX files gdata: unless the 'perl=' argument is used to specify the location of a gdata: valid perl intrpreter. gdata: gdata: (To avoid display of this message in the future, please ensure gdata: perl is installed and available on the executable search path.) gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLX' (Excel 97-2004) files. gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLSX' (Excel 2007+) files. gdata: Run the function 'installXLSXsupport()' gdata: to automatically download and install the perl gdata: libaries needed to support Excel XLS and XLSX formats. Attaching package: 'gdata' The following object(s) are masked from 'package:utils': object.size This is just a message that it can't find perl. If you don't need to use read.xls then you don't need perl so you can ignore the message. If you do need to use read.xls then install perl and once you have done that then run installXLSXsupport(). It complains. ?read.xls starting httpd help server ... done Read Excel files Both 32 and 64 bit R worked. If there is NO complaint on running; library(gdata) Then ?read.xls can't work. Can you clarify when ?read.xls works for you and when it does not? Perl seems has been installed. But I can't recall, when and how; C:\dir C:\Users\satimiswin764\Documents\R\win-library\2.12\gdata\ . 11/22/2010 10:44 AM DIR perl 11/22/2010 10:44 AM DIR R 11/22/2010 10:44 AM DIR unitTests 11/22/2010 10:44 AM DIR xls The gdata\perl folder contains perl libraries that come with gdata. Perl itself is not distributed with gdata and you don't need perl at all to use gdata except for read.xls and related functions. My understanding is that this question has nothing to do with perl nor with read.xls and that the problem is that you seem to be able to run this: library(gdata) ?read.xls and sometimes it works and at other times it does not work. Is that right? Does it occur with any other package? How about removing gdata and reinstalling it? remove.packages(gdata) ... exit R and check if gdata has been removed ... ... restart R ... install.packages(gdata) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot data inside matrix
On Nov 29, 2010, at 6:22 AM, alcesgabbo wrote: Hi, I have this problem: I have this matrix: Doubtful that is is a matrix. In R matrices are all of the same type of object. This looks more like a zoo object since it has a time index. How was it created and what does str() show? result property procProperty 2010-10-01 07:32:00 40 Asensor1 2010-10-01 17:32:00 15 Asensor3 2010-10-02 07:32:00 32 Asensor2 2010-10-03 04:33:21 20 Bsensor1 2010-10-03 04:33:21 33 Bsensor2 2010-10-03 14:33:21 12 Asensor3 2010-10-05 07:32:00 31 Bsensor1 2010-10-05 07:32:00 15 Bsensor2 2010-10-06 17:32:00 4Asensor3 I would like to plot this matrix in this way: create in this case 2 plots (one for each property: A and B ) for each plot there will be 3 lines (one for each procProperty: sensor1,sensor2,sensor3) composed by the result. So, what do you want: a dotplot, a barchart , a time-series or what? How can I do this with few commands?? Possibly depending on the correctness of my assumptions: xyplot.zoo -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Array help
if you can load the PASWR package and pull up StatTemps you will see what I am talking about. Otherwise I fear that my question will just be confusing. -- View this message in context: http://r.789695.n4.nabble.com/Array-help-tp3062992p3063535.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help Please!!!!!!!!!
Your data seems to read in just fine, so what is the problem you are trying to solve? x - read.table('clipboard', sep='\t', header=TRUE) str(x) 'data.frame': 5 obs. of 5 variables: $ X : Factor w/ 5 levels JE,JM,S,..: 5 2 4 1 3 $ None : int 4 4 25 18 10 $ Light : int 2 3 10 24 6 $ Medium: int 3 7 12 33 7 $ Heavy : int 2 4 4 13 2 summary(x) X None LightMedium Heavy JE:1 Min. : 4.0 Min. : 2 Min. : 3.0 Min. : 2 JM:1 1st Qu.: 4.0 1st Qu.: 3 1st Qu.: 7.0 1st Qu.: 2 S :1 Median :10.0 Median : 6 Median : 7.0 Median : 4 SE:1 Mean :12.2 Mean : 9 Mean :12.4 Mean : 5 SM:1 3rd Qu.:18.0 3rd Qu.:10 3rd Qu.:12.0 3rd Qu.: 4 Max. :25.0 Max. :24 Max. :33.0 Max. :13 On Mon, Nov 29, 2010 at 12:29 AM, Melissa Waldman melissawald...@gmail.com wrote: Hi, I have been working with Program R for my stats class and I keep coming upon the same error, I have read so many sites about inputting data from a text file into R and I'm using the data to do a correspondence analysis. I feel like I have read everything and it is still not explaining why the error message keeps coming up, I have used the exact examples I have seen in articles and the same error keeps popping up: Error in sum(N) : invalid 'type' (character) of argument I have spent so long trying to figure this out without success, I am sure it has to do with the fact that my rows have names in them. I have attached the text file I have been using and if you have any ideas as to how I can get R to plot the data using correspondence analysis with the column and row names that would be really helpful! Or if you could pass this email to someone who may know how to help me, that would be much appreciated. Thank you, Melissa Waldman my email: melissawald...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] in regards of plotting using functions.
Hi PRAVIN 2pravinnil...@gmail.com napsal dne 29.11.2010 13:18:45: Thanks for your help guidance. I am taking these values from a file as co-ordinates. I tried using the code for the same but it gave an error. saying X object error. plot((x=[,1],y=[,2]), What is [,1]? You shall have some data from which you wanted to extract first and second column. What about to look into R-intro manual to chapter 2. Regards Petr so how should i embedded the expression within the plot? regards, Pravin On Mon, Nov 29, 2010 at 5:20 PM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 29.11.2010 11:48:07: Hello, I am using basic plotting technique to get a graph. but i want to color the points plotted onto the graph depending upon few mathematical logics. values x should be colored blue. values y should be colored green. Untested plot(a, b, pch=19, col=c(green, blue)[ifelse(valuesx, 2, ifelse(valuesy, 1, NA))]) Regards Petr how can i go forward with the programming part in drawing these plots from a single file. Please do let me know as soon as possible Regards, -- Pravin Nilawe Bioinformatics, +91 9869739671 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Pravin Nilawe Bioinformatics, +91 9869739671 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] attached file
I forgot to attach it... The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. Lorenzo Melchor, PhD Mammary Stem Cell Team The Breakthrough Breast Cancer Research Centre (ICR) 237 Fulham Road SW3 6JB lorenzo.melc...@icr.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems in running affylmGUI
Hi, I am trying to run affylmGUI on my mac computer. I have already installed the Tlc package as well as Bwidgets through ActiveTcl conversion installing files. However, when running affylmGUI() on R, I keep getting the message in the attached file. I have copied the tcl folders from the root library to the user library, and have obtained the same issue. I would really appreciate if you could help me setting this up. Could you please send me an easy guideline to get affylmGUI working? Otherwise, we could use TeamViewer if that would be easier. Kind regards, Lorenzo Lorenzo Melchor, PhD Mammary Stem Cell Team The Breakthrough Breast Cancer Research Centre (ICR) 237 Fulham Road SW3 6JB lorenzo.melc...@icr.ac.uk The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] List of influential points?
Hello all, I fit a linear model to some data and used plot() to create diagnostic plots for the fit; I am having trouble reading the points that R is flagging as influential. Is there a way to get the list of influential points from the fit or its summary, etc.? Most likely, there are a few points appearing in almost the same place, making it difficult to read from the plots. Bill __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] selecting only corresponding categories from a confusion matrix
Dear R colleagues, as a result of my calculations regarding the inter-observer-variability in bronchoscopy, I get a confusion matrix like the following: 0 1 1001 1010 11 0609 11 54 36 6 1 1 260 2 1014 008 4 1004 000 0 1000 23 7 12 10 5 1001 0 040 0 1010 4 003 0 1011 1 010 2 11 0 033 1 1101 000 0 1100 2 000 0 1110 1 000 0 The first column represents the categories found among observers, the top row represents the categories found by the reference (goldstandard). I am looking for a way (general algorithm) to extract a data.frame with only the corresponding categories among observers and reference from the above confusion matrix. Corresponding means in this case, that a category has been chosen by both: observers and reference. In this example corresponding categories would be simply all categories that have been chosen by the reference (0,1,1001,1010,11), but generally there might also occur categories which are found by the reference only (and not among observers - in the first column). So the solution-dataframe for the above example would look like: 0 1 1001 1010 11 0609 11 54 36 6 1 1 260 2 1001 0 040 0 1010 4 003 0 11 0 033 1 All the categories found among observers only, were omitted. If the solution algorithm would include a method to list the omitted categories and to count their number as well as the number of omitted cases, it would be just perfect for me. I'd be happy to read from you soon! Thanks in advance for any kind of help with this. Greetings from snowy Munich, Felix __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aftreg vs survreg loglogistic aft model (different intercept term)
Survreg maximizes the log-likelihood to a relative tolerance of 1e-9 (?survreg.control). The printout shows -379503.5, to see the rest of the digits you need something like: fit - survreg( print(fit$loglik, digits=9) Aftreg printed even less digits; you would have to do the same with it to see which routine got closer to maximizing the actual log-likelihood. That is of survreg showed -37903.5392 and aftreg -37903.6123 then survreg wins. Likley all this means is that the default iteration tolerance is smaller for one routine than for the other. When you consider that significant changes in a log-likihood are on the order of 3.94/2 =2 units, I do not get very excited by a .08 difference in convergence. Terry Therneau -- begin included message -- I add an example , all the variables are mutually excluding dummy variables, notice the different intercept: 5.627 vs 5.545: survreg: Value Std. Error zp (Intercept) 5.6270.00887 634.3 0.00e+00 Var1.recR2 -0.1080.01026 -10.5 1.00e-25 Var1.recR3 -0.4900.01099 -44.5 0.00e+00 Var1.recR4 -0.5420.01303 -41.6 0.00e+00 Var1.recR5 -0.8910.01095 -81.3 0.00e+00 Log(scale) -0.3240.00350 -92.7 0.00e+00 Scale= 0.723 Log logistic distribution Loglik(model)= -379503.5 Loglik(intercept only)= -383388.9 Chisq= 7770.76 on 4 degrees of freedom, p= 0 aftreg: Covariate W.mean Coef Exp(Coef) se(Coef)Wald p Var1.recR 10.253 0 1 (reference) 20.330 0.108 1.114 0.010 0.000 30.191 0.490 1.632 0.011 0.000 40.106 0.542 1.720 0.013 0.000 50.120 0.891 2.437 0.011 0.000 log(scale)5.545 256.029 0.008 0.000 log(shape)0.324 1.383 0.003 0.000 Max. log. likelihood -379504 end inclusion --- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Custom ticks on x axis when dates are involved - many thanks!
Hi, I am sorry i am sending this again, but my email was snatched over the weekend by a spam generator and i was not able to send any email out. But now things are again back to normal. Even if i wrote to those who answered my question, i would like to let the list know that i got actually 2 solutions, both correct, but the following one is amazingly clear, short and to the point. I still struggle with the concept of dates in R. Thanks a lot, Monica Date: Thu, 25 Nov 2010 18:58:08 +1100 From: j...@bitwrit.com.au To: pisican...@hotmail.com CC: r-help@r-project.org Subject: Re: [R] Custom ticks on x axis when dates are involved On 11/25/2010 06:27 AM, Monica Pisica wrote: ... Now the graph looks very close to what i want, but i know that my ticks actually are not exactly at 01/01/ as i would like, although i suppose my error is not that much in this instance. However i would really appreciate if i can get the ticks on my x axis how i want in a much more elegant way - if possible (and if not at least in the correct way). Hi Monica, How about this? mpdates-as.POSIXct(paste(1/1,1984:2009,sep=/), format=%d/%m/%Y) axis(1,at=mpdates,1984:2009,las=2) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems in running affylmGUI
Hi Lorenzo, Your question pertains to a Bioconductor package, so you are better off posing the question on the BioC-help list (CC'ed). Best, Jim On 11/29/2010 7:47 AM, Lorenzo Melchor wrote: Hi, I am trying to run affylmGUI on my mac computer. I have already installed the Tlc package as well as Bwidgets through ActiveTcl conversion installing files. However, when running affylmGUI() on R, I keep getting the message in the attached file. I have copied the tcl folders from the root library to the user library, and have obtained the same issue. I would really appreciate if you could help me setting this up. Could you please send me an easy guideline to get affylmGUI working? Otherwise, we could use TeamViewer if that would be easier. Kind regards, Lorenzo Lorenzo Melchor, PhD Mammary Stem Cell Team The Breakthrough Breast Cancer Research Centre (ICR) 237 Fulham Road SW3 6JB lorenzo.melc...@icr.ac.uk The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weighted Spearman correlation coefficient
2010/11/29 Daniel Rabczenko dan...@medstat.waw.pl I would be grateful if anybody can help me in finding an R function to compute weighted Spearman correlation coefficient? There is someone, he lives here http://finzi.psych.upenn.edu/search.html But you can write R-code for this coefficient using: ?rank ?var ?cov.wt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting only corresponding categories from a confusion matrix
On Nov 29, 2010, at 8:32 AM, drflxms wrote: Dear R colleagues, as a result of my calculations regarding the inter-observer- variability in bronchoscopy, I get a confusion matrix like the following: 0 1 1001 1010 11 0609 11 54 36 6 1 1 260 2 1014 008 4 1004 000 0 1000 23 7 12 10 5 1001 0 040 0 1010 4 003 0 1011 1 010 2 11 0 033 1 1101 000 0 1100 2 000 0 1110 1 000 0 The first column represents the categories found among observers, the top row represents the categories found by the reference (goldstandard). I am looking for a way (general algorithm) to extract a data.frame with only the corresponding categories among observers and reference from the above confusion matrix. Corresponding means in this case, that a category has been chosen by both: observers and reference. In this example corresponding categories would be simply all categories that have been chosen by the reference (0,1,1001,1010,11), but generally there might also occur categories which are found by the reference only (and not among observers - in the first column). So the solution-dataframe for the above example would look like: 0 1 1001 1010 11 0609 11 54 36 6 1 1 260 2 1001 0 040 0 1010 4 003 0 11 0 033 1 I wasn't able to follow the confusing, er, confusion matrix explanation but it appears from a comparison of the input and output that you just want row indices that are the column names: mtx[colnames(mtx), ] 0 1 1001 1010 11 0609 11 54 36 6 1 1 260 2 1001 0 040 0 1010 4 003 0 11 0 033 1 # and the omitted mtx[!rownames(mtx) %in% colnames(mtx), ] 0 1 1001 1010 11 10 14 008 4 100 4 000 0 1000 23 7 12 10 5 1011 1 010 2 110 1 000 0 1100 2 000 0 1110 1 000 0 # and their number: NROW(mtx[!rownames(mtx) %in% colnames(mtx), ]) [1] 7 All the categories found among observers only, were omitted. If the solution algorithm would include a method to list the omitted categories and to count their number as well as the number of omitted cases, it would be just perfect for me. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot data inside matrix
yes, this is a zoo object. First off all I have: procProperty: sensor3 sensor3 sensor3 sensor3 sensor3 sensor3 sensor3 sensor3 property: A B B A B B A A A data: 40 20 31 32 15 33 15 12 4 I create a matrix with this objects: data-cbind(data,property) data-cbind(data,procProperty) now data is: data: data property procProperty [1,] 40 A sensor3 [2,] 20 B sensor3 [3,] 31 B sensor3 [4,] 32 A sensor3 [5,] 15 B sensor3 [6,] 33 B sensor3 [7,] 15 A sensor3 [8,] 12 A sensor3 [9,] 4 A sensor3 index contains the date : 2010-10-1 7:32:00 2010-10-3 4:33:21 2010-10-5 7:32:00 2010-10-2 7:32:00 2010-10-5 7:32:00 2010-10-3 4:33:21 2010-10-1 17:32:00 2010-10-3 14:33:21 2010-10-6 17:32:00 I modifed with this function: index-as.POSIXlt(index) then I do: sensor-zoo(data,index) sensor is: data property procProperty 2010-10-01 07:32:00 40 Asensor3 2010-10-01 17:32:00 15 Asensor3 2010-10-02 07:32:00 32 Asensor3 2010-10-03 04:33:21 20 Bsensor3 2010-10-03 04:33:21 33 Bsensor3 2010-10-03 14:33:21 12 Asensor3 2010-10-05 07:32:00 31 Bsensor3 2010-10-05 07:32:00 15 Bsensor3 2010-10-06 17:32:00 4Asensor3 str(sensor) is: str(sensor) ‘zoo’ series from 2010-10-01 07:32:00 to 2010-10-06 17:32:00 Data: chr [1:9, 1:3] 40 15 32 20 33 12 31 15 4 A A A B B A ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:3] data property procProperty Index: POSIXlt[1:9], format: 2010-10-01 07:32:00 2010-10-01 17:32:00 2010-10-02 07:32:00 2010-10-03 04:33:21 ... it doesn't matter the type of the plot, the problem is how can i manage all the data in order to visualize the plot? How can I tell to the pc that I want one plot for each property (A and B) and a line for each procProperty?? Maybe I should use the function tapply?? (in order to have an object like this:) table for A: sensor1 sensor2 sensor3 2010: 40 32 20 2011: 30 30 15 table for B: sensor1 sensor2 sensor3 2010: 14 3 12 2011: 10 30 15 -- View this message in context: http://r.789695.n4.nabble.com/Plot-data-inside-matrix-tp3063417p3063600.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the first. from SAS in R
My apologies for coming to the party so late. I'm sure this question has been answered a couple of times. The attached function is one I pulled from the help archives, but I can't seem to duplicate the search that led me to it. In any case, I've attached the function I found, and an .Rd file I use as part of a local package. I've also attached a pair of accompanying records to retrieve the last record and the nth record. These have the advantage of not requiring data frames to be sorted prior to extraction--the function will sort them for you. Benjamin -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Katz Sent: Wednesday, November 24, 2010 10:17 AM To: r-help@r-project.org Subject: Re: [R] the first. from SAS in R Often the purpose of first/last in sas is to facilitate grouping of observations in a sequential algorithm. This purpose is better served in R by using vectorized methods like those in package plyr. Also, note that first/last has different meanings in the context of by x; versus by x notsorted;. R duplicated does not address the latter, which splits noncontiguous records with equal x. Regards, David -- View this message in context: http://r.789695.n4.nabble.com/the-first-from-SAS-in-R-tp3055417p3057476. html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. === P Please consider the environment before printing this e-mail Cleveland Clinic is ranked one of the top hospitals in America by U.S.News World Report (2009). Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations. Confidentiality Note: This message is intended for use only by the individual or entity to which it is addressed and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. Thank you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Troubles in plotting to a postscript file (not to png)
Hi guys, to make it easier, here is a simple case with the same issues. I use the short function below to make the attached PS file. Things to fix: -) the greek letter lambda is not printed, while mu is printed (see the plot command) -) the annotation inside the plot area: the +- symbol and (K) overlap with the substitute for tmed and tstd respectively (see the text command). Also, how can I set the number of decimal digits for tmed and tstd? (option(digits=4) does not work ) Moreover, I'd like to make the characters thicker. Is there any way? Finally, once I close the R session without saving it (I answer n when quitting), the content of the ps file is erased. Do I miss something in writing the function? Thanks Gaetano plot_example=function() { setPS() postscript (file='plot_example.ps',width=5,height=5,horizontal = FALSE, paper = special,family = ComputerModern,encoding=TeXtext.enc) tmed-1.23456789 tstd-1.23456789 plot(c(0,1),c(0,1),xlab=expression(paste(lambda,mu,T)),main=,sub=(a))#lambda not printed text(.0,.8,substitute( T[disk,med] == tmed %+-% tstd (K),list(tmed=tmed,tstd=tstd)),pos=4,cex=1.5 )#overlapping symbols and numbers dev.off() } plot_example.ps Description: PostScript document __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Do you find trusted Latin wife
Do you find trusted Latin wife http://wong.to/tp480 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: R encoding question
But Sys.setlocale tries to change the option of the whole OS, I just want only R to use a specified encoding, how can I do this. Xiaobo.Gu -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Monday, November 29, 2010 8:57 PM To: Xiaobo Gu Subject: Re: FW: R encoding question I have never played with encodings myself. Suggest you read the postgresql documentation and try different arguments to Sys.setlocale in R. You probably have to do that before you initiate the database since it might not have any effect afterwards. I am not sure this is the problem but its worth a try. Here are some examples. Sys.setlocale(locale=C) Sys.setlocale(locale=en_NZ.iso88591) Sys.setlocale(LC_ALL, en_US) Sys.setlocale(LC_TIME, English) Sys.setlocale('LC_ALL','fr_FR') Sys.putenv(LANGUAGE=EN);Sys.setlocale(LC_ALL,EN) Sys.putenv(LANGUAGE=FR);Sys.setlocale(LC_ALL,FR) 2010/11/29 Xiaobo Gu guxiaobo1...@gmail.com: Hi, Can you help with this. Regards, Xiaobo Gu -Original Message- From: Xiaobo Gu [mailto:guxiaobo1...@gmail.com] Sent: Wednesday, November 24, 2010 10:19 PM To: r-help@r-project.org Subject: R encoding question Hi, I am using RpgSQL to retrieve data from a PostgreSQL database wich is with encoding UTF8, and I have some Chinese character in one of the columns, unfortunately R can't show it correctly. df - dbGetQuery(con, select * from test) df ab 1 1 椤惧皬娉\xa2 2 2 瑕冩 EURO\xa1 I see the following option, do I need to change the encoding option to show the corresponding texts? In my case how to set? $encoding [1] native.enc Thanks, Xiaobo Gu -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems in running affylmGUI
Lorenzo Melchor Lorenzo.Melchor at icr.ac.uk writes: I am trying to run affylmGUI on my mac computer. I have already installed the Tlc package as well as Bwidgets through ActiveTcl conversion installing files. However, when running affylmGUI() on R, I keep getting the message in the attached file. I have copied the tcl folders from the root library to the user library, and have obtained the same issue. I would really appreciate if you could help me setting this up. Could you please send me an easy guideline to get affylmGUI working? Otherwise, we could use TeamViewer if that would be easier. I strongly suggest that you send this e-mail to the Bioconductor e-mail list instead: most people here have no idea about affylmGUI or TeamViewer. Also, most attachments are stripped from postings to the list, so you may want to post it on the web in some public place instead. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data.frame and formula classes of aggregate
Hi - I apologize for the 2nd post, but I think my question from a few weeks ago may have been overlooked on a Friday afternoon. I might be missing something very obvious, but is it widely known that the aggregate function handles missing values differently depending if a data frame or a formula is the first argument ? For example, (d- data.frame(sex=rep(0:1,each=3), wt=c(100,110,120,200,210,NA),ht=c(10,20,NA,30,40,50))) x1- aggregate(d, by = list(d$sex), FUN = mean); names(x1)[3:4]- c('mean.dfcl.wt','mean.dfcl.ht') x2- aggregate(cbind(wt,ht)~sex,FUN=mean,data=d); names(x2)[2:3]- c('mean.formcl.wt','mean.formcl.ht') cbind(x1,x2)[,c(2,3,6,4,7)] The output from the data.frame class has an NA if there are missing values in the group for the variable with missing values. But, the formula class output seems to delete the entire row (missing and non-missing values) if there are any NAs. Wouldn't one expect that the 2 forms (data frame vs formula) of aggregate would give the same result? thanks very much david freedman, atlanta -- View this message in context: http://r.789695.n4.nabble.com/data-frame-and-formula-classes-of-aggregate-tp3063668p3063668.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Filling in missing time samples with na.approx
Hi Everyone, I have a some data from a sports gps device like the following: time latitude longitude altitude distance heartrate 1 1277648884 0.304048 -0.793819 260 0.0094 2 1277648885 0.304056 -0.793772 262 4.30761595 3 127764 0.304060 -0.793696 263 11.26234797 4 1277648894 0.304075 -0.793544 263 25.237911 103 5 1277648898 0.304085 -0.793455 263 33.322525 108 6 1277648902 0.304064 -0.793387 256 40.042988 115 As you can see, the samples have irregular holes in the time column. How can I fill in the missing samples using na.approx? I've tried to creating a blank series with no gaps and combine them, but merge just adds columns and rbind compains about duplicate indexes. P.S. My GPS still has holes in the data when I turn off smart recording :( Thanks, Jason __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Troubles in plotting to a postscript file (not to png)
On Nov 29, 2010, at 9:00 AM, pilchat wrote: Hi guys, to make it easier, here is a simple case with the same issues. I use the short function below to make the attached PS file. Things to fix: -) the greek letter lambda is not printed, while mu is printed (see the plot command) -) the annotation inside the plot area: the +- symbol and (K) overlap with the substitute for tmed and tstd respectively (see the text command). Also, how can I set the number of decimal digits for tmed and tstd? (option(digits=4) does not work ) I would have thought one would do any formatting (of digits) outside the text( ...substitute(...),...) setting. Moreover, I'd like to make the characters thicker. Is there any way? Which characters? There is a bold() option within plotmath. Finally, once I close the R session without saving it (I answer n when quitting), the content of the ps file is erased. Now _that_ is weird. A file should have been created in your default directory and closing R should not have made it go away. Do I miss something in writing the function? Perhaps. (But you certainly missed something in writing the question.) When I remove the family=ComputerModern from the postscript call, I start seeing lambda. And the other spacing weirness also resolves. I am on a Mac and ComputerModern is not one of the pdfFonts() on my machine. The list of available fonts varies widely across various OSes and devices about which you have given us no clues. -- David. Thanks Gaetano plot_example=function() { setPS() postscript (file='plot_example.ps',width=5,height=5,horizontal = FALSE, paper = special,family = ComputerModern,encoding=TeXtext.enc) tmed-1.23456789 tstd-1.23456789 plot (c (0,1 ),c (0,1 ),xlab=expression(paste(lambda,mu,T)),main=,sub=(a))#lambda not printed text(.0,.8,substitute( T[disk,med] == tmed %+-% tstd (K),list(tmed=tmed,tstd=tstd)),pos=4,cex=1.5 )#overlapping symbols and numbers dev.off() } plot_example.ps__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.frame and formula classes of aggregate
On Nov 29, 2010, at 9:35 AM, David Freedman wrote: Hi - I apologize for the 2nd post, but I think my question from a few weeks ago may have been overlooked on a Friday afternoon. I might be missing something very obvious, but is it widely known that the aggregate function handles missing values differently depending if a data frame or a formula is the first argument ? I'm not sure if it is widely known, but it is certainly suggested by the documentation for aggregate, since aggregate.data.frame has different defaults than aggregate.formula. See the Usage section at the very top of ?aggregate. For example, (d- data.frame(sex=rep(0:1,each=3), wt=c(100,110,120,200,210,NA),ht=c(10,20,NA,30,40,50))) x1- aggregate(d, by = list(d$sex), FUN = mean); names(x1)[3:4]- c('mean.dfcl.wt','mean.dfcl.ht') x2- aggregate(cbind(wt,ht)~sex,FUN=mean,data=d); names(x2)[2:3]- c('mean.formcl.wt','mean.formcl.ht') cbind(x1,x2)[,c(2,3,6,4,7)] The output from the data.frame class has an NA if there are missing values in the group for the variable with missing values. But, the formula class output seems to delete the entire row (missing and non-missing values) if there are any NAs. Wouldn't one expect that the 2 forms (data frame vs formula) of aggregate would give the same result? thanks very much david freedman, atlanta -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] periodic time series
http://r.789695.n4.nabble.com/file/n3063697/sample.xlsx sample.xlsx So, here are some sample data. 1st column is a label for 12 hour long days 2nd one a time stamp The rest are the actual measured values for 4 different groups. What would be the best model for such periodic data? Thanks, Andy -- View this message in context: http://r.789695.n4.nabble.com/periodic-time-series-tp3062866p3063697.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Filling in missing time samples with na.approx
On Mon, Nov 29, 2010 at 9:45 AM, Jason Edgecombe ja...@rampaginggeek.com wrote: Hi Everyone, I have a some data from a sports gps device like the following: time latitude longitude altitude distance heartrate 1 1277648884 0.304048 -0.793819 260 0.00 94 2 1277648885 0.304056 -0.793772 262 4.307615 95 3 127764 0.304060 -0.793696 263 11.262347 97 4 1277648894 0.304075 -0.793544 263 25.237911 103 5 1277648898 0.304085 -0.793455 263 33.322525 108 6 1277648902 0.304064 -0.793387 256 40.042988 115 As you can see, the samples have irregular holes in the time column. How can I fill in the missing samples using na.approx? I've tried to creating a blank series with no gaps and combine them, but merge just adds columns and rbind compains about duplicate indexes. P.S. My GPS still has holes in the data when I turn off smart recording :( Try this: Lines - time latitude longitude altitude distance heartrate 1277648884 0.304048 -0.793819 260 0.0094 1277648885 0.304056 -0.793772 262 4.30761595 127764 0.304060 -0.793696 263 11.26234797 1277648894 0.304075 -0.793544 263 25.237911 103 1277648898 0.304085 -0.793455 263 33.322525 108 1277648902 0.304064 -0.793387 256 40.042988 115 # read in data library(zoo) z - read.zoo(textConnection(Lines), header = TRUE) na.approx(z, xout = seq(min(time(z)), max(time(z -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RDA Triplot
Since I am doing an RDA with contraints I get this error message when trying to use biplot.rda: 'biplot.rda' not suitable for models with constraints Daniel 2010/11/26 Jari Oksanen [via R] ml-node+3060660-1324000642-57...@n4.nabble.comml-node%2b3060660-1324000642-57...@n4.nabble.com Danielwc daniel.carstensen at gmail.com writes: Im using the VEGAN package to do an RDA ordination. In my plot I get my environmental scores as arrows/vectors, but my species scores as points. I would like to get the species scores as arrows as well. Is there not a way I can tell R to plot both environmental and species scores as arrows/vectors? See ?biplot.rda in vegan. Cheers, Jari Oksanen __ [hidden email] http://user/SendEmail.jtp?type=nodenode=3060660i=0mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View message @ http://r.789695.n4.nabble.com/RDA-Triplot-tp3055474p3060660.html To unsubscribe from RDA Triplot, click herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3055474code=ZGFuaWVsLmNhcnN0ZW5zZW5AZ21haWwuY29tfDMwNTU0NzR8LTM3Mjc5OTMzMA==. -- View this message in context: http://r.789695.n4.nabble.com/RDA-Triplot-tp3055474p3063712.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] non-linear fourth-order differential equations
OP is asking about a system of fourth-order differential equations, whereas you are telling her how to solve a single, algebraic nonlinear equation. Take a look at package deSolve, and the function `lsode' in that package for solving a system of nonlinear ODEs (given initial values). Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Wu Gong Sent: Sunday, November 28, 2010 6:31 PM To: r-help@r-project.org Subject: Re: [R] non-linear fourth-order differential equations Hi Yanika, Please try ?uniroot and ?ployroot f - function(x) x^4-16 uniroot(f, lower= -3, upper=0) polyroot(c(-16,0,0,0,1)) - A R learner. -- View this message in context: http://r.789695.n4.nabble.com/non-linear-fourth-order-differential-equations -tp3062805p3062894.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unexpected behavior using round to 2 digits on randomly generated numbers
On Sun, Nov 28, 2010 at 01:53:05PM -0800, Jeff Newmiller wrote: FAQ 7.31 http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f Additional information concerning rounding errors of double precision and suggestions for R code, which avoids them in some situations, may be found in the first section of http://rwiki.sciviews.org/doku.php?id=misc:r_accuracy and in http://rwiki.sciviews.org/doku.php?id=misc:r_accuracy:decimal_numbers Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Where is gdata?
Hi Gabor, - snip - .. The following object(s) are masked from 'package:utils': : object.size This is just a message that it can't find perl. If you don't need to use read.xls then you don't need perl so you can ignore the message. If you do need to use read.xls then install perl and once you have done that then run installXLSXsupport(). After having installed Strawberry perl the warning disappears. library(gdata) ?read.xls works starting Read Excel files I haven't run installXLSXsupport() afterwards. Just did it without success. installXLSXsupport() Error: could not find function installXLSXsupport Couldn't proceed further. I can't resolve follows; 1) library(AER) data() starts the datasets of AER 2) library(gdata) data() gdata is added to the list of master dataset package ? B.R. Stephen L From: Gabor Grothendieck ggrothendi...@gmail.com Cc: Liviu Andronic landronim...@gmail.com; r-help r-help@r-project.org Sent: Mon, November 29, 2010 8:39:55 PM Subject: Re: [R] Where is gdata? Hi Liviu, Not if you library(gdata) first. Then ?read.xls should work. Yes, I did. I found something strange here which I can't explain. Win 7 64bit R 32/64 bit Just rebooted Win 7 and R library(gdata) gdata: Unable to locate valid perl interpreter gdata: gdata: read.xls() will be unable to read Excel XLS and XLSX files gdata: unless the 'perl=' argument is used to specify the location of a gdata: valid perl intrpreter. gdata: gdata: (To avoid display of this message in the future, please ensure gdata: perl is installed and available on the executable search path.) gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLX' (Excel 97-2004) files. gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLSX' (Excel 2007+) files. gdata: Run the function 'installXLSXsupport()' gdata: to automatically download and install the perl gdata: libaries needed to support Excel XLS and XLSX formats. Attaching package: 'gdata' The following object(s) are masked from 'package:utils': object.size This is just a message that it can't find perl. If you don't need to use read.xls then you don't need perl so you can ignore the message. If you do need to use read.xls then install perl and once you have done that then run installXLSXsupport(). It complains. ?read.xls starting httpd help server ... done Read Excel files Both 32 and 64 bit R worked. If there is NO complaint on running; library(gdata) Then ?read.xls can't work. Can you clarify when ?read.xls works for you and when it does not? Perl seems has been installed. But I can't recall, when and how; C:\dir C:\Users\satimiswin764\Documents\R\win-library\2.12\gdata\ . 11/22/2010 10:44 AMDIR perl 11/22/2010 10:44 AMDIR R 11/22/2010 10:44 AMDIR unitTests 11/22/2010 10:44 AMDIR xls The gdata\perl folder contains perl libraries that come with gdata. Perl itself is not distributed with gdata and you don't need perl at all to use gdata except for read.xls and related functions. My understanding is that this question has nothing to do with perl nor with read.xls and that the problem is that you seem to be able to run this: library(gdata) ?read.xls and sometimes it works and at other times it does not work. Is that right? Does it occur with any other package? How about removing gdata and reinstalling it? remove.packages(gdata) ... exit R and check if gdata has been removed ... ... restart R ... install.packages(gdata) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extracting P values from lm model
Hello I am trying to get out of an lm model the fstatistics, however after I run the model I write names(Model) and the fstatistic does not appear only these. names(Model) [1] coefficients residuals effects rank fitted.values [6] assignqrdf.residual xlevels call [11] terms model How could I extract the P values? I have run a cbind of 1800 response variables so is not easy to do it by hand. Thanks in advance. Rosario __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] non-linear fourth-order differential equations
Hi Ravi, Thank you for your correction. I hope I didn't mess up anything:) Cheers. Wu - A R learner. -- View this message in context: http://r.789695.n4.nabble.com/non-linear-fourth-order-differential-equations-tp3062805p3063761.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Significance of the difference between two correlation coefficients
Hi, based on the sample size I want to calculate whether to correlation coefficients are significantly different or not. I know that as a first step both coefficients have to be converted to z values using fisher's z transformation. I have done this already but I dont know how to further proceed from there. unlike for correlation coefficients I know that the difference for z values is mathematically defined but I do not know how to incorporate the sample size. I found a couple of websites that provide that service but since I have huge data sets I need to automate this procedure. (http://faculty.vassar.edu/lowry/rdiff.html) Can anyone help? Cheers, syrvn -- View this message in context: http://r.789695.n4.nabble.com/Significance-of-the-difference-between-two-correlation-coefficients-tp3063765p3063765.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Where is gdata?
On Mon, Nov 29, 2010 at 10:18 AM, Stephen Liu sati...@yahoo.com wrote: Hi Gabor, - snip - .. The following object(s) are masked from 'package:utils': : object.size This is just a message that it can't find perl. If you don't need to use read.xls then you don't need perl so you can ignore the message. If you do need to use read.xls then install perl and once you have done that then run installXLSXsupport(). After having installed Strawberry perl the warning disappears. library(gdata) ?read.xls works starting Read Excel files I haven't run installXLSXsupport() afterwards. Just did it without success. installXLSXsupport() Error: could not find function installXLSXsupport Couldn't proceed further. Please start at a fresh version of R. Copy and paste your session from the R console rather than relating what happened. Also, what version of gdata are you using? Older versions did not have installXLSXsupport. Show: packageDescription(gdata)$Version win.version() R.version.string I can't resolve follows; 1) library(AER) data() starts the datasets of AER 2) library(gdata) data() gdata is added to the list of master dataset package This is not clear. Please provide exact and complete output. B.R. Stephen L From: Gabor Grothendieck ggrothendi...@gmail.com To: Stephen Liu sati...@yahoo.com Cc: Liviu Andronic landronim...@gmail.com; r-help r-help@r-project.org Sent: Mon, November 29, 2010 8:39:55 PM Subject: Re: [R] Where is gdata? On Mon, Nov 29, 2010 at 3:44 AM, Stephen Liu sati...@yahoo.com wrote: Hi Liviu, Not if you library(gdata) first. Then ?read.xls should work. Yes, I did. I found something strange here which I can't explain. Win 7 64bit R 32/64 bit Just rebooted Win 7 and R library(gdata) gdata: Unable to locate valid perl interpreter gdata: gdata: read.xls() will be unable to read Excel XLS and XLSX files gdata: unless the 'perl=' argument is used to specify the location of a gdata: valid perl intrpreter. gdata: gdata: (To avoid display of this message in the future, please ensure gdata: perl is installed and available on the executable search path.) gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLX' (Excel 97-2004) files. gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLSX' (Excel 2007+) files. gdata: Run the function 'installXLSXsupport()' gdata: to automatically download and install the perl gdata: libaries needed to support Excel XLS and XLSX formats. Attaching package: 'gdata' The following object(s) are masked from 'package:utils': object.size This is just a message that it can't find perl. If you don't need to use read.xls then you don't need perl so you can ignore the message. If you do need to use read.xls then install perl and once you have done that then run installXLSXsupport(). It complains. ?read.xls starting httpd help server ... done Read Excel files Both 32 and 64 bit R worked. If there is NO complaint on running; library(gdata) Then ?read.xls can't work. Can you clarify when ?read.xls works for you and when it does not? Perl seems has been installed. But I can't recall, when and how; C:\dir C:\Users\satimiswin764\Documents\R\win-library\2.12\gdata\ . 11/22/2010 10:44 AM DIR perl 11/22/2010 10:44 AM DIR R 11/22/2010 10:44 AM DIR unitTests 11/22/2010 10:44 AM DIR xls The gdata\perl folder contains perl libraries that come with gdata. Perl itself is not distributed with gdata and you don't need perl at all to use gdata except for read.xls and related functions. My understanding is that this question has nothing to do with perl nor with read.xls and that the problem is that you seem to be able to run this: library(gdata) ?read.xls and sometimes it works and at other times it does not work. Is that right? Does it occur with any other package? How about removing gdata and reinstalling it? remove.packages(gdata) ... exit R and check if gdata has been removed ... ... restart R ... install.packages(gdata) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Setting default path to library
Hello I recently upgraded from 2.11.1 to 2.12.0 on a windows machine. When I launch R via TINN - R,(2.3.7.0) most things appear correct. The exception is the path to the library. I store all of the packages in C:\Program_Files \R\R-2.12.0\library Last week when I upgraded I rec'd an error: Error in loadNamespace(i[[1L]], c(lib.loc, .libPaths( : there is no package called cluster Well the cluster package was there, but I installed a new version from a local *.zip file to be sure, and things worked fine. This morning, when I launched R again, I rec'd the same error message, also indicating that the package Hmisc could not be loaded. It too is in the library folder. I think the system is searching for some packages in C:\\Program_Files\R\R-2.12.0\bin\i386 Why would it do that and what is the appropriate commands to tell R where to look for installed packages? I've search the archive and have not found a clear understandable answer. As always, Thanks for the assistance Steve Steve Friedman Ph. D. Ecologist / Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 steve_fried...@nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to remove a package.
On Sun, 2010-11-28 at 07:58 -0800, Stephen Liu wrote: Hi David, Thanks for your advice. I got it. But I can't resolve: *sigh* The packages are listed in alphabetical sort order, hence AER comes before car comes before datasets comes before Ecdat... You just need to page-down through the list of package data sets to see later ones. Seriously, and I don't mean to be rude, but instead of scatter-gun replies to the list, try engaging your brain and actually look *properly* at what is displayed... G library(AER) Loading required package: car Loading required package: MASS Loading required package: nnet Loading required package: survival Loading required package: splines Loading required package: Formula Loading required package: lmtest Loading required package: zoo Loading required package: sandwich Loading required package: strucchange data() displays Data sets in package ‘AER’: But; library(Ecdat) data() displays Data sets in package ‘datasets’: a large datasets including those in package Ecdat? NOt only Ecdat separately. B.R. Stephen L From: David Winsemius dwinsem...@comcast.net Cc: Stefan Grosse singularit...@gmx.net; r-help@r-project.org Sent: Sun, November 28, 2010 11:16:34 PM Subject: Re: [R] How to remove a package. On Nov 28, 2010, at 7:16 AM, Stephen Liu wrote: Hi Stefan, Tks for your advice. snipped Installation went through w/o problem. library(Ecdat) data() Data sets in package ‘datasets’ NOT 'Ecdat' ??Ecdat ... Ecdat::Caschool The California Test Score Data Set Ecdat::GrilichesWage Datas Ecdat::MCAS The Massashusets Test Score Data Set Ecdat::MunExp Municipal Expenditure Data Ecdat::Orange The Orange Juice Data Set Ecdat::SolowSolow's Technological Change Data Ecdat::TranspEq Statewide Data on Transportation Equipment Manufacturing Those files are in Ecdat packages. Caschool Error: object 'Caschool' not found MCAS Error: object 'MCAS' not found Because loading the package does not necessarily register the datasets: require(Ecdat) Loading required package: Ecdat data() #-- produces a large list including Car Stated Preferences for Car Choice Caschool The California Test Score Data Set Catsup Choice of Brand for Catsup CigarCigarette Consumption str(Caschool) Error in str(Caschool) : object 'Caschool' not found data(Car) str(Car) 'data.frame':4654 obs. of 70 variables: $ choice: Factor w/ 6 levels choice1,choice2,..: 1 2 5 5 5 5 2 5 5 2 ... $ college : num 0 1 0 0 0 0 1 1 0 1 ... $ hsg2 : num 0 1 1 0 1 0 1 0 0 0 ... Note that this dataset was NEVER spelled car. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Updates for xlsReadWrite (1.5.3) and xlsReadWritePro (1.6.1/3)
The xlsReadWrite[Pro] packages allow to natively read/write Excel files (.xls) on the Win 32-bit platform. About a week ago new package versions have been released: * xlsReadWrite 1.5.3 is available at CRAN (for R2.11/2.12) and from www.swissr.org/download (binary builds for R2.9 - R2.12) * xlsReadWritePro 1.6.3 is available from www.swissr.org/download (binary builds for R2.9 - R2.12) * (the full download listing is here: http://dl.dropbox.com/u/2602516/swissrpkg/index.html) ## Changes in xlsReadWrite 1.5.3 (0b78c1) ## - *important*: fix AV when reading large data (issue #110: in a subroutine a pointer (pSExp) had been 'riUnprotect'ed (Rf_unprotect) too early, the total protect/unprotect count was correct (of course) but when 'anyDuplicated' got called in the subroutine, the control flow switched to R and R then had the possibility to free 'my' pointer. Not good). Thanks to the (anonymous) user which submitted the nice bug report! - NaN values will be written as 'NaN' and behaviour (e.g. coercion) better accounts R (see read.xls.Rd, write.xls.Rd and unitTests/runitNaNaN.R). - 'dateTimeAs' argument in read.xls has been renamed to 'dateTime'. When using the default (which should be fine in most cases) this change won't affect you. - fix startup message scrambling in R2.12.0 (LF instead of CRLF - reported to Rdevel) - simplified file (unitTests/execManually.R) to run RUnit tests - misc. small/cosmetic changes (see github commits) - (internal) update makefile: support R2.12, much simplify targets, set/modify Windows System Path from within the makefile ## Changes in xlsReadWritePro 1.6.3 (93a6d7) ## - fix for startup message scrambling (use LF instead of CRLF - reported to Rdevel) - some more RUnit tests, cosmetic changes (typos, formatting, etc.) - (internal) update makefile: also support R2.12/2.9, much simplify targets, set/modify Windows System Path from within the makefile ## Changes in xlsReadWritePro version 1.6.1 ## This is a significant update and may require some small adjustments in your code. * Precompiled binary packages for R2.10 and R2.11 (see downloads) * While the package runs on R2.9 and R2.12, we have some issues with our automated 'binary-package-building-and-releasing' makefile. This will be fixed later. Please send us an email if you need the pkg for these versions now and we try to help. * Note: works with existing keys (even when already 4 years old;) Changes: --- 'meta' --- o test/ensure functionality with RUnit tests (180) o improve consistency and add examples o issue tracking is public now and a forum has been added --- important --- o 'KEEP' argument/functionality DROPPED - reason: redundant, differences between the so-called 'keep-obj' and 'xls-obj' are confusing, complicates lowlevel code and hinders future enhancements - resolution: use xls.open, xls.new, xls.save, xls.cancel and xls.close instead o area-related arguments (from, rows, cells, ...) in read.xls/write.xls - CELLS argument SPLITTED into 'CELLS' and 'RANGE': - cells: pick single cell values and give them back as a vector or as a data.frame (the latter is new, type will be determined for each cell individually) - range: read ranges either by name or by a numeric 4-elem-vector (R1,C1,R2,C2) (A1 style, i.e. 'A1:C3', 'Sheet2!B42' could eventually be added here). o xls.sheet, NAMEORINDEX argument renamed - 'nameOrIndex' becomes 'sheet' and the default is the first/active sheet (depends if file is a physical file or an xls-obj) - 'copyAndInsert' action copies from the active sheet o xls.image, SHEET argument is needed! In light of this obligatory change the arguments have been reworked. Here are the current and older declaration: - curr.: xls.image(file, action, sheet = NA|NULL, img = NA, range = NA, target = NA) - beta: xls.image(file, action, img = NA, range = NA, name = NA) - old: xls.image(file = NA, action, nameOrIdx = NA, miscData = NA, keep = NA) o xls.range, NAMEORINDEX argument renamed - 'nameorindex' becomes 'range' o template location moved - new: R_HOME/library/xlsReadWrite/template/TemplateNew.xls - (old/erronous: R_HOME/library/xlsReadWrite/libs/template, reported by B. Ripley for free version) - (the template location in APPDATA remains unaffected) --- normal --- o read.xls - colClasses: recognizes boolean strings as logical, recognizes isodatetime formatted strings, isodatetime/isotime/isodate work for double and character string values. ??? todo: re-read formula values when one or more formulas have been modified: gives back 0 (instead of #NULL!). - rownames for data.frames are integers (when not read from Excel) - NEW ARGUMENT 'checkNames' to optionally treat colnames with 'make.names' - NEW ARGUMENT 'strictArea' - background: when the library opens an Excel file it determines the area which is
Re: [R] extracting P values from lm model
Rosario, The summary function will compute the f-statistic, from which you can compute the attained p-value. Here's a snippet that shows the f-stat. summary(lm(Y ~ X))$fstatistic valuenumdfdendf 34.23125 1.0 8.0 Dave From: Rosario Garcia Gil m.rosario.gar...@genfys.slu.se To: r-help r-help@r-project.org Date: 11/29/2010 09:30 AM Subject: [R] extracting P values from lm model Sent by: r-help-boun...@r-project.org Hello I am trying to get out of an lm model the fstatistics, however after I run the model I write names(Model) and the fstatistic does not appear only these. names(Model) [1] coefficients residuals effects rank fitted.values [6] assignqrdf.residual xlevels call [11] terms model How could I extract the P values? I have run a cbind of 1800 response variables so is not easy to do it by hand. Thanks in advance. Rosario __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Setting Values of Elements in a Dataframe
Dear All, I am experiencing some problems in resetting the values of some selected elements in a dataframe. Consider d-seq(-1,1,length=16) dim(d)-c(4,4) d-as.data.frame(d) sel_pos-which(d0, arr.ind=TRUE) d[sel_pos]- -9 which returns the error Error in `[-.data.frame`(`*tmp*`, sel_pos, value = -9) : only logical matrix subscripts are allowed in replacement which is obscure to me. I am correctly selecting the positive elements in a data.frame and I'd like to reset them to another numerical value. What I am misunderstanding? Many thanks Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Where is gdata?
Hi Gabor, Please start at a fresh version of R. Copy and paste your session from the R console rather than relating what happened. Also, what version of gdata are you using? Older versions did not have installXLSXsupport. Show: packageDescription(gdata)$Version win.version() R.version.string packageDescription(gdata)$Version [1] 2.8.1 win.version() [1] Windows 7 x64 (build 7600) R.version.string [1] R version 2.12.0 (2010-10-15) Both 32 and 64 bits 2) library(gdata) data() gdata is added to the list of master dataset package This is not clear. Please provide exact and complete output. File gdata_output.txt is attached to this email. Following lines are added to the bottom of the file:- Data sets in package ‘gdata’: MedUnitsTable of conversions between Intertional Standard (SI) and US - end - B.R. Stephen L From: Gabor Grothendieck ggrothendi...@gmail.com To: Stephen Liu sati...@yahoo.com Cc: r-help r-help@r-project.org Sent: Mon, November 29, 2010 11:31:18 PM Subject: Re: [R] Where is gdata? On Mon, Nov 29, 2010 at 10:18 AM, Stephen Liu sati...@yahoo.com wrote: Hi Gabor, - snip - .. The following object(s) are masked from 'package:utils': : object.size This is just a message that it can't find perl. If you don't need to use read.xls then you don't need perl so you can ignore the message. If you do need to use read.xls then install perl and once you have done that then run installXLSXsupport(). After having installed Strawberry perl the warning disappears. library(gdata) ?read.xls works starting Read Excel files I haven't run installXLSXsupport() afterwards. Just did it without success. installXLSXsupport() Error: could not find function installXLSXsupport Couldn't proceed further. Please start at a fresh version of R. Copy and paste your session from the R console rather than relating what happened. Also, what version of gdata are you using? Older versions did not have installXLSXsupport. Show: packageDescription(gdata)$Version win.version() R.version.string I can't resolve follows; 1) library(AER) data() starts the datasets of AER 2) library(gdata) data() gdata is added to the list of master dataset package This is not clear. Please provide exact and complete output. B.R. Stephen L From: Gabor Grothendieck ggrothendi...@gmail.com To: Stephen Liu sati...@yahoo.com Cc: Liviu Andronic landronim...@gmail.com; r-help r-help@r-project.org Sent: Mon, November 29, 2010 8:39:55 PM Subject: Re: [R] Where is gdata? On Mon, Nov 29, 2010 at 3:44 AM, Stephen Liu sati...@yahoo.com wrote: Hi Liviu, Not if you library(gdata) first. Then ?read.xls should work. Yes, I did. I found something strange here which I can't explain. Win 7 64bit R 32/64 bit Just rebooted Win 7 and R library(gdata) gdata: Unable to locate valid perl interpreter gdata: gdata: read.xls() will be unable to read Excel XLS and XLSX files gdata: unless the 'perl=' argument is used to specify the location of a gdata: valid perl intrpreter. gdata: gdata: (To avoid display of this message in the future, please ensure gdata: perl is installed and available on the executable search path.) gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLX' (Excel 97-2004) files. gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLSX' (Excel 2007+) files. gdata: Run the function 'installXLSXsupport()' gdata: to automatically download and install the perl gdata: libaries needed to support Excel XLS and XLSX formats. Attaching package: 'gdata' The following object(s) are masked from 'package:utils': object.size This is just a message that it can't find perl. If you don't need to use read.xls then you don't need perl so you can ignore the message. If you do need to use read.xls then install perl and once you have done that then run installXLSXsupport(). It complains. ?read.xls starting httpd help server ... done Read Excel files Both 32 and 64 bit R worked. If there is NO complaint on running; library(gdata) Then ?read.xls can't work. Can you clarify when ?read.xls works for you and when it does not? Perl seems has been installed. But I can't recall, when and how; C:\dir C:\Users\satimiswin764\Documents\R\win-library\2.12\gdata\ . 11/22/2010 10:44 AMDIR perl 11/22/2010 10:44 AMDIR R 11/22/2010 10:44 AMDIR unitTests 11/22/2010 10:44 AMDIR xls The gdata\perl folder contains perl libraries that come with gdata. Perl itself is not distributed with gdata and you don't need perl at all to use gdata except for read.xls and related functions. My understanding is that this question has nothing to do with perl nor with read.xls and that the problem is that you seem
Re: [R] Setting Values of Elements in a Dataframe
Try this: d[d 0] - -9 On Mon, Nov 29, 2010 at 1:56 PM, Lorenzo Isella lorenzo.ise...@gmail.comwrote: Dear All, I am experiencing some problems in resetting the values of some selected elements in a dataframe. Consider d-seq(-1,1,length=16) dim(d)-c(4,4) d-as.data.frame(d) sel_pos-which(d0, arr.ind=TRUE) d[sel_pos]- -9 which returns the error Error in `[-.data.frame`(`*tmp*`, sel_pos, value = -9) : only logical matrix subscripts are allowed in replacement which is obscure to me. I am correctly selecting the positive elements in a data.frame and I'd like to reset them to another numerical value. What I am misunderstanding? Many thanks Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Setting Values of Elements in a Dataframe
Hi, Not sure why it doesn't work (I would say it's because of the structure of sel_pos, but I don't know how to deal with it). But just do: d[d0] - -9 It does work HTH, Ivan Le 11/29/2010 16:56, Lorenzo Isella a écrit : Dear All, I am experiencing some problems in resetting the values of some selected elements in a dataframe. Consider d-seq(-1,1,length=16) dim(d)-c(4,4) d-as.data.frame(d) sel_pos-which(d0, arr.ind=TRUE) d[sel_pos]- -9 which returns the error Error in `[-.data.frame`(`*tmp*`, sel_pos, value = -9) : only logical matrix subscripts are allowed in replacement which is obscure to me. I am correctly selecting the positive elements in a data.frame and I'd like to reset them to another numerical value. What I am misunderstanding? Many thanks Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Two dimensional Array defined on intevals
Hi I am new to R and am trying to set up a two-dimensional array/matrix with the elements defined by the function similar to below. Been trying to use outer with apply but can't seem to get the indexing quite right. Is their a simple way of accomplishing this task ?? - / | 1 x 0.5 y 0.5 | / 2 x 0.5 y 0.5 / G(x,y) = \ \ | 3 x 0.5 y 0.5 | \ 4 x 0.5 y 0.5 - -- Thanks in advance Leon Adams [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Setting default path to library
Take a look in .libPaths() On Mon, Nov 29, 2010 at 1:32 PM, steve_fried...@nps.gov wrote: Hello I recently upgraded from 2.11.1 to 2.12.0 on a windows machine. When I launch R via TINN - R,(2.3.7.0) most things appear correct. The exception is the path to the library. I store all of the packages in C:\Program_Files \R\R-2.12.0\library Last week when I upgraded I rec'd an error: Error in loadNamespace(i[[1L]], c(lib.loc, .libPaths( : there is no package called cluster Well the cluster package was there, but I installed a new version from a local *.zip file to be sure, and things worked fine. This morning, when I launched R again, I rec'd the same error message, also indicating that the package Hmisc could not be loaded. It too is in the library folder. I think the system is searching for some packages in C:\\Program_Files\R\R-2.12.0\bin\i386 Why would it do that and what is the appropriate commands to tell R where to look for installed packages? I've search the archive and have not found a clear understandable answer. As always, Thanks for the assistance Steve Steve Friedman Ph. D. Ecologist / Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 steve_fried...@nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Where is gdata?
On Mon, Nov 29, 2010 at 11:00 AM, Stephen Liu sati...@yahoo.com wrote: Hi Gabor, Please start at a fresh version of R. Copy and paste your session from the R console rather than relating what happened. Also, what version of gdata are you using? Older versions did not have installXLSXsupport. Show: packageDescription(gdata)$Version win.version() R.version.string packageDescription(gdata)$Version [1] 2.8.1 win.version() [1] Windows 7 x64 (build 7600) R.version.string [1] R version 2.12.0 (2010-10-15) Both 32 and 64 bits That is the correct version of gdata. I am using Windows 32 so there could be some differences due to that. At any rate, can you start a fresh R session and show the console output of the problems you have seen. Use Rgui --vanilla to start R to be sure you don't have anything else that might interfere and show everything including the R startup message in the R console output. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Two dimensional Array defined on intevals
Does this work for you? g - function(x,y) ifelse(x .5, 0, 2) + ifelse(y .5, 1, 2) -- Jonathan P. Daily Technician - USGS Leetown Science Center 11649 Leetown Road Kearneysville WV, 25430 (304) 724-4480 Is the room still a room when its empty? Does the room, the thing itself have purpose? Or do we, what's the word... imbue it. - Jubal Early, Firefly r-help-boun...@r-project.org wrote on 11/29/2010 11:37:41 AM: [image removed] [R] Two dimensional Array defined on intevals Leon Adams to: r-help 11/29/2010 11:39 AM Sent by: r-help-boun...@r-project.org Hi I am new to R and am trying to set up a two-dimensional array/matrix with the elements defined by the function similar to below. Been trying to use outer with apply but can't seem to get the indexing quite right. Is their a simple way of accomplishing this task ?? - / | 1 x 0.5 y 0.5 | / 2 x 0.5 y 0.5 / G(x,y) = \ \ | 3 x 0.5 y 0.5 | \ 4 x 0.5 y 0.5 - -- Thanks in advance Leon Adams [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Significance of the difference between two correlation coefficients
Thanks for providing the example but it would be useful to know who I am communicating with or from which institute, but nevermind ... I don't know much about this subject but a quick google search gives me the following site: http://davidmlane.com/hyperstat/A50760.html Using the info from that website, I can code up the following to give the two-tailed p-value of difference in correlations: diff.corr - function( r1, n1, r2, n2 ){ Z1 - 0.5 * log( (1+r1)/(1-r1) ) Z2 - 0.5 * log( (1+r2)/(1-r2) ) diff - Z1 - Z2 SEdiff - sqrt( 1/(n1 - 3) + 1/(n2 - 3) ) diff.Z - diff/SEdiff p - 2*pnorm( abs(diff.Z), lower=F) cat( Two-tailed p-value, p , \n ) } diff.corr( r1=0.5, n1=100, r2=0.40, n2=80 ) ## Two-tailed p-value 0.4103526 diff.corr( r1=0.1, n1=100, r2=-0.1, n2=80 ) ## Two-tailed p-value 0.1885966 The p-value here is slightly different from the Vassar website because the website rounds it's diff.Z values to 2 digits. Regards, Adai On 29/11/2010 15:30, syrvn wrote: Hi, based on the sample size I want to calculate whether to correlation coefficients are significantly different or not. I know that as a first step both coefficients have to be converted to z values using fisher's z transformation. I have done this already but I dont know how to further proceed from there. unlike for correlation coefficients I know that the difference for z values is mathematically defined but I do not know how to incorporate the sample size. I found a couple of websites that provide that service but since I have huge data sets I need to automate this procedure. (http://faculty.vassar.edu/lowry/rdiff.html) Can anyone help? Cheers, syrvn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Array help
Hi Brian, I believe there was some miscommunication earlier due to R's array class for objects and the colloquial usage of array (the idea that 'array' is used colloquially is a bit odd, but I digress). In any case, here are some steps I take (certainly not the only ones) when exploring a new dataset that I am not familiar with: ## load the package library(PASWR) ## look at the str()ucture of the object of interest str(StatTemps) ## Hmm, it is a 'data.frame' with 3 variables ## one variable is 'num' and the other two are 'Factor' ## let's see if we can find out more about those data classes ## (pull up the documentation on each, it can be hard to know at first ## that 'num' stands for numeric and 'Factor' needs to be lowercase) ?data.frame ?numeric ?factor ## in this case, it is easy to print the whole data set so StatTemps # print to screen ## but you can also get a nice little summary summary(StatTemps) ## For the documentation on extraction/indexing ?Extract ## and some examples StatTemps$temperature StatTemps$gender StatTemps$class ## now using a different operator than '$' ## You can call by name by quoting StatTemps[ , temperature] ## or since we know it is column 1 StatTemps[ , 1] ## conversely, we can get row 1 StatTemps[1, ] ## or some combination of rows StatTemps[c(1:7, 22:34), ] ## or rows and columns StatTemps[c(1:7, 22:34), c(1, 3)] ## But since you have a factor, there may be an easier way subset(StatTemps, gender == Male) subset(StatTemps, gender == Female) subset(StatTemps, class == 8 a.m.) subset(StatTemps, class == 9 a.m.) ## on more than one variable subset(StatTemps, class == 8 a.m. gender == Male) ## with a continuous variable subset(StatTemps, temperature 94) ## and we can do calculations by() groups by(data = StatTemps$temperature, INDICES = StatTemps$gender, FUN = mean) ## but typing the name is annoying with(StatTemps, by(data = temperature, INDICES = gender, FUN = mean)) ## even more detailed (but leaving off the explicit argument names) with(StatTemps, by(temperature, list(gender, class), mean)) ## A couple visual summaries boxplot(temperature ~ gender, data = StatTemps) boxplot(temperature ~ class, data = StatTemps) ## or hop on over to lattice for something a little more advanced bwplot(temperature ~ gender | class, data = StatTemps) ## and you can select certain parts without subset() ## first let's see what happens with StatTemps$gender == Female ## now if you pass a logical vector to the extraction operator, '[' StatTemps[StatTemps$gender == Female, ] ## same thing but just the first column StatTemps[StatTemps$gender == Female, 1] ## That came out as a vector, but StatTemps[StatTemps$gender == Female, 1, drop = FALSE] HTH, Josh On Mon, Nov 29, 2010 at 5:01 AM, bfhancock brianfhanc...@gmail.com wrote: if you can load the PASWR package and pull up StatTemps you will see what I am talking about. Otherwise I fear that my question will just be confusing. -- View this message in context: http://r.789695.n4.nabble.com/Array-help-tp3062992p3063535.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Significance of the difference between two correlation coefficients
Hi, thanks a lot. that's what i tried to figure out! it works great and is exactly what i need. Best, syrvn -- View this message in context: http://r.789695.n4.nabble.com/Significance-of-the-difference-between-two-correlation-coefficients-tp3063765p3063997.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to use by() ?
Hello, All! How might one accomplish this using the by() function? m1 is a data frame. # populate column m1$major_allele for ( i in 1:length(m1$major_allele)) { if ( m1$Freq1[i] == m1$MAF[i]){ m1$major_allele[i] = m1$Al1[i] } else{ m1$major_allele[i] = m1$Al2[i] } } Jim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issues with nnet.default for regression/classification
Hi Georg, The documentation (?nnet) says that y should be a matrix or data frame, but in your case it is a vector. This is most likely the problem, if you do not have other data issues going on. Convert y to a matrix (or data frame) using 'as.matrix' and see if this solves your problem. Library 'nnet' can do both classification and regression. I was able to replicate your problem, using an example from Modern Applied Statistics with S, Venables and Ripley, pages 246 and 247), by turning y into a vector and verifying that all the predicted values are the same when y is a vector. This is not the case when y is part of a data frame. You can see this by running the code below. I tried about 4 neural network packages in the past, including AMORE, but found 'nnet' to be the best for my needs. Hope this helps. Jude # Neural Network model in Modern Applied Statistics with S, Venables and Ripley, pages 246 and 247 library(nnet) attach(rock) dim(rock) area1 - area/1; peri1 - peri/1 rock1 - data.frame(perm, area = area1, peri = peri1, shape) dim(rock1) head(rock1,15) # skip = T rock.nn - nnet(log(perm) ~ area + peri + shape, rock1, size=3, decay=1e-3, linout=T, skip=T, maxit=1000, Hess=T) rock1$actual - log(perm) rock1$predicted - predict(rock.nn) head(rock1,15) summary(rock.nn) sum((log(perm) - predict(rock.nn))^2) y - as.vector(log(rock1$perm)) head(rock1[,c(2:4)]) test.nn - nnet(x=rock1[,c(2:4)], y=y, size=3, linout=T, maxit=1000) head(predict(test.nn)) Georg wrote: Hi, I'm currently trying desperately to get the nnet function for training a neural network (with one hidden layer) to perform a regression task. So I run it like the following: trainednet - nnet(x=traindata, y=trainresponse, size = 30, linout = TRUE, maxit=1000) (where x is a matrix and y a numerical vector consisting of the target values for one variable) To see whether the network learnt anything at all, I checked the network weights and those have definitely changed. However, when examining the trainednet$fitted.values, those are all the same so it rather looks as if the network is doing a classification. I can even set linout=FALSE and then it outputs 1 (the class?) for each training example. The trainednet$residuals are correct (difference between predicted/fitted example and actual response), but rather useless. The same happens if I run nnet with the formula/data.frame interface, btw. As per the suggestion in the ?nnet page: If the response is not a factor, it is passed on unchanged to 'nnet.default', I assume that the network is doing regression since my trainresponse variable is a numerical vector and _not_ a factor. I'm currently lost and I can't see that the AMORE/neuralnet packages are any better (moreover, they don't implement the formula/dataframe/predict things). I've read the manpages of nnet and predict.nnet a gazillion times, but I can't really find an answer there. I don't want to do classification, but regression. Thanks for any help. Georg. -- Research Assistant Otto-von-Guericke-Universit?t Magdeburg resea...@georgruss.de http://research.georgruss.de Jude Ryan MarketShare Partners 1270 Avenue of the Americas, Suite # 2702 New York, NY 10020 http://www.marketsharepartners.com Work: (646)-745-9916 ext: 222 Cell: (973)-943-2029 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.frame and formula classes of aggregate
On 2010-11-29 06:35, David Freedman wrote: Hi - I apologize for the 2nd post, but I think my question from a few weeks ago may have been overlooked on a Friday afternoon. I might be missing something very obvious, but is it widely known that the aggregate function handles missing values differently depending if a data frame or a formula is the first argument ? For example, (d- data.frame(sex=rep(0:1,each=3), wt=c(100,110,120,200,210,NA),ht=c(10,20,NA,30,40,50))) x1- aggregate(d, by = list(d$sex), FUN = mean); names(x1)[3:4]- c('mean.dfcl.wt','mean.dfcl.ht') x2- aggregate(cbind(wt,ht)~sex,FUN=mean,data=d); names(x2)[2:3]- c('mean.formcl.wt','mean.formcl.ht') cbind(x1,x2)[,c(2,3,6,4,7)] The output from the data.frame class has an NA if there are missing values in the group for the variable with missing values. But, the formula class output seems to delete the entire row (missing and non-missing values) if there are any NAs. Wouldn't one expect that the 2 forms (data frame vs formula) of aggregate would give the same result? Wasn't there some discussion of this not long ago? Maybe I'm getting senile. Anyway, as David W. points out, the defaults differ. Here's how you can get the same result from both methods: 1. use na.action = na.pass in aggregate.formula; this will duplicate your x1 result. 2. use d - d[complete.cases(d), ] in your x1 calculation; this will duplicate your x2 result. Peter Ehlers thanks very much david freedman, atlanta __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to use by() ?
Jim Moon moonja at ohsu.edu writes: How might one accomplish this using the by() function? m1 is a data frame. # populate column m1$major_allele for ( i in 1:length(m1$major_allele)) { if ( m1$Freq1[i] == m1$MAF[i]){ m1$major_allele[i] = m1$Al1[i] } else{ m1$major_allele[i] = m1$Al2[i] } } You could use: m1$major_allele - ifelse( m1$Freq1 == m1$MAF, m1$Al1, m1$Al2 ) Greg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Filling in missing time samples with na.approx
On 11/29/2010 10:00 AM, Gabor Grothendieck wrote: On Mon, Nov 29, 2010 at 9:45 AM, Jason Edgecombe ja...@rampaginggeek.com wrote: Hi Everyone, I have a some data from a sports gps device like the following: time latitude longitude altitude distance heartrate 1 1277648884 0.304048 -0.793819 260 0.0094 2 1277648885 0.304056 -0.793772 262 4.30761595 3 127764 0.304060 -0.793696 263 11.26234797 4 1277648894 0.304075 -0.793544 263 25.237911 103 5 1277648898 0.304085 -0.793455 263 33.322525 108 6 1277648902 0.304064 -0.793387 256 40.042988 115 As you can see, the samples have irregular holes in the time column. How can I fill in the missing samples using na.approx? I've tried to creating a blank series with no gaps and combine them, but merge just adds columns and rbind compains about duplicate indexes. P.S. My GPS still has holes in the data when I turn off smart recording :( Try this: Lines- time latitude longitude altitude distance heartrate 1277648884 0.304048 -0.793819 260 0.0094 1277648885 0.304056 -0.793772 262 4.30761595 127764 0.304060 -0.793696 263 11.26234797 1277648894 0.304075 -0.793544 263 25.237911 103 1277648898 0.304085 -0.793455 263 33.322525 108 1277648902 0.304064 -0.793387 256 40.042988 115 # read in data library(zoo) z- read.zoo(textConnection(Lines), header = TRUE) na.approx(z, xout = seq(min(time(z)), max(time(z No change: na.approx(z, xout = seq(min(time(z)), max(time(z latitude longitude altitude distance heartrate 1277648884 0.304048 -0.793819 260 0.0094 1277648885 0.304056 -0.793772 262 4.30761595 127764 0.304060 -0.793696 263 11.26234797 1277648894 0.304075 -0.793544 263 25.237911 103 1277648898 0.304085 -0.793455 263 33.322525 108 1277648902 0.304064 -0.793387 256 40.042988 115 There should be 19 samples after the na.approx. I'm guessing that na.approx is what I need, but I'm open to suggestions. Thanks, Jason __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FW: how to use by() ?
Thank you for the suggestion, Bill. The result is not quite what I would like. Here's sample code for you or anyone else who may be interested: Al1 = c('A','C','C','C') Al2 = c('G','G','G','T') Freq1 = c(0.0078,0.0567,0.9434,0.9908) MAF = c(0.0078,0.0567,0.0566,0.0092) m1 = data.frame(Al1=Al1, Al2=Al2,Freq1=Freq1,MAF=MAF,major_allele='') m1 Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.0078 2 C G 0.0567 0.0567 3 C G 0.9434 0.0566 4 C T 0.9908 0.0092 Using the suggestion involving with() (I swapped Al1 and Al2 from before, but this does not affect the nature of the output): m1$major_allele - with(m1, ifelse(Freq1==MAF, Al2, Al1));m1 Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.00781 2 C G 0.0567 0.05671 3 C G 0.9434 0.05662 4 C T 0.9908 0.00922 The output I desire is: Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.0078G 2 C G 0.0567 0.0567G 3 C G 0.9434 0.0566C 4 C T 0.9908 0.0092C Jim -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Monday, November 29, 2010 10:02 AM To: Jim Moon Subject: RE: [R] how to use by() ? m1$major_allele - with(m1, ifelse(Freq1==MAF, Al1, Al2)) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Moon Sent: Monday, November 29, 2010 9:44 AM To: r-help@r-project.org Subject: [R] how to use by() ? Hello, All! How might one accomplish this using the by() function? m1 is a data frame. # populate column m1$major_allele for ( i in 1:length(m1$major_allele)) { if ( m1$Freq1[i] == m1$MAF[i]){ m1$major_allele[i] = m1$Al1[i] } else{ m1$major_allele[i] = m1$Al2[i] } } Jim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issues with nnet.default for regression/classification
On 29/11/10 11:57:31, Jude Ryan wrote: Hi Georg, The documentation (?nnet) says that y should be a matrix or data frame, but in your case it is a vector. This is most likely the problem, if you do not have other data issues going on. Convert y to a matrix (or data frame) using ‘as.matrix’ and see if this solves your problem. Library ‘nnet’ can do both classification and regression. I was able to replicate your problem, using an example from Modern Applied Statistics with S, Venables and Ripley, pages 246 and 247), by turning y into a vector and verifying that all the predicted values are the same when y is a vector. This is not the case when y is part of a data frame. You can see this by running the code below. I tried about 4 neural network packages in the past, including AMORE, but found ‘nnet’ to be the best for my needs. Hi Jude, thanks for the hint. I lately experimented both with the nnet(x,y, ...) and the nnet(formula, dataframe ...) interfaces to nnet and both yielded the same results. So changing the format of y from a vector to a matrix or a data frame didn't change anything at all. However, what _did_ change the outcome is to introduce the decay parameter (which I didn't have at all before). By default it is set to 0 which doesn't seem appropriate in my case. Setting it to decay=1e-3 magically turned my output into an acceptable regression response instead of spitting out fixed values. I really love the predict interface for regression in each of the models I'm using. Clear code :-) So, for the record, the call for nnet for the regression problem is as follows: net.fitted - nnet(formula, data = sp...@data[-testset,], decay=1e-3, size = 20, linout = TRUE) (where sp...@data is the data part of a SpatialPointsDataFrame. And yes, in selecting the [-testset,] data points I'm taking into account the existing spatial autocorrelation.) # Neural Network model in Modern Applied Statistics with S, Venables and Ripley, pages 246 and 247 Thanks for your help and the reference, I'm likely to order the book now :-) Leaving out the decay parameter changes the fitted.values in the rock example you mentioned as well, although not that much. Convergence speed does change as expected, so the parameter is working. I guess my problem is solved now, the rest is due to the specialties with my data sets. Georg. -- Research Assistant Otto-von-Guericke-Universität Magdeburg resea...@georgruss.de http://research.georgruss.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to use by() ?
... or slightly less verbose: m1 - within(m1,major_allele - ifelse( Freq1 == MAF, Al1, Al2 )) ?within Cheers, Bert On Mon, Nov 29, 2010 at 10:25 AM, Greg Johnson g...@nosnhoj.org wrote: Jim Moon moonja at ohsu.edu writes: How might one accomplish this using the by() function? m1 is a data frame. # populate column m1$major_allele for ( i in 1:length(m1$major_allele)) { if ( m1$Freq1[i] == m1$MAF[i]){ m1$major_allele[i] = m1$Al1[i] } else{ m1$major_allele[i] = m1$Al2[i] } } You could use: m1$major_allele - ifelse( m1$Freq1 == m1$MAF, m1$Al1, m1$Al2 ) Greg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: how to use by() ?
ifelse(cond,ifTrue,ifFalse) doesn't do what you want when ifTrue or ifElse is a factor. You can use as.character on the factors with(m1, ifelse(Freq1==MAF, as.character(Al2), as.character(Al1))) [1] G G C C or use the stringsAsFactors=FALSE argument to data.frame (or read.table) when you make the data.frame. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Moon Sent: Monday, November 29, 2010 10:37 AM To: r-help@r-project.org Subject: [R] FW: how to use by() ? Thank you for the suggestion, Bill. The result is not quite what I would like. Here's sample code for you or anyone else who may be interested: Al1 = c('A','C','C','C') Al2 = c('G','G','G','T') Freq1 = c(0.0078,0.0567,0.9434,0.9908) MAF = c(0.0078,0.0567,0.0566,0.0092) m1 = data.frame(Al1=Al1, Al2=Al2,Freq1=Freq1,MAF=MAF,major_allele='') m1 Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.0078 2 C G 0.0567 0.0567 3 C G 0.9434 0.0566 4 C T 0.9908 0.0092 Using the suggestion involving with() (I swapped Al1 and Al2 from before, but this does not affect the nature of the output): m1$major_allele - with(m1, ifelse(Freq1==MAF, Al2, Al1));m1 Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.00781 2 C G 0.0567 0.05671 3 C G 0.9434 0.05662 4 C T 0.9908 0.00922 The output I desire is: Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.0078G 2 C G 0.0567 0.0567G 3 C G 0.9434 0.0566C 4 C T 0.9908 0.0092C Jim -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Monday, November 29, 2010 10:02 AM To: Jim Moon Subject: RE: [R] how to use by() ? m1$major_allele - with(m1, ifelse(Freq1==MAF, Al1, Al2)) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Moon Sent: Monday, November 29, 2010 9:44 AM To: r-help@r-project.org Subject: [R] how to use by() ? Hello, All! How might one accomplish this using the by() function? m1 is a data frame. # populate column m1$major_allele for ( i in 1:length(m1$major_allele)) { if ( m1$Freq1[i] == m1$MAF[i]){ m1$major_allele[i] = m1$Al1[i] } else{ m1$major_allele[i] = m1$Al2[i] } } Jim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to use by() ?
Jim Moon moonja at ohsu.edu writes: How might one accomplish this using the by() function? m1 is a data frame. # populate column m1$major_allele for ( i in 1:length(m1$major_allele)) { if ( m1$Freq1[i] == m1$MAF[i]){ m1$major_allele[i] = m1$Al1[i] } else{ m1$major_allele[i] = m1$Al2[i] } } You could use: m1$major_allele - ifelse( m1$Freq1 == m1$MAF, m1$Al1, m1$Al2 ) Greg --- Thank you for the suggestion, Greg. The result is not quite what I would like. Here's sample code for you or anyone else who may be interested: Al1 = c('A','C','C','C') Al2 = c('G','G','G','T') Freq1 = c(0.0078,0.0567,0.9434,0.9908) MAF = c(0.0078,0.0567,0.0566,0.0092) m1 = data.frame(Al1=Al1, Al2=Al2,Freq1=Freq1,MAF=MAF,major_allele='') m1 Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.0078 2 C G 0.0567 0.0567 3 C G 0.9434 0.0566 4 C T 0.9908 0.0092 Using the suggestion involving ifelse (I swapped Al1 and Al2 from before, but this does not affect the nature of the output): m1$major_allele - ifelse( m1$Freq1 == m1$MAF, m1$Al2, m1$Al1 );m1 Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.00781 2 C G 0.0567 0.05671 3 C G 0.9434 0.05662 4 C T 0.9908 0.00922 The output I desire is: Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.0078G 2 C G 0.0567 0.0567G 3 C G 0.9434 0.0566C 4 C T 0.9908 0.0092C Jim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Filling in missing time samples with na.approx
On Mon, Nov 29, 2010 at 1:33 PM, Jason Edgecombe ja...@rampaginggeek.com wrote: On 11/29/2010 10:00 AM, Gabor Grothendieck wrote: On Mon, Nov 29, 2010 at 9:45 AM, Jason Edgecombe ja...@rampaginggeek.com wrote: Hi Everyone, I have a some data from a sports gps device like the following: time latitude longitude altitude distance heartrate 1 1277648884 0.304048 -0.793819 260 0.00 94 2 1277648885 0.304056 -0.793772 262 4.307615 95 3 127764 0.304060 -0.793696 263 11.262347 97 4 1277648894 0.304075 -0.793544 263 25.237911 103 5 1277648898 0.304085 -0.793455 263 33.322525 108 6 1277648902 0.304064 -0.793387 256 40.042988 115 As you can see, the samples have irregular holes in the time column. How can I fill in the missing samples using na.approx? I've tried to creating a blank series with no gaps and combine them, but merge just adds columns and rbind compains about duplicate indexes. P.S. My GPS still has holes in the data when I turn off smart recording :( Try this: Lines- time latitude longitude altitude distance heartrate 1277648884 0.304048 -0.793819 260 0.00 94 1277648885 0.304056 -0.793772 262 4.307615 95 127764 0.304060 -0.793696 263 11.262347 97 1277648894 0.304075 -0.793544 263 25.237911 103 1277648898 0.304085 -0.793455 263 33.322525 108 1277648902 0.304064 -0.793387 256 40.042988 115 # read in data library(zoo) z- read.zoo(textConnection(Lines), header = TRUE) na.approx(z, xout = seq(min(time(z)), max(time(z No change: na.approx(z, xout = seq(min(time(z)), max(time(z latitude longitude altitude distance heartrate 1277648884 0.304048 -0.793819 260 0.00 94 1277648885 0.304056 -0.793772 262 4.307615 95 127764 0.304060 -0.793696 263 11.262347 97 1277648894 0.304075 -0.793544 263 25.237911 103 1277648898 0.304085 -0.793455 263 33.322525 108 1277648902 0.304064 -0.793387 256 40.042988 115 It works for me. Lines - time latitude longitude altitude distance heartrate + 1277648884 0.304048 -0.793819 260 0.0094 + 1277648885 0.304056 -0.793772 262 4.30761595 + 127764 0.304060 -0.793696 263 11.26234797 + 1277648894 0.304075 -0.793544 263 25.237911 103 + 1277648898 0.304085 -0.793455 263 33.322525 108 + 1277648902 0.304064 -0.793387 256 40.042988 115 # read in data library(zoo) z - read.zoo(textConnection(Lines), header = TRUE) na.approx(z, xout = seq(min(time(z)), max(time(z latitude longitude altitude distance heartrate 1277648884 0.3040480 -0.7938190 260. 0.00 94.0 1277648885 0.3040560 -0.7937720 262. 4.307615 95.0 1277648886 0.3040573 -0.7937467 262. 6.625859 95.7 1277648887 0.3040587 -0.7937213 262.6667 8.944103 96.3 127764 0.3040600 -0.7936960 263. 11.262347 97.0 1277648889 0.3040625 -0.7936707 263. 13.591608 98.0 1277648890 0.3040650 -0.7936453 263. 15.920868 99.0 1277648891 0.3040675 -0.7936200 263. 18.250129 100.0 1277648892 0.3040700 -0.7935947 263. 20.579390 101.0 1277648893 0.3040725 -0.7935693 263. 22.908650 102.0 1277648894 0.3040750 -0.7935440 263. 25.237911 103.0 1277648895 0.3040775 -0.7935218 263. 27.259065 104.25000 1277648896 0.3040800 -0.7934995 263. 29.280218 105.5 1277648897 0.3040825 -0.7934773 263. 31.301371 106.75000 1277648898 0.3040850 -0.7934550 263. 33.322525 108.0 1277648899 0.3040797 -0.7934380 261.2500 35.002641 109.75000 1277648900 0.3040745 -0.7934210 259.5000 36.682756 111.5 1277648901 0.3040693 -0.7934040 257.7500 38.362872 113.25000 1277648902 0.3040640 -0.7933870 256. 40.042988 115.0 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issues with nnet.default for regression/classification
Good to know that you solved your problem. I did not realize that the default decay parameter = 0 was the cause of the problem. Since I have the MASS book, I was always setting this parameter, in my own work, as indicated in the book, and had no reason to change it. This is probably the first time I have left this parameter out! I am not sure that the effect of leaving out the decay parameter is documented anywhere. I will have to dig out the book and check, but the book is rather terse and to the point and it would not surprise me if there is no mention of when to override the default of decay = 0. Jude Ryan MarketShare Partners 1270 Avenue of the Americas, Suite # 2702 New York, NY 10020 http://www.marketsharepartners.com Work: (646)-745-9916 ext: 222 Cell: (973)-943-2029 -Original Message- From: Georg Ruß [mailto:resea...@georgruss.de] Sent: Monday, November 29, 2010 10:37 AM To: Jude Ryan; R-help@r-project.org Subject: Re: [R] Issues with nnet.default for regression/classification On 29/11/10 11:57:31, Jude Ryan wrote: Hi Georg, The documentation (?nnet) says that y should be a matrix or data frame, but in your case it is a vector. This is most likely the problem, if you do not have other data issues going on. Convert y to a matrix (or data frame) using ‘as.matrix’ and see if this solves your problem. Library ‘nnet’ can do both classification and regression. I was able to replicate your problem, using an example from Modern Applied Statistics with S, Venables and Ripley, pages 246 and 247), by turning y into a vector and verifying that all the predicted values are the same when y is a vector. This is not the case when y is part of a data frame. You can see this by running the code below. I tried about 4 neural network packages in the past, including AMORE, but found ‘nnet’ to be the best for my needs. Hi Jude, thanks for the hint. I lately experimented both with the nnet(x,y, ...) and the nnet(formula, dataframe ...) interfaces to nnet and both yielded the same results. So changing the format of y from a vector to a matrix or a data frame didn't change anything at all. However, what _did_ change the outcome is to introduce the decay parameter (which I didn't have at all before). By default it is set to 0 which doesn't seem appropriate in my case. Setting it to decay=1e-3 magically turned my output into an acceptable regression response instead of spitting out fixed values. I really love the predict interface for regression in each of the models I'm using. Clear code :-) So, for the record, the call for nnet for the regression problem is as follows: net.fitted - nnet(formula, data = sp...@data[-testset,], decay=1e-3, size = 20, linout = TRUE) (where sp...@data is the data part of a SpatialPointsDataFrame. And yes, in selecting the [-testset,] data points I'm taking into account the existing spatial autocorrelation.) # Neural Network model in Modern Applied Statistics with S, Venables and Ripley, pages 246 and 247 Thanks for your help and the reference, I'm likely to order the book now :-) Leaving out the decay parameter changes the fitted.values in the rock example you mentioned as well, although not that much. Convergence speed does change as expected, so the parameter is working. I guess my problem is solved now, the rest is due to the specialties with my data sets. Georg. -- Research Assistant Otto-von-Guericke-Universität Magdeburg resea...@georgruss.de http://research.georgruss.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: how to use by() ?
On Nov 29, 2010, at 1:36 PM, Jim Moon wrote: Thank you for the suggestion, Bill. The result is not quite what I would like. Here's sample code for you or anyone else who may be interested: Al1 = c('A','C','C','C') Al2 = c('G','G','G','T') Freq1 = c(0.0078,0.0567,0.9434,0.9908) MAF = c(0.0078,0.0567,0.0566,0.0092) m1 = data.frame(Al1=Al1, Al2=Al2,Freq1=Freq1,MAF=MAF,major_allele='') m1 Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.0078 2 C G 0.0567 0.0567 3 C G 0.9434 0.0566 4 C T 0.9908 0.0092 Using the suggestion involving with() (I swapped Al1 and Al2 from before, but this does not affect the nature of the output): m1$major_allele - with(m1, ifelse(Freq1==MAF, Al2, Al1));m1 Al1 Al2 Freq1MAF major_allele I suspect that you have just been bitten by the data.frame- stringsAsFactors=TRUE crocodile. Since you are comparing floating point numbers, you are also wading in rivers where floating-point crocodiles are also hungry and searching out their next victim. -- David. 1 A G 0.0078 0.00781 2 C G 0.0567 0.05671 3 C G 0.9434 0.05662 4 C T 0.9908 0.00922 The output I desire is: Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.0078G 2 C G 0.0567 0.0567G 3 C G 0.9434 0.0566C 4 C T 0.9908 0.0092C Jim -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Monday, November 29, 2010 10:02 AM To: Jim Moon Subject: RE: [R] how to use by() ? m1$major_allele - with(m1, ifelse(Freq1==MAF, Al1, Al2)) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Moon Sent: Monday, November 29, 2010 9:44 AM To: r-help@r-project.org Subject: [R] how to use by() ? Hello, All! How might one accomplish this using the by() function? m1 is a data frame. # populate column m1$major_allele for ( i in 1:length(m1$major_allele)) { if ( m1$Freq1[i] == m1$MAF[i]){ m1$major_allele[i] = m1$Al1[i] } else{ m1$major_allele[i] = m1$Al2[i] } } Jim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: how to use by() ?
Thank you, Bill. That fixed it. Jim -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Monday, November 29, 2010 10:46 AM To: Jim Moon; r-help@r-project.org Subject: RE: [R] FW: how to use by() ? ifelse(cond,ifTrue,ifFalse) doesn't do what you want when ifTrue or ifElse is a factor. You can use as.character on the factors with(m1, ifelse(Freq1==MAF, as.character(Al2), as.character(Al1))) [1] G G C C or use the stringsAsFactors=FALSE argument to data.frame (or read.table) when you make the data.frame. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Moon Sent: Monday, November 29, 2010 10:37 AM To: r-help@r-project.org Subject: [R] FW: how to use by() ? Thank you for the suggestion, Bill. The result is not quite what I would like. Here's sample code for you or anyone else who may be interested: Al1 = c('A','C','C','C') Al2 = c('G','G','G','T') Freq1 = c(0.0078,0.0567,0.9434,0.9908) MAF = c(0.0078,0.0567,0.0566,0.0092) m1 = data.frame(Al1=Al1, Al2=Al2,Freq1=Freq1,MAF=MAF,major_allele='') m1 Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.0078 2 C G 0.0567 0.0567 3 C G 0.9434 0.0566 4 C T 0.9908 0.0092 Using the suggestion involving with() (I swapped Al1 and Al2 from before, but this does not affect the nature of the output): m1$major_allele - with(m1, ifelse(Freq1==MAF, Al2, Al1));m1 Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.00781 2 C G 0.0567 0.05671 3 C G 0.9434 0.05662 4 C T 0.9908 0.00922 The output I desire is: Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.0078G 2 C G 0.0567 0.0567G 3 C G 0.9434 0.0566C 4 C T 0.9908 0.0092C Jim -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Monday, November 29, 2010 10:02 AM To: Jim Moon Subject: RE: [R] how to use by() ? m1$major_allele - with(m1, ifelse(Freq1==MAF, Al1, Al2)) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Moon Sent: Monday, November 29, 2010 9:44 AM To: r-help@r-project.org Subject: [R] how to use by() ? Hello, All! How might one accomplish this using the by() function? m1 is a data frame. # populate column m1$major_allele for ( i in 1:length(m1$major_allele)) { if ( m1$Freq1[i] == m1$MAF[i]){ m1$major_allele[i] = m1$Al1[i] } else{ m1$major_allele[i] = m1$Al2[i] } } Jim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: how to use by() ?
Well-phrased, David. :-) Jim -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Monday, November 29, 2010 10:53 AM To: Jim Moon Cc: r-help@r-project.org Subject: Re: [R] FW: how to use by() ? On Nov 29, 2010, at 1:36 PM, Jim Moon wrote: Thank you for the suggestion, Bill. The result is not quite what I would like. Here's sample code for you or anyone else who may be interested: Al1 = c('A','C','C','C') Al2 = c('G','G','G','T') Freq1 = c(0.0078,0.0567,0.9434,0.9908) MAF = c(0.0078,0.0567,0.0566,0.0092) m1 = data.frame(Al1=Al1, Al2=Al2,Freq1=Freq1,MAF=MAF,major_allele='') m1 Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.0078 2 C G 0.0567 0.0567 3 C G 0.9434 0.0566 4 C T 0.9908 0.0092 Using the suggestion involving with() (I swapped Al1 and Al2 from before, but this does not affect the nature of the output): m1$major_allele - with(m1, ifelse(Freq1==MAF, Al2, Al1));m1 Al1 Al2 Freq1MAF major_allele I suspect that you have just been bitten by the data.frame- stringsAsFactors=TRUE crocodile. Since you are comparing floating point numbers, you are also wading in rivers where floating-point crocodiles are also hungry and searching out their next victim. -- David. 1 A G 0.0078 0.00781 2 C G 0.0567 0.05671 3 C G 0.9434 0.05662 4 C T 0.9908 0.00922 The output I desire is: Al1 Al2 Freq1MAF major_allele 1 A G 0.0078 0.0078G 2 C G 0.0567 0.0567G 3 C G 0.9434 0.0566C 4 C T 0.9908 0.0092C Jim -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Monday, November 29, 2010 10:02 AM To: Jim Moon Subject: RE: [R] how to use by() ? m1$major_allele - with(m1, ifelse(Freq1==MAF, Al1, Al2)) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Moon Sent: Monday, November 29, 2010 9:44 AM To: r-help@r-project.org Subject: [R] how to use by() ? Hello, All! How might one accomplish this using the by() function? m1 is a data frame. # populate column m1$major_allele for ( i in 1:length(m1$major_allele)) { if ( m1$Freq1[i] == m1$MAF[i]){ m1$major_allele[i] = m1$Al1[i] } else{ m1$major_allele[i] = m1$Al2[i] } } Jim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.frame and formula classes of aggregate
Thanks for the information. There was a discussion of different results obtained with the formula and data.frame methods for a paired t-test -- there are many threads, but one is at http://r.789695.n4.nabble.com/Paired-t-tests-td2325956.html#a2326291 david freedman -- View this message in context: http://r.789695.n4.nabble.com/data-frame-and-formula-classes-of-aggregate-tp3063668p3064177.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] drop levels problem
Hi all: I am having trouble dropping levels, got a few hints online without success. Please consider the dataset below: I was under the inpression that subset(..drop=TRUE) would work but it doesn't library(ggplot2) library(hmisc) x - structure(list(first = c(38.2086, 43.1768, 43.146, 41.8044, 42.4232, 46.3646, 38.0813, 40.0745, 40.4889, 38.6246, 40.2826, 41.6056, 34.5353, 40.0768), second = c(43.3295, 42.4326, 38.8994, 37.0894, 42.3218, 46.1726, 39.1206, 41.2072, 42.4874, 40.2657, 38.7766, 40.8822, 42.0165, 49.2055), third = c(42.24, 42.992, 37.7419, 42.3448, 41.9131, 44.385, 42.7811, 44.1963, 40.8088, 43.9634, 38.7079, 38.0791, 44.3136, 39.5333)), .Names = c(first, second, third), class = data.frame, row.names = c(NA, -14L)) head(x);str(x) xmelt - melt(x) names(xmelt) - c(year,fatPerc) # Year variable is a factor with three levels # Subset to plot only 'first' year firstyear - subset(xmelt,year=='first');str(firstyear) # Plot showing three levels still after I made the subset ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() # Try to drop the levels but dropUnusedLevels() doesn't seem to work here dropUnusedLevels() ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() # code below also should drop levels but it doesn't #data.frame(lapply(firstyear, function(x) if (is.factor(x)){ factor(x)} else{x})) str(firstyear) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Filling in missing time samples with na.approx
strange,,I don't see any change either, could it be that we have an older version of zoo? Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA - Original Message From: Gabor Grothendieck ggrothendi...@gmail.com To: Jason Edgecombe ja...@rampaginggeek.com Cc: r-h...@stat.math.ethz.ch Sent: Mon, November 29, 2010 10:51:07 AM Subject: Re: [R] Filling in missing time samples with na.approx On Mon, Nov 29, 2010 at 1:33 PM, Jason Edgecombe ja...@rampaginggeek.com wrote: On 11/29/2010 10:00 AM, Gabor Grothendieck wrote: On Mon, Nov 29, 2010 at 9:45 AM, Jason Edgecombe ja...@rampaginggeek.com wrote: Hi Everyone, I have a some data from a sports gps device like the following: time latitude longitude altitude distance heartrate 1 1277648884 0.304048 -0.793819 260 0.00 94 2 1277648885 0.304056 -0.793772 262 4.307615 95 3 127764 0.304060 -0.793696 263 11.262347 97 4 1277648894 0.304075 -0.793544 263 25.237911 103 5 1277648898 0.304085 -0.793455 263 33.322525 108 6 1277648902 0.304064 -0.793387 256 40.042988 115 As you can see, the samples have irregular holes in the time column. How can I fill in the missing samples using na.approx? I've tried to creating a blank series with no gaps and combine them, but merge just adds columns and rbind compains about duplicate indexes. P.S. My GPS still has holes in the data when I turn off smart recording :( Try this: Lines- time latitude longitude altitude distance heartrate 1277648884 0.304048 -0.793819 260 0.00 94 1277648885 0.304056 -0.793772 262 4.307615 95 127764 0.304060 -0.793696 263 11.262347 97 1277648894 0.304075 -0.793544 263 25.237911 103 1277648898 0.304085 -0.793455 263 33.322525 108 1277648902 0.304064 -0.793387 256 40.042988 115 # read in data library(zoo) z- read.zoo(textConnection(Lines), header = TRUE) na.approx(z, xout = seq(min(time(z)), max(time(z No change: na.approx(z, xout = seq(min(time(z)), max(time(z latitude longitude altitude distance heartrate 1277648884 0.304048 -0.793819 260 0.00 94 1277648885 0.304056 -0.793772 262 4.307615 95 127764 0.304060 -0.793696 263 11.262347 97 1277648894 0.304075 -0.793544 263 25.237911 103 1277648898 0.304085 -0.793455 263 33.322525 108 1277648902 0.304064 -0.793387 256 40.042988 115 It works for me. Lines - time latitude longitude altitude distance heartrate + 1277648884 0.304048 -0.793819 260 0.00 94 + 1277648885 0.304056 -0.793772 262 4.307615 95 + 127764 0.304060 -0.793696 263 11.262347 97 + 1277648894 0.304075 -0.793544 263 25.237911 103 + 1277648898 0.304085 -0.793455 263 33.322525 108 + 1277648902 0.304064 -0.793387 256 40.042988 115 # read in data library(zoo) z - read.zoo(textConnection(Lines), header = TRUE) na.approx(z, xout = seq(min(time(z)), max(time(z latitude longitude altitude distance heartrate 1277648884 0.3040480 -0.7938190 260. 0.00 94.0 1277648885 0.3040560 -0.7937720 262. 4.307615 95.0 1277648886 0.3040573 -0.7937467 262. 6.625859 95.7 1277648887 0.3040587 -0.7937213 262.6667 8.944103 96.3 127764 0.3040600 -0.7936960 263. 11.262347 97.0 1277648889 0.3040625 -0.7936707 263. 13.591608 98.0 1277648890 0.3040650 -0.7936453 263. 15.920868 99.0 1277648891 0.3040675 -0.7936200 263. 18.250129 100.0 1277648892 0.3040700 -0.7935947 263. 20.579390 101.0 1277648893 0.3040725 -0.7935693 263. 22.908650 102.0 1277648894 0.3040750 -0.7935440 263. 25.237911 103.0 1277648895 0.3040775 -0.7935218 263. 27.259065 104.25000 1277648896 0.3040800 -0.7934995 263. 29.280218 105.5 1277648897 0.3040825 -0.7934773 263. 31.301371 106.75000 1277648898 0.3040850 -0.7934550 263. 33.322525 108.0 1277648899 0.3040797 -0.7934380 261.2500 35.002641 109.75000 1277648900 0.3040745 -0.7934210 259.5000 36.682756 111.5 1277648901 0.3040693 -0.7934040 257.7500 38.362872 113.25000 1277648902 0.3040640 -0.7933870 256. 40.042988 115.0 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __
Re: [R] Filling in missing time samples with na.approx
On Mon, Nov 29, 2010 at 2:08 PM, Felipe Carrillo mazatlanmex...@yahoo.com wrote: strange,,I don't see any change either, could it be that we have an older version of zoo? I am using the most recent one on CRAN which is zoo 1.6.4: packageDescription(zoo)$Version [1] 1.6-4 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drop levels problem
Hi Felipe, On Mon, Nov 29, 2010 at 11:01 AM, Felipe Carrillo mazatlanmex...@yahoo.com wrote: Hi all: I am having trouble dropping levels, got a few hints online without success. Please consider the dataset below: I was under the inpression that subset(..drop=TRUE) would work but it doesn't Here drop is referring to: data.frame(1:10)[, 1] data.frame(1:10)[, 1, drop = FALSE] not to levels of a factor. library(ggplot2) library(hmisc) x - structure(list(first = c(38.2086, 43.1768, 43.146, 41.8044, 42.4232, 46.3646, 38.0813, 40.0745, 40.4889, 38.6246, 40.2826, 41.6056, 34.5353, 40.0768), second = c(43.3295, 42.4326, 38.8994, 37.0894, 42.3218, 46.1726, 39.1206, 41.2072, 42.4874, 40.2657, 38.7766, 40.8822, 42.0165, 49.2055), third = c(42.24, 42.992, 37.7419, 42.3448, 41.9131, 44.385, 42.7811, 44.1963, 40.8088, 43.9634, 38.7079, 38.0791, 44.3136, 39.5333)), .Names = c(first, second, third), class = data.frame, row.names = c(NA, -14L)) Thanks for the nice example! head(x);str(x) xmelt - melt(x) names(xmelt) - c(year,fatPerc) # Year variable is a factor with three levels # Subset to plot only 'first' year firstyear - subset(xmelt,year=='first');str(firstyear) # Plot showing three levels still after I made the subset ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() right, because it is possible to have levels of a factor that have no observations---sometimes these are the most interesting (e.g., if you subset by smoking and found that there were no instances of lung cancer in non-smokers (not that extreme, but you get the point)). # Try to drop the levels but dropUnusedLevels() doesn't seem to work here dropUnusedLevels() sorry, I have had some difficulty installing Hmisc on my linux system and never gotten around to working it out. ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() # code below also should drop levels but it doesn't #data.frame(lapply(firstyear, function(x) if (is.factor(x)){ factor(x)} else{x})) it would if you assigned it back to firstyear. You do it, and then just print to screen and the changed data goes off to oblivion. firstyear - data.frame(lapply(firstyear, function(x) if(is.factor(x)) {factor(x)} else {x})) str(firstyear) # should now just have one level Cheers, Josh str(firstyear) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drop levels problem
Take a look on droplevels function (R = 2.12) On Mon, Nov 29, 2010 at 5:01 PM, Felipe Carrillo mazatlanmex...@yahoo.comwrote: Hi all: I am having trouble dropping levels, got a few hints online without success. Please consider the dataset below: I was under the inpression that subset(..drop=TRUE) would work but it doesn't library(ggplot2) library(hmisc) x - structure(list(first = c(38.2086, 43.1768, 43.146, 41.8044, 42.4232, 46.3646, 38.0813, 40.0745, 40.4889, 38.6246, 40.2826, 41.6056, 34.5353, 40.0768), second = c(43.3295, 42.4326, 38.8994, 37.0894, 42.3218, 46.1726, 39.1206, 41.2072, 42.4874, 40.2657, 38.7766, 40.8822, 42.0165, 49.2055), third = c(42.24, 42.992, 37.7419, 42.3448, 41.9131, 44.385, 42.7811, 44.1963, 40.8088, 43.9634, 38.7079, 38.0791, 44.3136, 39.5333)), .Names = c(first, second, third), class = data.frame, row.names = c(NA, -14L)) head(x);str(x) xmelt - melt(x) names(xmelt) - c(year,fatPerc) # Year variable is a factor with three levels # Subset to plot only 'first' year firstyear - subset(xmelt,year=='first');str(firstyear) # Plot showing three levels still after I made the subset ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() # Try to drop the levels but dropUnusedLevels() doesn't seem to work here dropUnusedLevels() ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() # code below also should drop levels but it doesn't #data.frame(lapply(firstyear, function(x) if (is.factor(x)){ factor(x)} else{x})) str(firstyear) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drop levels problem
Thanks Joshua, I get it now, levels sometimes drive me loco Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA - Original Message From: Joshua Wiley jwiley.ps...@gmail.com To: Felipe Carrillo mazatlanmex...@yahoo.com Cc: r-h...@stat.math.ethz.ch Sent: Mon, November 29, 2010 11:18:45 AM Subject: Re: [R] drop levels problem Hi Felipe, On Mon, Nov 29, 2010 at 11:01 AM, Felipe Carrillo mazatlanmex...@yahoo.com wrote: Hi all: I am having trouble dropping levels, got a few hints online without success. Please consider the dataset below: I was under the inpression that subset(..drop=TRUE) would work but it doesn't Here drop is referring to: data.frame(1:10)[, 1] data.frame(1:10)[, 1, drop = FALSE] not to levels of a factor. library(ggplot2) library(hmisc) x - structure(list(first = c(38.2086, 43.1768, 43.146, 41.8044, 42.4232, 46.3646, 38.0813, 40.0745, 40.4889, 38.6246, 40.2826, 41.6056, 34.5353, 40.0768), second = c(43.3295, 42.4326, 38.8994, 37.0894, 42.3218, 46.1726, 39.1206, 41.2072, 42.4874, 40.2657, 38.7766, 40.8822, 42.0165, 49.2055), third = c(42.24, 42.992, 37.7419, 42.3448, 41.9131, 44.385, 42.7811, 44.1963, 40.8088, 43.9634, 38.7079, 38.0791, 44.3136, 39.5333)), .Names = c(first, second, third), class = data.frame, row.names = c(NA, -14L)) Thanks for the nice example! head(x);str(x) xmelt - melt(x) names(xmelt) - c(year,fatPerc) # Year variable is a factor with three levels # Subset to plot only 'first' year firstyear - subset(xmelt,year=='first');str(firstyear) # Plot showing three levels still after I made the subset ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() right, because it is possible to have levels of a factor that have no observations---sometimes these are the most interesting (e.g., if you subset by smoking and found that there were no instances of lung cancer in non-smokers (not that extreme, but you get the point)). # Try to drop the levels but dropUnusedLevels() doesn't seem to work here dropUnusedLevels() sorry, I have had some difficulty installing Hmisc on my linux system and never gotten around to working it out. ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter() # code below also should drop levels but it doesn't #data.frame(lapply(firstyear, function(x) if (is.factor(x)){ factor(x)} else{x})) it would if you assigned it back to firstyear. You do it, and then just print to screen and the changed data goes off to oblivion. firstyear - data.frame(lapply(firstyear, function(x) if(is.factor(x)) {factor(x)} else {x})) str(firstyear) # should now just have one level Cheers, Josh str(firstyear) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subset by using multiple values
Hi I would like to extend this item to the following: I have the following table X1 X2 X3 value 1 BVEq AGR 11412 954.75 2 CA_Tot AGR 11412 970.59 ... str(DC2_m) 'data.frame': 104160 obs. of 4 variables: $ X1 : Factor w/ 62 levels BVEq,CA_Tot,..: 1 2 3 4 5 6 45 46 47 48 ... ..- attr(*, names)= chr Figure.1 Figure.995 Figure.17873 Figure.17874 ... $ X2 : Factor w/ 48 levels AGR,AKZ,ALB,..: 1 1 1 1 1 1 1 1 1 1 ... ..- attr(*, names)= chr 1 1 1 1 ... $ X3 : int 11412 11412 11412 11412 11412 11412 11412 11412 11412 11412 ... $ value: num 955 971 NA NA NA ... And I have a second (manual) table with entries of combinations of X2 and X3 which I want to exclude: str(Exclude_Data) 'data.frame': 8 obs. of 2 variables: $ Code : Factor w/ 5 levels ALB,ALQ,BAY,..: 3 3 2 4 5 3 1 2 $ Dates: int 12052 12233 12508 11960 13056 12142 12691 12783 subset(DC2_m, cbind(X2,X3) %in% Exclude_Data[]) Now the trick is to precisely exclude just the combinations chosen, and not all combinations of Exclude_Data[1] and Exclude_Data[2], which is what happens when doing two statements X2 in ED[1] AND X3 in ED[3]. Any takers ? Thanks in advance Christian -- View this message in context: http://r.789695.n4.nabble.com/R-Subset-by-using-multiple-values-tp815278p3064226.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] map() and pdf clipping
Hello, Below is a function (test.map) that permits drawing the same map using three different devices. The pdf device doesn't clip polygons to the plot region as I see it does by both the native device (in my case Quartz) and the png device. test.map(pdf) # produces test-map.pdf with no clipping test.map(png) # produces test-map.png with clipping test.map(NA)# draws on the window device with clipping It doesn't appear to matter what the value of the fill argument is - the pdf output shows that the polygons are not being clipped to the plot region. I have viewed the pdf output using Mac OSX's Preview, PDFPen, Adobe Reader and Safari and they all render the same way. So my hunch is that it is not a viewer issue (although I suppose they might be using the same rendering engine under the hood.) Any help would be greatly appreciated. Thanks and cheers, Ben ## BEGIN library(maps) test.map - function(to.file = c(pdf, png, NA)[1], fill = TRUE){ if (!is.na(to.file)){ ofile = paste(test-map, to.file,sep = .) do.call(to.file, list(file=ofile)) } xr - c(-185, -155) yr - c(45, 70) map(xlim = xr, ylim = yr) map.axes() m - matrix(seq(0, 1, length = 40*40), nrow = 40) mr - as.raster(m) rasterImage(m, -180, 50, -160, 65) map(xlim = xr, ylim = yr, fill = fill, add = TRUE) if (!is.na(to.file)){ cat(wrote:, ofile, \n) dev.state - dev.off() } } ## END sessionInfo() R version 2.12.0 (2010-10-15) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] mapproj_1.1-8.2akima_0.5-4RColorBrewer_1.0-2 [4] mapdata_2.1-3 maps_2.1-5 loaded via a namespace (and not attached): [1] tools_2.12.0 Ben Tupper Bigelow Laboratory for Ocean Sciences 180 McKown Point Rd. P.O. Box 475 West Boothbay Harbor, Maine 04575-0475 http://www.bigelow.org/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] drop levels problem
Just to follow up on my own post a bit: xmelt$year[xmelt$year == first, drop = TRUE] will do what you want. I think because in the subset there are multiple columns not all of which are factor, the method for '[' being used is not the factor one that would drop unused levels. I did not make that clear at all the first time around (and probably still butchered it, which some knowledgeable soul may correct me on). Also I did get Hmisc installed, but I think dropUnusedLevels() does not work in this case for a similar reason. Henrique's solution is, as usual, the shortest :) Josh [snip] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] accuracy of GLM dispersion parameters
I'm confused as to the trustworthiness of the dispersion parameters reported by glm. Any help or advice would be greatly appreciated. Context: I'm interested in using a fitted GLM to make some predictions. Along with the predicted values, I'd also like to have estimates of variance for each of those predictions. For a Gamma-family model, I believe this can be done as Var[y] = dispersion parameter * predicted value ^ 2. Thus, I'm interested in knowing the dispersion parameter for this fitted model. Specifics: The summary function says that my fitted GLM has a dispersion parameter=15.8. On the other hand, the gamma.dispersion function (MASS) says that the GLM uses a dispersion parameter of 1.86. I could understand some modest difference, as the help for gamma.shape() says that the MASS functions return a more accurate dispersion value than summary(). However, these two numbers differ by a factor of 8, which is quite a lot. Is this normal? Would you folks expect such a large difference? Which value should I trust? R terminal excerpt: summary(tempglm_g2) Call: glm(formula = precip_sbi ~ precip_oxx + precip_oxx_sq, family = Gamma(link = identity), data = w.combo, start = c(0.1, 0.4, 0.02)) Deviance Residuals: Min1QMedian3Q Max -2.9 -1.63183 -1.00720 0.04878 8.93461 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)0.092360.04834 1.911 0.0583 . precip_oxx 0.268480.35891 0.748 0.4558 precip_oxx_sq 0.051380.13418 0.383 0.7024 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for Gamma family taken to be 15.78978) Null deviance: 528.73 on 130 degrees of freedom Residual deviance: 305.81 on 128 degrees of freedom AIC: -100.33 Number of Fisher Scoring iterations: 5 library(MASS) gamma.shape(tempglm_g2) Alpha: 0.53807358 SE:0.05526108 gamma.dispersion(tempglm_g2) [1] 1.858482 Thanks, Tim Handley Research Assistant Channel Islands National Park (Will be working from both CHIS and SAMO) CHIS Phone: 805-658-5759 SAMO Phone: 805-370-2300 x2412 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subset by using multiple values
One possibility would be to paste together the values before subsetting: subset(DC2_m,!paste(as.character(X2),X3,sep='\\0') %in% paste(as.character(Exclude_Data$Code),Exclude_Data$Dates,sep='\\0')) (untested due to lack of a reproducible example). - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Mon, 29 Nov 2010, clangkamp wrote: Hi I would like to extend this item to the following: I have the following table X1 X2 X3 value 1 BVEq AGR 11412 954.75 2 CA_Tot AGR 11412 970.59 ... str(DC2_m) 'data.frame': 104160 obs. of 4 variables: $ X1 : Factor w/ 62 levels BVEq,CA_Tot,..: 1 2 3 4 5 6 45 46 47 48 ... ..- attr(*, names)= chr Figure.1 Figure.995 Figure.17873 Figure.17874 ... $ X2 : Factor w/ 48 levels AGR,AKZ,ALB,..: 1 1 1 1 1 1 1 1 1 1 ... ..- attr(*, names)= chr 1 1 1 1 ... $ X3 : int 11412 11412 11412 11412 11412 11412 11412 11412 11412 11412 ... $ value: num 955 971 NA NA NA ... And I have a second (manual) table with entries of combinations of X2 and X3 which I want to exclude: str(Exclude_Data) 'data.frame': 8 obs. of 2 variables: $ Code : Factor w/ 5 levels ALB,ALQ,BAY,..: 3 3 2 4 5 3 1 2 $ Dates: int 12052 12233 12508 11960 13056 12142 12691 12783 subset(DC2_m, cbind(X2,X3) %in% Exclude_Data[]) Now the trick is to precisely exclude just the combinations chosen, and not all combinations of Exclude_Data[1] and Exclude_Data[2], which is what happens when doing two statements X2 in ED[1] AND X3 in ED[3]. Any takers ? Thanks in advance Christian -- View this message in context: http://r.789695.n4.nabble.com/R-Subset-by-using-multiple-values-tp815278p3064226.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.