[R] merging pre-sorted data frames

2015-01-13 Thread Mike Miller
I have many pairs of data frames each with about 15 million records each and about 10 million records in common. They are sorted by two of their fields and will be merged by those same fields. The fact that the data are sorted could be used to greatly speed up a merge, but I have the

Re: [R] merging pre-sorted data frames

2015-01-13 Thread Jeff Newmiller
On Tue, 13 Jan 2015, Mike Miller wrote: I have many pairs of data frames each with about 15 million records each and about 10 million records in common. They are sorted by two of their fields and will be merged by those same fields. The fact that the data are sorted could be used to greatly

Re: [R] Help on Principal Component Analysis in R

2015-01-13 Thread Karim Mezhoud
Hi error message indicates that you have non numeric value in your table/ matrix. Replace missing value by NA and add na.rm= true in your command prcomp. Karim Le 14 janv. 2015 00:27, R Help! emanek...@gmail.com a écrit : Hello! I am a beginner to R. I have read several guides, but still am

Re: [R] any r package can handle factor levels not in the test set

2015-01-13 Thread HelponR
Thanks for your reply. But I cannot control the data. I am dealing with real world stream data. It is very normal that the test data(when you apply model to do prediction) have new values that are not seen in training data. If I code myself, I would give a random guess or just an intercept for

Re: [R] any r package can handle factor levels not in the training set

2015-01-13 Thread HelponR
sorry I notice the email subject is not accurate. to be specific, when I do predict, there are error messages like factor x has new levels 1, 2 Here x is an attribute(independent var), not outcome. I wonder if the incremental packages (if any) solve this problem? Maybe it is time to write my

Re: [R] any r package can handle factor levels not in the test set

2015-01-13 Thread William Dunlap
I think it would be nice if predict methods returned NA in appropriate spots instead of aborting when a categorical predictor contains levels not found in the training set. It should not be that hard to implement, as the 'xlevels' component of the model is already being used to put factor levels

Re: [R] any r package can handle factor levels not in the test set

2015-01-13 Thread Bert Gunter
Folks: I believe this discussion would be better moved to a statistical discussion forum, like stats.stackexchange.com ,as it appears to be all about statistical issues, not R. I do not understand how you can possibly expect to predict behavior in new categories for which you have no prior

Re: [R] R vs. RStudio?

2015-01-13 Thread Vokey, John
Although it doesn’t prevent me from using RStudio, there is a bug in the pdf rendering from the graphics window that mangles the display of graph legends using base graphics (i.e., when save as pdf is chosen). The same does not happen when the same graphing code is used with the standard R

Re: [R] two-sample KS test: data becomes significantly different after normalization

2015-01-13 Thread Monnand
I know this must be a wrong method, but I cannot help to ask: Can I only use the p-value from KS test, saying if p-value is greater than \beta, then two samples are from the same distribution. If the definition of p-value is the probability that the null hypothesis is true, then why there's little

[R] Yodlee CRAN package

2015-01-13 Thread Hasan Diwan
Does anyone know of a CRAN package to access Yodlee.com's Aggregation API[1]? Many thanks -- H -- OpenPGP: https://hasan.d8u.us/gpg.key Sent from my mobile device Envoyé de mon portable 1. http://developer.yodlee.com/Aggregation_API [[alternative HTML version deleted]]

[R] Mac Yosemite R.app console typing / autorepeat performance

2015-01-13 Thread David R Forrest
I am having issues with performance in the R.app console on Mac Yosemite / OS X 10.10.1. 1) While editing commands on the command line, left or right arrow gradually slows within a session from usable rates to rates like 2/second or slower. 2) Autorepeat works differently for different

[R] Problem with Anova() in package car

2015-01-13 Thread Gang Chen
I'm having some trouble with Anova() in package car. When the model formula is explicitly expressed: library('nlme') library('car') fm - lme(distance ~ age + Sex, data = Orthodont, random = ~ 1) Anova() works fine: Anova(fm) However, if the model formula is scanned from an external source:

[R] seek(), Windows and Cygwin (was a UNIX vs. Windows package question, please)

2015-01-13 Thread Mike Miller
On Fri, 9 Jan 2015, Duncan Murdoch wrote: On 09/01/2015 5:32 PM, Erin Hodgess wrote: Hello again. Here is another question that I am puzzled about: I had the (incorrect) impression that if I had Rtools on a Windows machine that I could use any tar.gz package. However, that is not true.

Re: [R] two-sample KS test: data becomes significantly different after normalization

2015-01-13 Thread Andrews, Chris
This sounds more like quality control than hypothesis testing. Rather than statistical significance, you want to determine what is an acceptable difference (an 'equivalence margin', if you will). And that is a question about the application, not a statistical one.

Re: [R] Mac Yosemite R.app console typing / autorepeat performance

2015-01-13 Thread Bert Gunter
This clearly should go to the r-sig-mac list, not r-help. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Tue, Jan 13, 2015 at 10:44 AM, David R

[R] Psych package: why am I receiving NA for many of the factor scores?

2015-01-13 Thread Elizabeth Barrett-Cheetham
Hello R Psych package users, Why am I receiving NA for many of the factor scores for individual observations? I'm assuming it is because there is quite a bit of missing data (denoted by NA). Are there any tricks in the psych package for getting a complete set of factor scores? My input is:

Re: [R] R vs. RStudio?

2015-01-13 Thread Sigbert Klinke
On 12.01.2015 09:01, peter dalgaard wrote: On 11 Jan 2015, at 11:30 , Duncan Murdoch murdoch.dun...@gmail.com wrote: - I don't like the tiled display. I find it doesn't give me enough space. This is a mixed blessing. For teaching purposes, it helps avoid shuffling windows to uncover

[R] global environment

2015-01-13 Thread Methekar, Pushpa (GE Transportation, Non-GE)
Sorry it was my mistake. I tried to do like this rm.outliers = function(model,xsys) { rst = rstudent(model) outliers-vector(numeric,731) xsys-xsys for(i in 1:length(rst)) { if(rst[i]=3 rst[i]=-3) #condition for identifying outlier { print(this is not outlier)

Re: [R] Complex merging problems

2015-01-13 Thread PIKAL Petr
Hi I do not understand what you want to achive with this. df2$v3 - ifelse(df2$v1 %in% df1$v1 df2$v2==df2$v1, 1, 0). You compare v1 and v2 from data frame df2 to column v1 in data frame df1? It is true only in case where df2$v1 equals df2$v2. In case you mean that you want check equality of

Re: [R] regular expression question

2015-01-13 Thread Loris Bennett
Hi Mark, Mark Leeds marklee...@gmail.com writes: Hi All: I have a regular expression problem. If a character string ends with rhofixed or norhofixed, I want that part of the string to be removed. If it doesn't end with either of those two endings, then the result should be the same as the

[R] Two stats courses in Portugal

2015-01-13 Thread Highland Statistics Ltd
Apologies for cross-posting There are 5 remaining seats available on each of the following two courses: Data exploration, regression, GLM GAM with introduction to R. 2 - 6 February 2015. Coimbra, Portugal Introduction to Linear mixed effects models, GLMM and MCMC with R 9-13 February 2015.

Re: [R] R console colours (R profile)

2015-01-13 Thread Ingrid Charvet
Thanks for your reply Sarah. I am using R on Windows 7 professional, 64-bit-OS (on my local machine). setOutputColors doesn’t work in Windows however I came across this post mentioning the package colorout, which seems however not to be available from CRAN:

Re: [R] R console colours (R profile)

2015-01-13 Thread Duncan Murdoch
On 13/01/2015 2:52 AM, Ingrid Charvet wrote: Thanks for your reply Sarah. I am using R on Windows 7 professional, 64-bit-OS (on my local machine). setOutputColors doesn’t work in Windows however I came across this post mentioning the package colorout, which seems however not to be

[R] Diff time returns 2 hours when there is 1

2015-01-13 Thread Jue Lin-Ye
Greetings! I am analysing my data and checking for gaps over 1 hour and come across the problem that R tells me that there are many 2 hour gaps. When I go into details, I don't see any of these lapses. Is it something with the diff() function? You can see what I am saying, below:

Re: [R] Diff time returns 2 hours when there is 1

2015-01-13 Thread Duncan Murdoch
On 13/01/2015 3:37 AM, Jue Lin-Ye wrote: Greetings! I am analysing my data and checking for gaps over 1 hour and come across the problem that R tells me that there are many 2 hour gaps. When I go into details, I don't see any of these lapses. Is it something with the diff() function? You

Re: [R] R console colours (R profile)

2015-01-13 Thread Ingrid Charvet
Thank you very much it works - I just needed to save my preferences to the R console when using the GUI and it automatically loads the right colors each time I open R! -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: 13 January 2015 11:23 To: Ingrid

Re: [R] Diff time returns 2 hours when there is 1

2015-01-13 Thread PIKAL Petr
Hi You are bitten by DST - daylight savings time. You need to use either timezones which do not have DST or adopt your code to the fact that twice a year one hour takes 2 hour and one hour is missing. see CEST and CET difference time [1] 1999-10-31 01:00:00 CEST 1999-10-31 02:00:00 CEST [3]

Re: [R] overlapping coefficient bidimensional distribution

2015-01-13 Thread Meli Massimiliano
Hi, also intersection could be a good measure, my set when plotted look like in the picture in attach : https://www.dropbox.com/sh/j68oa80ihn95s04/AADTOt_GYF1JHrmZU2129y__a?dl=0 thanks a lot max On 12/01/15 19:44, Bert Gunter wrote: ?intersect or, more generally, ?match Cheers, Bert Bert

Re: [R] Complex merging problems

2015-01-13 Thread David L Carlson
I think the OP does not want to list duplicate records. Perhaps merge(unique(df1), df2, all.y=TRUE) v1 v2 ind 1 1 83 1 2 1 84 1 3 2 83 NA 4 2 84 NA 5 3 83 NA 6 3 84 NA 7 4 83 NA 8 4 84 NA - David L Carlson Department of Anthropology Texas

[R] package methods in options(defaultPackages) was not found

2015-01-13 Thread D. Alain
Dear R-List, I receive a strange error message when starting R. It says: Warning message: package methods in options(defaultPackages) was not found Error in file(filename, r) : cannot open the connection In addition: Warning message: In file(filename, r) : cannot open file

Re: [R] Problem with Anova() in package car

2015-01-13 Thread Gang Chen
Dear John, Thanks a lot for the quick response and fix! I'm looking forward to try out the development version. I assume that the fix will be released in the official version at some point. Thanks again, Gang On Tue, Jan 13, 2015 at 5:06 PM, John Fox j...@mcmaster.ca wrote: Dear Gang, The

Re: [R] seek(), Windows and Cygwin (was a UNIX vs. Windows package question, please)

2015-01-13 Thread Jeff Newmiller
I don't know why the R developers made that comment, and R-devel is probably a better place to follow up, but the usual problem is that Windows treats text files differently than binary files, so seeking n text files is a headache. Binary files ought to be okay, but that is a theoretical

Re: [R] seek(), Windows and Cygwin (was a UNIX vs. Windows package question, please)

2015-01-13 Thread Henrik Bengtsson
I/we've been utilizing both read and write seek():s on *binary* connections across platforms and file systems, including Windows (at least NTFS, but probably also FAT/FAT32 back in the days) in the Aroma Framework (e.g. affxparser, R.huge) for ~8 years and counting. There should be thousands and

Re: [R] seek(), Windows and Cygwin (was a UNIX vs. Windows package question, please)

2015-01-13 Thread Mike Miller
Thanks, everyone. This is very good news from Henrik because I am interested only in binary connections. It sounds like a function that uses seek() is very likely to work well in Windows, so I won't bother to warn people. I should do a little testing just to see that it's working, though.

Re: [R] Problem with Anova() in package car

2015-01-13 Thread John Fox
Dear Gang, The problem was in the model.matrix.lme() method provided by the car package, and is now fixed in the development version of the car package on R-Forge. You should be able to install it from there via install.packages(car, repos=http://R-Forge.R-project.org;) after the package is next

Re: [R] seek(), Windows and Cygwin (was a UNIX vs. Windows package question, please)

2015-01-13 Thread Henrik Bengtsson
On Tue, Jan 13, 2015 at 2:05 PM, Mike Miller mbmille...@gmail.com wrote: Thanks, everyone. This is very good news from Henrik because I am interested only in binary connections. It sounds like a function that uses seek() is very likely to work well in Windows, so I won't bother to warn

[R] Help on Principal Component Analysis in R

2015-01-13 Thread R Help!
Hello! I am a beginner to R. I have read several guides, but still am stuck on this: I have data in an excel csv file, on which I want to run PCA. I'm not sure how the prcomp formula works. The help page states: prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE, tol = NULL, ...) what is x

Re: [R] Help on Principal Component Analysis in R

2015-01-13 Thread Jeff Newmiller
I'd you keep reading the help file, the answer to your question is right there. As for step by step... only you know what your data looks like. There are various pitfalls one can encounter in getting data from a file into an object in memory, but the basic idea is to use the read.csv function,