Re: [R] Tools For Preparing Data For Analysis

2007-06-22 Thread Kevin E. Thorpe
I am posting to this thread that has been quiet for some time because I remembered the following question. Christophe Pallier wrote: Hi, Can you provide examples of data formats that are problematic to read and clean with R ? Today I had a data manipulation problem that I don't know how to

Re: [R] Tools For Preparing Data For Analysis

2007-06-22 Thread Christophe Pallier
If I understand correctly (from your Perl script) 1. you count the number of occurences of each (echo, muga) pairs in the first file. 2. you remove from the second file the lines that correspond to these occurences. If this is indeed your aim, here's a solution in R: cumcount - function(x) {

Re: [R] Tools For Preparing Data For Analysis

2007-06-14 Thread Ted Harding
As a tangent to this thread, there is a very relevant article in the latest issue of the RSS magazine Significance, which I have just received: Dr Fisher's Casebook The trouble with data Significance, Vol 4 (2007) Issue 2. Full current contents at

Re: [R] Tools For Preparing Data For Analysis

2007-06-14 Thread John Kane
--- [EMAIL PROTECTED] wrote: As a tangent to this thread, there is a very relevant article in the latest issue of the RSS magazine Significance, which I have just received: Dr Fisher's Casebook The trouble with data Significance, Vol 4 (2007) Issue 2. Full current contents at

Re: [R] Tools For Preparing Data For Analysis

2007-06-14 Thread Robert Wilkins
[ Arrggh, not reply , but reply to all , cross my fingers again , sorry Peter! ] Hmm, I don't think you need a retain statement. if first.patientID ; or if last.patientID ; ought to do it. It's actually better than the Vilno version, I must admit, a bit more concise: if ( not

Re: [R] Tools For Preparing Data For Analysis

2007-06-11 Thread Chris Evans
(Ted Harding) sent the following at 10/06/2007 09:28: ... much snipped ... (As is implicit in many comments in Robert's blog, and indeed also from many postings to this list over time and undoubtedly well known to many of us in practice, a lot of the problems with data files arise at the

Re: [R] Tools For Preparing Data For Analysis

2007-06-11 Thread Barry Rowlingson
Chris Evans wrote: Thanks Ted, great thread and I'm impressed with EpiData that I've discovered through this. I'd still like something that is even more integrated with R but maybe some day, if EpiData go fully open source as I think they are doing (A full conversion plan to secure this and

Re: [R] Tools For Preparing Data For Analysis

2007-06-10 Thread Ted Harding
On 10-Jun-07 02:16:46, Gabor Grothendieck wrote: That can be elegantly handled in R through R's object oriented programming by defining a class for the fancy input. See this post: https://stat.ethz.ch/pipermail/r-help/2007-April/130912.html for a simple example of that style. On 6/9/07,

Re: [R] Tools For Preparing Data For Analysis

2007-06-10 Thread Peter Dalgaard
Douglas Bates wrote: Frank Harrell indicated that it is possible to do a lot of difficult data transformation within R itself if you try hard enough but that sometimes means working against the S language and its whole object view to accomplish what you want and it can require knowledge of

Re: [R] Tools For Preparing Data For Analysis

2007-06-10 Thread Sarah Goslee
On 6/10/07, Ted Harding [EMAIL PROTECTED] wrote: ... a lot of the problems with data files arise at the data gathering and entry stages, where people can behave as if stuffing unpaired socks and unattributed underwear randomly into a drawer, and then banging it shut. Not specifically

Re: [R] Tools For Preparing Data For Analysis

2007-06-10 Thread Stephen Tucker
Since R is supposed to be a complete programming language, I wonder why these tools couldn't be implemented in R (unless speed is the issue). Of course, it's a naive desire to have a single language that does everything, but it seems that R currently has most of the functions necessary to do the

Re: [R] Tools For Preparing Data For Analysis

2007-06-10 Thread Ted Harding
On 10-Jun-07 14:04:44, Sarah Goslee wrote: On 6/10/07, Ted Harding [EMAIL PROTECTED] wrote: ... a lot of the problems with data files arise at the data gathering and entry stages, where people can behave as if stuffing unpaired socks and unattributed underwear randomly into a drawer, and

Re: [R] Tools For Preparing Data For Analysis

2007-06-10 Thread Ted Harding
On 10-Jun-07 19:27:50, Stephen Tucker wrote: Since R is supposed to be a complete programming language, I wonder why these tools couldn't be implemented in R (unless speed is the issue). Of course, it's a naive desire to have a single language that does everything, but it seems that R

Re: [R] Tools For Preparing Data For Analysis

2007-06-10 Thread roger koenker
An important potential benefit of R solutions shared by awk, sed, ... is that they provide a reproducible way to document exactly how one got from one version of the data to the next. This seems to be the main problem with handicraft methods like editing excel files, it is too easy to

Re: [R] Tools For Preparing Data For Analysis

2007-06-10 Thread Stephen Tucker
Embarrasingly, I don't know awk or sed but R's code seems to be shorter for most tasks than Python, which is my basis for comparison. It's true that R's more powerful data structures usually aren't necessary for the data cleaning, but sometimes in the filtering process I will pick out lines that

Re: [R] Tools For Preparing Data For Analysis

2007-06-09 Thread Robert Wilkins
Here are some examples of the type of data crunching you might have to do. In response to the requests by Christophe Pallier and Martin Stevens. Before I started developing Vilno, some six years ago, I had been working in the pharmaceuticals for eight years ( it's not easy to show you actual

Re: [R] Tools For Preparing Data For Analysis

2007-06-09 Thread Gabor Grothendieck
That can be elegantly handled in R through R's object oriented programming by defining a class for the fancy input. See this post: https://stat.ethz.ch/pipermail/r-help/2007-April/130912.html for a simple example of that style. On 6/9/07, Robert Wilkins [EMAIL PROTECTED] wrote: Here are

Re: [R] Tools For Preparing Data For Analysis

2007-06-08 Thread Christophe Pallier
Hi, Can you provide examples of data formats that are problematic to read and clean with R ? The only problematic cases I have encountered were cases with multiline and/or varying length records (optional information). Then, it is sometimes a good idea to preprocess the data to present in a

Re: [R] Tools For Preparing Data For Analysis

2007-06-08 Thread Ted Harding
On 08-Jun-07 08:27:21, Christophe Pallier wrote: Hi, Can you provide examples of data formats that are problematic to read and clean with R ? The only problematic cases I have encountered were cases with multiline and/or varying length records (optional information). Then, it is

Re: [R] Tools For Preparing Data For Analysis

2007-06-08 Thread Douglas Bates
On 6/7/07, Robert Wilkins [EMAIL PROTECTED] wrote: As noted on the R-project web site itself ( www.r-project.org - Manuals - R Data Import/Export ), it can be cumbersome to prepare messy and dirty data for analysis with the R tool itself. I've also seen at least one S programming book (one of

Re: [R] Tools For Preparing Data For Analysis

2007-06-08 Thread Wensui Liu
I had mentioned exactly the same thing to others and the feedback I got is - 'when you have a hammer, everything will look like a nail' ^_^. On 6/7/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote: Robert Wilkins wrote: As noted on the R-project web site itself ( www.r-project.org - Manuals -

Re: [R] Tools For Preparing Data For Analysis

2007-06-08 Thread Martin Henry H. Stevens
Is there an example available of this sort of problematic data that requires this kind of data screening and filtering? For many of us, this issue would be nice to learn about, and deal with within R. If a package could be created, that would be optimal for some of us. I would like to

Re: [R] Tools For Preparing Data For Analysis

2007-06-08 Thread Chris Evans
Martin Henry H. Stevens sent the following at 08/06/2007 15:11: Is there an example available of this sort of problematic data that requires this kind of data screening and filtering? For many of us, this issue would be nice to learn about, and deal with within R. If a package could be

Re: [R] Tools For Preparing Data For Analysis

2007-06-08 Thread Dale Steele
For windows users, EpiData Entry http://www.epidata.dk/ is an excellent (free) tool for data entry and documentation.--Dale On 6/8/07, Chris Evans [EMAIL PROTECTED] wrote: Martin Henry H. Stevens sent the following at 08/06/2007 15:11: Is there an example available of this sort of

Re: [R] Tools For Preparing Data For Analysis

2007-06-08 Thread Frank E Harrell Jr
Dale Steele wrote: For windows users, EpiData Entry http://www.epidata.dk/ is an excellent (free) tool for data entry and documentation.--Dale Note that EpiData seems to work well under linux using wine. Frank __ R-help@stat.math.ethz.ch mailing

Re: [R] Tools For Preparing Data For Analysis

2007-06-08 Thread Christophe Pallier
On 6/8/07, Douglas Bates [EMAIL PROTECTED] wrote: Other responses in this thread have mentioned 'little language' filters like awk, which is fine for those who were raised in the Bell Labs tradition of programming (why type three characters when two character names should suffice for

[R] Tools For Preparing Data For Analysis

2007-06-07 Thread Robert Wilkins
As noted on the R-project web site itself ( www.r-project.org - Manuals - R Data Import/Export ), it can be cumbersome to prepare messy and dirty data for analysis with the R tool itself. I've also seen at least one S programming book (one of the yellow Springer ones) that says, more briefly, the

Re: [R] Tools For Preparing Data For Analysis

2007-06-07 Thread Robert Duval
An additional option for Windows users is Micro Osiris http://www.microsiris.com/ best robert On 6/7/07, Robert Wilkins [EMAIL PROTECTED] wrote: As noted on the R-project web site itself ( www.r-project.org - Manuals - R Data Import/Export ), it can be cumbersome to prepare messy and dirty

Re: [R] Tools For Preparing Data For Analysis

2007-06-07 Thread Frank E Harrell Jr
Robert Wilkins wrote: As noted on the R-project web site itself ( www.r-project.org - Manuals - R Data Import/Export ), it can be cumbersome to prepare messy and dirty data for analysis with the R tool itself. I've also seen at least one S programming book (one of the yellow Springer ones)