Re: [R] read.table() and scientific notation
January == January Weiner [EMAIL PROTECTED] writes: Dear all, I am having troubles importing values written as scientific notation using read.table(). I'm sure this is a frequent problem, as many people in my lab have this problem as well, so I'm sure that I just have troubles googling for the right solution. The problem is, that, given a file like that: a 1 2e-4 b 2 3e-8 ... Note: this is advocacy for education in clear quantitative language and is a border-line off topic rant... The other day I read a paper from a student who used notation like 2e-4 in the text - blech! I sent it back for revisions. Lately I have noticed here and in other places this tendency to use floating point notation (also referred to as exponential notation) where scientific notation is appropriate, and vice versa. The notation 2e-4 is a convenient way to express floating point numbers with a simple text string, but it is certainly not scientific notation. No wonder you had trouble googling it! Mike __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table() and scientific notation
Note: this is advocacy for education in clear quantitative language and is a border-line off topic rant... The other day I read a paper from a student who used notation like 2e-4 in the text - blech! I sent it back for revisions. You have sent it back for revisions just because the student used a version of the scientific notation that can be routinely found in literature? Hm. I am _teaching_ my students to understand the scientific notation in the form 1e-20 etc. - for example, because many programs in the field (including R) are representing real numbers using this version of scientific notation. I wouldn't penalize a student for using it in a scientific text. That's what the proof reading is for (if the editors are picky). Lately I have noticed here and in other places this tendency to use floating point notation (also referred to as exponential notation) where scientific notation is appropriate, and vice versa. The notation 2e-4 is a convenient way to express floating point numbers with a simple text string, but it is certainly not scientific notation. Depends how formal and picky you wish to be. 2e-4 is the same as $2\times10^{-4}$ to me as it is for most people, I guess (e.g. look at the Wikipedia entry). No wonder you had trouble googling it! Nope. The problem with googling is that most of the pages you get when googling for R do not refer to R as the statistical language. Cheers, January -- January Weiner 3 -+--- Division of Bioinformatics, University of Muenster | Schloßplatz 4 (+49)(251)8321634 | D48149 Münster http://www.uni-muenster.de/Evolution/ebb/ | Germany __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.table() and scientific notation
Dear all, I am having troubles importing values written as scientific notation using read.table(). I'm sure this is a frequent problem, as many people in my lab have this problem as well, so I'm sure that I just have troubles googling for the right solution. The problem is, that, given a file like that: a 1 2e-4 b 2 3e-8 ... the third column gets imported as a factor, or a string if I set the as.is parameter of read.table to TRUE for this column. However, I just want a simple numeric vector :-) I'm sure there is a simple trick for this. If you can point me to the right function, or manual, I think I should be able to find out the details myself. Thanks in advance, January -- January Weiner 3 -+--- Division of Bioinformatics, University of Muenster | Schloßplatz 4 (+49)(251)8321634 | D48149 Münster http://www.uni-muenster.de/Biologie.Botanik/ebb/| Germany __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table() and scientific notation
Your example does not exhibit that behavior when I try it (below). Can you provide a reproducible example following the style shown here: Lines - a 1 2e-4 + b 2 3e-8 DF - read.table(textConnection(Lines)) str(DF) 'data.frame': 2 obs. of 3 variables: $ V1: Factor w/ 2 levels a,b: 1 2 $ V2: int 1 2 $ V3: num 2e-04 3e-08 R.version.string # Windows XP [1] R version 2.4.0 (2006-10-03) On 10/10/06, January Weiner [EMAIL PROTECTED] wrote: Dear all, I am having troubles importing values written as scientific notation using read.table(). I'm sure this is a frequent problem, as many people in my lab have this problem as well, so I'm sure that I just have troubles googling for the right solution. The problem is, that, given a file like that: a 1 2e-4 b 2 3e-8 ... the third column gets imported as a factor, or a string if I set the as.is parameter of read.table to TRUE for this column. However, I just want a simple numeric vector :-) I'm sure there is a simple trick for this. If you can point me to the right function, or manual, I think I should be able to find out the details myself. Thanks in advance, January -- January Weiner 3 -+--- Division of Bioinformatics, University of Muenster | Schloßplatz 4 (+49)(251)8321634 | D48149 Münster http://www.uni-muenster.de/Biologie.Botanik/ebb/| Germany __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table() and scientific notation
On FC5 Linux: gannet% cat foo.dat a 1 2e-4 b 2 3e-8 gannet% R ... read.table(foo.dat) V1 V2V3 1 a 1 2e-04 2 b 2 3e-08 sapply(read.table(foo.dat), class) V1V2V3 factor integer numeric so please tell us your environment and give a reproducible example. (This is using the OS function strtod, so it might be a deficiency in your OS's implementation of ISO C.) On Tue, 10 Oct 2006, January Weiner wrote: Dear all, I am having troubles importing values written as scientific notation using read.table(). I'm sure this is a frequent problem, as many people in my lab have this problem as well, so I'm sure that I just have troubles googling for the right solution. The problem is, that, given a file like that: a 1 2e-4 b 2 3e-8 ... the third column gets imported as a factor, or a string if I set the as.is parameter of read.table to TRUE for this column. However, I just want a simple numeric vector :-) I'm sure there is a simple trick for this. If you can point me to the right function, or manual, I think I should be able to find out the details myself. Thanks in advance, January -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table() and scientific notation
Oh, thanks, that was hint enough :-) I see it now. I turns that R does not understand e-10 ...which stands for 1e-10 and is produced by some of the bioinformatic applications that I use (notably BLAST). However, R instead of being verbose on that just assumes that the whole column is a string. Is there a way to enforce a specific conversion in R (for example, to be able to see where the errors are?). January -- January Weiner 3 -+--- Division of Bioinformatics, University of Muenster | Schloßplatz 4 (+49)(251)8321634 | D48149 Münster http://www.uni-muenster.de/Biologie.Botanik/ebb/| Germany __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table() and scientific notation
I think the colClasses argument to read.table() is what you need. Either that, or explicitly cast columns in the data.frame that's returned by read.table(). That's how you get data types that aren't directly supported by read.table(), like various date formats. - Martin January Weiner wrote: Oh, thanks, that was hint enough :-) I see it now. I turns that R does not understand e-10 ...which stands for 1e-10 and is produced by some of the bioinformatic applications that I use (notably BLAST). However, R instead of being verbose on that just assumes that the whole column is a string. Is there a way to enforce a specific conversion in R (for example, to be able to see where the errors are?). January __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table() and scientific notation
On Tue, 10 Oct 2006, January Weiner wrote: Oh, thanks, that was hint enough :-) I see it now. I turns that R does not understand e-10 ...which stands for 1e-10 and is produced by some of the bioinformatic applications that I use (notably BLAST). And that is not standard C notation. However, R instead of being verbose on that just assumes that the whole column is a string. Is there a way to enforce a specific conversion in R (for example, to be able to see where the errors are?). Please study ?read.table, especially 'colClasses'. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table() and scientific notation
A cheeky solution by subverting the coerce mechanism and read.table: # install a coerce function which can fix the e+10 syntax for an imaginary class myDouble: setAs(character, myDouble, function(from)as.double(sub('^(-?) e','\\11e',from))) Warning message: in the method signature for function 'coerce' no definition for class: “myDouble” in: matchSignature(signature, fdef, where) # load some data: Lines - scan(sep=\n, what=) a 1 3e-8 b 2 1e+10 c 3 e-10 d 4 e+3 e 5 e+1 # process it without using the imaginary class - use a real double instead to see what happens: # Note I've used textConnection(Lines) here, where your filename would go T - read.table(textConnection(Lines), colClasses=list (character, integer, double)) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got 'e-10' # process it, specifying the imaginary class myDouble. T - read.table(textConnection(Lines), colClasses=list (character, integer, myDouble)) T V1 V2V3 1 a 1 3e-08 2 b 2 1e+10 3 c 3 1e-10 4 d 4 1e+03 5 e 5 1e+01 lapply(T, class) $V1 [1] character $V2 [1] integer $V3 [1] numeric Someone's bound to shoot me down for hackery here :-) -Alex On 10 Oct 2006, at 11:43, January Weiner wrote: Dear all, I am having troubles importing values written as scientific notation using read.table(). I'm sure this is a frequent problem, as many people in my lab have this problem as well, so I'm sure that I just have troubles googling for the right solution. The problem is, that, given a file like that: a 1 2e-4 b 2 3e-8 ... the third column gets imported as a factor, or a string if I set the as.is parameter of read.table to TRUE for this column. However, I just want a simple numeric vector :-) I'm sure there is a simple trick for this. If you can point me to the right function, or manual, I think I should be able to find out the details myself. Thanks in advance, January -- January Weiner 3 -+--- Division of Bioinformatics, University of Muenster | Schloßplatz 4 (+49)(251)8321634 | D48149 Münster http://www.uni-muenster.de/Biologie.Botanik/ebb/| Germany __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.