January == January Weiner [EMAIL PROTECTED] writes:
Dear all, I am having troubles importing values written as
scientific notation using read.table(). I'm sure this is a
frequent problem, as many people in my lab have this
problem as well, so I'm sure that I just have
Note: this is advocacy for education in clear quantitative
language and is a border-line off topic rant...
The other day I read a paper from a student who used notation
like 2e-4 in the text - blech! I sent it back for revisions.
You have sent it back for revisions just because the student
Dear all,
I am having troubles importing values written as scientific notation
using read.table(). I'm sure this is a frequent problem, as many
people in my lab have this problem as well, so I'm sure that I just
have troubles googling for the right solution.
The problem is, that, given a file
Your example does not exhibit that behavior when I try it (below).
Can you provide a reproducible example following the style
shown here:
Lines - a 1 2e-4
+ b 2 3e-8
DF - read.table(textConnection(Lines))
str(DF)
'data.frame': 2 obs. of 3 variables:
$ V1: Factor w/ 2 levels a,b: 1 2
$
On FC5 Linux:
gannet% cat foo.dat
a 1 2e-4
b 2 3e-8
gannet% R
...
read.table(foo.dat)
V1 V2V3
1 a 1 2e-04
2 b 2 3e-08
sapply(read.table(foo.dat), class)
V1V2V3
factor integer numeric
so please tell us your environment and give a reproducible example.
Oh, thanks, that was hint enough :-) I see it now. I turns that R does
not understand
e-10
...which stands for 1e-10 and is produced by some of the bioinformatic
applications that I use (notably BLAST). However, R instead of being
verbose on that just assumes that the whole column is a string.
I think the colClasses argument to read.table() is what you need.
Either that, or explicitly cast columns in the data.frame that's
returned by read.table(). That's how you get data types that aren't
directly supported by read.table(), like various date formats.
- Martin
January Weiner wrote:
On Tue, 10 Oct 2006, January Weiner wrote:
Oh, thanks, that was hint enough :-) I see it now. I turns that R does
not understand
e-10
...which stands for 1e-10 and is produced by some of the bioinformatic
applications that I use (notably BLAST).
And that is not standard C notation.
A cheeky solution by subverting the coerce mechanism and read.table:
# install a coerce function which can fix the e+10 syntax for an
imaginary class myDouble:
setAs(character, myDouble, function(from)as.double(sub('^(-?)
e','\\11e',from)))
Warning message:
in the method signature for