Re: [R] read.table() and scientific notation

2006-10-11 Thread mmiller3
 January == January Weiner [EMAIL PROTECTED] writes:

 Dear all, I am having troubles importing values written as
 scientific notation using read.table(). I'm sure this is a
 frequent problem, as many people in my lab have this
 problem as well, so I'm sure that I just have troubles
 googling for the right solution.

 The problem is, that, given a file like that:

 a 1 2e-4
 b 2 3e-8
 ...

Note: this is advocacy for education in clear quantitative
language and is a border-line off topic rant...

The other day I read a paper from a student who used notation
like 2e-4 in the text - blech!  I sent it back for revisions.
Lately I have noticed here and in other places this tendency to
use floating point notation (also referred to as exponential
notation) where scientific notation is appropriate, and vice
versa.  The notation 2e-4 is a convenient way to express floating
point numbers with a simple text string, but it is certainly not
scientific notation.  No wonder you had trouble googling it!

Mike

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table() and scientific notation

2006-10-11 Thread January Weiner
 Note: this is advocacy for education in clear quantitative
 language and is a border-line off topic rant...

 The other day I read a paper from a student who used notation
 like 2e-4 in the text - blech!  I sent it back for revisions.

You have sent it back for revisions just because the student used a
version of the scientific notation that can be routinely found in
literature? Hm. I am _teaching_ my students to understand the
scientific notation in the form 1e-20 etc. - for example, because
many programs in the field (including R) are representing real numbers
using this version of scientific notation. I wouldn't penalize a
student for using it in a scientific text. That's what the proof
reading is for (if the editors are picky).

 Lately I have noticed here and in other places this tendency to
 use floating point notation (also referred to as exponential
 notation) where scientific notation is appropriate, and vice
 versa.  The notation 2e-4 is a convenient way to express floating
 point numbers with a simple text string, but it is certainly not
 scientific notation.

Depends how formal and picky you wish to be. 2e-4 is the same as
$2\times10^{-4}$ to me as it is for most people, I guess (e.g. look at
the Wikipedia entry).

 No wonder you had trouble googling it!

Nope. The problem with googling is that most of the pages you get when
googling for R do not refer to R as the statistical language.

Cheers,

January


-- 
 January Weiner 3  -+---
Division of Bioinformatics, University of Muenster  |  Schloßplatz 4
(+49)(251)8321634   |  D48149 Münster
http://www.uni-muenster.de/Evolution/ebb/   |  Germany

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.table() and scientific notation

2006-10-10 Thread January Weiner
Dear all,

I am having troubles importing values written as scientific notation
using read.table(). I'm sure this is a frequent problem, as many
people in my lab have this problem as well, so I'm sure that I just
have troubles googling for the right solution.

The problem is, that, given a file like that:

a 1 2e-4
b 2 3e-8
...

the third column gets imported as a factor, or a string if I set the
as.is parameter of read.table to TRUE for this column. However, I just
want a simple numeric vector :-) I'm sure there is a simple trick for
this. If you can point me to the right function, or manual, I think I
should be able to find out the details myself.

Thanks in advance,
January

-- 
 January Weiner 3  -+---
Division of Bioinformatics, University of Muenster  |  Schloßplatz 4
(+49)(251)8321634   |  D48149 Münster
http://www.uni-muenster.de/Biologie.Botanik/ebb/|  Germany

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table() and scientific notation

2006-10-10 Thread Gabor Grothendieck
Your example does not exhibit that behavior when I try it (below).
Can you provide a reproducible example following the style
shown here:

 Lines - a 1 2e-4
+ b 2 3e-8

 DF - read.table(textConnection(Lines))
 str(DF)
'data.frame':   2 obs. of  3 variables:
 $ V1: Factor w/ 2 levels a,b: 1 2
 $ V2: int  1 2
 $ V3: num  2e-04 3e-08
 R.version.string # Windows XP
[1] R version 2.4.0 (2006-10-03)


On 10/10/06, January Weiner [EMAIL PROTECTED] wrote:
 Dear all,

 I am having troubles importing values written as scientific notation
 using read.table(). I'm sure this is a frequent problem, as many
 people in my lab have this problem as well, so I'm sure that I just
 have troubles googling for the right solution.

 The problem is, that, given a file like that:

 a 1 2e-4
 b 2 3e-8
 ...

 the third column gets imported as a factor, or a string if I set the
 as.is parameter of read.table to TRUE for this column. However, I just
 want a simple numeric vector :-) I'm sure there is a simple trick for
 this. If you can point me to the right function, or manual, I think I
 should be able to find out the details myself.

 Thanks in advance,
 January

 --
  January Weiner 3  -+---
 Division of Bioinformatics, University of Muenster  |  Schloßplatz 4
 (+49)(251)8321634   |  D48149 Münster
 http://www.uni-muenster.de/Biologie.Botanik/ebb/|  Germany

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table() and scientific notation

2006-10-10 Thread Prof Brian Ripley
On FC5 Linux:

gannet% cat  foo.dat
a 1 2e-4
b 2 3e-8
gannet% R
...
 read.table(foo.dat)
   V1 V2V3
1  a  1 2e-04
2  b  2 3e-08
 sapply(read.table(foo.dat), class)
V1V2V3
  factor integer numeric

so please tell us your environment and give a reproducible example.  (This 
is using the OS function strtod, so it might be a deficiency in your OS's 
implementation of ISO C.)

On Tue, 10 Oct 2006, January Weiner wrote:

 Dear all,

 I am having troubles importing values written as scientific notation
 using read.table(). I'm sure this is a frequent problem, as many
 people in my lab have this problem as well, so I'm sure that I just
 have troubles googling for the right solution.

 The problem is, that, given a file like that:

 a 1 2e-4
 b 2 3e-8
 ...

 the third column gets imported as a factor, or a string if I set the
 as.is parameter of read.table to TRUE for this column. However, I just
 want a simple numeric vector :-) I'm sure there is a simple trick for
 this. If you can point me to the right function, or manual, I think I
 should be able to find out the details myself.

 Thanks in advance,
 January



-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table() and scientific notation

2006-10-10 Thread January Weiner
Oh, thanks, that was hint enough :-) I see it now. I turns that R does
not understand

e-10

...which stands for 1e-10 and is produced by some of the bioinformatic
applications that I use (notably BLAST). However, R instead of being
verbose on that just assumes that the whole column is a string.

Is there a way to enforce a specific conversion in R (for example, to
be able to see where the errors are?).

January

-- 
 January Weiner 3  -+---
Division of Bioinformatics, University of Muenster  |  Schloßplatz 4
(+49)(251)8321634   |  D48149 Münster
http://www.uni-muenster.de/Biologie.Botanik/ebb/|  Germany

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table() and scientific notation

2006-10-10 Thread Martin C. Martin
I think the colClasses argument to read.table() is what you need. 
Either that, or explicitly cast columns in the data.frame that's 
returned by read.table().  That's how you get data types that aren't 
directly supported by read.table(), like various date formats.

- Martin

January Weiner wrote:
 Oh, thanks, that was hint enough :-) I see it now. I turns that R does
 not understand
 
 e-10
 
 ...which stands for 1e-10 and is produced by some of the bioinformatic
 applications that I use (notably BLAST). However, R instead of being
 verbose on that just assumes that the whole column is a string.
 
 Is there a way to enforce a specific conversion in R (for example, to
 be able to see where the errors are?).
 
 January


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table() and scientific notation

2006-10-10 Thread Prof Brian Ripley
On Tue, 10 Oct 2006, January Weiner wrote:

 Oh, thanks, that was hint enough :-) I see it now. I turns that R does
 not understand

 e-10

 ...which stands for 1e-10 and is produced by some of the bioinformatic
 applications that I use (notably BLAST).

And that is not standard C notation.

 However, R instead of being
 verbose on that just assumes that the whole column is a string.

 Is there a way to enforce a specific conversion in R (for example, to
 be able to see where the errors are?).

Please study ?read.table, especially 'colClasses'.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table() and scientific notation

2006-10-10 Thread Alex Brown
A cheeky solution by subverting the coerce mechanism and read.table:

# install a coerce function which can fix the e+10 syntax for an  
imaginary class myDouble:

  setAs(character, myDouble, function(from)as.double(sub('^(-?) 
e','\\11e',from)))
Warning message:
in the method signature for function 'coerce' no definition for  
class: “myDouble” in: matchSignature(signature, fdef, where)

# load some data:

  Lines - scan(sep=\n, what=)
a 1 3e-8
b 2 1e+10
c 3 e-10
d 4 e+3
e 5 e+1

# process it without using the imaginary class - use a real double  
instead to see what happens:
# Note I've used textConnection(Lines) here, where your filename  
would go

  T - read.table(textConnection(Lines), colClasses=list 
(character, integer, double))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,  
na.strings,  :
scan() expected 'a real', got 'e-10'

# process it, specifying the imaginary class myDouble.

  T - read.table(textConnection(Lines), colClasses=list 
(character, integer, myDouble))
  T
   V1 V2V3
1  a  1 3e-08
2  b  2 1e+10
3  c  3 1e-10
4  d  4 1e+03
5  e  5 1e+01

  lapply(T, class)
$V1
[1] character

$V2
[1] integer

$V3
[1] numeric


Someone's bound to shoot me down for hackery here :-)

-Alex

On 10 Oct 2006, at 11:43, January Weiner wrote:

 Dear all,

 I am having troubles importing values written as scientific notation
 using read.table(). I'm sure this is a frequent problem, as many
 people in my lab have this problem as well, so I'm sure that I just
 have troubles googling for the right solution.

 The problem is, that, given a file like that:

 a 1 2e-4
 b 2 3e-8
 ...

 the third column gets imported as a factor, or a string if I set the
 as.is parameter of read.table to TRUE for this column. However, I just
 want a simple numeric vector :-) I'm sure there is a simple trick for
 this. If you can point me to the right function, or manual, I think I
 should be able to find out the details myself.

 Thanks in advance,
 January

 -- 
  January Weiner 3  -+---
 Division of Bioinformatics, University of Muenster  |  Schloßplatz 4
 (+49)(251)8321634   |  D48149 Münster
 http://www.uni-muenster.de/Biologie.Botanik/ebb/|  Germany

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.