Re: [R] splitting very long character string

2006-11-02 Thread Arne.Muller
Hello,

thanks a lot for your help on splitting the string to get a numeric vector. I'm 
now writign the string to a tempfile and read it in via scan - this is fast 
enough for me:

library(XML);

...
tmp = xmlElementsByTagName(root, 'tofDataSample', recursive=T);
tmp = xmlValue(tmp[[1]]);
cat(paste('splitting', nchar(tmp), 'string ...\n'));
tmp.file = tempfile();
sink(tmp.file);
cat(tmp);
sink();
tmp = scan(tmp.file);
unlink(tmp.file);
cat(paste('splitting done,', length(tmp), 'elements\n'));

thanks again
and kind regards,

Arne

 -Original Message-
 From: john seers (IFR) [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, November 01, 2006 17:01
 To: Muller, Arne PH/FR; r-help@stat.math.ethz.ch
 Subject: RE: [R] splitting very long character string
 
 
 
 Hi Arne
 
 If you are reading in from files and they are just one number per line
 it would be more efficient to use scan directly.  ?scan
 
 For example:
 
  filen-C:/temp/tt.txt
  i-scan(filen)
 Read 5 items
  i
 [1]   12345  5643765674 63566565666
  
 
 
  
 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 [EMAIL PROTECTED]
 Sent: 01 November 2006 15:47
 To: r-help@stat.math.ethz.ch
 Subject: [R] splitting very long character string
 
 
 Hello,
 
 I've a very long character array (500k characters) that need to split
 by '\n' resulting in an array of about 60k numbers. The help 
 on strsplit
 says to use perl=TRUE to get better formance, but still it 
 takes several
 minutes to split this string.
 
 The massive string is the return value of a call to 
 xmlElementsByTagName
 from the XML library and looks like this:
 
 
 12345
 564376
 5674
 6356656
 5666
 
 
 I've to read about a hundred of these files and was wondering whether
 there's a more efficient way to turn this string into an array of
 numerics. Any ideas?
 
   thanks a lot for your help
   and kind regards,
 
   Arne
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting very long character string

2006-11-02 Thread Gabor Grothendieck
You could use the file= argument on cat to avoid the two calls to sink:

cat(tmp, file = tmp.file)

On 11/2/06, [EMAIL PROTECTED]
[EMAIL PROTECTED] wrote:
 Hello,

 thanks a lot for your help on splitting the string to get a numeric vector. 
 I'm now writign the string to a tempfile and read it in via scan - this is 
 fast enough for me:

 library(XML);

 ...
 tmp = xmlElementsByTagName(root, 'tofDataSample', recursive=T);
 tmp = xmlValue(tmp[[1]]);
 cat(paste('splitting', nchar(tmp), 'string ...\n'));
 tmp.file = tempfile();
 sink(tmp.file);
 cat(tmp);
 sink();
 tmp = scan(tmp.file);
 unlink(tmp.file);
 cat(paste('splitting done,', length(tmp), 'elements\n'));

thanks again
and kind regards,

Arne

  -Original Message-
  From: john seers (IFR) [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, November 01, 2006 17:01
  To: Muller, Arne PH/FR; r-help@stat.math.ethz.ch
  Subject: RE: [R] splitting very long character string
 
 
 
  Hi Arne
 
  If you are reading in from files and they are just one number per line
  it would be more efficient to use scan directly.  ?scan
 
  For example:
 
   filen-C:/temp/tt.txt
   i-scan(filen)
  Read 5 items
   i
  [1]   12345  5643765674 63566565666
  
 
 
 
 
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of
  [EMAIL PROTECTED]
  Sent: 01 November 2006 15:47
  To: r-help@stat.math.ethz.ch
  Subject: [R] splitting very long character string
 
 
  Hello,
 
  I've a very long character array (500k characters) that need to split
  by '\n' resulting in an array of about 60k numbers. The help
  on strsplit
  says to use perl=TRUE to get better formance, but still it
  takes several
  minutes to split this string.
 
  The massive string is the return value of a call to
  xmlElementsByTagName
  from the XML library and looks like this:
 
  
  12345
  564376
  5674
  6356656
  5666
  
 
  I've to read about a hundred of these files and was wondering whether
  there's a more efficient way to turn this string into an array of
  numerics. Any ideas?
 
thanks a lot for your help
and kind regards,
 
Arne
 
 
 
 
[[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] splitting very long character string

2006-11-01 Thread Arne.Muller
Hello,

I've a very long character array (500k characters) that need to split by '\n' 
resulting in an array of about 60k numbers. The help on strsplit says to use 
perl=TRUE to get better formance, but still it takes several minutes to split 
this string.

The massive string is the return value of a call to xmlElementsByTagName from 
the XML library and looks like this:

...
12345
564376
5674
6356656
5666
...

I've to read about a hundred of these files and was wondering whether there's a 
more efficient way to turn this string into an array of numerics. Any ideas?

thanks a lot for your help
and kind regards,

Arne




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting very long character string

2006-11-01 Thread john seers \(IFR\)

Hi Arne

If you are reading in from files and they are just one number per line
it would be more efficient to use scan directly.  ?scan

For example:

 filen-C:/temp/tt.txt
 i-scan(filen)
Read 5 items
 i
[1]   12345  5643765674 63566565666
 


 


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: 01 November 2006 15:47
To: r-help@stat.math.ethz.ch
Subject: [R] splitting very long character string


Hello,

I've a very long character array (500k characters) that need to split
by '\n' resulting in an array of about 60k numbers. The help on strsplit
says to use perl=TRUE to get better formance, but still it takes several
minutes to split this string.

The massive string is the return value of a call to xmlElementsByTagName
from the XML library and looks like this:

...
12345
564376
5674
6356656
5666
...

I've to read about a hundred of these files and was wondering whether
there's a more efficient way to turn this string into an array of
numerics. Any ideas?

thanks a lot for your help
and kind regards,

Arne




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting very long character string

2006-11-01 Thread Marc Schwartz
On Wed, 2006-11-01 at 16:47 +0100, [EMAIL PROTECTED] wrote:
 Hello,
 
 I've a very long character array (500k characters) that need to split
 by '\n' resulting in an array of about 60k numbers. The help on
 strsplit says to use perl=TRUE to get better formance, but still it
 takes several minutes to split this string.
 
 The massive string is the return value of a call to
 xmlElementsByTagName from the XML library and looks like this:
 
 ...
 12345
 564376
 5674
 6356656
 5666
 ...
 
 I've to read about a hundred of these files and was wondering whether
 there's a more efficient way to turn this string into an array of
 numerics. Any ideas?
 
   thanks a lot for your help
   and kind regards,
 
   Arne
 

Vec - sample(c(0:9, \n), 50, replace = TRUE)

 str(Vec)
 chr [1:50] 7 0 9 6 5 3 1 9 ...

 table(Vec)
Vec
   \n 0 1 2 3 4 5 6 7 8 9
45432 45723 45641 45526 45460 45284 45378 45392 45374 45314 45476


 sink(Vec.txt)
 cat(Vec)
 sink()

First 10 lines of Vec.txt:

7 0 9 6 5 3 1 9 8 1 8 3 4 2 
 1 2 2 
 3 7 7 6 8 3 4 7 4 
 9 2 1 9 8 7 2 0 9 4 3 
 9 3 5 2 2 5 8 0 5 4 5 6 1 5 8 7 4 1 2 8 3 2 6 4 9 4 1 6 8 5 0 8 8 8 5 3 0 5 3 
5 4 8 5 4 3 
 9 
 5 3 6 5 8 9 7 6 9 
 5 8 
 2 4 6 
 5 

 system.time(Vec.Split - scan(Vec.txt, sep = \n))
Read 41276 items
[1] 0.180 0.004 0.186 0.000 0.000

 str(Vec.Split)
 num [1:41276] 7.10e+13 1.22e+02 3.78e+08 9.22e+10 9.35e+44 ...

 sprintf(%.0f, Vec.Split[1:10])
 [1] 70965319818342
 [2] 122
 [3] 377683474
 [4] 92198720943
 [5] 935225805456158720742405574866620654670577664
 [6] 9
 [7] 536589769
 [8] 58
 [9] 246
[10] 5


Does that help?

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting very long character string

2006-11-01 Thread Prof Brian Ripley
On Wed, 1 Nov 2006, [EMAIL PROTECTED] wrote:

 Hello,

 I've a very long character array (500k characters) that need to split 
 by '\n' resulting in an array of about 60k numbers. The help on strsplit 
 says to use perl=TRUE to get better formance, but still it takes several 
 minutes to split this string.

Can't you use fixed=TRUE since you do not have a regular expression?
Nevertheless, if you are going to be creating about 60k character strings, 
the overhead in creating the strings will be very considerable.

If you just want the numbers, using an anonymous file() connection to 
write out the string and then using scan() might well be a lot more 
efficient.

 The massive string is the return value of a call to xmlElementsByTagName 
 from the XML library and looks like this:
^^^
'package' or your own C code accessing libxml?

 ...
 12345
 564376
 5674
 6356656
 5666
 ...

 I've to read about a hundred of these files and was wondering whether there's 
 a more efficient way to turn this string into an array of numerics. Any ideas?

   thanks a lot for your help
   and kind regards,

   Arne




   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.