Re: [R] splitting very long character string
Hello, thanks a lot for your help on splitting the string to get a numeric vector. I'm now writign the string to a tempfile and read it in via scan - this is fast enough for me: library(XML); ... tmp = xmlElementsByTagName(root, 'tofDataSample', recursive=T); tmp = xmlValue(tmp[[1]]); cat(paste('splitting', nchar(tmp), 'string ...\n')); tmp.file = tempfile(); sink(tmp.file); cat(tmp); sink(); tmp = scan(tmp.file); unlink(tmp.file); cat(paste('splitting done,', length(tmp), 'elements\n')); thanks again and kind regards, Arne -Original Message- From: john seers (IFR) [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 01, 2006 17:01 To: Muller, Arne PH/FR; r-help@stat.math.ethz.ch Subject: RE: [R] splitting very long character string Hi Arne If you are reading in from files and they are just one number per line it would be more efficient to use scan directly. ?scan For example: filen-C:/temp/tt.txt i-scan(filen) Read 5 items i [1] 12345 5643765674 63566565666 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: 01 November 2006 15:47 To: r-help@stat.math.ethz.ch Subject: [R] splitting very long character string Hello, I've a very long character array (500k characters) that need to split by '\n' resulting in an array of about 60k numbers. The help on strsplit says to use perl=TRUE to get better formance, but still it takes several minutes to split this string. The massive string is the return value of a call to xmlElementsByTagName from the XML library and looks like this: 12345 564376 5674 6356656 5666 I've to read about a hundred of these files and was wondering whether there's a more efficient way to turn this string into an array of numerics. Any ideas? thanks a lot for your help and kind regards, Arne [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splitting very long character string
You could use the file= argument on cat to avoid the two calls to sink: cat(tmp, file = tmp.file) On 11/2/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hello, thanks a lot for your help on splitting the string to get a numeric vector. I'm now writign the string to a tempfile and read it in via scan - this is fast enough for me: library(XML); ... tmp = xmlElementsByTagName(root, 'tofDataSample', recursive=T); tmp = xmlValue(tmp[[1]]); cat(paste('splitting', nchar(tmp), 'string ...\n')); tmp.file = tempfile(); sink(tmp.file); cat(tmp); sink(); tmp = scan(tmp.file); unlink(tmp.file); cat(paste('splitting done,', length(tmp), 'elements\n')); thanks again and kind regards, Arne -Original Message- From: john seers (IFR) [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 01, 2006 17:01 To: Muller, Arne PH/FR; r-help@stat.math.ethz.ch Subject: RE: [R] splitting very long character string Hi Arne If you are reading in from files and they are just one number per line it would be more efficient to use scan directly. ?scan For example: filen-C:/temp/tt.txt i-scan(filen) Read 5 items i [1] 12345 5643765674 63566565666 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: 01 November 2006 15:47 To: r-help@stat.math.ethz.ch Subject: [R] splitting very long character string Hello, I've a very long character array (500k characters) that need to split by '\n' resulting in an array of about 60k numbers. The help on strsplit says to use perl=TRUE to get better formance, but still it takes several minutes to split this string. The massive string is the return value of a call to xmlElementsByTagName from the XML library and looks like this: 12345 564376 5674 6356656 5666 I've to read about a hundred of these files and was wondering whether there's a more efficient way to turn this string into an array of numerics. Any ideas? thanks a lot for your help and kind regards, Arne [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] splitting very long character string
Hello, I've a very long character array (500k characters) that need to split by '\n' resulting in an array of about 60k numbers. The help on strsplit says to use perl=TRUE to get better formance, but still it takes several minutes to split this string. The massive string is the return value of a call to xmlElementsByTagName from the XML library and looks like this: ... 12345 564376 5674 6356656 5666 ... I've to read about a hundred of these files and was wondering whether there's a more efficient way to turn this string into an array of numerics. Any ideas? thanks a lot for your help and kind regards, Arne [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splitting very long character string
Hi Arne If you are reading in from files and they are just one number per line it would be more efficient to use scan directly. ?scan For example: filen-C:/temp/tt.txt i-scan(filen) Read 5 items i [1] 12345 5643765674 63566565666 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: 01 November 2006 15:47 To: r-help@stat.math.ethz.ch Subject: [R] splitting very long character string Hello, I've a very long character array (500k characters) that need to split by '\n' resulting in an array of about 60k numbers. The help on strsplit says to use perl=TRUE to get better formance, but still it takes several minutes to split this string. The massive string is the return value of a call to xmlElementsByTagName from the XML library and looks like this: ... 12345 564376 5674 6356656 5666 ... I've to read about a hundred of these files and was wondering whether there's a more efficient way to turn this string into an array of numerics. Any ideas? thanks a lot for your help and kind regards, Arne [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splitting very long character string
On Wed, 2006-11-01 at 16:47 +0100, [EMAIL PROTECTED] wrote: Hello, I've a very long character array (500k characters) that need to split by '\n' resulting in an array of about 60k numbers. The help on strsplit says to use perl=TRUE to get better formance, but still it takes several minutes to split this string. The massive string is the return value of a call to xmlElementsByTagName from the XML library and looks like this: ... 12345 564376 5674 6356656 5666 ... I've to read about a hundred of these files and was wondering whether there's a more efficient way to turn this string into an array of numerics. Any ideas? thanks a lot for your help and kind regards, Arne Vec - sample(c(0:9, \n), 50, replace = TRUE) str(Vec) chr [1:50] 7 0 9 6 5 3 1 9 ... table(Vec) Vec \n 0 1 2 3 4 5 6 7 8 9 45432 45723 45641 45526 45460 45284 45378 45392 45374 45314 45476 sink(Vec.txt) cat(Vec) sink() First 10 lines of Vec.txt: 7 0 9 6 5 3 1 9 8 1 8 3 4 2 1 2 2 3 7 7 6 8 3 4 7 4 9 2 1 9 8 7 2 0 9 4 3 9 3 5 2 2 5 8 0 5 4 5 6 1 5 8 7 4 1 2 8 3 2 6 4 9 4 1 6 8 5 0 8 8 8 5 3 0 5 3 5 4 8 5 4 3 9 5 3 6 5 8 9 7 6 9 5 8 2 4 6 5 system.time(Vec.Split - scan(Vec.txt, sep = \n)) Read 41276 items [1] 0.180 0.004 0.186 0.000 0.000 str(Vec.Split) num [1:41276] 7.10e+13 1.22e+02 3.78e+08 9.22e+10 9.35e+44 ... sprintf(%.0f, Vec.Split[1:10]) [1] 70965319818342 [2] 122 [3] 377683474 [4] 92198720943 [5] 935225805456158720742405574866620654670577664 [6] 9 [7] 536589769 [8] 58 [9] 246 [10] 5 Does that help? Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splitting very long character string
On Wed, 1 Nov 2006, [EMAIL PROTECTED] wrote: Hello, I've a very long character array (500k characters) that need to split by '\n' resulting in an array of about 60k numbers. The help on strsplit says to use perl=TRUE to get better formance, but still it takes several minutes to split this string. Can't you use fixed=TRUE since you do not have a regular expression? Nevertheless, if you are going to be creating about 60k character strings, the overhead in creating the strings will be very considerable. If you just want the numbers, using an anonymous file() connection to write out the string and then using scan() might well be a lot more efficient. The massive string is the return value of a call to xmlElementsByTagName from the XML library and looks like this: ^^^ 'package' or your own C code accessing libxml? ... 12345 564376 5674 6356656 5666 ... I've to read about a hundred of these files and was wondering whether there's a more efficient way to turn this string into an array of numerics. Any ideas? thanks a lot for your help and kind regards, Arne [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.