Re: [R] Regex matching that gives byte offset?
On Monday 02 November 2009 13:41:45 Prof Brian Ripley wrote: > On Mon, 2 Nov 2009, Johannes Graumann wrote: > > Hmmm ... that should do it, thanks. But how would one use this on a file > > without reading it into memory completely? > > ?file, ?readLines, ?readBin > > will tell you about connections. ... all of which I only get to read by the line and a regexpr on that will not give me the absolute offset. "grep -buo" on the unix command line is really fast for this. If I can't find the native R equivalent, I'm of a mind to do this via a sys call - ugly and not portable, but SOOO fast ... is it possible in R? Joh > > > Joh > > > > On Wednesday 28 October 2009 16:29:00 Prof Brian Ripley wrote: > >> Do you mean like regexpr() (on the same help page)? > >> > >> Depending on your locale, you might actually prefer the character > >> offset: if you want to match in a MBCS and have byte offsets you will > >> need to work a bit harder if useBytes=TRUE is not sufficient for you. > >> > >> On Wed, 28 Oct 2009, Johannes Graumann wrote: > >>> Hi, > >>> > >>> Is there any way of doing 'grep' ore something like it on the content > >>> of a text file and extract the byte positioning of the match in the > >>> file? I'm facing the need to access rather largish (>600MB) XML files > >>> and would like to be able to index them ... > >>> > >>> Thanks for any help or flogging, > >>> > >>> Joh > >>> > >>> __ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html and provide commented, > >>> minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regex matching that gives byte offset?
On Mon, 2 Nov 2009, Johannes Graumann wrote: Hmmm ... that should do it, thanks. But how would one use this on a file without reading it into memory completely? ?file, ?readLines, ?readBin will tell you about connections. Joh On Wednesday 28 October 2009 16:29:00 Prof Brian Ripley wrote: Do you mean like regexpr() (on the same help page)? Depending on your locale, you might actually prefer the character offset: if you want to match in a MBCS and have byte offsets you will need to work a bit harder if useBytes=TRUE is not sufficient for you. On Wed, 28 Oct 2009, Johannes Graumann wrote: Hi, Is there any way of doing 'grep' ore something like it on the content of a text file and extract the byte positioning of the match in the file? I'm facing the need to access rather largish (>600MB) XML files and would like to be able to index them ... Thanks for any help or flogging, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regex matching that gives byte offset?
Hmmm ... that should do it, thanks. But how would one use this on a file without reading it into memory completely? Joh On Wednesday 28 October 2009 16:29:00 Prof Brian Ripley wrote: > Do you mean like regexpr() (on the same help page)? > > Depending on your locale, you might actually prefer the character > offset: if you want to match in a MBCS and have byte offsets you will > need to work a bit harder if useBytes=TRUE is not sufficient for you. > > On Wed, 28 Oct 2009, Johannes Graumann wrote: > > Hi, > > > > Is there any way of doing 'grep' ore something like it on the content of > > a text file and extract the byte positioning of the match in the file? > > I'm facing the need to access rather largish (>600MB) XML files and would > > like to be able to index them ... > > > > Thanks for any help or flogging, > > > > Joh > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regex matching that gives byte offset?
Do you mean like regexpr() (on the same help page)? Depending on your locale, you might actually prefer the character offset: if you want to match in a MBCS and have byte offsets you will need to work a bit harder if useBytes=TRUE is not sufficient for you. On Wed, 28 Oct 2009, Johannes Graumann wrote: Hi, Is there any way of doing 'grep' ore something like it on the content of a text file and extract the byte positioning of the match in the file? I'm facing the need to access rather largish (>600MB) XML files and would like to be able to index them ... Thanks for any help or flogging, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.