Re: [R] Regex matching that gives byte offset?

2009-11-02 Thread Johannes Graumann
On Monday 02 November 2009 13:41:45 Prof Brian Ripley wrote:
> On Mon, 2 Nov 2009, Johannes Graumann wrote:
> > Hmmm ... that should do it, thanks. But how would one use this on a file
> > without reading it into memory completely?
> 
> ?file, ?readLines, ?readBin
> 
> will tell you about connections.
... all of which I only get to read by the line and a regexpr on that will not 
give me the absolute offset.
"grep -buo" on the unix command line is really fast for this. If I can't find 
the native R equivalent, I'm of a mind to do this via a sys call - ugly and 
not portable, but SOOO fast ... is it possible in R?

Joh

> 
> > Joh
> >
> > On Wednesday 28 October 2009 16:29:00 Prof Brian Ripley wrote:
> >> Do you mean like regexpr() (on the same help page)?
> >>
> >> Depending on your locale, you might actually prefer the character
> >> offset: if you want to match in a MBCS and have byte offsets you will
> >> need to work a bit harder if useBytes=TRUE is not sufficient for you.
> >>
> >> On Wed, 28 Oct 2009, Johannes Graumann wrote:
> >>> Hi,
> >>>
> >>> Is there any way of doing 'grep' ore something like it on the content
> >>> of a text file and extract the byte positioning of the match in the
> >>> file? I'm facing the need to access rather largish (>600MB) XML files
> >>> and would like to be able to index them ...
> >>>
> >>> Thanks for any help or flogging,
> >>>
> >>> Joh
> >>>
> >>> __
> >>> R-help@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html and provide commented,
> >>> minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regex matching that gives byte offset?

2009-11-02 Thread Prof Brian Ripley

On Mon, 2 Nov 2009, Johannes Graumann wrote:


Hmmm ... that should do it, thanks. But how would one use this on a file
without reading it into memory completely?


?file, ?readLines, ?readBin

will tell you about connections.


Joh


On Wednesday 28 October 2009 16:29:00 Prof Brian Ripley wrote:

Do you mean like regexpr() (on the same help page)?

Depending on your locale, you might actually prefer the character
offset: if you want to match in a MBCS and have byte offsets you will
need to work a bit harder if useBytes=TRUE is not sufficient for you.

On Wed, 28 Oct 2009, Johannes Graumann wrote:

Hi,

Is there any way of doing 'grep' ore something like it on the content of
a text file and extract the byte positioning of the match in the file?
I'm facing the need to access rather largish (>600MB) XML files and would
like to be able to index them ...

Thanks for any help or flogging,

Joh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented,
minimal, self-contained, reproducible code.






--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regex matching that gives byte offset?

2009-11-02 Thread Johannes Graumann
Hmmm ... that should do it, thanks. But how would one use this on a file 
without reading it into memory completely?

Joh


On Wednesday 28 October 2009 16:29:00 Prof Brian Ripley wrote:
> Do you mean like regexpr() (on the same help page)?
> 
> Depending on your locale, you might actually prefer the character
> offset: if you want to match in a MBCS and have byte offsets you will
> need to work a bit harder if useBytes=TRUE is not sufficient for you.
> 
> On Wed, 28 Oct 2009, Johannes Graumann wrote:
> > Hi,
> >
> > Is there any way of doing 'grep' ore something like it on the content of
> > a text file and extract the byte positioning of the match in the file?
> > I'm facing the need to access rather largish (>600MB) XML files and would
> > like to be able to index them ...
> >
> > Thanks for any help or flogging,
> >
> > Joh
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regex matching that gives byte offset?

2009-10-28 Thread Prof Brian Ripley

Do you mean like regexpr() (on the same help page)?

Depending on your locale, you might actually prefer the character 
offset: if you want to match in a MBCS and have byte offsets you will 
need to work a bit harder if useBytes=TRUE is not sufficient for you.


On Wed, 28 Oct 2009, Johannes Graumann wrote:


Hi,

Is there any way of doing 'grep' ore something like it on the content of a
text file and extract the byte positioning of the match in the file? I'm
facing the need to access rather largish (>600MB) XML files and would like
to be able to index them ...

Thanks for any help or flogging,

Joh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.