Re: Grammars and biological data formats

2014-08-16 Thread Martin D Kealey
Hmmm, what about just implementing mmap-as-string? Then, assuming the parsing process is somewhat stream-like, the OS will take care of swapping in chunks as you need them. You don't even need anything special to support backtracking -- it's just a memory address, after all. -Martin On Thu, 14

Re: Grammars and biological data formats

2014-08-16 Thread Fields, Christopher J
Yes, that looks like an even better option. I see that this is implemented in p5 as File::Map, which is a nice portable option. Chris On Aug 16, 2014, at 7:51 AM, Martin D Kealey mar...@kurahaupo.gen.nz wrote: Hmmm, what about just implementing mmap-as-string? Then, assuming the

Re: Grammars and biological data formats

2014-08-14 Thread Carl Mäsak
I was going to pipe in and say that I wouldn't wait around for Cat, I'd write something that reads chunks and then parses that. It'll be a bit more code, but it'll work today. But I see you reached that conclusion already. :) Lately I've found myself writing more and more grammars that parse just

Re: Grammars and biological data formats

2014-08-14 Thread Fields, Christopher J
Yeah, I'm thinking of a Cat-like class that would chunkify the data and check for matches. The main reason I would like to stick with a consistent grammar-based approach is I have seen many instances in BioPerl where a parser is essentially rewritten based on its purpose (full parsing, lazy

Grammars and biological data formats

2014-08-13 Thread Fields, Christopher J
I have a fairly simple question regarding the feasibility of using grammars with commonly used biological data formats. My main question: if I wanted to parse() or subparse() vary large files (not unheard of to have FASTA/FASTQ or other similar data files exceed 100’s of GB) would a grammar

Re: Grammars and biological data formats

2014-08-13 Thread Solomon Foster
On Sat, Aug 9, 2014 at 7:26 PM, Fields, Christopher J cjfie...@illinois.edu wrote: I have a fairly simple question regarding the feasibility of using grammars with commonly used biological data formats. My main question: if I wanted to parse() or subparse() vary large files (not unheard of

Re: Grammars and biological data formats

2014-08-13 Thread Fields, Christopher J
On Aug 13, 2014, at 4:50 AM, Solomon Foster colo...@gmail.com wrote: On Sat, Aug 9, 2014 at 7:26 PM, Fields, Christopher J cjfie...@illinois.edu wrote: I have a fairly simple question regarding the feasibility of using grammars with commonly used biological data formats. My main question:

Re: Grammars and biological data formats

2014-08-13 Thread Fields, Christopher J
On Aug 13, 2014, at 8:11 AM, Christopher Fields cjfie...@illinois.edu wrote: On Aug 13, 2014, at 4:50 AM, Solomon Foster colo...@gmail.com wrote: On Sat, Aug 9, 2014 at 7:26 PM, Fields, Christopher J cjfie...@illinois.edu wrote: I have a fairly simple question regarding the feasibility of