I have a fairly simple question regarding the feasibility of using grammars
with commonly used biological data formats.
My main question: if I wanted to parse() or subparse() vary large files (not
unheard of to have FASTA/FASTQ or other similar data files exceed 100’s of GB)
would a grammar be the best solution? For instance, based on what I am reading
the semantics appear to be greedy; for instance:
Grammar.parsefile($file)
appears to be a convenient shorthand for:
Grammar.parse($file.slurp)
since Grammar.parse() works on a Str, not a IO::Handle or Buf. Or am I
misunderstanding how this could be accomplished?
(just to point out, I know I can subparse() as well but that also appears to
act on a string…)
As an example, I have a simple grammar for parsing FASTA, which a (deceptively)
simple format for storing sequence data:
http://en.wikipedia.org/wiki/FASTA_format
I have a simple grammar here:
https://github.com/cjfields/bioperl6/blob/master/lib/Bio/Grammar/Fasta.pm6
and tests here:
https://github.com/cjfields/bioperl6/blob/master/t/Grammar/fasta.t
Tests pass with the latest Rakudo just fine.
chris