Re: Grammars and biological data formats

Fields, Christopher J Thu, 14 Aug 2014 06:50:45 -0700

Yeah, I'm thinking of a Cat-like class that would chunkify the data and check 
for matches.


The main reason I would like to stick with a consistent grammar-based approach 
is I have seen many instances in BioPerl where a parser is essentially 
rewritten based on its purpose (full parsing, lazy parsing, indexing of flat 
files, adding to a persistent data store, etc).  Having a way to both parse a 
full grammar but also subparse for a specific token/rule is very handy, and 
when Cat comes around even more so.  

Chris

Sent from my iPad

> On Aug 14, 2014, at 6:40 AM, "Carl Mäsak" <cma...@gmail.com> wrote:
> 
> I was going to pipe in and say that I wouldn't wait around for Cat,
> I'd write something that reads chunks and then parses that. It'll be a
> bit more code, but it'll work today. But I see you reached that
> conclusion already. :)
> 
> Lately I've found myself writing more and more grammars that parse
> just one line of some input. Provided that the same action object gets
> attached to the parse each time, that's an excellent place to store
> information that you want to persist between lines. Actually, action
> objects started to make a whole lot more sense to me after I found
> that use case, because it takes on the role of a session/lifetime
> object for the parse process itself.
> 
> // Carl
> 
> On Wed, Aug 13, 2014 at 3:19 PM, Fields, Christopher J
> <cjfie...@illinois.edu> wrote:
>> On Aug 13, 2014, at 8:11 AM, Christopher Fields <cjfie...@illinois.edu> 
>> wrote:
>> 
>>>> On Aug 13, 2014, at 4:50 AM, Solomon Foster <colo...@gmail.com> wrote:
>>>> 
>>>> On Sat, Aug 9, 2014 at 7:26 PM, Fields, Christopher J
>>>> <cjfie...@illinois.edu> wrote:
>>>>> I have a fairly simple question regarding the feasibility of using 
>>>>> grammars with commonly used biological data formats.
>>>>> 
>>>>> My main question: if I wanted to parse() or subparse() vary large files 
>>>>> (not unheard of to have FASTA/FASTQ or other similar data files exceed 
>>>>> 100’s of GB) would a grammar be the best solution?  For instance, based 
>>>>> on what I am reading the semantics appear to be greedy; for instance:
>>>>> 
>>>>>  Grammar.parsefile($file)
>>>>> 
>>>>> appears to be a convenient shorthand for:
>>>>> 
>>>>>  Grammar.parse($file.slurp)
>>>>> 
>>>>> since Grammar.parse() works on a Str, not a IO::Handle or Buf.  Or am I 
>>>>> misunderstanding how this could be accomplished?
>>>> 
>>>> My understanding is it is intended that parsing can work on Cats
>>>> (hypothetical lazy strings) but this hasn't been implemented yet
>>>> anywhere.
>>>> 
>>>> --
>>>> Solomon Foster: colo...@gmail.com
>>>> HarmonyWare, Inc: http://www.harmonyware.com
>>> 
>>> Yeah, that’s what I recall as well.  I see very little in the specs re: Cat 
>>> unfortunately.
>>> 
>>> chris
>> 
>> Ah, nevermind.  I did a search of the IRC channel and found it’s considered 
>> to be a ‘6.1’ feature:
>> 
>>    http://irclog.perlgeek.de/perl6/2014-07-06#i_8978974
>> 
>> It is mentioned a few times in the specs, I’m guessing based on where it’s 
>> thought to fit in best.  For the moment the proposal is to run grammar 
>> parsing on sized chunks of the input data, which might be how Cat would be 
>> implemented anyway.
>> 
>> chris
>>

Re: Grammars and biological data formats

Reply via email to