Re: [opencog-dev] Performance improvement suggestion for the LALR parser generator used by GHOST

Xabush Semrie Sat, 29 Jun 2019 13:55:12 -0700


> why the heck would you need to "parse" atomese?


 What are you actually trying to do? 


I am converting it to JSON for graph visualization with Cytoscape.js for an 
annotation service. For example,
(EvaluationLink
 (PredicateNode "expresses")
 (ListLink 
    (GeneNode "MAP2K4")
    (MoleculeNode "Uniprot:Q5U0B8")))

The above will be "parsed" into the following JSON
{
  "data": {"source": "MAP2K4", "target": "Uniprot:Q5U0B8", "name": 
"expresses", "group": "edges"}
}


 Especially since it already comes with a built-in parser? 


Maybe I am confusing something here, but I didn't know any parser existed 
for my use case.

 
On Saturday, June 29, 2019 at 11:43:55 PM UTC+3, linas wrote:
>
> Dumb question: why the heck would you need to "parse" atomese? Especially 
> since it already comes with a built-in parser?  What are you actually 
> trying to do? --linas
>
> On Sat, Jun 29, 2019 at 10:04 AM Xabush Semrie <[email protected] 
> <javascript:>> wrote:
>
>> Hi,
>>
>> I have been working recently on LALR parser to parse atomese to JSON(code 
>> can be found here 
>> <https://github.com/Habush/annotation-scheme/blob/de66cd29c375321e5c7a14741a91c40ac40fb0b9/helpers/atomese-parser.scm#L98>).
>>  
>> I initially used the same LALR parser generator used by GHOST found in 
>> *(system 
>> base lalr)* module with a similar lexer generator (in my case I 
>> precompiled the regex patterns for performance gain). However, I was 
>> getting very bad performance and it took way too long to parse moderately 
>> sized atomese files. It didn't help that the module didn't provided its own 
>> lexer generator and in the case of the GHOST code, the regex patterns were 
>> not precompiled which would further degrade the performance. As a result, I 
>> started looking at alternatives and found the nyacc project.
>>
>> After rewriting the code using nyacc, I found that the nyacc parser 
>> generator on average is 5-6X faster than the previous parser generator 
>> (which used by GHOST) for the same file. In addition to the performance 
>> improvement, it removes the need to provide a manually written lexer 
>> generator, has support for mid-rule context actions for complicated 
>> production rules, has a better debugging and "logging" capabilities and 
>> (although minor) doesn't require to list all the terminal symbols. Also the 
>> project is also being actively developed.
>>
>> Hence, I deduced the GHOST parser could also benefit the same performance 
>> improvements and thought sharing this here. I am happy to work on porting 
>> the LALR parser from the current one to nyacc if this gets traction.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "opencog" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/opencog.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/opencog/f3d23857-71b2-40a8-b99d-86249f9bd71a%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/opencog/f3d23857-71b2-40a8-b99d-86249f9bd71a%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> -- 
> cassette tapes - analog TV - film cameras - you
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/c1ab0355-de77-428b-b8ce-baba885dd157%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [opencog-dev] Performance improvement suggestion for the LALR parser generator used by GHOST

Reply via email to