Re: [opencog-dev] Performance improvement suggestion for the LALR parser generator used by GHOST

Xabush Semrie Sat, 29 Jun 2019 14:29:26 -0700

I see your point. And for a simple use-case your method works. But in my 
case, I have the following requirements


1. I have to return both the atomese and the parsed JSON to the use.

2. I am running a different pattern matching functions to aggregate their 
outputs and parse the result as a whole. That's why I am using the parser.

3. I have to create some links based on discovered patterns instead of 
directly return a JSON string. For example, we this kind of a function:

(define outputInteraction
   (lambda(gene)
       (cog-execute! (BindLink
         (VariableList
           (TypedVariable (VariableNode "$a") (Type 'GeneNode))
           (TypedVariable (VariableNode "$b") (Type 'GeneNode)))

          (And  
           (EvaluationLink
              (PredicateNode "interacts_with")
              (ListLink
              gene
              (VariableNode "$a")
             ))

            (EvaluationLink
              (PredicateNode "interacts_with")
              (ListLink
              (VariableNode "$a")
              (VariableNode "$b")
             ))

            (EvaluationLink
              (PredicateNode "interacts_with")
              (ListLink
               gene
              (VariableNode "$b")
             ))
         )
         (ExecutionOutputLink
           (GroundedSchemaNode "scm: generate-result")
             (ListLink
               (VariableNode "$a")
               (VariableNode "$b")
             ))
   ))  
))

And generate result is something like

(define (generate-result gene-a gene-b)
    (ListLink 
        (EvaluationLink 
            (PredicateNode "interacts_with") 
            (ListLink gene-a gene-b))
       (node-info gene-b)
       (node-info gene-a)
       )
)


So based on the above points, I decided to write a custom parser.

On Sunday, June 30, 2019 at 12:16:24 AM UTC+3, linas wrote:
>
> I  mean, one very low-brow, trivial way to do it would be to write:
>
> (BindLink 
>    (VariableList 
>        (TypedVariable (Variable "SRC") (Type 'GeneNode))
>        (TypedVariable (Variable "TGT") (Type 'MoleculeNode))
>        (TypedVariable (Variable "XPS") (Type 'PredicateNode)))
>    ; what you are looking for
>    (Evaluation (Variable "XPS")(List (Variable "SRC") (Variable "TGT")))
>    ; what to do when you find it
>    (ExecutationOutput
>         (GroundedSchema "scm:print-stuff")
>         (List (Variable "SRC") (Variable "TGT") (Variable "XPS") ))
>
> ; and then define the printer:
>
> (define (print-stuff src tgt xps)
>    (format #t "{ \"data\": {\"source\": \"~A\", \"target\": \"~A\", 
> "name": \"~A\", \"group\": \"edges\"}}"
>        (cog-name src) (cog-name tgt) (cog-name xps))
>    ; a return value
>    xps)
>
> I mean -- this is low-brow, simple, bordering on trite, but does what you 
> want to do, for your example.  There are other ways of doing this that are 
> even simpler, but the above is a good demo.  Maybe you need more 
> sophisticated features, but the above is lots easier than trying to figure 
> out LALR.   I mean, knowing what LALR is and having experience with it is a 
> "good thing", but its overkill for this particular problem.
>
> --linas
>
> On Sat, Jun 29, 2019 at 3:58 PM Linas Vepstas <[email protected] 
> <javascript:>> wrote:
>
>>
>>
>> On Sat, Jun 29, 2019 at 3:54 PM Xabush Semrie <[email protected] 
>> <javascript:>> wrote:
>>
>>>
>>> why the heck would you need to "parse" atomese?  
>>>
>>>  What are you actually trying to do? 
>>>
>>>
>>> I am converting it to JSON for graph visualization with Cytoscape.js for 
>>> an annotation service. For example,
>>> (EvaluationLink
>>>  (PredicateNode "expresses")
>>>  (ListLink 
>>>     (GeneNode "MAP2K4")
>>>     (MoleculeNode "Uniprot:Q5U0B8")))
>>>
>>> The above will be "parsed" into the following JSON
>>> {
>>>   "data": {"source": "MAP2K4", "target": "Uniprot:Q5U0B8", "name": 
>>> "expresses", "group": "edges"}
>>> }
>>>
>>>
>> Why not just dump directly from the atomspace?
>>
>>
>>>  Especially since it already comes with a built-in parser? 
>>>
>>>
>>> Maybe I am confusing something here, but I didn't know any parser 
>>> existed for my use case.
>>>
>>
>> ? Of course there is. It's called "the atomspace". 
>>
>> --linas
>>
>>>
>>>  
>>> On Saturday, June 29, 2019 at 11:43:55 PM UTC+3, linas wrote:
>>>>
>>>> Dumb question: why the heck would you need to "parse" atomese? 
>>>> Especially since it already comes with a built-in parser?  What are you 
>>>> actually trying to do? --linas
>>>>
>>>> On Sat, Jun 29, 2019 at 10:04 AM Xabush Semrie <[email protected]> 
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have been working recently on LALR parser to parse atomese to 
>>>>> JSON(code can be found here 
>>>>> <https://github.com/Habush/annotation-scheme/blob/de66cd29c375321e5c7a14741a91c40ac40fb0b9/helpers/atomese-parser.scm#L98>).
>>>>>  
>>>>> I initially used the same LALR parser generator used by GHOST found in 
>>>>> *(system 
>>>>> base lalr)* module with a similar lexer generator (in my case I 
>>>>> precompiled the regex patterns for performance gain). However, I was 
>>>>> getting very bad performance and it took way too long to parse moderately 
>>>>> sized atomese files. It didn't help that the module didn't provided its 
>>>>> own 
>>>>> lexer generator and in the case of the GHOST code, the regex patterns 
>>>>> were 
>>>>> not precompiled which would further degrade the performance. As a result, 
>>>>> I 
>>>>> started looking at alternatives and found the nyacc project.
>>>>>
>>>>> After rewriting the code using nyacc, I found that the nyacc parser 
>>>>> generator on average is 5-6X faster than the previous parser generator 
>>>>> (which used by GHOST) for the same file. In addition to the performance 
>>>>> improvement, it removes the need to provide a manually written lexer 
>>>>> generator, has support for mid-rule context actions for complicated 
>>>>> production rules, has a better debugging and "logging" capabilities and 
>>>>> (although minor) doesn't require to list all the terminal symbols. Also 
>>>>> the 
>>>>> project is also being actively developed.
>>>>>
>>>>> Hence, I deduced the GHOST parser could also benefit the same 
>>>>> performance improvements and thought sharing this here. I am happy to 
>>>>> work 
>>>>> on porting the LALR parser from the current one to nyacc if this gets 
>>>>> traction.
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "opencog" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/opencog.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/opencog/f3d23857-71b2-40a8-b99d-86249f9bd71a%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/opencog/f3d23857-71b2-40a8-b99d-86249f9bd71a%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>> -- 
>>>> cassette tapes - analog TV - film cameras - you
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "opencog" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> To post to this group, send email to [email protected] 
>>> <javascript:>.
>>> Visit this group at https://groups.google.com/group/opencog.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/opencog/c1ab0355-de77-428b-b8ce-baba885dd157%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/opencog/c1ab0355-de77-428b-b8ce-baba885dd157%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> -- 
>> cassette tapes - analog TV - film cameras - you
>>
>
>
> -- 
> cassette tapes - analog TV - film cameras - you
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/d7400a53-cbc2-454d-9a8f-2b570b291718%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [opencog-dev] Performance improvement suggestion for the LALR parser generator used by GHOST

Reply via email to