Re: [opencog-dev] Performance improvement suggestion for the LALR parser generator used by GHOST

Leung Man Hin Mon, 01 Jul 2019 21:06:30 -0700

Hi Xabush,

Performance-wise, I think it would be really nice to replace the LALR
parser in Ghost with the nyacc parser for the 5-6X performance gain +
better debugging and logging capabilities, if it's not too much work :)


On Sun, Jun 30, 2019 at 6:21 AM Linas Vepstas <[email protected]>
wrote:

> Well, you are free do to whatever you want to do, but one of the points of
> having Atomese in the first place, is to avoid having to go through such
> contortions. Why do you need the output-Interaction function?  ... the only
> reason I can see for "(GroundedSchemaNode "scm: generate-result")" is
> because you are trying to wrap "node-info" -- what does node-info do? Can
> you do it directly in the atomspace? why write code in scheme?
>
> I mean, yes, I write bucket-loads of scheme all the time, to get things
> done, but there's always the meta-question -- why, and how can it be made
> simpler?  The long-run goal is to eventually replace all scheme an python
> code with declarative Atomese that "does the same thing" - this is
> impossible in the short-run, but, in the back of your mind, always think
> "how can this be coded in a declarative manner?" instead of thinking of
> "how can I code this in a functional manner" or "a procedural manner" or
> "an OO style"?
>
> --linas
>
> On Sat, Jun 29, 2019 at 4:29 PM Xabush Semrie <[email protected]> wrote:
>
>> I see your point. And for a simple use-case your method works. But in my
>> case, I have the following requirements
>>
>> 1. I have to return both the atomese and the parsed JSON to the use.
>>
>> 2. I am running a different pattern matching functions to aggregate their
>> outputs and parse the result as a whole. That's why I am using the parser.
>>
>> 3. I have to create some links based on discovered patterns instead of
>> directly return a JSON string. For example, we this kind of a function:
>>
>> (define outputInteraction
>>    (lambda(gene)
>>        (cog-execute! (BindLink
>>          (VariableList
>>            (TypedVariable (VariableNode "$a") (Type 'GeneNode))
>>            (TypedVariable (VariableNode "$b") (Type 'GeneNode)))
>>
>>           (And
>>            (EvaluationLink
>>               (PredicateNode "interacts_with")
>>               (ListLink
>>               gene
>>               (VariableNode "$a")
>>              ))
>>
>>             (EvaluationLink
>>               (PredicateNode "interacts_with")
>>               (ListLink
>>               (VariableNode "$a")
>>               (VariableNode "$b")
>>              ))
>>
>>             (EvaluationLink
>>               (PredicateNode "interacts_with")
>>               (ListLink
>>                gene
>>               (VariableNode "$b")
>>              ))
>>          )
>>          (ExecutionOutputLink
>>            (GroundedSchemaNode "scm: generate-result")
>>              (ListLink
>>                (VariableNode "$a")
>>                (VariableNode "$b")
>>              ))
>>    ))
>> ))
>>
>> And generate result is something like
>>
>> (define (generate-result gene-a gene-b)
>>     (ListLink
>>         (EvaluationLink
>>             (PredicateNode "interacts_with")
>>             (ListLink gene-a gene-b))
>>        (node-info gene-b)
>>        (node-info gene-a)
>>        )
>> )
>>
>>
>> So based on the above points, I decided to write a custom parser.
>>
>> On Sunday, June 30, 2019 at 12:16:24 AM UTC+3, linas wrote:
>>>
>>> I  mean, one very low-brow, trivial way to do it would be to write:
>>>
>>> (BindLink
>>>    (VariableList
>>>        (TypedVariable (Variable "SRC") (Type 'GeneNode))
>>>        (TypedVariable (Variable "TGT") (Type 'MoleculeNode))
>>>        (TypedVariable (Variable "XPS") (Type 'PredicateNode)))
>>>    ; what you are looking for
>>>    (Evaluation (Variable "XPS")(List (Variable "SRC") (Variable "TGT")))
>>>    ; what to do when you find it
>>>    (ExecutationOutput
>>>         (GroundedSchema "scm:print-stuff")
>>>         (List (Variable "SRC") (Variable "TGT") (Variable "XPS") ))
>>>
>>> ; and then define the printer:
>>>
>>> (define (print-stuff src tgt xps)
>>>    (format #t "{ \"data\": {\"source\": \"~A\", \"target\": \"~A\",
>>> "name": \"~A\", \"group\": \"edges\"}}"
>>>        (cog-name src) (cog-name tgt) (cog-name xps))
>>>    ; a return value
>>>    xps)
>>>
>>> I mean -- this is low-brow, simple, bordering on trite, but does what
>>> you want to do, for your example.  There are other ways of doing this that
>>> are even simpler, but the above is a good demo.  Maybe you need more
>>> sophisticated features, but the above is lots easier than trying to figure
>>> out LALR.   I mean, knowing what LALR is and having experience with it is a
>>> "good thing", but its overkill for this particular problem.
>>>
>>> --linas
>>>
>>> On Sat, Jun 29, 2019 at 3:58 PM Linas Vepstas <[email protected]>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Sat, Jun 29, 2019 at 3:54 PM Xabush Semrie <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>> why the heck would you need to "parse" atomese?
>>>>>
>>>>>  What are you actually trying to do?
>>>>>
>>>>>
>>>>> I am converting it to JSON for graph visualization with Cytoscape.js
>>>>> for an annotation service. For example,
>>>>> (EvaluationLink
>>>>>  (PredicateNode "expresses")
>>>>>  (ListLink
>>>>>     (GeneNode "MAP2K4")
>>>>>     (MoleculeNode "Uniprot:Q5U0B8")))
>>>>>
>>>>> The above will be "parsed" into the following JSON
>>>>> {
>>>>>   "data": {"source": "MAP2K4", "target": "Uniprot:Q5U0B8", "name":
>>>>> "expresses", "group": "edges"}
>>>>> }
>>>>>
>>>>>
>>>> Why not just dump directly from the atomspace?
>>>>
>>>>
>>>>>  Especially since it already comes with a built-in parser?
>>>>>
>>>>>
>>>>> Maybe I am confusing something here, but I didn't know any parser
>>>>> existed for my use case.
>>>>>
>>>>
>>>> ? Of course there is. It's called "the atomspace".
>>>>
>>>> --linas
>>>>
>>>>>
>>>>>
>>>>> On Saturday, June 29, 2019 at 11:43:55 PM UTC+3, linas wrote:
>>>>>>
>>>>>> Dumb question: why the heck would you need to "parse" atomese?
>>>>>> Especially since it already comes with a built-in parser?  What are you
>>>>>> actually trying to do? --linas
>>>>>>
>>>>>> On Sat, Jun 29, 2019 at 10:04 AM Xabush Semrie <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have been working recently on LALR parser to parse atomese to
>>>>>>> JSON(code can be found here
>>>>>>> <https://github.com/Habush/annotation-scheme/blob/de66cd29c375321e5c7a14741a91c40ac40fb0b9/helpers/atomese-parser.scm#L98>).
>>>>>>> I initially used the same LALR parser generator used by GHOST found in 
>>>>>>> *(system
>>>>>>> base lalr)* module with a similar lexer generator (in my case I
>>>>>>> precompiled the regex patterns for performance gain). However, I was
>>>>>>> getting very bad performance and it took way too long to parse 
>>>>>>> moderately
>>>>>>> sized atomese files. It didn't help that the module didn't provided its 
>>>>>>> own
>>>>>>> lexer generator and in the case of the GHOST code, the regex patterns 
>>>>>>> were
>>>>>>> not precompiled which would further degrade the performance. As a 
>>>>>>> result, I
>>>>>>> started looking at alternatives and found the nyacc project.
>>>>>>>
>>>>>>> After rewriting the code using nyacc, I found that the nyacc parser
>>>>>>> generator on average is 5-6X faster than the previous parser generator
>>>>>>> (which used by GHOST) for the same file. In addition to the performance
>>>>>>> improvement, it removes the need to provide a manually written lexer
>>>>>>> generator, has support for mid-rule context actions for complicated
>>>>>>> production rules, has a better debugging and "logging" capabilities and
>>>>>>> (although minor) doesn't require to list all the terminal symbols. Also 
>>>>>>> the
>>>>>>> project is also being actively developed.
>>>>>>>
>>>>>>> Hence, I deduced the GHOST parser could also benefit the same
>>>>>>> performance improvements and thought sharing this here. I am happy to 
>>>>>>> work
>>>>>>> on porting the LALR parser from the current one to nyacc if this gets
>>>>>>> traction.
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "opencog" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> Visit this group at https://groups.google.com/group/opencog.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/opencog/f3d23857-71b2-40a8-b99d-86249f9bd71a%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/opencog/f3d23857-71b2-40a8-b99d-86249f9bd71a%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> cassette tapes - analog TV - film cameras - you
>>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "opencog" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/opencog.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/opencog/c1ab0355-de77-428b-b8ce-baba885dd157%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/opencog/c1ab0355-de77-428b-b8ce-baba885dd157%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>> --
>>>> cassette tapes - analog TV - film cameras - you
>>>>
>>>
>>>
>>> --
>>> cassette tapes - analog TV - film cameras - you
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "opencog" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/opencog.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/d7400a53-cbc2-454d-9a8f-2b570b291718%40googlegroups.com
>> <https://groups.google.com/d/msgid/opencog/d7400a53-cbc2-454d-9a8f-2b570b291718%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
> cassette tapes - analog TV - film cameras - you
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAHrUA36iEO5Y%3DL54GY-xA%2BBd%3D6enq0bGaoOHsFTqoGW6zy-hTw%40mail.gmail.com
> <https://groups.google.com/d/msgid/opencog/CAHrUA36iEO5Y%3DL54GY-xA%2BBd%3D6enq0bGaoOHsFTqoGW6zy-hTw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAMfi0B%2B5gAnMw64N2MBAcF7Tu_w8ywgJ%2BztPV6aSpBe5ij3qwA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [opencog-dev] Performance improvement suggestion for the LALR parser generator used by GHOST

Reply via email to