Re: [opencog-dev] Re: Pattern mining from PLN inference histories

'Nil Geisweiller' via opencog Mon, 19 Jun 2017 01:06:07 -0700

Hi,

sorry, I was trapped in a variadic-template-hole. I managed to escapebut not without bringing back a vicious gift. Now everything looks likea variadic template to me.


On 06/14/2017 04:32 PM, Shujing Ke wrote:

3)GroundedSchemaNode and TypeNode are not considered to becomeVariableNodes. Seems it doesn't make a lot of sense to make them intovariables. Also, if they become Variables, there are errors because theatom core will check the types and get the opencog::SyntaxException below:Unexpected contents in TypedVariableLink Expected type specifier(e.g. TypeNode, TypeChoice, etc.), got PatternVariableNode
and
    ExecutionOutputLink must have schema! Got (PatternVariableNode ...


These errors should go away if you quote them.

4)In current stage, both VariableNode and PatternVariableNode are usedto distinguish the Variables generated by Pattern Miner and the originalVariables. We can unfiy them later if need.

As soon as these variable are scoped it can be unified. I don't mind toretain PatternVariableNode if it has a specific semantics, otherwise itshould go.

*2. About pattern gram*
Actually in this application, only 1-gram patterns are wanted. Inprevious applications, like mining from DBPedia and conceptnet, n-grampattern contains n Links, but the Link is relatively small, like oneEvalutionLink or Inheritance Link. For example:
  EvaluationLink
          PredicateNode "Country"
          ListLink
                "var_1"
                 "USA"
  EvaluationLink
          PredicateNode "Language"
          ListLink
                "var_1"
                 "English"
Above is a 2-gram pattern, because each fact in DBPedia is just oneEvaluationLink. This two Links are connected via the variablenode"var_1". But in the pln corpus case, each ExecutionOutputLinks is big,it contains a lot of Links in it. It seems we don't really want thepatterns that contains multiple ExecutionOutputLinks in one singlepattern; but we want to find the common abstract patterns of each singleExecutionOutputLink. So it is actually just 1-gram.

I'm confused, is it a 2-gram because the pattern is the conjunction of 2EvaluationLinks, or because each pattern pattern has 2 links in them(EvaluationLink and ListLink)?

BTW, I think you could use AndLink root links to denote conjunctions ofpatterns, like in the pattern matcher, so for instance the pattern abovewould be


SatifyingSetScopeLink
  Variable "var_1"
  AndLink
    EvaluationLink
      PredicateNode "Country"
      ListLink
        "var_1"
        "USA"
    EvaluationLink
      PredicateNode "Language"
      ListLink
        "var_1"
        "English"

In case you want to represent a pattern of AndLink you could local quoteit, like


SatifyingSetScopeLink
  Variable "var_1"
  LocalQuoteLink
    AndLink
      EvaluationLink
        PredicateNode "Country"
        ListLink
          "var_1"
          "USA"
      EvaluationLink
        PredicateNode "Language"
        ListLink
          "var_1"
          "English"

like in the pattern matcher. What do you think?

*3. The interestingness evaluation is different from previous application*s
Our interestingness evalution is based on surpringness measure, whichincludes Surpingness_I and Surpringness_II:Surpingness_I : how difficult the actual frequency of a n-gram patterncan be infered from all its (n-1)-gram to 1-gram subpatterns' frequency.Surpingness_II : how difficult the actual frequency of a n-gram patterncan be infered from all its (n+1)-gram super patterns' frequency.But in the pln corpus, we only mine 1 gram, and I guess the interestingpatterns here you want to identify is the patterns of "the max degree ofabstraction" , for example:pattern1: (x and y are friends) (x is musician) (y is musician) (z ismusician) (z and y are friends)->(x and z are friends)pattern2: (x and y are friends) (x is var_job) (y is var_job) (z isvar_job) (z and y are friends)->(x and z are friends)If pattern 1 occurs 10 times; pattern 2 also occurs 10 times, it meansthat pattern 2 only be right when var_job = musician, which means theabstraction to be pattern 2 is no sense. So patten 1 is already the maxdegree of abstraction in this case. If my unerstand is right, then Iwill need to write a new interestingness evalution for this, because itis different from the surpringness measure.

That sounds right but I think I need to understand better how thepattern miner algorithm operates. I'll look into that soon.

*4. Unify the link order in unordered Links*
In current stage, I haven't unify the order of Links in unordered Linksin the corpus, like AndLink. For example:In the AndLink below in the corpus, the 3 EvaluationLinks are possibleto be in a different order for different instances, which will affectthe structure of the AndLink, but they actually should be orderindependent. So if this sistuation does exist in the pln corpus otherother applications in future, I may need to unify the order of the inputLinks to be a unique order (just like the pattern isomorphic problem Isolved before in pattern miner), but I am not sure I can find a way tomake all of them have an a unique order; the worse case, we have togenerate all the possible combinations for each unorderedLink in itspatterns.
                   AndLink
                     EvaluationLink
                       (PredicateNode "are-friends")
                        ListLink
                         (ConceptNode "John")
                         (VariableNode "$Y-37aad5ea")
                     (EvaluationLink
                       (PredicateNode "is-musician")
                       (VariableNode "$Y-37aad5ea")
                     (EvaluationLink
                       (PredicateNode "is-musician")
                       (VariableNode "John")

Yes, this is gonna be needed, although for now it can wait I think.Regarding how to solve it, I afraid you're gonna have to consider allpermutations, like the unifier and pattern matcher do.

Nil



Thanks,
Shujing

On Mon, Jun 5, 2017 at 9:09 AM, Nil Geisweiller <ngeis...@googlemail.com<mailto:ngeis...@googlemail.com>> wrote:


    Hi Shuijing,

    that is where CHandle could be useful. If there are equal it means
    they are bound to the same scope, and thus should be considered the
    same. In practice you won't find many patterns with persistent
    original variables because these variables will have different
    scopes at most of the time, thus they will likely be replaced by
    pattern variables.

    So for example if you get the 2 groundings

    > (ExecutionOutputLink
    >     (GroundedSchemaNode "scm: bc-deduction-formula") ;
    >     (ListLink
    >       (InheritanceLink
     >         (VariableNode "$X") ; <- bound to scope-1
    >         (ConceptNode "D") ;
    >       ) ;
    >       (InheritanceLink
     >         (VariableNode "$X") ; <- bound to scope-1
     >         (VariableNode "$B-6266d6f2") <- bound to scope-1
     >       )
     >       (InheritanceLink
     >         (VariableNode "$B-6266d6f2") <- bound to scope-1
    >         (ConceptNode "D")
    >       )
    >     )
    > )



    > (ExecutionOutputLink
    >     (GroundedSchemaNode "scm: bc-deduction-formula") ;
    >     (ListLink
    >       (InheritanceLink
     >         (VariableNode "$X") ; <- bound to scope-2
    >         (ConceptNode "D") ;
    >       ) ;
    >       (InheritanceLink
     >         (VariableNode "$X") ; <- bound to scope-2
     >         (ConceptNode "A")
     >       )
     >       (InheritanceLink
     >         (ConceptNode "A")
    >         (ConceptNode "D")
    >       )
    >     )
    > )

    You will produce the pattern


    > (ExecutionOutputLink
    >     (GroundedSchemaNode "scm: bc-deduction-formula") ;
    >     (ListLink
    >       (InheritanceLink
     >         (VariableNode "$pattern-var1") ;
    >         (ConceptNode "D") ;
    >       ) ;
    >       (InheritanceLink
     >         (VariableNode "$pattern-var1") ;
     >         (VariableNode "$pattern-var2")
     >       )
     >       (InheritanceLink
     >         (VariableNode "$pattern-var2")
    >         (ConceptNode "D")
    >       )
    >     )
    > )

    The type of $pattern-var1 would be VariableNode (as you suggest
    below), and the type of $pattern-var2 would be Node, cause it's the
    least abstract union type of VariableNode (for the original variable
    (VariableNode "$B-6266d6f2")) and ConceptNode (for (ConceptNode "A")).

    But perhaps you don't need worry about typing pattern variables for
    now, unless you're code already takes care of it.

    See more below.

    On 06/05/2017 12:06 AM, Shujing Ke wrote:

        *Clause 1:*
        (ExecutionOutputLink
             (GroundedSchemaNode "scm: bc-deduction-formula") ;
             (ListLink
               (InheritanceLink
                 (VariableNode "$X") ;
                 (ConceptNode "D") ;
               ) ;
               (InheritanceLink
                 (VariableNode "$X") ;
                 (VariableNode "$B-6266d6f2")
               )
               (InheritanceLink
                 (VariableNode "$B-6266d6f2")
                 (ConceptNode "D")
               )
             )
        )


        *Clause 2:*
        (ExecutionOutputLink
             (GroundedSchemaNode "scm: xxxxx-formula") ;
             (ListLink
               (AAALink
                 (VariableNode "$X") ;
                 (ConceptNode "R") ;
               ) ;
               (BBLink
                 (VariableNode "$X") ;
                 (VariableNode "$Y")
               )
               (CCCLink
                 (VariableNode "$Y")
                 (ConceptNode "R")
               )
             )
        )
        (VariableNode "$X") exist in both clause 1 and 2, but they do
        not really
        have to mean the same thing, so they should not connect.


    Again, if they are bound to the same scope then they should mean the
    same thing. That would be the case if both clauses belong to the
    same large scoped tree, and you're trying to find patterns inside
    this tree. Not probable but possible.


        clause 1 and 3 have the  common pattern 1,2 and 3; clause 1 and
        4 have
        the common pattern 3:
        *
        Pattern 1:*
        (ExecutionOutputLink
             (GroundedSchemaNode "scm: bc-deduction-formula")
             (ListLink
               (InheritanceLink
                 (PatternVariableNode "$var1")
                 (ConceptNode "D")
               )
               (InheritanceLink
                 (PatternVariableNode "$var1")
                 (VariableNode "$B-6266d6f2")
               )
               (InheritanceLink
                 (VariableNode "$B-6266d6f2")
                 (ConceptNode "D")
               )
             )
          )


    My suggestion is: only bind pattern variables to the pattern scope
    (like SatisfyingSetScopeLink) and let the original variable unbound.
    That way you don't need to introduce a new PatternVariableNode type
    to make the distinction between pattern variables and original
    variables treated as constant. However prefixing the pattern
    variable names by "pattern", as you did further below, is a good
    idea for human readability.


        *Pattern 2:*
        (ExecutionOutputLink
             (GroundedSchemaNode "scm: bc-deduction-formula")
             (ListLink
               (InheritanceLink
                 (PatternVariableNode "$var1")
                 (ConceptNode "D")
               )
               (InheritanceLink
                 (PatternVariableNode "$var1")
                 (PatternVariableNode "$var2")
               )
               (InheritanceLink
                 (PatternVariableNode "$var2")
                 (ConceptNode "D")
               )
             )
          )

        *Pattern 3:*
        (ExecutionOutputLink
             (GroundedSchemaNode "scm: bc-deduction-formula")
             (ListLink
               (InheritanceLink
                 (PatternVariableNode "$var1")
                 (PatternVariableNode "$var3")
               )
               (InheritanceLink
                 (PatternVariableNode "$var1")
                 (PatternVariableNode "$var2")
               )
               (InheritanceLink
                 (PatternVariableNode "$var2")
                 (PatternVariableNode "$var3")
               )
             )
          )


    Pattern 3 is more abstract than pattern 2. Would you not want to
    return the least possible abstract patterns with the greatest
    support (or greatest given fitness)? But it's another issue anyway...


        Acutally, I guess the expected pattern here is Pattern 3, but
        the exact
        expected format of it is Pattern 4:
        *Pattern 4:*
        (ExecutionOutputLink
             (GroundedSchemaNode "scm: bc-deduction-formula")
             (ListLink
               (InheritanceLink
                 (VariableNode "$var1")
                 (PatternVariableNode "$pattern_var1")
               )
               (InheritanceLink
                 (VariableNode "$var1")
                 (VariableNode "$var2")
               )
               (InheritanceLink
                 (VariableNode "$var2")
                 (PatternVariableNode "$pattern_var1")
               )
             )
          )


    Again, just have $pattern_var1 scoped to the SatisfyingSetScopeLink
    of the pattern, and let $var1 and $var2 free. Again these patterns,
    with original variables, are gonna be unlikely (in my use cases
    anyway), but they might be meaningful in some situations.


        Which means in the process of pattern miner, I probably should
        do this:
        *1*. Do not consider any VariableNodes  are connected with each
        other
        out of a clause, even they have the same name.


    Again, only assume that variables with different scopes are not
    connected to each others.

        *5*. (optional), if it is necessary, TypedVariableLinks can be
        added to
        specify the original variablenodes:
        (TypedVariableLink
            (VariableNode "$var1")
            (TypeNode "VariableNode")
        )


    Yes, only if var1 is a pattern variable, not if it is an original
    variable, and this type declaration would be inserted in the
    variable declaration of the pattern scope (like
    SatisfyingSetScopeLink). If it is an original variable let it as is,
    it's unlikely anyway, so even if it turns out to be problematic we
    can worry about that later.


        Is this process OK?

        One more question:
        Is GroundedSchemaNode also to become variablenode?


    Possibly, as Ben said.

    Nil


        Thanks,
        Shujing

        On Thu, Jun 1, 2017 at 11:00 PM, Shujing Ke <shujin...@gmail.com
        <mailto:shujin...@gmail.com>
        <mailto:shujin...@gmail.com <mailto:shujin...@gmail.com>>> wrote:

             Ok : )

             On Thu, Jun 1, 2017 at 5:17 PM, 'Nil Geisweiller' via opencog
             <opencog@googlegroups.com <mailto:opencog@googlegroups.com>
        <mailto:opencog@googlegroups.com
        <mailto:opencog@googlegroups.com>>> wrote:

                 On 06/01/2017 04:59 PM, Shujing Ke wrote:

                     Oh, another question: is to mine patterns that
        contains at
                     least one ExecutionOutputLink, or to mine patterns
        that only
                     contains ExecutionOutputLinks and the Links inside
                     ExecutionOutputLinks?


                 I'd say all of them, at any depth. The corpus I gave
        you is not
                 gonna contain any useful pattern anyway, it's just an
        exercise
                 at this point.

                 Nil


                     On Thu, Jun 1, 2017 at 3:52 PM, Shujing Ke
                     <shujin...@gmail.com <mailto:shujin...@gmail.com>
        <mailto:shujin...@gmail.com <mailto:shujin...@gmail.com>>
                     <mailto:shujin...@gmail.com
        <mailto:shujin...@gmail.com> <mailto:shujin...@gmail.com
        <mailto:shujin...@gmail.com>>>>
                     wrote:

                         OK, I will try to mine EOLs first. Thanks : )

                         Shujing

                         On Thu, Jun 1, 2017 at 7:25 AM, Nil Geisweiller
                         <ngeis...@googlemail.com
        <mailto:ngeis...@googlemail.com>
                     <mailto:ngeis...@googlemail.com
        <mailto:ngeis...@googlemail.com>>
                     <mailto:ngeis...@googlemail.com
        <mailto:ngeis...@googlemail.com>
                     <mailto:ngeis...@googlemail.com
        <mailto:ngeis...@googlemail.com>>>> wrote:

                             Hi,

                             On 06/01/2017 01:32 AM, Shujing Ke wrote:

                                 Hi, Nil and Ben,

                                 I studied the corpus. Is each BindLink one
                     instance of
                                 inference? So


                             Yes.

                                 that each BindLink should be considered as
                     primitve / atomic
                                 -  one pattern should be one BindLink;
        any Links
                     inside a
                                 BindLink should not be mined separatly,
        right?
                     For example,


                             No they can and should be mined separately
        as well.
                     Specifically
                             what we are interested in are the structures of
                             ExecutionOutputLink (EOL). The third
        argument of an
                     inference
                             BindLink is systematically gonna be an EOL
        wrapping
                     other EOLs,
                             and we are mostly interested in mining
        these EOLs. But
                             ultimately mining the whole BindLink might
        be useful
                     too. We may
                             want to do both, but for starter only mine
        patterns
                     with an EOL
                             as root link.



                                         (InheritanceLink
                                           (VariableNode "$X")
                                           (PatternVariableNode "var1")
                                         )
                                         (InheritanceLink
                                           (VariableNode "$X")
                                           (VariableNode "$B-6266d6f2")
                                         )
                                         (InheritanceLink
                                           (VariableNode "$B-6266d6f2")
                                           (PatternVariableNode "var1")
                                         )

                                 This is a pattern that may be mined by
        patten
                     miner from the
                                 PLN corpus under a general purpose. But
        it is
                     not that kind
                                 of expected patterns as descriped in

        
http://wiki.opencog.org/w/Pattern_Miner_Prospective_Examples#patterns_in_PLN_inference_histories
        
<http://wiki.opencog.org/w/Pattern_Miner_Prospective_Examples#patterns_in_PLN_inference_histories>