Re: [opencog-dev] Re: Pattern mining from PLN inference histories

Shujing Ke Tue, 20 Jun 2017 17:30:14 -0700

Hi, Ben and Nil,

Thanks for all your responses. I may be a bit slow this week - it is too
warm here and my baby is sick, he barely eat and drink anything since
yesterday morning.

*1. About the output format and TV of patterns*
The pattern miner will output the raw patterns found from the input data
(without more process). Because different modules in Opencog and
applications may require different output formats. It shouldn't be only one
output format. Currently we can put our discussion based on raw pattern
format. After we make sure the concents of patterns are right, we can
discuss about the output formats for differnt modules. If I have time then,
I can implement it, if I don't then I think each module's developer should
also be easy to turn the raw patterns into the format they want. It is
better to be on another layer out of the pattern miner, which is more
convient for each module to modify the pattern format they need in future.
Otherewise, any module wants to change some format, they have to modify the
pattern miner core.

*2. About the pattern gram*
Actually the gram doesn't really exactly indicate the size of a pattern, it
just mean the numbers of root links in a pattern.

A ==> B, B==>C  |- A==>C
A==>C, C ==> D |- A ==>D
HebbianLink (D,B)
useful(A==>D)

Yes, it could be a 4 gram , but it can also be 1 gram, depends on the input
data
*.*
If you have a big Link likes:

ImplicationLink
     AndLink
          ImplicationLink A B
          ImplicationLink B C
          ImplicationLink C D
    ImplicationLink A D

Then this pattern will be 1-gram.

 Take the cockroach pattern for more example:
Suppose you have handle 666 and 777:

ImplicationLink [handle=666]
     EvaluationLink
         PredicateNode "eat"
         ListLink
               ConceptNode "Ben"
               ConceptNode "cockroach"
     InheritanceLink
          ConceptNode "Ben"
          ConceptNode "weird"

ImplicationLink [handle=777]
     EvaluationLink
         PredicateNode "eat"
         ListLink
               ConceptNode "NIl"
               ConceptNode "cockroach"
     InheritanceLink
          ConceptNode "Nil"
          ConceptNode "weird"

If only alow ImplicationLinks to be rootlinks, then the pattern 1 below is
a 1-gram pattern:
*Pattern 1:*
ImplicationLink
     EvaluationLink
         PredicateNode "eat"
         ListLink
               ConceptNode "var1"
               ConceptNode "cockroach"
     InheritanceLink
          ConceptNode "var1"
          ConceptNode "weird"

If EvaluationLinks and InheritanceLinks are also allow to be rootlinks,
then pattern 2,3,4 are all 2-gram patterns, because they contains two
rootlinks. Of course, in this case, pattern 3 and 4 do not make much sense,
but in the DBpedia data, these types of patterns are what we want. So we
need to specify which link types should be rootlinks for different
applications, to avoid a lot of useless patterns being mined. It can be set
in config file or scm interface throuth the white and black link type list.

*Pattern 2:*
EvaluationLink
     PredicateNode "eat"
     ListLink
          ConceptNode "var1"
          ConceptNode "cockroach"

InheritanceLink
     ConceptNode "var1"
     ConceptNode "weird"

*Pattern 3:*
 EvaluationLink
         PredicateNode "eat"
         ListLink
               ConceptNode "var1"
               ConceptNode "cockroach"

ImplicationLink
     EvaluationLink
         PredicateNode "eat"
         ListLink
               ConceptNode "var1"
               ConceptNode "cockroach"
     InheritanceLink
          ConceptNode "var1"
          ConceptNode "weird"

*Pattern 4:*
ImplicationLink
     EvaluationLink
         PredicateNode "eat"
         ListLink
               ConceptNode "Ben"
               ConceptNode "var1"
     InheritanceLink
          ConceptNode "Ben"
          ConceptNode "var2"

 InheritanceLink
     ConceptNode "Nil"
     ConceptNode "var2"

*3. About unify link orders in unorderlinks in input data*
It probably won't cost too much time to code, because it should be quite
similar to the logic of pattern isomorphism identifying algorithm which I
already have in pattern miner, becasue it is quite an important part of
pattern miner. I should be able to reuse the logic.

*4. About the interestingness evalution*

I didn't quite get the meaning of the rich(x) and z(y) and married(x,y)
example.
I think it is also related to the pattern gram. For below 2 patterns: x,y,z
are variables
pattern A:  rich(x) and z(y) and married(x,y)
pattern B:  rich(x) and cute(y) and married(x,y)

If they are represented as 3 gram patterns, then it may be able to just
evaluate their interesingness by surpringness
pattern A:
InheritanceLink  x  rich
InheritanceLink  y  z
EvaluationLink married x y

pattern B:
InheritanceLink  x  rich
InheritanceLink  y  cute
EvaluationLink married x y

If they are represented as 1 gram patterns, then I can implement an
interestingness evalution based on the variables inside one root link.
pattern A:
ImplicationLink
    AndLink
        InheritanceLink  x  rich
        InheritanceLink  y  z
    EvaluationLink married x y

pattern B:
ImplicationLink
    AndLink
        InheritanceLink  x  rich
        InheritanceLink  y  z
    EvaluationLink married x y

*5. A suggestion to make up a very simple tiny test data file *
I suggest Nil to make up a simple test data file just to test if the output
patterns are what you want and if the frequency count is correct. For
example, I made up a simple data before - the ugly-man-drink-soda file,
which contains 10 men, 10 women, among then 5 women and 5 men are ugly, and
also 5 women and 5 men drink soda - it is expected to find the pattern that
"ugly man drink soda". Because for such a tiny file, we can actually check
every output pattern and its count to see if there is any bug. If it pass,
then we can apply it on a big corpus. Otherwise, there are too many outputs
for a big corpus, it is hard to examine the result.

Thanks,
Shujing

On Tue, Jun 20, 2017 at 4:48 AM, Ben Goertzel <[email protected]> wrote:

> On Tue, Jun 20, 2017 at 2:29 AM, Nil Geisweiller
> <[email protected]> wrote:
> > What do you mean exactly by "useful(A==>D)"?
>
>
> What I was thinking was:  If the implication [666], e.g.
>
> ImplicationLink [handle=666]
>      EvaluationLink
>          PredicateNode "eat"
>          ListLink
>                ConceptNode "Ben"
>                ConceptNode "cockroach"
>
>      InheritanceLink
>           ConceptNode "Ben"
>           ConceptNode "weird"
>
>
> was used or created by the BC, and was found to be useful for whatever
> inference the BC was doing when it used or created [666], then the
> utility of this link should be annotated via
>
> EvaluationLink
>      PredicateNode "useful"
>      ListLink
>              [666]
>              [111]
>
>
> where [111] is the handle of the target of the BC inference the BC was
> doing when it created [666].
>
> So maybe my example should look more like
>
> A ==> B, B==>C  |- A==>C
> A==>C, C ==> D |- A ==>D
> HebbianLink (D,B)
> useful(A==>D, T)
>
>
> where T is a variable that matches the target of prior BC inferences...
>
> ben
>
> --
> Ben Goertzel, PhD
> http://goertzel.org
>
> "I am God! I am nothing, I'm play, I am freedom, I am life. I am the
> boundary, I am the peak." -- Alexander Scriabin
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CALpD4-JvDiNRB%3D%2BxiTUsTBC0KNtrFHm_4hDKjjonu%2Bh562DVzQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [opencog-dev] Re: Pattern mining from PLN inference histories

Reply via email to