correting a typeo: in the point 4 in previous email: pattern B should be: ImplicationLink AndLink InheritanceLink x rich InheritanceLink y cute EvaluationLink married x y
On Wed, Jun 21, 2017 at 2:29 AM, Shujing Ke <shujin...@gmail.com> wrote: > Hi, Ben and Nil, > > Thanks for all your responses. I may be a bit slow this week - it is too > warm here and my baby is sick, he barely eat and drink anything since > yesterday morning. > > *1. About the output format and TV of patterns* > The pattern miner will output the raw patterns found from the input data > (without more process). Because different modules in Opencog and > applications may require different output formats. It shouldn't be only one > output format. Currently we can put our discussion based on raw pattern > format. After we make sure the concents of patterns are right, we can > discuss about the output formats for differnt modules. If I have time then, > I can implement it, if I don't then I think each module's developer should > also be easy to turn the raw patterns into the format they want. It is > better to be on another layer out of the pattern miner, which is more > convient for each module to modify the pattern format they need in future. > Otherewise, any module wants to change some format, they have to modify the > pattern miner core. > > > *2. About the pattern gram* > Actually the gram doesn't really exactly indicate the size of a pattern, > it just mean the numbers of root links in a pattern. > > A ==> B, B==>C |- A==>C > A==>C, C ==> D |- A ==>D > HebbianLink (D,B) > useful(A==>D) > > Yes, it could be a 4 gram , but it can also be 1 gram, depends on the > input data > *.* > If you have a big Link likes: > > ImplicationLink > AndLink > ImplicationLink A B > ImplicationLink B C > ImplicationLink C D > ImplicationLink A D > > Then this pattern will be 1-gram. > > Take the cockroach pattern for more example: > Suppose you have handle 666 and 777: > > ImplicationLink [handle=666] > EvaluationLink > PredicateNode "eat" > ListLink > ConceptNode "Ben" > ConceptNode "cockroach" > InheritanceLink > ConceptNode "Ben" > ConceptNode "weird" > > > ImplicationLink [handle=777] > EvaluationLink > PredicateNode "eat" > ListLink > ConceptNode "NIl" > ConceptNode "cockroach" > InheritanceLink > ConceptNode "Nil" > ConceptNode "weird" > > If only alow ImplicationLinks to be rootlinks, then the pattern 1 below is > a 1-gram pattern: > *Pattern 1:* > ImplicationLink > EvaluationLink > PredicateNode "eat" > ListLink > ConceptNode "var1" > ConceptNode "cockroach" > InheritanceLink > ConceptNode "var1" > ConceptNode "weird" > > If EvaluationLinks and InheritanceLinks are also allow to be rootlinks, > then pattern 2,3,4 are all 2-gram patterns, because they contains two > rootlinks. Of course, in this case, pattern 3 and 4 do not make much sense, > but in the DBpedia data, these types of patterns are what we want. So we > need to specify which link types should be rootlinks for different > applications, to avoid a lot of useless patterns being mined. It can be set > in config file or scm interface throuth the white and black link type list. > > *Pattern 2:* > EvaluationLink > PredicateNode "eat" > ListLink > ConceptNode "var1" > ConceptNode "cockroach" > > InheritanceLink > ConceptNode "var1" > ConceptNode "weird" > > *Pattern 3:* > EvaluationLink > PredicateNode "eat" > ListLink > ConceptNode "var1" > ConceptNode "cockroach" > > ImplicationLink > EvaluationLink > PredicateNode "eat" > ListLink > ConceptNode "var1" > ConceptNode "cockroach" > InheritanceLink > ConceptNode "var1" > ConceptNode "weird" > > *Pattern 4:* > ImplicationLink > EvaluationLink > PredicateNode "eat" > ListLink > ConceptNode "Ben" > ConceptNode "var1" > InheritanceLink > ConceptNode "Ben" > ConceptNode "var2" > > InheritanceLink > ConceptNode "Nil" > ConceptNode "var2" > > *3. About unify link orders in unorderlinks in input data* > It probably won't cost too much time to code, because it should be quite > similar to the logic of pattern isomorphism identifying algorithm which I > already have in pattern miner, becasue it is quite an important part of > pattern miner. I should be able to reuse the logic. > > *4. About the interestingness evalution* > > I didn't quite get the meaning of the rich(x) and z(y) and married(x,y) > example. > I think it is also related to the pattern gram. For below 2 patterns: > x,y,z are variables > pattern A: rich(x) and z(y) and married(x,y) > pattern B: rich(x) and cute(y) and married(x,y) > > If they are represented as 3 gram patterns, then it may be able to just > evaluate their interesingness by surpringness > pattern A: > InheritanceLink x rich > InheritanceLink y z > EvaluationLink married x y > > pattern B: > InheritanceLink x rich > InheritanceLink y cute > EvaluationLink married x y > > If they are represented as 1 gram patterns, then I can implement an > interestingness evalution based on the variables inside one root link. > pattern A: > ImplicationLink > AndLink > InheritanceLink x rich > InheritanceLink y z > EvaluationLink married x y > > pattern B: > ImplicationLink > AndLink > InheritanceLink x rich > InheritanceLink y z > EvaluationLink married x y > > *5. A suggestion to make up a very simple tiny test data file * > I suggest Nil to make up a simple test data file just to test if the > output patterns are what you want and if the frequency count is correct. > For example, I made up a simple data before - the ugly-man-drink-soda file, > which contains 10 men, 10 women, among then 5 women and 5 men are ugly, and > also 5 women and 5 men drink soda - it is expected to find the pattern that > "ugly man drink soda". Because for such a tiny file, we can actually check > every output pattern and its count to see if there is any bug. If it pass, > then we can apply it on a big corpus. Otherwise, there are too many outputs > for a big corpus, it is hard to examine the result. > > Thanks, > Shujing > > On Tue, Jun 20, 2017 at 4:48 AM, Ben Goertzel <b...@goertzel.org> wrote: > >> On Tue, Jun 20, 2017 at 2:29 AM, Nil Geisweiller >> <ngeis...@googlemail.com> wrote: >> > What do you mean exactly by "useful(A==>D)"? >> >> >> What I was thinking was: If the implication [666], e.g. >> >> ImplicationLink [handle=666] >> EvaluationLink >> PredicateNode "eat" >> ListLink >> ConceptNode "Ben" >> ConceptNode "cockroach" >> >> InheritanceLink >> ConceptNode "Ben" >> ConceptNode "weird" >> >> >> was used or created by the BC, and was found to be useful for whatever >> inference the BC was doing when it used or created [666], then the >> utility of this link should be annotated via >> >> EvaluationLink >> PredicateNode "useful" >> ListLink >> [666] >> [111] >> >> >> where [111] is the handle of the target of the BC inference the BC was >> doing when it created [666]. >> >> So maybe my example should look more like >> >> A ==> B, B==>C |- A==>C >> A==>C, C ==> D |- A ==>D >> HebbianLink (D,B) >> useful(A==>D, T) >> >> >> where T is a variable that matches the target of prior BC inferences... >> >> ben >> >> -- >> Ben Goertzel, PhD >> http://goertzel.org >> >> "I am God! I am nothing, I'm play, I am freedom, I am life. I am the >> boundary, I am the peak." -- Alexander Scriabin >> > > -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscr...@googlegroups.com. To post to this group, send email to opencog@googlegroups.com. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CALpD4-%2BJZ7dxzQrcCVNK8%3DwjJJagFm9V2R9bKnOVu3kL06ZVwg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.