Very cool Jacek! Of course there is no strong reason to do this sort of textual MI calculation in Atomspace, when it can be done just as easily with standalone scripts
However, once we get to later phases of the algorithm where we are doing subtree MI calculations on parse trees (rather than on word sequences), it would be really nice to use Atomspace (rather than making some separate, standalone tree data structure) .... Though of course, in the end "practicably doable" trumps "really nice" ... Senna and I discussed previously the idea of making a kind of "frozen Atomspace" , to be used when one wants to repeatedly query or analyze a certain Atomspace, during a time interval in which one does not need to change that Atomspace.... So it would be an Atomspace that was not changeable, but was very rapidly traversable.... I wonder if this is a good solution to this sort of problem. I.e. once one has done the MST parsing, one gets a bunch of parse trees, and then puts them in an Atomspace, and freezes that Atomspace, and then does a bunch of iterated calculations on that Atomspace to calculate subtree MI values... Just speculating a bit... ;) On Sat, Jan 7, 2017 at 12:51 AM, Jacek Świergocki <[email protected]> wrote: > Hi Linas, > Yes, in September I tried to run language-learning experiment according to > your instructions and Rohit Shinde Q&A on the newsgroup. I ran a pipeline > from opencog repo: > > split-sentences.pl -> link-grammar -> relex -> scheme -> atomspace -> > postgress > > Basically it worked, but very very slowly. I estimated it would have taken > months to get reasonable size disjuncts dataset for clustering in the next > step. So I had written some simple but efficient c++ programs that do the > same besides atomspace and ran a pipeline: > > split-sentences.pl -> link-grammar -> text-files -> c++ programs -> text-files > > It took just few days on 3-core machine to get mutual informations for > word pairs and disjunct sets for sentences after MST parsing: > (dataset size: ~24M sentences, ~750K words, ~26M word-pairs, language: > English) > And then I suspended this experiment because of lack of time. > > As fair as I understand next steps of this experiment you have described > here: > https://docs.google.com/viewer?a=v&pid=forums&srcid= > MDQ3MzU0NzU5MTM4MjQ0MDEwOTgBMDgxMTg0NDQyODM5MjI4MDIwOTUBa3R4 > d2pORmdlRTBKATAuMQEBdjI > > I think I can make some programming contribution to this project, probably > I will have some time after January 20 or later. If you see something > specific to do, please let me know, I am not aware of the current status of > this project. Of course I can help verify experiments with Polish language > as a native speaker. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <https://github.com/opencog/relex/issues/248#issuecomment-270947516>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AFolXOJXwYsJbNNajxMO_42mEqk4p28zks5rPnEYgaJpZM4KFuPx> > . > -- Ben Goertzel, PhD http://goertzel.org “I tell my students, when you go to these meetings, see what direction everyone is headed, so you can go in the opposite direction. Don’t polish the brass on the bandwagon.” – V. S. Ramachandran -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBddAgBNaP1hyR-2gjnLuWfc2X3sPUYoxGv3Z1-_Y_%3DpJA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
