On Mon, Jun 19, 2017 at 3:31 AM, Ben Goertzel <b...@goertzel.org> wrote:
> > Regarding "hidden multivariate logistic regression", as you hint at > the end of your document ... it seems you are gradually inching toward > my suggestion of using neural nets here... > Maybe. I want to understand the data first, before I start applying random algorithms to it. BTW, the previous report, showing graphs and distributions of various sorts: its been expanded and cleaned up with lots of new stuff. Nothing terribly exciting. I can send the current version, if you care. > > However, we haven't gotten to experimenting with that yet, because are > still getting stuck with weird Guile problems in trying to get the MST > parsing done ... we (Curtis) can get through MST-parsing maybe > 800-1500 sentences before it crashes (and it doesn't crash when > examined with GDB, which is frustrating...).... > Arghhh. OK, I just now merged one more tweak to the text-ingestion that might allow you to progress. Some back-story: Back-when, when Curtis was complaining about the large amount of CPU time spent in garbage collection, that is because the script *manually* triggered a GC after each sentence. I presume that Curtis was not aware of this. Now he is. The reason for doing this was that, without it, mem usage would blow up: link-grammar was returning these strings that were 10 or 20MBytes long and the GC was perfectly happy in letting these clog up RAM. That works out to a gigabyte every 50 or 100 sentences, so I was forcing GC to run pretty much constantly: maybe a few times a second. This appears to have exposed an obscure guile bug. Each of those giant strings contains scheme code, which guile interprets/compiles and then runs. It appears that high-frequency GC pulls the rug out from under the compiler/interpreter, leading to a weird hang. I think I know how to turn this into a simple test case, but haven't yet. Avoiding the high-frequency GC avoids the weird hang. And that's what the last few github merges do. Basically, it checks, after every sentence, if RAM usage is above 750MBytes, and then forces a GC if it is. This is enough to keep RAM usage low, while still avoiding the other ills and diseases. For me, its been running for over a week without any problems. It runs at about a few sentences per second. Not sure, its not something I measure. So pretty slow, but I kind-of don't care. because after a week, its 20 or 40 million observations of words, which is plenty enough for me. Too much, actually, the datasets get too big, and I need to trim them. This has no effect at all on new, unmerged Curtis code. It won't fix his crash. Its only for the existing pipeline. So set it running on some other machine, and while Curtis debugs, you'll at least get some data piling up. Run it stock, straight out of the box, don't tune it or tweak it, and it should work fine. --linas -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscr...@googlegroups.com. To post to this group, send email to firstname.lastname@example.org. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA34rEGAE5nwfpARkTGiXc9QfdR0ZAYpz064D%3D_EtZ%3DLCow%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.