> > Of course, you too could have turned down, or turned off GC if you wanted > to, then you would not have gotten the awful numbers that you so badly did > not like.
I could have and did. It made little difference. It was one of the first things I tried. The 2.2 seconds per sentence times I got with scheme / Guile already had: 1) the (gc) call in observe-text removed, and this made no difference in timing but did cause memory to leak at a bit higher rate. This caused me to believe that (gc) was already being called at least once per sentence due to the sheer volume of work. 2) no (store-atom h) or (fetch-atom h) calls in count-one-atom. So these times included absolutely no SQL reads or writes whatsoever. On Fri, Jun 9, 2017 at 5:09 AM, Linas Vepstas <[email protected]> wrote: > Just to make you happy, I tuned down the GC from doing it every sentence, > to doing it every 20 sentences. Perhaps this will make you happier. I > also turned off the collection of teh kind of data that you are not > interested in; this will cut down on the number of database writes by more > than half. I also made the clique pair-counting code easier to adjust. > > All of this is in the latest pull req. Of course, you too could have > turned down, or turned off GC if you wanted to, then you would not have > gotten the awful numbers that you so badly did not like. > > But when you collect more information than you need or want, and you > perform more GC than you need or want, then, yes, you will get poor > performance. > > --linas > > On Thu, Jun 8, 2017 at 9:09 AM, Curtis Faith <[email protected]> > wrote: > >> I got the sentence word extractor and plumbing working to replace the >> Scheme observe-text function with C++ equivalents, and it now creates >> atoms. I set things up so it works as a CogServer command so we can have >> multi-threaded input and many threads adding atoms at the same time in the >> future. Though, the first tests I have done are not multi-threaded and >> initial tests were a single-core MacBook Air. >> >> There are several things I am not doing because I didn't think you needed >> all the atoms that observe-text is creating. No connection to the Relex >> server, no Link Grammar parses, etc. >> >> I am creating all the possible ordered pairs, and counting them in a >> manner analogous to what update-clique-pair-counts was doing in >> link-pipeline.scm. >> But I can easily adjust the algorithm so it doesn't output pairs.that >> are further than some N distance, if you desire. >> >> The atoms it generates are of this form, as per link-pipeline.scm. >> >> ; EvaluationLink >> ; PredicateNode "*-Sentence Word Pair-*" >> ; ListLink >> ; WordNode "lefty" -- or whatever words these are. >> ; WordNode "righty" >> ; >> ; ExecutionLink >> ; SchemaNode "*-Pair Distance-*" >> ; ListLink >> ; WordNode "lefty" >> ; WordNode "righty" >> ; NumberNode 3 >> >> I have attached a sample output for one sentence: "Ben likes his ice >> cream, and Jerry does too.". Please let me know if you need any other types >> of atoms or counts. >> >> I was thinking it might be good to output adjacent pairs separately and >> adjacent triplets in addition to pairs using different PredicateNodes, as >> adjacency seems like more information than simple pairing in a sentence. It >> also might be good to have separate sentence predicates that contain all >> the words in each sentence since there is no link-grammar parse and you >> can't reconstruct the sentences from the information in this step. >> >> >> *Speed* >> >> The interesting part is how much faster it is at processing the full >> Pride and Prejudice test file. >> >> On my 1.7 GHz Intel Core i7 MacBook Air, the test which took 3 or 4 hours >> on the 6-core Dell with observe-text in scheme, generates 2,690,934 >> atoms in only 81 seconds with a single core using C++. At this speed, the >> Wikipedia file that Ruiting was using should only take a few hours to >> process. >> >> For Pride and Prejudice, sentences were averaging 2.2 seconds at 1100% >> CPU on the Dell. On my Mac it hits about 92% of one CPU and averages 0.0137 >> seconds per sentence. >> >> The Dell running Guile used 4.2G of RAM. The C++ code uses 1.8G of RAM. >> >> As mentioned above, there is also a lot of work I am not doing, no relex >> server parses, no link-grammar, etc. But this was only taking 0.15 seconds >> per sentence of the 2.2 seconds last time I measured. >> >> - Curtis >> >> -- >> You received this message because you are subscribed to the Google Groups >> "opencog" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/opencog. >> To view this discussion on the web visit https://groups.google.com/d/ms >> gid/opencog/CAJzHpFos1vJaPdmK%2B9uvZ8M4rKMLXrvpnERCzjUBqbP0u >> kBHpA%40mail.gmail.com >> <https://groups.google.com/d/msgid/opencog/CAJzHpFos1vJaPdmK%2B9uvZ8M4rKMLXrvpnERCzjUBqbP0ukBHpA%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "opencog" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/opencog. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/opencog/CAHrUA35DDYnSgtSa4Fce4L92SQbAiByuOw1SDmUnL%3DEF-vT_9g% > 40mail.gmail.com > <https://groups.google.com/d/msgid/opencog/CAHrUA35DDYnSgtSa4Fce4L92SQbAiByuOw1SDmUnL%3DEF-vT_9g%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAJzHpFr%3DtK%3DQeP7xyxyrbYa-6VA6V8VgMuJWxECsk7LRHYN5Zg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
