Re: [opencog-dev] Preliminary results of C++ observe-text

Curtis Faith Thu, 08 Jun 2017 15:56:17 -0700

>
> Of course, you too could have turned down, or turned off GC if you wanted
> to, then you would not have gotten the awful numbers that you so badly did
> not like.



I could have and did. It made little difference. It was one of the first
things I tried.

The 2.2 seconds per sentence times I got with scheme / Guile already had:

1) the (gc) call in observe-text removed, and this made no difference in
timing but did cause memory to leak at a bit higher rate. This caused me to
believe that (gc) was already being called at least once per sentence due
to the sheer volume of work.

2) no (store-atom h) or (fetch-atom h) calls in count-one-atom. So these
times included absolutely no SQL reads or writes whatsoever.



On Fri, Jun 9, 2017 at 5:09 AM, Linas Vepstas <[email protected]>
wrote:

> Just to make you happy, I tuned down the GC from doing it every sentence,
> to doing it every 20 sentences.  Perhaps this will make you happier.  I
> also turned off the collection of teh kind of data that you are not
> interested in; this will cut down on the number of database writes by more
> than half.   I also made the clique pair-counting code easier to adjust.
>
> All of this is in the latest pull req.  Of course, you too could have
> turned down, or turned off GC if you wanted to, then you would not have
> gotten the awful numbers that you so badly did not like.
>
> But when you collect more information than you need or want, and you
> perform more GC than you need or want, then, yes, you will get poor
> performance.
>
> --linas
>
> On Thu, Jun 8, 2017 at 9:09 AM, Curtis Faith <[email protected]>
> wrote:
>
>> I got the sentence word extractor and plumbing working to replace the
>> Scheme observe-text function with C++ equivalents, and it now creates
>> atoms. I set things up so it works as a CogServer command so we can have
>> multi-threaded input and many threads adding atoms at the same time in the
>> future. Though, the first tests I have done are not multi-threaded and
>> initial tests were a single-core MacBook Air.
>>
>> There are several things I am not doing because I didn't think you needed
>> all the atoms that observe-text is creating. No connection to the Relex
>> server, no Link Grammar parses, etc.
>>
>> I am creating all the possible ordered pairs, and counting them in a
>> manner analogous to what update-clique-pair-counts was doing in 
>> link-pipeline.scm.
>> But I can easily adjust the algorithm so it doesn't output pairs.that
>> are further than some N distance, if you desire.
>>
>> The atoms it generates are of this form, as per link-pipeline.scm.
>>
>> ;     EvaluationLink
>> ;         PredicateNode "*-Sentence Word Pair-*"
>> ;         ListLink
>> ;             WordNode "lefty"  -- or whatever words these are.
>> ;             WordNode "righty"
>> ;
>> ;     ExecutionLink
>> ;         SchemaNode "*-Pair Distance-*"
>> ;         ListLink
>> ;             WordNode "lefty"
>> ;             WordNode "righty"
>> ;         NumberNode 3
>>
>> I have attached a sample output for one sentence: "Ben likes his ice
>> cream, and Jerry does too.". Please let me know if you need any other types
>> of atoms or counts.
>>
>> I was thinking it might be good to output adjacent pairs separately and
>> adjacent triplets in addition to pairs using different PredicateNodes, as
>> adjacency seems like more information than simple pairing in a sentence. It
>> also might be good to have separate sentence predicates that contain all
>> the words in each sentence since there is no link-grammar parse and you
>> can't reconstruct the sentences from the information in this step.
>>
>>
>> *Speed*
>>
>> The interesting part is how much faster it is at processing the full
>> Pride and Prejudice test file.
>>
>> On my 1.7 GHz Intel Core i7 MacBook Air, the test which took 3 or 4 hours
>> on the 6-core Dell with observe-text in scheme, generates 2,690,934
>> atoms in only 81 seconds with a single core using C++.  At this speed, the
>> Wikipedia file that Ruiting was using should only take a few hours to
>> process.
>>
>> For Pride and Prejudice, sentences were averaging 2.2 seconds at 1100%
>> CPU on the Dell. On my Mac it hits about 92% of one CPU and averages 0.0137
>> seconds per sentence.
>>
>> The Dell running Guile used 4.2G of RAM. The C++ code uses 1.8G of RAM.
>>
>> As mentioned above, there is also a lot of work I am not doing, no relex
>> server parses, no link-grammar, etc. But this was only taking 0.15 seconds
>> per sentence of the 2.2 seconds last time I measured.
>>
>> - Curtis
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "opencog" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/opencog.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/opencog/CAJzHpFos1vJaPdmK%2B9uvZ8M4rKMLXrvpnERCzjUBqbP0u
>> kBHpA%40mail.gmail.com
>> <https://groups.google.com/d/msgid/opencog/CAJzHpFos1vJaPdmK%2B9uvZ8M4rKMLXrvpnERCzjUBqbP0ukBHpA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/opencog/CAHrUA35DDYnSgtSa4Fce4L92SQbAiByuOw1SDmUnL%3DEF-vT_9g%
> 40mail.gmail.com
> <https://groups.google.com/d/msgid/opencog/CAHrUA35DDYnSgtSa4Fce4L92SQbAiByuOw1SDmUnL%3DEF-vT_9g%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAJzHpFr%3DtK%3DQeP7xyxyrbYa-6VA6V8VgMuJWxECsk7LRHYN5Zg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [opencog-dev] Preliminary results of C++ observe-text

Reply via email to