Just to make you happy, I tuned down the GC from doing it every sentence,
to doing it every 20 sentences.  Perhaps this will make you happier.  I
also turned off the collection of teh kind of data that you are not
interested in; this will cut down on the number of database writes by more
than half.   I also made the clique pair-counting code easier to adjust.

All of this is in the latest pull req.  Of course, you too could have
turned down, or turned off GC if you wanted to, then you would not have
gotten the awful numbers that you so badly did not like.

But when you collect more information than you need or want, and you
perform more GC than you need or want, then, yes, you will get poor
performance.

--linas

On Thu, Jun 8, 2017 at 9:09 AM, Curtis Faith <[email protected]>
wrote:

> I got the sentence word extractor and plumbing working to replace the
> Scheme observe-text function with C++ equivalents, and it now creates
> atoms. I set things up so it works as a CogServer command so we can have
> multi-threaded input and many threads adding atoms at the same time in the
> future. Though, the first tests I have done are not multi-threaded and
> initial tests were a single-core MacBook Air.
>
> There are several things I am not doing because I didn't think you needed
> all the atoms that observe-text is creating. No connection to the Relex
> server, no Link Grammar parses, etc.
>
> I am creating all the possible ordered pairs, and counting them in a
> manner analogous to what update-clique-pair-counts was doing in 
> link-pipeline.scm.
> But I can easily adjust the algorithm so it doesn't output pairs.that are
> further than some N distance, if you desire.
>
> The atoms it generates are of this form, as per link-pipeline.scm.
>
> ;     EvaluationLink
> ;         PredicateNode "*-Sentence Word Pair-*"
> ;         ListLink
> ;             WordNode "lefty"  -- or whatever words these are.
> ;             WordNode "righty"
> ;
> ;     ExecutionLink
> ;         SchemaNode "*-Pair Distance-*"
> ;         ListLink
> ;             WordNode "lefty"
> ;             WordNode "righty"
> ;         NumberNode 3
>
> I have attached a sample output for one sentence: "Ben likes his ice
> cream, and Jerry does too.". Please let me know if you need any other types
> of atoms or counts.
>
> I was thinking it might be good to output adjacent pairs separately and
> adjacent triplets in addition to pairs using different PredicateNodes, as
> adjacency seems like more information than simple pairing in a sentence. It
> also might be good to have separate sentence predicates that contain all
> the words in each sentence since there is no link-grammar parse and you
> can't reconstruct the sentences from the information in this step.
>
>
> *Speed*
>
> The interesting part is how much faster it is at processing the full Pride
> and Prejudice test file.
>
> On my 1.7 GHz Intel Core i7 MacBook Air, the test which took 3 or 4 hours
> on the 6-core Dell with observe-text in scheme, generates 2,690,934 atoms
> in only 81 seconds with a single core using C++.  At this speed, the
> Wikipedia file that Ruiting was using should only take a few hours to
> process.
>
> For Pride and Prejudice, sentences were averaging 2.2 seconds at 1100% CPU
> on the Dell. On my Mac it hits about 92% of one CPU and averages 0.0137
> seconds per sentence.
>
> The Dell running Guile used 4.2G of RAM. The C++ code uses 1.8G of RAM.
>
> As mentioned above, there is also a lot of work I am not doing, no relex
> server parses, no link-grammar, etc. But this was only taking 0.15 seconds
> per sentence of the 2.2 seconds last time I measured.
>
> - Curtis
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/opencog/CAJzHpFos1vJaPdmK%2B9uvZ8M4rKMLXrvpnERCzjUBqbP0u
> kBHpA%40mail.gmail.com
> <https://groups.google.com/d/msgid/opencog/CAJzHpFos1vJaPdmK%2B9uvZ8M4rKMLXrvpnERCzjUBqbP0ukBHpA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35DDYnSgtSa4Fce4L92SQbAiByuOw1SDmUnL%3DEF-vT_9g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to