[opencog-dev] Re: Language Learning - Progress and Performance

Linas Vepstas Sat, 03 Jun 2017 09:14:49 -0700

Which version of guile are you using? You should be using guile-2.2.
sooner or later, you also need to set up bdwgc with the ./configure
--enable-large-config or you will get th dreaded `Too many heap sections:
Increase MAXHINCR or MAX_HEAP_SECTS` error.


Re guile inf loop. Yes, there is a bug in guile, I plan to open a bug
report "real soon now".  I think you are describing this bug below. What I
see is that:
* the guile prompt stops responding
* cpu spins at exactly 100% cpu usage.
* GDB shows a very small, very tight spin loop, when I single-step : get
this:

single-stepping:

scm_i_vm_mark_stack (vp=0xbdca3120, mark_stack_ptr=0x7f361fe85ef0,
    mark_stack_limit=0x7f361fe95eb0) at ../../libguile/vm.c:975
975            fp = SCM_FRAME_DYNAMIC_LINK (fp))
(gdb) print fp
$1 = (union scm_vm_stack_element *) 0x7f36183d5e38
(gdb) step
973       for (fp = vp->fp, sp = vp->sp;
(gdb)
979           for (slot = nlocals - 1; sp < fp; sp++, slot--)
(gdb)
1011          slot_map = find_slot_map (SCM_FRAME_RETURN_ADDRESS (fp),
&cache);
(gdb)
1006          sp = SCM_FRAME_PREVIOUS_SP (fp);
(gdb) print fp
$2 = (union scm_vm_stack_element *) 0x7f36183d5e38
(gdb) print sp
$3 = (union scm_vm_stack_element *) 0x7f36183d5e48
(gdb) step
1011          slot_map = find_slot_map (SCM_FRAME_RETURN_ADDRESS (fp),
&cache);
(gdb)
find_slot_map (cache=0x7f361fe85ab0, ip=0x1) at ../../libguile/vm.c:932
932       size_t slot = (((scm_t_uintptr) ip) >> 2) %
SLOT_MAP_CACHE_SIZE;
(gdb)
935       if (cache->entries[slot].ip == ip)
(gdb)
scm_i_vm_mark_stack (vp=0xbdca3120, mark_stack_ptr=0x7f361fe85ef0,
    mark_stack_limit=0x7f361fe95eb0) at ../../libguile/vm.c:1011
1011          slot_map = find_slot_map (SCM_FRAME_RETURN_ADDRESS (fp),
&cache);
(gdb)
find_slot_map (cache=0x7f361fe85ab0, ip=0x1) at ../../libguile/vm.c:936
936         map = cache->entries[slot].map;
(gdb)
scm_i_vm_mark_stack (vp=0xbdca3120, mark_stack_ptr=0x7f361fe85ef0,
    mark_stack_limit=0x7f361fe95eb0) at ../../libguile/vm.c:975
975            fp = SCM_FRAME_DYNAMIC_LINK (fp))
(gdb) print fp
$4 = (union scm_vm_stack_element *) 0x7f36183d5e38

This also happens in older version e.g. guile-2.1.8  Its a real bug, I have
not had time to tell the guile people about it.

weird thing is, everything was fine until about a 2-3 weeks ago, when I
started seeing this bug. I was churning through huge quantities of text
before this, and it was not an issue. At all. And now it is. I can't figure
out why .. as I say, it still hangs with older versions, and I can't figure
out what triggered this.

--linas

On Thu, Jun 1, 2017 at 11:09 PM, Curtis Faith <[email protected]>
wrote:

> So my intuition about the Guile GC was right about after all.
>
> I came into the office this morning after leaving last night with a test
> of the 6,000 sentence file which took two days last week and started me on
> this path before working on AWS.
>
> For the new test last night, I reintroduced the (gc) call in observe-text
> but did not put back the fetch-atom and store-atom calls.
>
> I figured it would either finish quickly last night (if the SQL issue was
> the only major problem) or run into the same problem I encountered last
> time I ran it last week (if something else like the GC was causing the
> slowdown). Either way I would be able to separate the GC performance issues
> I suspected from the SQL performance issues I already noted.
>
> When I arrived, the cogserver was using 100% but was still running and
> processing sentence #590. I suspected the Guile GC, since the CPU is
> pegging one process at 100% on a 6CPU machine that usually has 1150% for
> the cogserver when running. This matches the profile of one thread blocking
> all the others, which seemed like the GC, I fired up gdb, attached to the
> cog server, and did a backtrace:
>
> gdb) bt
> #0  0x00007f562938cfb2 in do_sigsuspend (set=0x7f56244ea2c0)
>     at ../sysdeps/unix/sysv/linux/sigsuspend.c:31
> #1  __GI___sigsuspend (set=0x7f56244ea2c0)
>     at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
> #2  0x00007f56242ce832 in *GC_suspend_handler_inner* ()
>    from /usr/lib/x86_64-linux-gnu/libgc.so.1
> #3  0x00007f56242ce8a5 in *GC_suspend_handler* ()
>    from /usr/lib/x86_64-linux-gnu/libgc.so.1
> #4  <signal handler called>
> #5  0x00007f562941a93d in nanosleep () at ../sysdeps/unix/syscall-templa
> te.S:81
> #6  0x00007f562944bbc4 in usleep (useconds=<optimized out>)
>     at ../sysdeps/unix/sysv/linux/usleep.c:32
> #7  0x00007f562a2d7fb7 in opencog::CogServer::serverLoop (this=0x19727f0)
>     at /home/inflector/opencog/opencog/opencog/cogserver/server/
> CogServer.cc:211
> #8  0x0000000000406b05 in main (argc=3, argv=0x7ffec4633ae8)
>     at /home/inflector/opencog/opencog/opencog/cogserver/server/
> CogServerMain.cc:232
> (gdb)
>
> This is the code that the Garbage collector invokes to suspend threads
> while running. I've been digging into the code for the collector and found
> several potential interactions which will result in what I've been seeing.
> Essentially as the number of objects in the cogserver heap grows the
> Conservative GC has to search a larger and larger space until eventually
> this search takes a single core's entire processing capability. As the
> search space grows larger and larger, this blocking begins to dominate all
> execution.
>
> You can see this in my notes in the attached observe-text.log file where I
> show the GC time compared with total time and it is 320 seconds of 463
> seconds for my 5 chapter test file. So 69% of the processing when
> processing only 303 sentences. And this percentage grows with successive
> processing of the same file. These successive runs create no new atoms but
> show a leak of 0.130G per execution with the (gc) call in observe-text; and
> 0.400G per processing of the 5 chapters without the (gc) call inside
> observe-text.
>
> Currently the job is sitting at 12 hours for one single sentence while the
> garbage collector is repeatedly running while attempting to complete it's
> job. Since I encountered this last week and just let it run over the
> weekend, and when I came back on Monday it was finished, I suspect that the
> GC is still churning away but being repeatedly invoked and at some point,
> it will finish enough to process the sentence. Further the completion will
> invoke another internal allocation which it always makes at 200% of the
> previous size, this will cause the GC to be invoked much less again for
> some period of time which will make the rest of the sentences process much
> faster. But at some size the GC stops working as the processing time
> required dominates a single CPU core.
>
> All the above means Guile's current GC won't work for AtomSpace. Not
> without some significant work. That's the bad news.
>
> The good news is that I believe we can implement a fix for this for our
> special case of Guile running in a separate process in the CogServer in a
> few days. My idea is a bit of a hack but will result in extremely fast
> allocations, zero leaks, and speedup the code by at least 2X over the small
> file performance case where we're getting sub second processing of lines.
> I'd not be surprised if we can hit 0.5 seconds in the no SQL saving case.
>
> I have attached a screenshot of the top where you can see 100% CPU for the
> conserver and Terminal windows where you can see that there were no errors
> starting the conserver and that it is on sentence 590 which took 1049
> seconds to get to with an average per-sentence time at that point of 1.78
> seconds. You can also see that there are no atoms in the atom space (except
> for the default one that gets added to every atomspace).
>
> So my plan is: as a first step, I'm going to implement a quick-and-dirty
> replacement for the GC which just uses malloc to see if that gets through
> the file. If this approach works, this will be confirmation of the above,
> but not be suitable for the long-term.
>
> Assuming the malloc-based allocations complete in reasonable time, I've
> got some ideas for altering the GC code to work in a separate per-thread
> heap for new allocations which we can then dispose en mass when the thread
> terminates. Assuming we can get a decent bound on usage of guile-internal
> objects during statement execution, this will be much much faster and won't
> leak. Everyone wins.
>
> Let me know if you have any questions or suggestions.
>
> - Curtis
>
> On Thu, Jun 1, 2017 at 1:42 AM, Ben Goertzel <[email protected]> wrote:
>
>> Thanks a lot Curtis!
>>
>> I totally grok the desire to complete an analysis before announcing
>> results... OTOH if we were all sitting in the same office we'd
>> probably be communicating lots of interim results.  So generally I'm
>> biased toward fostering communication! ...
>>
>> See you Monday...
>>
>> ben
>>
>> On Wed, May 31, 2017 at 7:58 AM, Curtis Faith <[email protected]>
>> wrote:
>> > I should have been more explicit and said that removing the store and
>> fetch
>> > was just a quick and dirty way of measuring the performance impact of
>> this
>> > issue, not a suggested solution to it.
>> >
>> > On Wed, May 31, 2017 at 10:13 PM, Curtis Faith <
>> [email protected]>
>> > wrote:
>> >>
>> >> I am sending this a bit earlier than I would normally as I am still
>> trying
>> >> to figure out the contribution of the Guile garbage collector on
>> performance
>> >> and am not quite done. I have also not yet measured a test over
>> significant
>> >> data for a before and after comparison, for example.
>> >>
>> >> But since Ben sent Linas an email about this problem, I'll send my
>> work in
>> >> progress, so Linas has a bit more information to answer Ben's question.
>> >>
>> >> So, the following represents a partial analysis of the SQL issues I
>> found.
>> >> I removed the code for fetch-atom and store-atom and while this is
>> >> considerably faster, there still seems to be many-second pauses in
>> >> processing that are not related to the number of words in the sentence
>> in
>> >> any way I can easily correlate. Guile garbage collector? This might be
>> due
>> >> to a memory leak which is resulting in the garbage collector having to
>> >> search over an ever increasing memory space looking for object-ish
>> >> references using it's conservative scheme.
>> >>
>> >> - Curtis
>> >>
>> >> -----------------------
>> >>
>> >> My 6,000 sentence test file finished over the weekend. It took about 48
>> >> hours.
>> >>
>> >> For some context, the input corpus, Pride and Prejudice, has
>> >> approximately:
>> >>
>> >> 120 K words
>> >> 700 K bytes
>> >> 6 K sentences
>> >> 2 K paragraphs
>> >> 25 average words per sentence
>> >>
>> >> So it is a fair sized novel but only a tiny tiny fraction of what we'd
>> >> like to feed the language learning system.
>> >>
>> >> In looking at the Postgres statistics for updates on a freshly
>> initialized
>> >> database, I found:
>> >>
>> >> SELECT relname, idx_scan, idx_tup_fetch, n_tup_ins, n_tup_upd,
>> n_tup_del
>> >> FROM pg_stat_user_tables;
>> >>
>> >>
>> >>   relname   | idx_scan  | idx_tup_fetch | n_tup_ins | n_tup_upd |
>> >> n_tup_del
>> >>
>> >> ------------+-----------+---------------+-----------+-------
>> ----+-----------
>> >>  atoms      |  95789119 |      90072278 |   3288742 |         0 |
>> >> 0
>> >>  values     |         4 |             0 |         0 |         0 |
>> >> 0
>> >>  typecodes  |         0 |             0 |       228 |         0 |
>> >> 0
>> >>  valuations | 131581719 |      84520666 |  44519326 |         0 |
>> >> 41926158
>> >>  spaces     |   3288742 |       3288742 |         2 |         0 |
>> >> 0
>> >> (5 rows)
>> >>
>> >> So roughly 90M fetches for Atoms and 85M for Valuations while parsing a
>> >> single file of 6K sentences.
>> >>
>> >> We have 3.3M atoms inserted and 90.0M index tuple fetches for the atom
>> >> table.
>> >>
>> >> In digging into the code, i found we have an explicit delete and insert
>> >> for every single increment of every atom count maintained by the
>> system. The
>> >> count increment code is also recursive for the trees defined by each
>> pair
>> >> which will in practice be many different atoms because even though we
>> only
>> >> increment the count for one atom in the tree, the store-atom and
>> fetch-atom
>> >> calls update the truth values for every atom in the tree that has a
>> truth
>> >> value. The pair counting code has this comment:
>> >>
>> >> ; ------------------------------------------------------------
>> ---------
>> >> ; update-pair-counts -- count occurances of word-pairs in a parse.
>> >> ;
>> >> ; Note that this might throw an exception...
>> >> ;
>> >> ; The structures that get created and incremented are of the form
>> >> ;
>> >> ;     EvaluationLink
>> >> ;         PredicateNode "*-Sentence Word Pair-*"
>> >> ;         ListLink
>> >> ;             WordNode "lefty"  -- or whatever words these are.
>> >> ;             WordNode "righty"
>> >> ;
>> >> ;     ExecutionLink
>> >> ;         SchemaNode "*-Pair Distance-*"
>> >> ;         ListLink
>> >> ;             WordNode "lefty"
>> >> ;             WordNode "righty"
>> >> ;         NumberNode 3
>> >>
>> >>
>> >> So every pair has 11 atoms.
>> >>
>> >> The truth value gets set only at the top level but the fetch and store
>> end
>> >> up retrieving and storing the valuations for the entire tree where
>> there are
>> >> truth values.
>> >>
>> >> Further, the counting code is storing and and fetching the truth values
>> >> for each unique count value each time it gets incremented. So we'll
>> have (1:
>> >> an insert for 1), then (2: a delete of the 1, and an insert for 2),
>> then (3:
>> >> a delete of the 2, and an insert for 3), (4: a delete for 3, and an
>> insert
>> >> for 4), etc. And this is done for the EvaluationLink, ExecutionLink
>> and four
>> >> Word Nodes in the above example.
>> >>
>> >> I wanted to verify this assumption, so I created a fresh database and
>> >> turned on statement logging in Postgres. It resulted in the attached
>> log
>> >> file for a test of the a four-word sentence:
>> >>
>> >> (observe-text "This is a test.")
>> >>
>> >> We got 5419 lines in the log file (not counting the 333 lines of
>> startup
>> >> creations of types etc.), each of which represents a single SQL
>> statement.
>> >> There are:
>> >>
>> >> 813 BEGIN,
>> >> 813 COMMIT,
>> >> 2127 SELECT,
>> >> 720 DELETE, and
>> >> 946 INSERT
>> >>
>> >> statements for a four word sentence. (See the attached text log for
>> >> details)
>> >>
>> >> There are 133 new atoms created and 93 valuations after the call to
>> >> observe-text is finished.
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Ben Goertzel, PhD
>> http://goertzel.org
>>
>> "I am God! I am nothing, I'm play, I am freedom, I am life. I am the
>> boundary, I am the peak." -- Alexander Scriabin
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA37TzQjmnYCBmY-zhyrAW%2BPhOyDyB1N8b_KM8a6GEjbTMw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: Language Learning - Progress and Performance

Reply via email to