Linas wrote:

> 1) I fixed the hang. :-) Or figured out how to avoid it. It turns out
> that, due to a stoopid bug/oversight, quotation marks in the input text
> were not being escaped. Thus, guile would see a begin-quote, end-quote,
> followed by garbage. A backtrace would be generated, and passed back to the
> witless user, who is just netcat, can couldn't give a damn, and so was
> silently ignored.   Escape the quotation marks correctly, the problem goes
> away.


Cool.

Puzzling how this but might result in the fp in the garbage collector
getting screwed up. Did you find the specific bug in the GC or Guile's use
of it that causes the fp to get screwed up, so we can help the developers
get that fixed? A problem in scheme source syntax or variables not being
spelled correctly shouldn't result in an infinite loop in the GC in any
case. Seems like they must be missing some sort of exception handler
somewhere.

2) how are you measuring GC time? I also get 70% but i also get 500% cpu
> time for the other 181 active threads, so 70/500 seems acceptable to me.
> Again, GC halts only guile, it does not halt the atomspace.


I was simply using the overall duration of the test as measured by the perl
script and the result of (gc-run-time) as measured via a telnet session
running in another bash terminal.  My measurements may be in error if my
assumption about the GC is incorrect as I've hinted a couple of times in
prior emails. If anything, however, I'm over counting the total time and
undercounting the percentage since I'm not getting CPU measurements for
duration on the CogServer, I'm counting the time until the test is finished
as determined by the perl script.

My base assumption is that 'gc-run-time' is time when the other Guile
threads are blocked. I make this assumption because of the way that '
gc_time_taken' statistic in Guile is generated via hooks into the GC's
before_gc and after_gc hooks that get called before and after each
stop-the-world collection. So my assumption is that since the
stop-the-world code suspends all the threads that can be garbage collected,
and that since all the CogServer generated threads created in response to
an "scm hush (observe-text 'Some sentence'))" are threads that are stopped
since they allocate objects in the GC via Guile, that therefore the GC time
reported is non-overlapping with the other processing time.

If a test takes 100 seconds, and 50 seconds are spent in the GC, during
those 50 seconds, there is no work going on in any non-GC threads because
they are suspended during the duration of the stop-the-world collection. I have
looked at the code to verify these assumptions but it is certainly possible
that I am missing something.

How many CPUs are on your test machine? Does it have hyper-threaded Intel
chips? If so, you don't tend to get accurate measures of real-time
CPU performance on those chips.

You have to reduce the multi-threading process CPU time by a factor of 0.6
to 0.65 to more accurately reflect the 120% to 130% of CPU core that is
available for a hyper-threaded core pair, it's not 200% even though it
reports that way. So if you've got 70 units time in a single-thread
blocking all the others and a reported 500 units total, the non-blocking
multi-threading time is 500-70 or 430 units. Multiply by 0.6 to 0.65 to
account for over-reporting on hyper threaded chips and you get 258 to 280.
Now divide 258 to 280 by the number of CPUs as reported by the OS
(all hyper threading units) and you should be close to 30% of the elapsed
time. NOTE: even the 70% reported for the gc time may be overstated by some
factor, though since there are plenty of empty cores for OS and other tasks
while the GC is running, it is likely that the GC gets full use of a core
for most of its duration on a machine that isn't already taxed with other
processes (like Postgres asynchronous dumps into the AtomSpace). My tests
include no such additional work as I've turned off the store-atom
and fetch-atom tests.

See:
http://perfdynamics.blogspot.hk/2014/01/monitoring-cpu-utilization-under-hyper.html
for a bit more on why the CPU times are wrong on Intel chips and linux.

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAJzHpFppfm%3DcQ147FbakeriPhoxXHzQB0%3D46%3Danrrz5CcmAF5g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to