Linas wrote:
> 1) I fixed the hang. :-) Or figured out how to avoid it. It turns out > that, due to a stoopid bug/oversight, quotation marks in the input text > were not being escaped. Thus, guile would see a begin-quote, end-quote, > followed by garbage. A backtrace would be generated, and passed back to the > witless user, who is just netcat, can couldn't give a damn, and so was > silently ignored. Escape the quotation marks correctly, the problem goes > away. Cool. Puzzling how this but might result in the fp in the garbage collector getting screwed up. Did you find the specific bug in the GC or Guile's use of it that causes the fp to get screwed up, so we can help the developers get that fixed? A problem in scheme source syntax or variables not being spelled correctly shouldn't result in an infinite loop in the GC in any case. Seems like they must be missing some sort of exception handler somewhere. 2) how are you measuring GC time? I also get 70% but i also get 500% cpu > time for the other 181 active threads, so 70/500 seems acceptable to me. > Again, GC halts only guile, it does not halt the atomspace. I was simply using the overall duration of the test as measured by the perl script and the result of (gc-run-time) as measured via a telnet session running in another bash terminal. My measurements may be in error if my assumption about the GC is incorrect as I've hinted a couple of times in prior emails. If anything, however, I'm over counting the total time and undercounting the percentage since I'm not getting CPU measurements for duration on the CogServer, I'm counting the time until the test is finished as determined by the perl script. My base assumption is that 'gc-run-time' is time when the other Guile threads are blocked. I make this assumption because of the way that ' gc_time_taken' statistic in Guile is generated via hooks into the GC's before_gc and after_gc hooks that get called before and after each stop-the-world collection. So my assumption is that since the stop-the-world code suspends all the threads that can be garbage collected, and that since all the CogServer generated threads created in response to an "scm hush (observe-text 'Some sentence'))" are threads that are stopped since they allocate objects in the GC via Guile, that therefore the GC time reported is non-overlapping with the other processing time. If a test takes 100 seconds, and 50 seconds are spent in the GC, during those 50 seconds, there is no work going on in any non-GC threads because they are suspended during the duration of the stop-the-world collection. I have looked at the code to verify these assumptions but it is certainly possible that I am missing something. How many CPUs are on your test machine? Does it have hyper-threaded Intel chips? If so, you don't tend to get accurate measures of real-time CPU performance on those chips. You have to reduce the multi-threading process CPU time by a factor of 0.6 to 0.65 to more accurately reflect the 120% to 130% of CPU core that is available for a hyper-threaded core pair, it's not 200% even though it reports that way. So if you've got 70 units time in a single-thread blocking all the others and a reported 500 units total, the non-blocking multi-threading time is 500-70 or 430 units. Multiply by 0.6 to 0.65 to account for over-reporting on hyper threaded chips and you get 258 to 280. Now divide 258 to 280 by the number of CPUs as reported by the OS (all hyper threading units) and you should be close to 30% of the elapsed time. NOTE: even the 70% reported for the gc time may be overstated by some factor, though since there are plenty of empty cores for OS and other tasks while the GC is running, it is likely that the GC gets full use of a core for most of its duration on a machine that isn't already taxed with other processes (like Postgres asynchronous dumps into the AtomSpace). My tests include no such additional work as I've turned off the store-atom and fetch-atom tests. See: http://perfdynamics.blogspot.hk/2014/01/monitoring-cpu-utilization-under-hyper.html for a bit more on why the CPU times are wrong on Intel chips and linux. -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAJzHpFppfm%3DcQ147FbakeriPhoxXHzQB0%3D46%3Danrrz5CcmAF5g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
