On Tue, Jun 6, 2017 at 2:02 AM, Curtis Faith <[email protected]> wrote:
> Linas wrote: > > >> 1) I fixed the hang. :-) Or figured out how to avoid it. It turns out >> that, due to a stoopid bug/oversight, quotation marks in the input text >> were not being escaped. Thus, guile would see a begin-quote, end-quote, >> followed by garbage. A backtrace would be generated, and passed back to the >> witless user, who is just netcat, can couldn't give a damn, and so was >> silently ignored. Escape the quotation marks correctly, the problem goes >> away. > > > Cool. > I was wrong. It still happens, and it happens just as often as before. So this wasn't it. > > Puzzling how this but might result in the fp in the garbage collector > getting screwed up. Did you find the specific bug in the GC or Guile's use > of it that causes the fp to get screwed up, so we can help the developers > get that fixed? > I have been fed up with this conversation for a long time. There's no bug in the GC. I don't understand why you keep saying that, or what evidence you have for that. Clearly, there are bugs in guile. I guess. Maybe. Probably. Not sure. All I know is that memory gets corrupted. Don't know where, why, who is doing the corruption. Could be anyone, anything. Could be the stupid boost C++ library that handles socket I/O. Who knows. > A problem in scheme source syntax or variables not being spelled correctly > shouldn't result in an infinite loop in the GC in any case. Seems like they > must be missing some sort of exception handler somewhere. > > 2) how are you measuring GC time? I also get 70% but i also get 500% cpu >> time for the other 181 active threads, so 70/500 seems acceptable to me. >> Again, GC halts only guile, it does not halt the atomspace. > > > I was simply using the overall duration of the test as measured by the > perl script and the result of (gc-run-time) as measured via a telnet > session running in another bash terminal. My measurements may be in error > if my assumption about the GC is incorrect as I've hinted a couple of times > in prior emails. If anything, however, I'm over counting the total time and > undercounting the percentage since I'm not getting CPU measurements for > duration on the CogServer, I'm counting the time until the test is finished > as determined by the perl script. > Sounds like you're counting wall-clock time, not cpu-time. So that is misleading you. > > My base assumption is that 'gc-run-time' is time when the other Guile > threads are blocked. I make this assumption because of the way that ' > gc_time_taken' statistic in Guile is generated via hooks into the GC's > before_gc and after_gc hooks that get called before and after each > stop-the-world collection. So my assumption is that since the > stop-the-world code suspends all the threads that can be garbage collected, > and that since all the CogServer generated threads created in response to > an "scm hush (observe-text 'Some sentence'))" are threads that are stopped > since they allocate objects in the GC via Guile, that therefore the GC time > reported is non-overlapping with the other processing time. > > If a test takes 100 seconds, and 50 seconds are spent in the GC, during > those 50 seconds, there is no work going on in any non-GC threads because > they are suspended during the duration of the stop-the-world collection. I > have > looked at the code to verify these assumptions but it is certainly possible > that I am missing something. > You are looking at wall-clock time, not cpu time. > > How many CPUs are on your test machine? > 24 > Does it have hyper-threaded Intel chips? > No. > If so, you don't tend to get accurate measures of real-time > CPU performance on those chips. > what? Is this some bug with, what, Xeon's? I don't beleive that, that is a very seriious bug. I don't beleive Intel would ship a chip that would do that. I mean, you'de have trouble booting an OS on it, since well, at least Linux turns off the clock very early in the boot process, and excepts the CPU do do the right thing. This got done to save power (battery on laptops, but also wasted cpu cycles on virtualized mainframes) Below it sounds like you describe a serious bug. I exepct intel chips to report correct time, even if they are hyper-threaded. Is this a Windows bug, maybe? hyper-threading is a cheap stunt to try to keep the cpu busy while waiting for data to arrive from RAM. So part of what you say below is correct, but I would expect the chip not to lie, and claim it spent 100% of clock cycles hyperthreading, when it only spent 10% of it's time there. --linas > > You have to reduce the multi-threading process CPU time by a factor of 0.6 > to 0.65 to more accurately reflect the 120% to 130% of CPU core that is > available for a hyper-threaded core pair, it's not 200% even though it > reports that way. So if you've got 70 units time in a single-thread > blocking all the others and a reported 500 units total, the non-blocking > multi-threading time is 500-70 or 430 units. Multiply by 0.6 to 0.65 to > account for over-reporting on hyper threaded chips and you get 258 to 280. > Now divide 258 to 280 by the number of CPUs as reported by the OS > (all hyper threading units) and you should be close to 30% of the elapsed > time. NOTE: even the 70% reported for the gc time may be overstated by some > factor, though since there are plenty of empty cores for OS and other tasks > while the GC is running, it is likely that the GC gets full use of a core > for most of its duration on a machine that isn't already taxed with other > processes (like Postgres asynchronous dumps into the AtomSpace). My tests > include no such additional work as I've turned off the store-atom > and fetch-atom tests. > > See: http://perfdynamics.blogspot.hk/2014/01/monitoring-cpu-utilization- > under-hyper.html for a bit more on why the CPU times are wrong on Intel > chips and linux. > > -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA35%2BWQA%3DCszhJpPQxJL3N2BqU12%2BDNKHdz%2Bow5KKcGm8pA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
