Linas wrote:
> OK. So lets think this through. The only place where GC is being used is
> in guile; GC is not being used in the atomspace itself. So you could
> accomplish exactly the same thing by periodically shutting down guile,
> completely. This would release all that memory, and then you are done.
>
<snip>
> But I don't see any way of implementing a pool, without fully shutting
> down guile; and if one fully shuts down guile, then one doesn't need a pool.
Okay. So here's what I think it happening now in the CogServer, please
correct me if this is wrong:
CogServer receives 3 requests, then CogServer creates three new threads:
'scm (observe-text "Test sentence one. )"'
Thread one
'scm (observe-text "Test sentence two. )"'
Thread two
'scm (observe-text "Test sentence three. )"'
Thread three
If Guile's GC, the bwdgc, were to be altered to check a thread-local flag
and if it was set it would allocated new objects out of a thread-local
pool, then when these types of threads completed there would be no need to
garbage collect any allocations made out of that pool. One could just
restart the pool.
But this is way more work than just restarting Guile every once in a while,
I agree. There may also be other problems with this approach. It is a very
general solution to what is likely to be a very common issue in the future,
so it may be worth looking at in the future.
So on to your objection to restarting Guile.
The problem with this proposal is that pretty much everything runs through
> guile. All the atoms go in through it, and come out through it. So
> shutting down guile and restarting it is tantamount to shutting down the
> system, and restarting it. Which is OK, if you saved all the atoms you
> care about to the database.
>
I don't see why shutting down Guile is tantamount to shutting down the
system and restarting it. Right now, the atom space is created first, then
the SchemeEval object is created in CogServer::CogServer. So I don't see
why we couldn't have a function that shutdown Guile releasing all it's
memory back to the OS, and then restarted a new Guile interpreter without
destroying the AtomSpace, upon restart you'd pass in the same atom space
that existed before the shutdown.
Now, due to some of the quirks of the GC, and perhaps Guile's interaction
with it, shutting down and restarting is a bit tricky. You've got the issue
with the infinite sleeping initialization thread noted in
https://github.com/opencog/atomspace/blob/master/opencog/guile/SchemeEval.cc#L241-L285
You've also got a few scattered SCM static globals scattered about that
would have to be cleared and reloaded. It's a bit of work, sure, but doable.
The cleanest way would be to call the appropriate destructors to free up
the memory allocated by Guile and the GC. Then unload both libguile.so and
libgc.so and reload them, and then redo the initialization.
Still, I'm not sure any of this is what we should be doing now since the
issue at hand can be most easily handled through other means.
On Tue, Jun 6, 2017 at 3:10 AM, Linas Vepstas <[email protected]>
wrote:
> Long message...
>
> On Sat, Jun 3, 2017 at 10:07 PM, Curtis Faith <[email protected]>
> wrote:
>
>> Linas wrote:
>>
>>
>>> The idea that you are going to use a special pool for guile which you
>>> then clear out ever so often is just .. a proposal to take a sophisticated
>>> GC algorithm and replace it with a truly sophmoric ..a ahem freshman
>>> concept of GC. Its a waste of time.
>>>
>>
>> GC is needed when you have long-running processes or threads that can't
>> just leak with impunity. It is wholly unnecessary for short-duration tasks
>> with moderate memory requirements. In the special case of processing batch
>> requests, with web servers like nginx or Apache or the CogServer running
>> observe-text, there are not many objects that can't immediately be
>> destroyed when the request (or sets of those requests) finish. That makes
>> the overhead of GC unnecessary in these batch processing cylcles. It also
>> makes the overhead of finalizing anything unnecessary if there is enough
>> RAM to service the requests without any cleanup during a single requests
>> processing. You need to release system resources and that's about it.
>>
>> A pool-based cleanup--once-at-the-end approach will make things much
>> faster whether the problem I am seeing ends up being from a bug or not.
>>
>
> OK. So lets think this through. The only place where GC is being used is
> in guile; GC is not being used in the atomspace itself. So you could
> accomplish exactly the same thing by periodically shutting down guile,
> completely. This would release all that memory, and then you are done.
>
> The problem with this proposal is that pretty much everything runs through
> guile. All the atoms go in through it, and come out through it. So
> shutting down guile and restarting it is tantamount to shutting down the
> system, and restarting it. Which is OK, if you saved all the atoms you
> care about to the database.
>
> But I don't see any way of implementing a pool, without fully shutting
> down guile; and if one fully shuts down guile, then one doesn't need a pool.
>
> The only alternatives are to use python, by python is single-threaded, so
> this is a non-starter. The 3rd alternative is haskell, but it's also
> garbage-collelcted. Can't use C++ because it has no interpreted
> command-line. (Using C++ is tantamount to shutting everything down,
> recompiling, and restarting everything, which is clearly the worst-possible
> scenario). A 5th alternative is to invent a custom vocabulary of words to
> control C++ objects from an interactive command line, but this is clearly a
> design disaster. It was 1988, when the folly of this was realized, which
> cause tcl to be integrated into C apps The inadequacy of TCL lead to the
> invention of guile ... and so here we are, full-circle. We could switch to
> swig+perl, or to javascript.. but javascript is garbage-collected, and I
> think perl is too, not sure. I just don't see any way of implementing what
> you are talking about.
>
>>
>> Performance is clearly an issue for unsupervised learning
>>
>
> Really? Sorry, but in what way? What's the problem?
>
>
>> and general AI in general. It is also an issue for OpenCog right now
>> since getting the data into the AtomSpace in the right form is taking far
>> too long right now.
>>
>
> Really? In what way? What is the problem?
>
>
>> No matter what we do performance will always be an issue because of the
>> sheer size of the datasets researchers want to work with.
>>
>
> Well, if those folks at Intel and AMD weren't so lazy, we'd have great
> performance by now.
>
>
>> That is why Ben first asked me to look into using AWS to spawn parallel
>> processes to cut down on the calendar time required to input large corpora.
>>
>
> Well, we know Ben is crazy. This is not where the problem lies. Its easy
> to get large corpora pumped through. I can give you a dozen dumps of
> datasets so large they won't fit in the RAM of your computer. Do you want
> large datasets? Cause I got them.
>
> The problem is that I don't have tools to analyze those datasets. That's
> where 95% of my personal bottleneck lies. Simply crunching a lot of data
> is just so totally not at all the hard part.
>
> > I'm seeing 50% to 70% times spend in the GC.
>
> Are you using the tool I sent you? because I am seeing less than 20%.
>
>
>>>> >>>>> I totally
>>>>
>>>
> Sorry for some of the sarcasm. Sure, more performance would always be
> nice, but GC time is a complete red herring. Also, technically, I think GC
> is not a solvable problem. The alternative is reference counting, and that
> is also a total CPU hog.
>
> I do have some proposals, but first: 1) I have large datasets, 2) creating
> large datasets is totally not an issue. 3) creating tools to analyze them
> is almost 100% of the issue.
>
> But if you wanted to get atoms into the atomspace faster: like 10x faster
> or 20x faster: you could run the link-grammar parser in the same address
> space as the atomspace. Just take what it spits out, convert them into
> atoms, shove the atoms into the atomspace. This would completely by-pass
> guile, and bypass all GC. So GC would totally not be an issue in this
> case.
>
> To be clear: currently, LG parses text, then bloody java code turns it
> into strings, which are sent over a socket to guile, which evaluates the
> strings, and creates atoms. About 80% or more of this process is the cost
> of having guile evaluate strings that specify atoms, in string format.
> Eliminate this, and you get an instant 3x, 4x speedup.
>
> Once you did this, you'd discover two other bottlenecks: shoving atoms
> into the atomspace is slowwwww. And pushing atoms out to the database is
> slowwww. These are much harder, but more important bottlenecks to overcome.
>
> Re: running LG in the same adress space as the atomspace: this has already
> been done; the surreal code does this. In a day or 2 or 3 you could write
> the needed wrapper code to have LG live directly inside of opencog,
> generating the correct atoms, thus totally bypassing guile and garbage
> collection. And this would be a very easy way to get a 3x speedup, if
> that's really your end-goal. Its a lot wasier than all the other crazy
> schemes discussed.
>
> In the very-long term, I plan to do this anyway, because I want to apply
> the LG algorthms to generic atomspace data, not just to natural language.
> However, curently LG is totally focused only on lanugage, and its too much
> work to re-implement it as a generic data parser. Baby steps, for now.
>
> --linas
>
>
>
--
You received this message because you are subscribed to the Google Groups
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit
https://groups.google.com/d/msgid/opencog/CAJzHpFoHwN446RzkRpNGgKmG%3DOpFAbNJj9dtOSt%3DF0j%2B9UNgPA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.