[opencog-dev] Re: [opencog/cogutil] Add ThreadLocal utility class (#108)

Linas Vepstas Wed, 25 Apr 2018 11:22:56 -0700

OK, first we need to settle some organizational issues.

Writing private emails to me and Nil is just counter-productive. The
conversation needs to be carried out either on the mailing list, or in
github, or both.  The reason for using a mailing list are these:

-- other people can become aware of the conversation, and can participate
in it.
-- mailing list archives provide a permanent record of what was said, and
how the decisions got made.  This is often very useful, as searching for
mailing list archives with google is a lot easier and more effective than
searching my email box.  Emailing list archives are a prosthetic extension
of my brain - they record my long-term memories.  I have tools for
retrieving those long-term memories that work well with the mailing lists,
and do not work at all with private email archives.

Github issues provide an equally good, if not better way of recording
discussions, since they group related discussions into one location. Its a
far superior technology to slack channels, and other chat-based nonsense.

I just opened issue https://github.com/opencog/atomspace/issues/1645 to
track this work. This is where future discussion should happen.

I am also concerned about the priority of the work.
https://github.com/opencog/atomspace/labels/pattern-matcher lists twenty
issues that exist in the current pattern matcher. Its not at all obvious
that parallelizing the pattern matcher is all that important or urgent.  I
am guessing that Ben put you up to this.  Some advance planning and
organization would have been maybe a bit better.

On Wed, Apr 25, 2018 at 12:07 PM, Vitaly Bogdanov <[email protected]> wrote:

>
> Two main objects participate in matching: PatternMatchEngine instance and
> PatternMatchCallback instance.
>
> PatternMatchEngine actively modifies its state during matching so it
> cannot be used from more than one thread. But we can make one copy for each
> thread and use copies safely, as PME doesn't reuse its state between
> different calls of explore_neighborhood() method (1).
>

Making one copy per thread seems reasonable to me. It would be the "normal"
way of doing multi-threading.

>
> PatternMatchCallback is different thing.
>

Why not make one copy per thread, also? So:

> - (2) grounding() method modifies the state (saves matching results)
>

OK, so this is maybe the hardest part.  The class would need to be split
into two parts: one, which would operate as one-instance-per-thread, and
another part that operates as one-instance-per-search.

However, one work item is to alter the pattern matcher to report results as
they are found, rather than waiting to complete, before reporting anything.
See https://github.com/opencog/atomspace/isues/1507  If results are
reported as they are found, then the need for a class that operates at a
one-instance-per-search seems to go away, or, at least, should be handled
differently.

>
> - (3) evaluate_sentence() method in DefaultPatternMatchCB uses temporary
> atomspace to evaluate things and cleans up atomspace after evaluation so it
> should be synchronized
>

The temporary atomspaces currently do not hold information between each
exploration. Thus, we can alloc one temp atomspace per thread.

> - (4) scope_match()/link_match()/post_link_match()/post_link_mismatch()
> methods in DefaultPatternMatchCB share state between calls via pair of
> pointers: _pat_bound_vars and _gnd_bound_vars - in original code
> - push()/pop() methods are intended to modify the state but there are no
> nontrivial implementations
>

Yes but no. That state is not shared outside of each exploration.  You
really need to visualize each exploration as a stack -- there is a huge
amount of info cached on the C stack, literally -- this is a very recursive
search; and what is not on the C stack is explicitly controlled with the
push/pop stack. It would be a major mistake to try to parallelize the
stack; you'll never get it to work right.  The parallelization MUST happen
in the initiate-search loops, and not somewhere else.

>
>
> (1) - to not make unnecessary copies of PatternMatchEngine on each
> iteration -
>

What's wrong with making copies? This seems cheap, easy, direct and simple
to me.  Simpler and safer than trying to put mutexes everywhere.

>
>
> (3) - I used thread local here to keep implementation as is. At the moment
> temporary atomspace is created once and reused each time when it is
> required to evaluate the term. grab_transient_atomspace() method gets it
> from cache using mutex to be thread safe, so it is not clear why it is kept
> in DefaultPatternMatchCB state. I agree that it can be removed from state
> and received each time when it is needed from cache. Performance have to be
> measured to understand if there is significant loss.
>

I would rather design this correctly, than try various hacks and measure
their performance. The correct design would seem to be a temp atomspace per
thread.

FWIW, if you need a resource pool, we already have one:
https://github.com/opencog/cogutil/blob/master/opencog/util/pool.h

>
> (4) - this is most complex case because there is no simple way using
> stack: link_match() sets _pat_bound_vars and _gnd_bound_vars and
> scope_match()/post_link_match()/post_link_mismatch() methods use it.
> Straightforward solution I see is to move _pat_bound_vars and
> _gnd_bound_vars initialization code out of DefaultPatternMatchCB into
> PatternMatchEngine and pass them to scope_match but it requires callback
> signature changing.
>

The straightforward solution is to not try to parallelize the stack at all.
Instead, do all parallelization in the initiate-search loops.

Again; I doubt that you'll ever be able to fully implement and debug the
parallel stack machine.  I mean, sure, maybe you can, but it seems like a
waste of time and intellectual effort, and tehre is a high probability that
it will be a performance looser.  Just don't even try to parallelize there.

(I hope its clear, what I am trying to talk about here).

----------
OK, next: procedurally: it will be a whole lot easier if you understake
this projects with a whole bunch of smaller pull requests, each of which
reorganizes teh code in such a way that parallelizing the initiate-search
loops becomes easy.

I am very deeply concerned that you will create some huge, complex block of
code changes, that you will submit it in some giant, massive pull request,
you will insist that its just great and wonderful, and then I will get into
a giant and unpleasant argument with Ben as to whether it should be merged
or not.  I really really want to avoid this kind of an ugly situation
before we get even close to it.   The way to do this is really by making
and testing small, incremental stages that get you to where you want to go,
and then creating pull requests for each small, incremental change.  Right
now, this, for me, is an extremely important way to operate.

-- Linas

>
> Thanks,
> Vitaly
>
> 2018-04-25 18:42 GMT+03:00 Linas Vepštas <[email protected]>:
>
>> OK, now some comments re thread-local.
>>
>> So, one standard idiom for storage that gets automatically cleaned up is
>> to put that storage on the stack. When the stack unwinds, the storage is
>> cleaned up automatically. I don't see why this is not sufficient in this
>> case.
>>
>> Separately, I kind of would have liked to talk to you about parallelizing
>> the pattern matcher before you started writing code, instead of after. The
>> "natural", simplest, easiest place to parallelize would be to make the
>> initiate-search loops parallel, and then allow each search to run in a
>> distinct thread. This, it would seem to me, to be a natural place to set up
>> the per-thread stack: its located "at the root" of the search, and so is
>> guaranteed to be alive, until that thread finishes.
>>
>> Doing this would seem to obviate the need for some per-thread storage
>> that shows up "from thin air", in the middle of some random code doing some
>> random thing. I'm rather nervous about the debuggability and correctness of
>> what you are proposing to do.
>>
>> Anyway, this pull req is the wrong place to have this discussion.
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <https://github.com/opencog/cogutil/pull/108#issuecomment-384333734>, or mute
>> the thread
>> <https://github.com/notifications/unsubscribe-auth/AA8BuHRxN64LVigwnvtCgcXd3iFKuz1Wks5tsJlIgaJpZM4Th0Wt>
>> .
>>
>
>

-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA37-hqArHz-qGfsizSPEPW3wkWh8iKT9JtEXBWJxs%3Ds5yg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: [opencog/cogutil] Add ThreadLocal utility class (#108)

Reply via email to