[opencog-dev] Re: Pattern Matching performance

Linas Vepstas Tue, 13 Mar 2018 21:55:42 -0700

On Tue, Mar 13, 2018 at 3:14 AM, Alexey Potapov <[email protected]> wrote:


> Efficient data structures and algos to deal with numeric data are
> necessary,
>

What kind of numeric data?  Again, please note that the atomspace has two
classes of data: peristent, immutable, globally-unique  "Atoms" which are
slow in part because they are indexed: they are used to represent
"knowledge". The NumberNode is an atom.

The other type are "Values" (sich as truth values) which are temporary,
fleeting, non-unique, not indexed/indexable. And much faster. FloatValue is
a value. Its actually a vector of floats.

Both NumberNodes and FloatValues store numbers, but they have very
different performance characteristics, and are meant to solve different
classes of problems.


> although the queries to episodic memory may be as complex as to pattern
> matcher.
>

Sure. If by "episodic" you mean time and 3D locations, then yes. Like I
said, the space-server is very naive. It would be nice if the space server
were actually episodic memory, but it isn't.


> But even with this additional module, the conceptual problem with PM
> remains.
>

What conceptual problem is that?


> A fixed algo for solving an NP-complete problem cannot be a basis for
> intelligence. Period.
>

Eh? Just because a problem is NP complete does not mean that it doesn't
contain sub-problems that are solvable much much faster.  Pattern matcher
queries for single variables will run in log(N) time, or worst-case O(N)
time. But you can also create queries that are pathologically perverse.

Take, for example, SAT solvers.  They are commonly used to model 64-bit
CPU's. Modelling a three-register operation on a CPU requires exploring
2^(64+64+64) = 2^192 possibilities, which is more than there are atoms in
our local galaxy cluster.  It is literally an astronomic number. Yet SAT
solvers solve this in fractions of a second.  There's a reason for that.
Just because a problem is NP complete, doesn't mean that it has to be slow.

Of course, if your 64-bit CPU has instructions that implement cryptographic
hashes in hardware, and hand it to the same SAT solver as before, don't be
surprised if that millisecond computation suddenly takes years to
complete.   SAT cannot magically factor large primes.

Life, the universe, and everything is NP complete. That's just how it is.


> I'll answer to Linas in more detail later...
>

-- I  don't always respond to email quickly. If its important, please nag
me.

--linas

>
> 2018-03-13 10:52 GMT+03:00 Ben Goertzel <[email protected]>:
>
>> Yes, if the problem at hand is
>>
>> ***
>> It just searches for pairs of bounding boxes one of which is higher
>> than another within the same frame.
>> ***
>>
>> then it should be possible to fairly simply craft some optimized way
>> to solve this using the SpaceServer ...
>>
>> Basically the SpaceServer gives an octree (or quadtree) index for a
>> set of Atoms that correspond to coordinate values in some common 3D or
>> 2D space..
>>
>> The situation is similar to with spatial DBs versus standard
>> relational DBs ....  For some queries a spatial DB is massively more
>> efficient due to the types of indexes it contains...
>>
>> For some PM queries, using the octree/quadtree index in the
>> SpaceServer will be much more efficient than just using the
>> Atomspace...
>>
>> Right now however the PM is not connected to the SpaceServer, i.e. if
>> a predicate or schema that would be rapidly evaluated/executed using
>> the SpaceServer is encountered, it is not currently evaluated/executed
>> that way...
>>
>> So linking the PM to the SpaceServer would be clearly beneficial for
>> many applications and would enable space and/or time related PM
>> queries to get resolved much more rapidly...
>>
>> -- ben
>>
>>
>> On Tue, Mar 13, 2018 at 3:30 PM, Nil Geisweiller
>> <[email protected]> wrote:
>> > So if I understand well, the appropriate fix would be to use AtTimeLink
>> and
>> > AtLocationLink and have the pattern matcher specially handles this via
>> > perhaps relaying the spatio-temporal component of the query to the space
>> > server.
>> >
>> > I know almost nothing about the current state of space server so take my
>> > comment with a grain of salt...
>> >
>> > Alternatively, there might be more abstract heuristics that can hacked
>> into
>> > the pattern matcher only, maybe that would be simpler and more general.
>> No
>> > idea...
>> >
>> > Nil
>> >
>> > On 03/13/2018 07:07 AM, Linas Vepstas wrote:
>> >>
>> >> I hope you can read the below. It took me a while to type it in.
>> >> I think maybe I'm the victim of some kind of google A/B testing
>> >> today, because for me, my own emails are totally unreadable.
>> >> Let me know if you suspect you didn't receive everything I sent ...
>> >>
>> >> --linas
>> >>
>> >> On Tue, Mar 13, 2018 at 12:05 AM, Linas Vepstas <
>> [email protected]>
>> >> wrote:
>> >>>
>> >>> OK, trying again. Maybe the third time it will work right.
>> >>>
>> >>> -- Linas
>> >>>
>> >>> On Tue, Mar 13, 2018 at 12:04 AM, Linas Vepstas <
>> [email protected]>
>> >>> wrote:
>> >>>>
>> >>>> Resending. My email agent is formatting and indenting bizarrely.
>> >>>> I don't understand why its 2018 and something as simple as email
>> >>>> has ugly and unreliable formating.  Is this what the heat death of
>> >>>> the universe looks like?
>> >>>>
>> >>>> --linas
>> >>>>
>> >>>> On Mon, Mar 12, 2018 at 11:55 PM, Linas Vepstas <
>> [email protected]>
>> >>>> wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> To repeat my earlier remarks, and add a few more:
>> >>>>>
>> >>>>> On Mon, Mar 12, 2018 at 3:42 PM, Alexey Potapov <[email protected]
>> >
>> >>>>> wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>> Yes. But Pattern Matcher compares all pairs of bounding boxes
>> making
>> >>>>>> it
>> >>>>>> quadratic.
>> >>>>
>> >>>>
>> >>>>> Yes, there is supposed to be a space-server that is supposed to be
>> >>>>> optimized
>> >>>>> for these kinds of searches. There is one, but its super-minimal,
>> and
>> >>>>> not at all
>> >>>>> integrated with the pattern matcher, or anything else for that
>> matter.
>> >>>>> Smart
>> >>>>> searches on bounding boxes is completely green-field development,
>> for
>> >>>>> the
>> >>>>> atomspace. We've kind of got very nearly nothing for this.
>> >>>>
>> >>>>
>> >>>>
>> >>>>>>> Yes, but keep in mind that the PM is also Turing complete because
>> you
>> >>>>>>> can
>> >>>>>>> call any function within a query (including the PM itself).
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> This is quite problematic. Basic processes should not execute
>> anything
>> >>>>>> dangerous that can take too much time or loop forever and cannot be
>> >>>>>> interrupted. Thus, we should either not treat PM as a basic
>> process,
>> >>>>>> or
>> >>>>>> should restrict its capabilities (and shift responsibility for
>> >>>>>> evaluating
>> >>>>>> arbitrary code to other processes).
>> >>>>
>> >>>>
>> >>>>
>> >>>>> This is misleading or a mis-understanding. Its not the correct way
>> to
>> >>>>> think
>> >>>>> about it.  The current pattern matcher is 2 or three or 4 things
>> >>>>> matched into
>> >>>>> one:
>> >>>>>
>> >>>>> A) A generic subgraph isomorphism solver. Since this is an NP
>> complete
>> >>>>> problem, it's certainly possible to create pathologically slow
>> queries.
>> >>>>>
>> >>>>> B) A way of combining subgraphs using a crisp-logic boolean algebra
>> >>>>> (actually a Heyting algebra) which we have very vague intentions of
>> >>>>> promoting into something probabilistic.  Which, of course, if this
>> was
>> >>>>> done,
>> >>>>> would layer on an additional combinatoric explosion.  It would be
>> >>>>> fruitful to
>> >>>>> discuss the wisdom or stupidity of this particular task. Or
>> alternative
>> >>>>> designs
>> >>>>> for it.
>> >>>>>
>> >>>>> C) The ability to perform subgraph isomorphism with so-called "axiom
>> >>>>> schemata".
>> >>>>> An "axiom schemata" is roughly an infinite collection of relations,
>> for
>> >>>>> example
>> >>>>> "less than" over the integers or rationals or reals.   This means
>> that
>> >>>>> the pattern
>> >>>>> matcher is kind-of-like-ish a "satisfiability modulo theories" (SMT)
>> >>>>> solver.  The
>> >>>>> API for specifying a theory at this point is rather very simplistic.
>> >>>>> The "virtual link"
>> >>>>> is  that API. It says, basically "implement your model theory here,
>> as
>> >>>>> C++ code,
>> >>>>> and we will automatically do the satisfiability modulo your theory
>> for
>> >>>>> you"
>> >>>>>
>> >>>>> Currently supported theories are the equational theory (EqualLink)
>> and
>> >>>>> numeric
>> >>>>> inequaltiy (GreaterThanLink)  An example of a nice-to-have theory
>> would
>> >>>>> be
>> >>>>> linear algebra - done right, this could  solve your space-time
>> bounding
>> >>>>> box problem
>> >>>>> for starters, and linear programming type problems if anyone cared
>> >>>>> about that.
>> >>>>> Maybe matroids. Whatever. Dunno.  Another nice-to-have would be
>> naive
>> >>>>> set theory
>> >>>>> which could help lay a cornerstone of probability done right (TM)
>> but
>> >>>>> this would be
>> >>>>> a long and difficult but veryinteresting discussion.
>> >>>>>
>> >>>>> D) Once a matching subgraph is found, you can launch arbitrary
>> >>>>> C++/scheme/python
>> >>>>> code to do something with that subgraph. So that's unbounded.
>> >>>>
>> >>>>
>> >>>>
>> >>>>>>> So, the first thing you should do is build a good benchmark tool,
>> >>>>>>> then we,
>> >>>>>>> you and the rest of opencog community, can supply it with a
>> >>>>>>> collection of
>> >>>>>>> critical tests.
>> >>>>>>
>> >>>>>>
>> >>>>>> How do you see such tool?
>> >>>>
>> >>>>
>> >>>>> Unclear. I am interested in a tool that tells me if performance got
>> >>>>> worse after
>> >>>>> a particular code change or bug fix.  Some of our fixes accidentally
>> >>>>> slow things
>> >>>>> down (by a lot) and no one noticies for months or half a year.
>> >>>>
>> >>>>
>> >>>>>> We have some datasets of varying sizes and
>> >>>>>> queries, and simply run PM on these queries and measure the time.
>> What
>> >>>>>> can
>> >>>>>> be unified/automated?
>> >>>>
>> >>>>
>> >>>>> Yes!?
>> >>>>
>> >>>>
>> >>>>>> Running all tests and writing log files with
>> >>>>>> computation times? Anything else?
>> >>>>
>> >>>>
>> >>>>> I don't care about log files.
>> >>>>
>> >>>>
>> >>>>>> I agree that we need several (many?) tests to be sure that some
>> >>>>>> changes
>> >>>>>> didn't affect any types of queries, but isn't it just a script? Or
>> do
>> >>>>>> you
>> >>>>>> have something much more complicated in mind?
>> >>>>
>> >>>>
>> >>>>> Take a look at 3D graphics performance: there are two types of
>> >>>>> benchmark:
>> >>>>> triangles per second, lines per second, texture maps per second.
>> >>>>
>> >>>>
>> >>>>> The other type is "for game XYZ, frames per second".
>> >>>>
>> >>>>
>> >>>>> We need both types of benchmarks.  Probably the first more than the
>> >>>>> second,
>> >>>>> because it helps developers more.  The second kind just tells you
>> how
>> >>>>> screwed
>> >>>>> up the system is today, without telling you why, or where to look.
>> >>>>
>> >>>>
>> >>>>
>> >>>>>> In our case, the query processing time becomes unrealistically
>> large
>> >>>>>> for a
>> >>>>>> one-minute video.  If we consider the problem of search in the
>> entire
>> >>>>>> episodic memory, it should be even not linear, but logarithmic. Low
>> >>>>>> degree
>> >>>>>> polynomial complexity is ok for the task dimensions ~1000, not
>> >>>>>> millions or
>> >>>>>> billions...
>> >>>>
>> >>>>
>> >>>>> No clue what you are searching here.  Logarithmic searches mean
>> binary
>> >>>>> tree,
>> >>>>> quad-tree, octree.  The space-time sever is an octree, but its not
>> >>>>> integrated with
>> >>>>> the pattern matcher, and has a super-naive API.
>> >>>>
>> >>>>
>> >>>>
>> >>>>> We have binary-trees and hash tables for atoms-by-name, by-type, but
>> >>>>> zero
>> >>>>> sophistication for numeric values. See above comments about
>> >>>>> "satisfiability
>> >>>>> modulo theories".
>> >>>>
>> >>>>
>> >>>>
>> >>>>> Linas
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> cassette tapes - analog TV - film cameras - you
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> cassette tapes - analog TV - film cameras - you
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> cassette tapes - analog TV - film cameras - you
>> >>
>> >>
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Ben Goertzel, PhD
>> http://goertzel.org
>>
>> "In the province of the mind, what one believes to be true is true or
>> becomes true, within certain limits to be found experientially and
>> experimentally. These limits are further beliefs to be transcended. In
>> the mind, there are no limits.... In the province of connected minds,
>> what the network believes to be true, either is true or becomes true
>> within certain limits to be found experientially and experimentally.
>> These limits are further beliefs to be transcended. In the network's
>> mind there are no limits." -- John Lilly
>>
>
>


-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA37dAQUXxNxaoQM4W25j%2BS4RtLMzXPth-9LO%2BRjH0gm9qQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: Pattern Matching performance

Reply via email to