[opencog-dev] Re: Pattern Matching performance

'Nil Geisweiller' via opencog Tue, 13 Mar 2018 00:21:07 -0700

Linas, your email looked perfectly fine to me.

I would advice you to not use Google webmail interface. Personally, Iuse Thunderbird. It's completely community driven (unlike Firefox),allows to read and write emails off-line. The only thing I miss isexternal editor support so I can easily read/write my emails in Emacs (Iused to manage my emails in Emacs via Wonderlust, but it's much slowerthan Thunderbird).


Nil

On 03/13/2018 07:07 AM, Linas Vepstas wrote:

I hope you can read the below. It took me a while to type it in.
I think maybe I'm the victim of some kind of google A/B testing
today, because for me, my own emails are totally unreadable.
Let me know if you suspect you didn't receive everything I sent ...

--linas

On Tue, Mar 13, 2018 at 12:05 AM, Linas Vepstas <[email protected]> wrote:

OK, trying again. Maybe the third time it will work right.

-- Linas

On Tue, Mar 13, 2018 at 12:04 AM, Linas Vepstas <[email protected]> wrote:

Resending. My email agent is formatting and indenting bizarrely.
I don't understand why its 2018 and something as simple as email
has ugly and unreliable formating.  Is this what the heat death of
the universe looks like?

--linas

On Mon, Mar 12, 2018 at 11:55 PM, Linas Vepstas <[email protected]> wrote:

Hi,

To repeat my earlier remarks, and add a few more:

On Mon, Mar 12, 2018 at 3:42 PM, Alexey Potapov <[email protected]> wrote:


Yes. But Pattern Matcher compares all pairs of bounding boxes making it
quadratic.

Yes, there is supposed to be a space-server that is supposed to be optimized
for these kinds of searches. There is one, but its super-minimal, and not at all
integrated with the pattern matcher, or anything else for that matter.  Smart
searches on bounding boxes is completely green-field development, for the
atomspace. We've kind of got very nearly nothing for this.

Yes, but keep in mind that the PM is also Turing complete because you can
call any function within a query (including the PM itself).



This is quite problematic. Basic processes should not execute anything
dangerous that can take too much time or loop forever and cannot be
interrupted. Thus, we should either not treat PM as a basic process, or
should restrict its capabilities (and shift responsibility for evaluating
arbitrary code to other processes).

This is misleading or a mis-understanding. Its not the correct way to think
about it.  The current pattern matcher is 2 or three or 4 things matched into
one:

A) A generic subgraph isomorphism solver. Since this is an NP complete
problem, it's certainly possible to create pathologically slow queries.

B) A way of combining subgraphs using a crisp-logic boolean algebra
(actually a Heyting algebra) which we have very vague intentions of
promoting into something probabilistic.  Which, of course, if this was done,
would layer on an additional combinatoric explosion.  It would be fruitful to
discuss the wisdom or stupidity of this particular task. Or alternative designs
for it.

C) The ability to perform subgraph isomorphism with so-called "axiom schemata".
An "axiom schemata" is roughly an infinite collection of relations, for example
"less than" over the integers or rationals or reals.   This means that the 
pattern
matcher is kind-of-like-ish a "satisfiability modulo theories" (SMT) solver.  
The
API for specifying a theory at this point is rather very simplistic. The "virtual 
link"
is  that API. It says, basically "implement your model theory here, as C++ code,
and we will automatically do the satisfiability modulo your theory for you"

Currently supported theories are the equational theory (EqualLink) and numeric
inequaltiy (GreaterThanLink)  An example of a nice-to-have theory would be
linear algebra - done right, this could  solve your space-time bounding box 
problem
for starters, and linear programming type problems if anyone cared about that.
Maybe matroids. Whatever. Dunno.  Another nice-to-have would be naive set theory
which could help lay a cornerstone of probability done right (TM) but this 
would be
a long and difficult but veryinteresting discussion.

D) Once a matching subgraph is found, you can launch arbitrary C++/scheme/python
code to do something with that subgraph. So that's unbounded.

So, the first thing you should do is build a good benchmark tool, then we,
you and the rest of opencog community, can supply it with a collection of
critical tests.


How do you see such tool?

Unclear. I am interested in a tool that tells me if performance got worse after
a particular code change or bug fix.  Some of our fixes accidentally slow things
down (by a lot) and no one noticies for months or half a year.

We have some datasets of varying sizes and
queries, and simply run PM on these queries and measure the time. What can
be unified/automated?

Yes!?

Running all tests and writing log files with
computation times? Anything else?

I don't care about log files.

I agree that we need several (many?) tests to be sure that some changes
didn't affect any types of queries, but isn't it just a script? Or do you
have something much more complicated in mind?

Take a look at 3D graphics performance: there are two types of benchmark:
triangles per second, lines per second, texture maps per second.

The other type is "for game XYZ, frames per second".

We need both types of benchmarks.  Probably the first more than the second,
because it helps developers more.  The second kind just tells you how screwed
up the system is today, without telling you why, or where to look.

In our case, the query processing time becomes unrealistically large for a
one-minute video.  If we consider the problem of search in the entire
episodic memory, it should be even not linear, but logarithmic. Low degree
polynomial complexity is ok for the task dimensions ~1000, not millions or
billions...

No clue what you are searching here.  Logarithmic searches mean binary tree,
quad-tree, octree.  The space-time sever is an octree, but its not integrated 
with
the pattern matcher, and has a super-naive API.

We have binary-trees and hash tables for atoms-by-name, by-type, but zero
sophistication for numeric values. See above comments about "satisfiability
modulo theories".

Linas


--
cassette tapes - analog TV - film cameras - you




--
cassette tapes - analog TV - film cameras - you




--
cassette tapes - analog TV - film cameras - you


--
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/f64084bb-5f28-e243-1530-a0dfa07254c2%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: Pattern Matching performance

Reply via email to