-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Tibor,

I just pushed a new branch to my personal repository, 'nested_parens',
which contains a complete rewrite of SearchQueryParenthesisedParser
which supports arbitrarily nested parthesised search expressions.  It
also outputs more correct search units when an expression has close
proximity to a quoted expression - I haven't investigated why the old
SQPP is broken in this regard.  All the tests pass.

Because it didn't support nested expressions, the old SQPP didn't bother
annotating nested structures.  Grouped items - query expressions in
parentheses - were simply emitted as strings rather than broken into
list terms.  I adapted the new SQPP to do the same thing, though I don't
think it's the right behavior, and it's a 1-line fix to make it emit
nested lists.

This just pushes the problem down the stack, though.  search_engine
doesn't appear to support nested search queries, and I can't think of
any easy way to do it that doesn't involve keeping around a lot of
temporary hitsets.  Which I assume isn't ideal.

So I think that before this new SQPP goes into master, it should
probably be extended to output some kind of flat logical normal form
(conjunctive normal form or disjunctive normal form) for consumption by
the search_engine.  I found a C++ library that manipulates arbitrary
expressions into DNF that doesn't seem like it would be hard to wrap
(boolStuff).  But I also realized today that Norvig's AI text contains a
formula for CNF conversion, and the (free software) related materials on
the book's website, indeed, have working Python code that does CNF
resolution.  Lots of other things, too, but I'm assuming we'd want to
cut it down.

- From my reading of search_engine, it looks like expressions in either
normal form would be handled correctly, and I don't that the CNF time
explosion or the DNF space explosion are real potential problems;
human-input search expressions have a natural limit to their complexity,
probably around 7 subexpressions.

If you could look over the branch and give me your thoughts about having
the parser do first order logic reductions for the user, I'd appreciate it.

Joe
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQIcBAEBCAAGBQJL5LB1AAoJEGh+D3e/PaCrnO4P/AqtT017IpUzkm7q1LTvzIZZ
PQwvKPo7Cy653YDW/9V2YYHqC8TJdPSMEkDavL03keeKW8/IG4ig8+dLArOq0f/R
L3HYc4FzzBasLd1QIXcFawTVV9co27vYjOKnLpmZEehOt7eg15E4cgb6bT99jFvL
pVGkszvGT7hdyysNrgoJO7ZotmLK4NcuXV1jVNReFMzxGB/cCaCTLnBvYQr+ZULY
dxiXOWovCfkKS0f4vsjWYHCS2WoBYMRNgtJ8rDy+1Av8xL2VnDKq7c4pP+Rx0LXs
wiHj33LPho1/hAc+lYX+s6z9vWKgaI8FsBUMxg9lWyFHTdGWyC8Ud1A7dfz/mvtn
Nutt2PMovrTilRxhXPAftTHw0bBVoxzcz8WoczrgjqzCratWq3fPLfmEUTQaiSTH
RewdcLjuKZo6dcaorDevKjwKZXV8Ymu0Wchfa6KH6joT5OoFrTVQ8Ogr8eCer9Hj
HL1/XeNN3wn8qY84+0Lf0y8fT26DDyrg9tTkHKFUGt7lBWgcZRqQnO8PvhX8Ek0P
DZFx3U/2s+q3/NdX+Mmc5+9rkjof4RNIq1pnSrzU5uiAk9mnGzuMF6GvWL7cXIsU
2tuWAoku9c3xdh0oUpSTBD96frTe4gjPMPGQl52cgfg1SW0JMoyVlR070QoBw8bi
ZR2BjkkLPF3RufG23isa
=jL7B
-----END PGP SIGNATURE-----

Reply via email to