-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi Tibor,
I just pushed a new branch to my personal repository, 'nested_parens', which contains a complete rewrite of SearchQueryParenthesisedParser which supports arbitrarily nested parthesised search expressions. It also outputs more correct search units when an expression has close proximity to a quoted expression - I haven't investigated why the old SQPP is broken in this regard. All the tests pass. Because it didn't support nested expressions, the old SQPP didn't bother annotating nested structures. Grouped items - query expressions in parentheses - were simply emitted as strings rather than broken into list terms. I adapted the new SQPP to do the same thing, though I don't think it's the right behavior, and it's a 1-line fix to make it emit nested lists. This just pushes the problem down the stack, though. search_engine doesn't appear to support nested search queries, and I can't think of any easy way to do it that doesn't involve keeping around a lot of temporary hitsets. Which I assume isn't ideal. So I think that before this new SQPP goes into master, it should probably be extended to output some kind of flat logical normal form (conjunctive normal form or disjunctive normal form) for consumption by the search_engine. I found a C++ library that manipulates arbitrary expressions into DNF that doesn't seem like it would be hard to wrap (boolStuff). But I also realized today that Norvig's AI text contains a formula for CNF conversion, and the (free software) related materials on the book's website, indeed, have working Python code that does CNF resolution. Lots of other things, too, but I'm assuming we'd want to cut it down. - From my reading of search_engine, it looks like expressions in either normal form would be handled correctly, and I don't that the CNF time explosion or the DNF space explosion are real potential problems; human-input search expressions have a natural limit to their complexity, probably around 7 subexpressions. If you could look over the branch and give me your thoughts about having the parser do first order logic reductions for the user, I'd appreciate it. Joe -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQIcBAEBCAAGBQJL5LB1AAoJEGh+D3e/PaCrnO4P/AqtT017IpUzkm7q1LTvzIZZ PQwvKPo7Cy653YDW/9V2YYHqC8TJdPSMEkDavL03keeKW8/IG4ig8+dLArOq0f/R L3HYc4FzzBasLd1QIXcFawTVV9co27vYjOKnLpmZEehOt7eg15E4cgb6bT99jFvL pVGkszvGT7hdyysNrgoJO7ZotmLK4NcuXV1jVNReFMzxGB/cCaCTLnBvYQr+ZULY dxiXOWovCfkKS0f4vsjWYHCS2WoBYMRNgtJ8rDy+1Av8xL2VnDKq7c4pP+Rx0LXs wiHj33LPho1/hAc+lYX+s6z9vWKgaI8FsBUMxg9lWyFHTdGWyC8Ud1A7dfz/mvtn Nutt2PMovrTilRxhXPAftTHw0bBVoxzcz8WoczrgjqzCratWq3fPLfmEUTQaiSTH RewdcLjuKZo6dcaorDevKjwKZXV8Ymu0Wchfa6KH6joT5OoFrTVQ8Ogr8eCer9Hj HL1/XeNN3wn8qY84+0Lf0y8fT26DDyrg9tTkHKFUGt7lBWgcZRqQnO8PvhX8Ek0P DZFx3U/2s+q3/NdX+Mmc5+9rkjof4RNIq1pnSrzU5uiAk9mnGzuMF6GvWL7cXIsU 2tuWAoku9c3xdh0oUpSTBD96frTe4gjPMPGQl52cgfg1SW0JMoyVlR070QoBw8bi ZR2BjkkLPF3RufG23isa =jL7B -----END PGP SIGNATURE-----
