Le 28 août 2010 à 14:34, Lars Hellström a écrit : > Paul Libbrecht skrev: >> Well, let's continue the lesson... applying it to canonicalization for >> search purposes. > > Where of course canonisation of data may turn out to be a too expensive > operation to be applied to the entire database being searched, and other > methods must be used to sieve it first, but that's true regardless of how you > decide equality.
No performance issue here... canonicalization would be applied at "analysis" time: - on each query terms at query time - on each database's terms at indexing time (once a document changes) >> It's quite human and I have the impression we're waiting on the invasion of >> rewriting systems: > > They have been around for ages... A problem publicity-wise is that when it > was new then computers were too small to fit much anything in memory. But they still don't seem to enjoy a super visibility. Or? >> - uniqueness is the holy grail to be reached, I don't think it can be in >> any way *ensured*. I had the impression rewriting systems were on a safer >> side > > Uniqueness is indeed not ensured by the definition, but something one > strongly desires if the point of the exercise is to find the canonical form > of some given expression. (Generative grammars may also be viewed as > rewriting systems, in which case uniqueness would only hold for singleton > languages.) Asking questions about arbitrary rewriting systems runs straight > into the general word problem, but questions asked about specific rewriting > systems may well be possible to address computationally. Maybe that's the problem: uniqueness is the grail that involves a "perfect" transmission of the knowledge of the "index guru" to the search-engine-programme. Such perfectness is not achievable at all in word-based search engines where the precision and recall have a mean of around 25% in classical competitions. > Normally, you first want to prove termination (e.g. that you avoid silly > rewrite cycles and rewrites that will forever make the result longer); Seems necessary indeed, though, again... a strategy such as "ah well, this one is too complex to get a smart treatment" is quite safe. > many tools and techniques for this are available (cf. > http://www.floc-conference.org/WST-home.html). Then checking whether all > critical pairs satisfy the diamond property will typically (it depends > slightly on what kind of things you allow as rules) yield an *algorithm* that > for any finite rewrite system either proves that it is complete (that normal > forms with respect to it are unique) or gives an example of an expression > that can be rewritten to two different normal forms. That seems to be way too theoretic for the practical case. I would wish statistical approaches... paul _______________________________________________ Om mailing list [email protected] http://openmath.org/mailman/listinfo/om
