Le 28 août 2010 à 14:34, Lars Hellström a écrit :

> Paul Libbrecht skrev:
>> Well, let's continue the lesson... applying it to canonicalization for  
>> search purposes.
> 
> Where of course canonisation of data may turn out to be a too expensive 
> operation to be applied to the entire database being searched, and other 
> methods must be used to sieve it first, but that's true regardless of how you 
> decide equality.

No performance issue here... canonicalization would be applied at "analysis" 
time:
- on each query terms at query time
- on each database's terms at indexing time (once a document changes)

>> It's quite human and I have the impression we're  waiting on the invasion of 
>> rewriting systems:
> 
> They have been around for ages... A problem publicity-wise is that when it 
> was new then computers were too small to fit much anything in memory. 

But they still don't seem to enjoy a super visibility. Or?

>> - uniqueness is the holy grail to be reached, I don't think it can be  in 
>> any way *ensured*. I had the impression rewriting systems were on a  safer 
>> side
> 
> Uniqueness is indeed not ensured by the definition, but something one 
> strongly desires if the point of the exercise is to find the canonical form 
> of some given expression. (Generative grammars may also be viewed as 
> rewriting systems, in which case uniqueness would only hold for singleton 
> languages.) Asking questions about arbitrary rewriting systems runs straight 
> into the general word problem, but questions asked about specific rewriting 
> systems may well be possible to address computationally.

Maybe that's the problem: uniqueness is the grail that involves a "perfect" 
transmission of the knowledge of the "index guru" to the 
search-engine-programme. 
Such perfectness is not achievable at all in word-based search engines where 
the precision and recall have a mean of around 25% in classical competitions.

> Normally, you first want to prove termination (e.g. that you avoid silly 
> rewrite cycles and rewrites that will forever make the result longer);

Seems necessary indeed, though, again... a strategy such as "ah well, this one 
is too complex to get a smart treatment" is quite safe.

> many tools and techniques for this are available (cf. 
> http://www.floc-conference.org/WST-home.html). Then checking whether all 
> critical pairs satisfy the diamond property will typically (it depends 
> slightly on what kind of things you allow as rules) yield an *algorithm* that 
> for any finite rewrite system either proves that it is complete (that normal 
> forms with respect to it are unique) or gives an example of an expression 
> that can be rewritten to two different normal forms.

That seems to be way too theoretic for the practical case.
I would wish statistical approaches...

paul
_______________________________________________
Om mailing list
[email protected]
http://openmath.org/mailman/listinfo/om

Reply via email to