Hi Roman,

Il giorno sab, 18/12/2010 alle 12.17 +0100, Roman Chyla ha scritto: 
> I agree this is cool, but something doesn't fit, at least I don't
> understand how this could be used for the task of bibclassify, the
> dict is good if you know (more or less) what you are looking for, but
> the task of bibclassify is to find entities inside the fulltext - and
> to find that out, bibclassify has to search for it - and it is not
> exactly the same thing as the spell checking. I must be missing
> something, could you explain to me what advantage at all there would
> be in using the dict? As a fast cache of single level entries? I could
> see how it would be more useful for the cache, citation links etc.,
> but not for bibclassify.

I am not that aware of how BibClassify works right now, but if its final
goal is to look for the most frequent keywords (from a controlled set)
inside a fulltext, then, post-poning the issue of the grammar (plural,
genders, conjugations :-S), I think that it would be indeed possible to
use dictd in a orthogonal way than we currently do with ontologies. 

Currently for each word in the ontology (correct me if I am wrong) we
look how many times it appears in the text.

On the other hand with dict, we might simply take all the words in the
text, and filter them against the dictionary (which is built after the
ontology), and then sum up the occurencies of repeated words. 

The two methods should accomplish the same goal (if I am not wrong on
BibClassify algorithm) but the latter should be in principle extremely
fast, unless the grammar issue is the bottleneck.

Cheers!
Sam

-- 
Samuele Kaplun
Invenio Developer ** <http://invenio-software.org/>

Reply via email to