Stefan Groschupf wrote:

* do you plan to store metadata in inverted lists as well, (which currently would translate into adding new arbitrary fields in Lucene's Document)? This would be very useful in my scenario - I'm doing language detection and key-phrase extraction to enhance the index, and I'd love to store this information in the index itself to avoid the need for separate storage.


+1 that i was asking with how to story dynamically meta data for each page, since i wish to do something similar like key phrase extraction.
I would be interested to hear how you extract you key phrases?

We have built a proprietary solution for this. It works pretty well with most European languages. More details available after signing an NDA.. ;-)

Do you know Kea? It is interesting but does not scale at all.
http://www.text-mining.org/index.jsp? action=showDocumentFromFolder&folderPK=789&documentPK=2500&template=docD

Yes, I'm aware of Kea - but it's GPL, so I don't use it.

--
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)




-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to