David Jashi wrote:
Wow. Does it mean we'll have live indexing out of the box?
If by "live" you mean that you can index a fetched & parsed segment, and have it appear immediately in live search after you commit, then yes. Other than that, Nutch still uses segments as a unit of work, so the segment generation / fetch / parsing / updatedb etc. are still batch operations that take time.
By the way, is there any chance to modify stemming to process several wordforms (tokens) at once, and not one-by one? That would really increase speed of my external stemming.
You can implement your own analyzer, which first caches all tokens from TokenStream, and then passes them all at once to the external process.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
