Re: reuse of TokenStream
On Fri, Feb 18, 2005 at 10:43:07AM -0500, Erik Hatcher wrote: > I'm confused on how you're reusing a TokenStream object. General > Lucene usage would not involve a developer dealing with it directly. Why not? The IndexWriter wants to tokenize a field, so it calls my Analyzer to get a custom made Tokenizer or TokenStream object for the given field. Since setting up the TokenStream needs some work to be done, I rather not repeat this work for every document to be indexed. Harald. -- Harald Kirsch | [EMAIL PROTECTED] | +44 (0) 1223/49-2593 BioMed Information Extraction: http://www.ebi.ac.uk/Rebholz-srv/whatizit - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: reuse of TokenStream
I'm confused on how you're reusing a TokenStream object. General Lucene usage would not involve a developer dealing with it directly. Could you share an example of what you're up to? I'm not sure if this is related, but a technique I'm using is to index the same Document instance into two different IndexWriter instances (each uses a different Analyzer) - and this is working fine. Erik On Feb 17, 2005, at 6:04 AM, Harald Kirsch wrote: Hi, is it thread safe to reuse the same TokenStream object for several fields of a document or does the IndexWriter try to parallelise tokenization of the fields of a single document? Similar question: Is it safe to reuse the same TokenStream object for several documents if I use IndexWriter.addDocument() in a loop? Or does addDocument only put the work into a queue where tasks are taken out for parallel indexing by several threads? Thanks, Harald. -- --- - Harald Kirsch | [EMAIL PROTECTED] | +44 (0) 1223/49-2593 BioMed Information Extraction: http://www.ebi.ac.uk/Rebholz-srv/whatizit - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
reuse of TokenStream
Hi, is it thread safe to reuse the same TokenStream object for several fields of a document or does the IndexWriter try to parallelise tokenization of the fields of a single document? Similar question: Is it safe to reuse the same TokenStream object for several documents if I use IndexWriter.addDocument() in a loop? Or does addDocument only put the work into a queue where tasks are taken out for parallel indexing by several threads? Thanks, Harald. -- Harald Kirsch | [EMAIL PROTECTED] | +44 (0) 1223/49-2593 BioMed Information Extraction: http://www.ebi.ac.uk/Rebholz-srv/whatizit - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]