Re: reuse of TokenStream

2005-02-18 Thread Harald Kirsch
On Fri, Feb 18, 2005 at 10:43:07AM -0500, Erik Hatcher wrote:
> I'm confused on how you're reusing a TokenStream object.  General  
> Lucene usage would not involve a developer dealing with it directly.   

Why not? The IndexWriter wants to tokenize a field, so it calls my
Analyzer to get a custom made Tokenizer or TokenStream object for the
given field. Since setting up the TokenStream needs some work to be
done, I rather not repeat this work for every document to be indexed.

  Harald.


-- 

Harald Kirsch | [EMAIL PROTECTED] | +44 (0) 1223/49-2593
BioMed Information Extraction: http://www.ebi.ac.uk/Rebholz-srv/whatizit

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: reuse of TokenStream

2005-02-18 Thread Erik Hatcher
I'm confused on how you're reusing a TokenStream object.  General  
Lucene usage would not involve a developer dealing with it directly.   
Could you share an example of what you're up to?

I'm not sure if this is related, but a technique I'm using is to index  
the same Document instance into two different IndexWriter instances  
(each uses a different Analyzer) - and this is working fine.

Erik
On Feb 17, 2005, at 6:04 AM, Harald Kirsch wrote:
Hi,
is it thread safe to reuse the same TokenStream object for several
fields of a document or does the IndexWriter try to parallelise
tokenization of the fields of a single document?
Similar question: Is it safe to reuse the same TokenStream object for
several documents if I use IndexWriter.addDocument() in a loop?  Or
does addDocument only put the work into a queue where tasks are taken
out for parallel indexing by several threads?
  Thanks,
  Harald.
--  
--- 
-
Harald Kirsch | [EMAIL PROTECTED] | +44 (0) 1223/49-2593
BioMed Information Extraction:  
http://www.ebi.ac.uk/Rebholz-srv/whatizit

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


reuse of TokenStream

2005-02-17 Thread Harald Kirsch
Hi,

is it thread safe to reuse the same TokenStream object for several
fields of a document or does the IndexWriter try to parallelise
tokenization of the fields of a single document?

Similar question: Is it safe to reuse the same TokenStream object for
several documents if I use IndexWriter.addDocument() in a loop?  Or
does addDocument only put the work into a queue where tasks are taken
out for parallel indexing by several threads?

  Thanks,
  Harald.

-- 

Harald Kirsch | [EMAIL PROTECTED] | +44 (0) 1223/49-2593
BioMed Information Extraction: http://www.ebi.ac.uk/Rebholz-srv/whatizit

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]