subject:"difficulties for me to understand the index chain"

difficulties for me to understand the index chain

2010-12-26 Thread xu cheng

Hi all:
I'm new to lucene dev. these days I'm reading the lucene source code. and
now there are some difficulties for me to understand the index chain.
I could not understand the complex relationship between the classes!
for example:
I could not understand the relations between these classes:
 DocFieldConsumerPerThread, DocFieldConsumerPerField, DocInvertedPerThread,
DocInverterPerThread..

btw, what 's the advantage of using such a design, the so called index
chain??

Is there any docs about this??

any suggestion or references are appreciated! thanks

regards.

Re: difficulties for me to understand the index chain

2010-12-26 Thread Li Li

I am also interested in this question.
And my understanding may be wrong.


2010/12/27 xu cheng xcheng@gmail.com:
 Hi all:
 I'm new to lucene dev. these days I'm reading the lucene source code. and
 now there are some difficulties for me to understand the index chain.
 I could not understand the complex relationship between the classes!
 for example:
 I could not understand the relations between these classes:
  DocFieldConsumerPerThread, DocFieldConsumerPerField, DocInvertedPerThread,
 DocInverterPerThread..
  because segments often have the same fields, so PerField is used
to share common things.
  To support multithreads indexing, PerThread class is used.

   See codes in DocumentsWriter

  static final IndexingChain DefaultIndexingChain = new IndexingChain() {

DocConsumer getChain(DocumentsWriter documentsWriter) {
  /*
  This is the current indexing chain:

  DocConsumer / DocConsumerPerThread
-- code: DocFieldProcessor / DocFieldProcessorPerThread
  -- DocFieldConsumer / DocFieldConsumerPerThread /
DocFieldConsumerPerField
-- code: DocFieldConsumers / DocFieldConsumersPerThread /
DocFieldConsumersPerField
  -- code: DocInverter / DocInverterPerThread / DocInverterPerField
-- InvertedDocConsumer / InvertedDocConsumerPerThread
/ InvertedDocConsumerPerField
  -- code: TermsHash / TermsHashPerThread / TermsHashPerField
-- TermsHashConsumer / TermsHashConsumerPerThread
/ TermsHashConsumerPerField
  -- code: FreqProxTermsWriter /
FreqProxTermsWriterPerThread / FreqProxTermsWriterPerField
  -- code: TermVectorsTermsWriter /
TermVectorsTermsWriterPerThread / TermVectorsTermsWriterPerField
-- InvertedDocEndConsumer /
InvertedDocConsumerPerThread / InvertedDocConsumerPerField
  -- code: NormsWriter / NormsWriterPerThread /
NormsWriterPerField
  -- code: StoredFieldsWriter /
StoredFieldsWriterPerThread / StoredFieldsWriterPerField
*/

// Build up indexing chain:

  final TermsHashConsumer termVectorsWriter = new
TermVectorsTermsWriter(documentsWriter);
  final TermsHashConsumer freqProxWriter = new FreqProxTermsWriter();

  final InvertedDocConsumer  termsHash = new
TermsHash(documentsWriter, true, freqProxWriter,
   new
TermsHash(documentsWriter, false, termVectorsWriter, null));
  final NormsWriter normsWriter = new NormsWriter();
  final DocInverter docInverter = new DocInverter(termsHash, normsWriter);
  return new DocFieldProcessor(documentsWriter, docInverter);
}
  };
 btw, what 's the advantage of using such a design, the so called index
 chain??
  I think because older version of lucene only support single
thread indexing and to reuse existed codes, they designed such a
architecture.
 Is there any docs about this??
  If you can read Chinese, you may find some useful articles here:
http://forfuture1978.javaeye.com/
  But I think read codes are very helpful.
 any suggestion or references are appreciated! thanks
 regards.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: difficulties for me to understand the index chain

2010-12-26 Thread xu cheng

hi Li Li
thanks for your answer very much!!!

 To support multithreads indexing, PerThread class is used.
multithreads to do what? each thread for processing per file, or each thread
for processing per field or something else??

regards


2010/12/27 Li Li fancye...@gmail.com

 I am also interested in this question.
 And my understanding may be wrong.


 2010/12/27 xu cheng xcheng@gmail.com:
  Hi all:
  I'm new to lucene dev. these days I'm reading the lucene source code. and
  now there are some difficulties for me to understand the index chain.
  I could not understand the complex relationship between the classes!
  for example:
  I could not understand the relations between these classes:
   DocFieldConsumerPerThread, DocFieldConsumerPerField,
 DocInvertedPerThread,
  DocInverterPerThread..
   because segments often have the same fields, so PerField is used
 to share common things.
  To support multithreads indexing, PerThread class is used.

   See codes in DocumentsWriter

  static final IndexingChain DefaultIndexingChain = new IndexingChain() {

DocConsumer getChain(DocumentsWriter documentsWriter) {
  /*
  This is the current indexing chain:

  DocConsumer / DocConsumerPerThread
-- code: DocFieldProcessor / DocFieldProcessorPerThread
  -- DocFieldConsumer / DocFieldConsumerPerThread /
 DocFieldConsumerPerField
-- code: DocFieldConsumers / DocFieldConsumersPerThread /
 DocFieldConsumersPerField
  -- code: DocInverter / DocInverterPerThread /
 DocInverterPerField
-- InvertedDocConsumer / InvertedDocConsumerPerThread
 / InvertedDocConsumerPerField
  -- code: TermsHash / TermsHashPerThread /
 TermsHashPerField
-- TermsHashConsumer / TermsHashConsumerPerThread
 / TermsHashConsumerPerField
  -- code: FreqProxTermsWriter /
 FreqProxTermsWriterPerThread / FreqProxTermsWriterPerField
  -- code: TermVectorsTermsWriter /
 TermVectorsTermsWriterPerThread / TermVectorsTermsWriterPerField
-- InvertedDocEndConsumer /
 InvertedDocConsumerPerThread / InvertedDocConsumerPerField
  -- code: NormsWriter / NormsWriterPerThread /
 NormsWriterPerField
  -- code: StoredFieldsWriter /
 StoredFieldsWriterPerThread / StoredFieldsWriterPerField
*/

// Build up indexing chain:

  final TermsHashConsumer termVectorsWriter = new
 TermVectorsTermsWriter(documentsWriter);
  final TermsHashConsumer freqProxWriter = new FreqProxTermsWriter();

  final InvertedDocConsumer  termsHash = new
 TermsHash(documentsWriter, true, freqProxWriter,
   new
 TermsHash(documentsWriter, false, termVectorsWriter, null));
  final NormsWriter normsWriter = new NormsWriter();
  final DocInverter docInverter = new DocInverter(termsHash,
 normsWriter);
  return new DocFieldProcessor(documentsWriter, docInverter);
 }
  };
  btw, what 's the advantage of using such a design, the so called index
  chain??
   I think because older version of lucene only support single
 thread indexing and to reuse existed codes, they designed such a
 architecture.
  Is there any docs about this??
   If you can read Chinese, you may find some useful articles here:
 http://forfuture1978.javaeye.com/
  But I think read codes are very helpful.
  any suggestion or references are appreciated! thanks
  regards.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

difficulties for me to understand the index chain

Re: difficulties for me to understand the index chain

Re: difficulties for me to understand the index chain

3 matches

Site Navigation

Mail list logo

Footer information