difficulties for me to understand the index chain
Hi all: I'm new to lucene dev. these days I'm reading the lucene source code. and now there are some difficulties for me to understand the index chain. I could not understand the complex relationship between the classes! for example: I could not understand the relations between these classes: DocFieldConsumerPerThread, DocFieldConsumerPerField, DocInvertedPerThread, DocInverterPerThread.. btw, what 's the advantage of using such a design, the so called index chain?? Is there any docs about this?? any suggestion or references are appreciated! thanks regards.
Re: difficulties for me to understand the index chain
I am also interested in this question. And my understanding may be wrong. 2010/12/27 xu cheng xcheng@gmail.com: Hi all: I'm new to lucene dev. these days I'm reading the lucene source code. and now there are some difficulties for me to understand the index chain. I could not understand the complex relationship between the classes! for example: I could not understand the relations between these classes: DocFieldConsumerPerThread, DocFieldConsumerPerField, DocInvertedPerThread, DocInverterPerThread.. because segments often have the same fields, so PerField is used to share common things. To support multithreads indexing, PerThread class is used. See codes in DocumentsWriter static final IndexingChain DefaultIndexingChain = new IndexingChain() { DocConsumer getChain(DocumentsWriter documentsWriter) { /* This is the current indexing chain: DocConsumer / DocConsumerPerThread -- code: DocFieldProcessor / DocFieldProcessorPerThread -- DocFieldConsumer / DocFieldConsumerPerThread / DocFieldConsumerPerField -- code: DocFieldConsumers / DocFieldConsumersPerThread / DocFieldConsumersPerField -- code: DocInverter / DocInverterPerThread / DocInverterPerField -- InvertedDocConsumer / InvertedDocConsumerPerThread / InvertedDocConsumerPerField -- code: TermsHash / TermsHashPerThread / TermsHashPerField -- TermsHashConsumer / TermsHashConsumerPerThread / TermsHashConsumerPerField -- code: FreqProxTermsWriter / FreqProxTermsWriterPerThread / FreqProxTermsWriterPerField -- code: TermVectorsTermsWriter / TermVectorsTermsWriterPerThread / TermVectorsTermsWriterPerField -- InvertedDocEndConsumer / InvertedDocConsumerPerThread / InvertedDocConsumerPerField -- code: NormsWriter / NormsWriterPerThread / NormsWriterPerField -- code: StoredFieldsWriter / StoredFieldsWriterPerThread / StoredFieldsWriterPerField */ // Build up indexing chain: final TermsHashConsumer termVectorsWriter = new TermVectorsTermsWriter(documentsWriter); final TermsHashConsumer freqProxWriter = new FreqProxTermsWriter(); final InvertedDocConsumer termsHash = new TermsHash(documentsWriter, true, freqProxWriter, new TermsHash(documentsWriter, false, termVectorsWriter, null)); final NormsWriter normsWriter = new NormsWriter(); final DocInverter docInverter = new DocInverter(termsHash, normsWriter); return new DocFieldProcessor(documentsWriter, docInverter); } }; btw, what 's the advantage of using such a design, the so called index chain?? I think because older version of lucene only support single thread indexing and to reuse existed codes, they designed such a architecture. Is there any docs about this?? If you can read Chinese, you may find some useful articles here: http://forfuture1978.javaeye.com/ But I think read codes are very helpful. any suggestion or references are appreciated! thanks regards. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: difficulties for me to understand the index chain
hi Li Li thanks for your answer very much!!! To support multithreads indexing, PerThread class is used. multithreads to do what? each thread for processing per file, or each thread for processing per field or something else?? regards 2010/12/27 Li Li fancye...@gmail.com I am also interested in this question. And my understanding may be wrong. 2010/12/27 xu cheng xcheng@gmail.com: Hi all: I'm new to lucene dev. these days I'm reading the lucene source code. and now there are some difficulties for me to understand the index chain. I could not understand the complex relationship between the classes! for example: I could not understand the relations between these classes: DocFieldConsumerPerThread, DocFieldConsumerPerField, DocInvertedPerThread, DocInverterPerThread.. because segments often have the same fields, so PerField is used to share common things. To support multithreads indexing, PerThread class is used. See codes in DocumentsWriter static final IndexingChain DefaultIndexingChain = new IndexingChain() { DocConsumer getChain(DocumentsWriter documentsWriter) { /* This is the current indexing chain: DocConsumer / DocConsumerPerThread -- code: DocFieldProcessor / DocFieldProcessorPerThread -- DocFieldConsumer / DocFieldConsumerPerThread / DocFieldConsumerPerField -- code: DocFieldConsumers / DocFieldConsumersPerThread / DocFieldConsumersPerField -- code: DocInverter / DocInverterPerThread / DocInverterPerField -- InvertedDocConsumer / InvertedDocConsumerPerThread / InvertedDocConsumerPerField -- code: TermsHash / TermsHashPerThread / TermsHashPerField -- TermsHashConsumer / TermsHashConsumerPerThread / TermsHashConsumerPerField -- code: FreqProxTermsWriter / FreqProxTermsWriterPerThread / FreqProxTermsWriterPerField -- code: TermVectorsTermsWriter / TermVectorsTermsWriterPerThread / TermVectorsTermsWriterPerField -- InvertedDocEndConsumer / InvertedDocConsumerPerThread / InvertedDocConsumerPerField -- code: NormsWriter / NormsWriterPerThread / NormsWriterPerField -- code: StoredFieldsWriter / StoredFieldsWriterPerThread / StoredFieldsWriterPerField */ // Build up indexing chain: final TermsHashConsumer termVectorsWriter = new TermVectorsTermsWriter(documentsWriter); final TermsHashConsumer freqProxWriter = new FreqProxTermsWriter(); final InvertedDocConsumer termsHash = new TermsHash(documentsWriter, true, freqProxWriter, new TermsHash(documentsWriter, false, termVectorsWriter, null)); final NormsWriter normsWriter = new NormsWriter(); final DocInverter docInverter = new DocInverter(termsHash, normsWriter); return new DocFieldProcessor(documentsWriter, docInverter); } }; btw, what 's the advantage of using such a design, the so called index chain?? I think because older version of lucene only support single thread indexing and to reuse existed codes, they designed such a architecture. Is there any docs about this?? If you can read Chinese, you may find some useful articles here: http://forfuture1978.javaeye.com/ But I think read codes are very helpful. any suggestion or references are appreciated! thanks regards. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org