Re: List of files that Lucene 4.0 generates during indexing
The following files are originally created files (upon an initial indexing): _0.fdt _0.fdx _0.fnm _0.si _0_Lucene40_0.frq _0_Lucene40_0.prx _0_Lucene40_0.tim _0_Lucene40_0.tip _0_nrm.cfe _0_nrm.cfs index.v0008 segments.gen segments_1 But when I added a new document, in one case, I got several other files that got generated apart from the above: _0.fdt _0.fdx _0.fnm _0.si _0_Lucene40_0.frq _0_Lucene40_0.prx _0_Lucene40_0.tim _0_Lucene40_0.tip _0_nrm.cfe _0_nrm.cfs * _2.fdx // what is the significance of these _2 prefix files. _2.fnm _2.si _2_Lucene40_0.frq _2_Lucene40_0.prx _2_Lucene40_0.tim _2_Lucene40_0.tip _2_nrm.cfe _2_nrm.cfs* segments_3 Sometimes, it does create the _2 prefix files apart from incrementing the segement_N version. Could anyone please let me know why those files (_2 prefix files are there in the index directory) are generated in the first place and its importance/significance. I haven't seen it generated for other updates hence would like to understand the concept behind. Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4037530.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
Hi saisantoshi, Check out the documentation: http://lucene.apache.org/core/4_1_0/index.html - particularly the File Formats link under Reference Documents. Steve On Jan 24, 2013, at 11:41 AM, saisantoshi saisantosh...@gmail.com wrote: Is there any doc on how many files that lucene generates during indexing (with 4.0) and what are those files? Once we migrate to 4.0, we would need to validate looking at the index directory if the files that needs to be generated was created in the first place. It helps for debugging purposes. Can someone post a link to the doc for 4.0 generated files? Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
Thanks. I checked it out. Here are the list of files that has been generated: _0.fdt _0.fdx _0.fnm _0.si _0_Lucene40_0.frq _0_Lucene40_0.prx _0_Lucene40_0.tim _0_Lucene40_0.tip _0_nrm.cfe _0_nrm.cfs index.v0008 segments.gen segments_1 My question is, are the above files are the right set of files that needs to be generated? I just want to make sure if there is a check list to verify what files must be there or any files that are missing from the above. I am looking to validate the above. The docs says various number of different file formats but does not explicit mention the necessary files for 4.0. Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036028.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
That looks correct, except I don't know what index.v0008 is. Mike McCandless http://blog.mikemccandless.com On Thu, Jan 24, 2013 at 1:22 PM, saisantoshi saisantosh...@gmail.comwrote: Thanks. I checked it out. Here are the list of files that has been generated: _0.fdt _0.fdx _0.fnm _0.si _0_Lucene40_0.frq _0_Lucene40_0.prx _0_Lucene40_0.tim _0_Lucene40_0.tip _0_nrm.cfe _0_nrm.cfs index.v0008 segments.gen segments_1 My question is, are the above files are the right set of files that needs to be generated? I just want to make sure if there is a check list to verify what files must be there or any files that are missing from the above. I am looking to validate the above. The docs says various number of different file formats but does not explicit mention the necessary files for 4.0. Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036028.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
Thanks Michael. The additional file in the list is just a typo. One more question is, we were using 2.4 before, and it only generated few files _0.cfs _0.cfx // segment files I am assuming that the 2.4 version has the compound index structure enabled by default. Do we need to set it explicitly with 4.0 version. Is it not enabled by default? 4.0 seems to be using the multifile index structure. Is there any change in the behavior with the latest version. Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036075.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
4.0 has a hybrid approach by default: big segments ( 10% of index size, by default) are non-compound-files and small segments are compound files. See TieredMergePolicy.setNoCFSRatio if you want to always use compound file format. Mike McCandless http://blog.mikemccandless.com On Thu, Jan 24, 2013 at 4:39 PM, saisantoshi saisantosh...@gmail.comwrote: Thanks Michael. The additional file in the list is just a typo. One more question is, we were using 2.4 before, and it only generated few files _0.cfs _0.cfx // segment files I am assuming that the 2.4 version has the compound index structure enabled by default. Do we need to set it explicitly with 4.0 version. Is it not enabled by default? 4.0 seems to be using the multifile index structure. Is there any change in the behavior with the latest version. Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036075.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
Thanks. Are there any best practices to follow here? or leave the the default ( which is hybrid approach as you mentioned). -- View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036086.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
I would leave the default until/unless something goes wrong ... Mike McCandless http://blog.mikemccandless.com On Thu, Jan 24, 2013 at 5:28 PM, saisantoshi saisantosh...@gmail.comwrote: Thanks. Are there any best practices to follow here? or leave the the default ( which is hybrid approach as you mentioned). -- View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036086.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
Thanks a lot. One last question, how do we set it? IndexWriter.??? Thanks, Ranjith. -- View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036091.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
You get/set the merge policy on IndexWriterConfig (which you pass to IndexWriter). And then you can set this CFS ratio via that merge policy. Mike McCandless http://blog.mikemccandless.com On Thu, Jan 24, 2013 at 5:35 PM, saisantoshi saisantosh...@gmail.comwrote: Thanks a lot. One last question, how do we set it? IndexWriter.??? Thanks, Ranjith. -- View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036091.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: List of files that Lucene 4.0 generates during indexing
Thanks. Could you please also comment on the following as well? http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-td4035806.html Thanks and really appreciate your help. Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036098.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org