Re: List of files that Lucene 4.0 generates during indexing

2013-01-30 Thread saisantoshi
The following files are originally created files (upon an initial indexing):

  _0.fdt
_0.fdx
_0.fnm
_0.si
_0_Lucene40_0.frq
_0_Lucene40_0.prx
_0_Lucene40_0.tim
_0_Lucene40_0.tip
_0_nrm.cfe
_0_nrm.cfs
index.v0008
segments.gen
segments_1


But when I added a new document, in one case, I got several other files that
got generated apart from the above:

 _0.fdt
  _0.fdx
_0.fnm
_0.si
_0_Lucene40_0.frq
_0_Lucene40_0.prx
_0_Lucene40_0.tim
_0_Lucene40_0.tip
_0_nrm.cfe
_0_nrm.cfs
 *   _2.fdx  // what is the significance of
these _2 prefix files.
_2.fnm
_2.si
_2_Lucene40_0.frq
_2_Lucene40_0.prx
_2_Lucene40_0.tim
_2_Lucene40_0.tip
_2_nrm.cfe
_2_nrm.cfs*
 segments_3


Sometimes, it does create the _2 prefix files apart from incrementing the
segement_N version. Could anyone please let me know why those files (_2
prefix files are there in the index directory) are generated in the first
place and its importance/significance.

I haven't seen it generated for other updates hence would like to understand
the concept behind.

Thanks,
Sai.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4037530.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread Steve Rowe
Hi saisantoshi,

Check out the documentation: http://lucene.apache.org/core/4_1_0/index.html - 
particularly the File Formats link under Reference Documents.

Steve

On Jan 24, 2013, at 11:41 AM, saisantoshi saisantosh...@gmail.com wrote:

 Is there any doc on how many files that lucene generates during indexing
 (with 4.0) and what are those files? Once we migrate to 4.0, we would need
 to validate looking at the index directory if the files that needs to be
 generated was created in the first place. It helps for debugging purposes.
 Can someone post a link to the doc for 4.0 generated files?
 
 Thanks,
 Sai.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread saisantoshi
Thanks. I checked it out.

Here are the list of files that has been generated:

_0.fdt
_0.fdx
_0.fnm
_0.si
_0_Lucene40_0.frq
_0_Lucene40_0.prx
_0_Lucene40_0.tim
_0_Lucene40_0.tip
_0_nrm.cfe
_0_nrm.cfs
index.v0008
segments.gen
segments_1

My question is, are the above files are the right set of files that needs to
be generated? I just want to make sure if there is a check list to verify
what files must be there or any files that are missing from the above. I am
looking to validate the above. The docs says various number of different
file formats but does not explicit mention the necessary files for 4.0.

Thanks,
Sai.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036028.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread Michael McCandless
That looks correct, except I don't know what index.v0008 is.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 24, 2013 at 1:22 PM, saisantoshi saisantosh...@gmail.comwrote:

 Thanks. I checked it out.

 Here are the list of files that has been generated:

 _0.fdt
 _0.fdx
 _0.fnm
 _0.si
 _0_Lucene40_0.frq
 _0_Lucene40_0.prx
 _0_Lucene40_0.tim
 _0_Lucene40_0.tip
 _0_nrm.cfe
 _0_nrm.cfs
 index.v0008
 segments.gen
 segments_1

 My question is, are the above files are the right set of files that needs
 to
 be generated? I just want to make sure if there is a check list to verify
 what files must be there or any files that are missing from the above. I am
 looking to validate the above. The docs says various number of different
 file formats but does not explicit mention the necessary files for 4.0.

 Thanks,
 Sai.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036028.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread saisantoshi
Thanks Michael. The additional file in the list is just a typo.

One more question is, we were using 2.4 before, and it only generated few
files

_0.cfs
_0.cfx
// segment files

I am assuming that the 2.4 version has the compound index structure enabled
by default. Do we need to set it explicitly with 4.0 version. Is it not
enabled by default? 4.0 seems to be using the multifile index structure. Is
there any change in the behavior with the latest version.

Thanks,
Sai.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036075.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread Michael McCandless
4.0 has a hybrid approach by default: big segments ( 10% of index size,
by default) are non-compound-files and small segments are compound files.

See TieredMergePolicy.setNoCFSRatio if you want to always use compound file
format.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 24, 2013 at 4:39 PM, saisantoshi saisantosh...@gmail.comwrote:

 Thanks Michael. The additional file in the list is just a typo.

 One more question is, we were using 2.4 before, and it only generated few
 files

 _0.cfs
 _0.cfx
 // segment files

 I am assuming that the 2.4 version has the compound index structure enabled
 by default. Do we need to set it explicitly with 4.0 version. Is it not
 enabled by default? 4.0 seems to be using the multifile index structure. Is
 there any change in the behavior with the latest version.

 Thanks,
 Sai.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036075.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread saisantoshi
Thanks. Are there any best practices to follow here? or leave the the default
( which is hybrid approach as you mentioned).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036086.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread Michael McCandless
I would leave the default until/unless something goes wrong ...

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jan 24, 2013 at 5:28 PM, saisantoshi saisantosh...@gmail.comwrote:

 Thanks. Are there any best practices to follow here? or leave the the
 default
 ( which is hybrid approach as you mentioned).



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036086.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread saisantoshi
Thanks a lot. One last question, how do we set it? IndexWriter.???

Thanks,
Ranjith.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036091.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread Michael McCandless
You get/set the merge policy on IndexWriterConfig (which you pass to
IndexWriter).

And then you can set this CFS ratio via that merge policy.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 24, 2013 at 5:35 PM, saisantoshi saisantosh...@gmail.comwrote:

 Thanks a lot. One last question, how do we set it? IndexWriter.???

 Thanks,
 Ranjith.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036091.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread saisantoshi
Thanks. Could you please also comment on the following as well?

http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-td4035806.html

Thanks and really appreciate your help.

Thanks,
Sai.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036098.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org