Re: MergerFactor and MaxMergerDocs effecting num of segments created

2011-07-05 Thread Shawn Heisey

On 7/4/2011 12:51 AM, Romi wrote:

Shawn when i reindex data using full-import i got:
*_0.fdt 3310
_0.fdx  23
_0.frq  857
_0.nrm  31
_0.prx  1748
_0.tis  350
_1.fdt  3310
_1.fdx  23
_1.fnm  1
_1.frq  857
_1.nrm  31
_1.prx  1748
_1.tii  5
_1.tis  350
segments.gen1
segments_3  1*

Where all  _1  marked as archived(A)

And when i run again full import(for testing ) i got _1 and 2_ files where
all 2_ marked as archive. What does it mean.
and the problem i am not getting is while i am doing full import which
deletes the old indexes and creates new than why i m getting the old one
again.


By mentioning the Archive bit, it sounds like you are running on 
Windows.  I've only run it on Linux, but I understand from reading 
messages on this list that there are a lot of problems on Windows with 
deleting old files whenever you do anything that results in old segments 
going away -- reindex, optimize, replication, normal segment merging, 
etc.  The current solr version is 3.3, previous versions are 3.2, 3.1, 
then 1.4.1.  Others will have to comment about whether things have 
improved in more recent releases.


The archive bit is simply a DOS/Windows attribute that says this file 
needs to be backed up.  When you create or modify a file in a normal 
way, it is turned on.  Normally the only thing that turns that bit off 
is backup software, but Solr might be programmed to clear it on files 
that are no longer needed, in case the delete fails, so there's a way to 
detect that they should not be backed up.  I don't know if this is 
right, it's just speculation.


Thanks,
Shawn



Re: MergerFactor and MaxMergerDocs effecting num of segments created

2011-07-04 Thread Romi
Shawn when i reindex data using full-import i got:
*_0.fdt 3310
_0.fdx  23
_0.frq  857
_0.nrm  31
_0.prx  1748
_0.tis  350
_1.fdt  3310
_1.fdx  23
_1.fnm  1
_1.frq  857
_1.nrm  31
_1.prx  1748
_1.tii  5
_1.tis  350
segments.gen1
segments_3  1*

Where all  _1  marked as archived(A)

And when i run again full import(for testing ) i got _1 and 2_ files where
all 2_ marked as archive. What does it mean.
and the problem i am not getting is while i am doing full import which
deletes the old indexes and creates new than why i m getting the old one
again.




-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/MergerFactor-and-MaxMergerDocs-effecting-num-of-segments-created-tp3128897p3136664.html
Sent from the Solr - User mailing list archive at Nabble.com.


MergerFactor and MaxMergerDocs effecting num of segments created

2011-07-01 Thread Romi
My indexes are these, i want to see the effect of merge factor and maxmerge
docs. on These indexes how can i do it.
*
_0.fdt  3310 KB
_0.fdx  23 KB
_0.fnm  1 KB
_0.frq  857 KB
_0.nrm  31 KB
_0.prx  1748 KB
_0.tii  5 KB
_0.tis  350 Kb*

I mean what test cases for mergefactor and maxmergedoc i can run to see the
effect on indexed files. current configuration is:
*
mergeFactor2/mergeFactor
 maxMergeDocs10/maxMergeDocs*



-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/MergerFactor-and-MaxMergerDocs-effecting-num-of-segments-created-tp3128897p3128897.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MergerFactor and MaxMergerDocs effecting num of segments created

2011-07-01 Thread Shawn Heisey

On 7/1/2011 4:43 AM, Romi wrote:

My indexes are these, i want to see the effect of merge factor and maxmerge
docs. on These indexes how can i do it.
*
_0.fdt  3310 KB
_0.fdx  23 KB
_0.fnm  1 KB
_0.frq  857 KB
_0.nrm  31 KB
_0.prx  1748 KB
_0.tii  5 KB
_0.tis  350 Kb*

I mean what test cases for mergefactor and maxmergedoc i can run to see the
effect on indexed files. current configuration is:
*
mergeFactor2/mergeFactor
  maxMergeDocs10/maxMergeDocs*


That is a single index segment, and as it's the initial segment (_0), no 
optimization or merging has taken place.  Further segments would have 
the same file extensions with prefixes like _1, _2, etc.  Once you 
reached _z, the next segment would be _10.


Your index is very small, so small that it only needs one segment when 
it is built all at once.  If you were to add new documents to the index 
(rather than do a full reindex), those new documents would go into a new 
segment.  If you continue to add segments in this way, this is when 
mergeFactor comes into play -- when the number of original segments 
reaches this value, they are merged into a single larger segment.  When 
this continues and you have enough merged segments, they are merged into 
an even larger segment.  I believe that a mergeFactor of 2 is special, 
designed to keep a large starting segment untouched while merging all 
the rest, but I have not confirmed that myself.


I don't know why maxMergeDocs is not taking effect.  It could be that 
during initial indexing, other factors (like ramBufferSizeMB) are 
involved, and maxMergeDocs only takes effect when merging existing segments.


For comparison purposes, here are the first three segments from one of 
my indexes:


-rw-r--r-- 1 ncindex ncindex 6323043528 Jun 30 00:57 _lf.fdt
-rw-r--r-- 1 ncindex ncindex   75766484 Jun 30 00:57 _lf.fdx
-rw-r--r-- 1 ncindex ncindex382 Jun 30 00:55 _lf.fnm
-rw-r--r-- 1 ncindex ncindex 2833619259 Jun 30 01:04 _lf.frq
-rw-r--r-- 1 ncindex ncindex   28412434 Jun 30 01:05 _lf.nrm
-rw-r--r-- 1 ncindex ncindex1183860 Jun 30 15:41 _lf_o.del
-rw-r--r-- 1 ncindex ncindex 2455819068 Jun 30 01:04 _lf.prx
-rw-r--r-- 1 ncindex ncindex   23759599 Jun 30 01:04 _lf.tii
-rw-r--r-- 1 ncindex ncindex  926422435 Jun 30 01:04 _lf.tis
-rw-r--r-- 1 ncindex ncindex   18940740 Jun 30 01:06 _lf.tvd
-rw-r--r-- 1 ncindex ncindex 5883186438 Jun 30 01:06 _lf.tvf
-rw-r--r-- 1 ncindex ncindex  151532964 Jun 30 01:06 _lf.tvx
-rw-r--r-- 1 ncindex ncindex  868769283 Jul  1 09:07 _mf.fdt
-rw-r--r-- 1 ncindex ncindex   11279356 Jul  1 09:07 _mf.fdx
-rw-r--r-- 1 ncindex ncindex372 Jul  1 09:06 _mf.fnm
-rw-r--r-- 1 ncindex ncindex  347906214 Jul  1 09:08 _mf.frq
-rw-r--r-- 1 ncindex ncindex4229761 Jul  1 09:08 _mf.nrm
-rw-r--r-- 1 ncindex ncindex  284701250 Jul  1 09:08 _mf.prx
-rw-r--r-- 1 ncindex ncindex 960052 Jul  1 09:08 _mf.tii
-rw-r--r-- 1 ncindex ncindex  141775812 Jul  1 09:08 _mf.tis
-rw-r--r-- 1 ncindex ncindex2818958 Jul  1 09:08 _mf.tvd
-rw-r--r-- 1 ncindex ncindex  735319599 Jul  1 09:08 _mf.tvf
-rw-r--r-- 1 ncindex ncindex   22558708 Jul  1 09:08 _mf.tvx
-rw-r--r-- 1 ncindex ncindex   30888748 Jul  1 09:07 _mg.fdt
-rw-r--r-- 1 ncindex ncindex 385700 Jul  1 09:07 _mg.fdx
-rw-r--r-- 1 ncindex ncindex372 Jul  1 09:07 _mg.fnm
-rw-r--r-- 1 ncindex ncindex   13709508 Jul  1 09:07 _mg.frq
-rw-r--r-- 1 ncindex ncindex 144640 Jul  1 09:07 _mg.nrm
-rw-r--r-- 1 ncindex ncindex   12683152 Jul  1 09:07 _mg.prx
-rw-r--r-- 1 ncindex ncindex  51848 Jul  1 09:07 _mg.tii
-rw-r--r-- 1 ncindex ncindex7409698 Jul  1 09:07 _mg.tis
-rw-r--r-- 1 ncindex ncindex  96428 Jul  1 09:07 _mg.tvd
-rw-r--r-- 1 ncindex ncindex   31790084 Jul  1 09:07 _mg.tvf
-rw-r--r-- 1 ncindex ncindex 771396 Jul  1 09:07 _mg.tvx

Shawn