Re: unstable results on refresh

2014-10-23 Thread Giovanni Bricconi
My user interface shows some boxes to describe results categories. After
half a day of small updates and delete I noticed with various queries that
the boxes started swapping while browsing.
For sure I relied too much in getting the same results on each call, now
I'm keeping the categories order in request parameters to avoid the blink
effect while browsing.

The optimize process is really slow, and I can't use it. Since I have many
other parameters that should be carried along the request to make sure that
the navigation is consistent, I would like to understand if is there a
setup that can limit the idf change and keep it low enough

I tried with

indexConfig

mergeFactor5/mergeFactor

/indexConfig
In solrconfig but this morning /solr/admin/cores?action=STATUS still
reports a number of segments above ten for all cores of the shard. (I'm
sure I have reloaded each core after changing the value)

Now I'm trying with expungeDeletes called from solrj, but still I don't see
the segment count decrease

UpdateRequest commitRequest = new UpdateRequest();

  commitRequest.setAction

//(action, waitFlush, waitSearcher, maxSegments, softCommit, expungeDeletes)

   ( ACTION.COMMIT, true, true, 10, false, true);

  commitRequest.process(solrServer);



2014-10-22 15:48 GMT+02:00 Erick Erickson erickerick...@gmail.com:

 I would rather ask whether such small differences matter enough to
 do this. Is this something users will _ever_ notice? Optimization
 is quite a heavyweight operation, and is generally not recommended
 on indexes that change often, and 5 minutes is certainly below
 the recommendation for optimizing.

 There is/has been work done on distributed IDF, but I don't quite
 know the current status that should address this (I think).

 But other than in a test setup, is it worth the effort?

 Best,
 Erick

 On Wed, Oct 22, 2014 at 3:54 AM, Giovanni Bricconi
 giovanni.bricc...@banzai.it wrote:
  I have made some small patch to the application to make this problem less
  visible, and I'm trying to perform the optimize once per hour, yesterday
 it
  took 5 minutes to perform it, this morning 15 minutes. Today I will
 collect
  some statistics but the publication process sends documents every 5
  minutes, and I think the optimize is taking too much time.
 
  I have no default mergeFactor configured for this collection, do you
 think
  that setting it to a small value could improve the situation? If I have
  understood well having to merge segments will keep similar stats on all
  nodes. It's ok to have the indexing process a little bit slower.
 
 
  2014-10-21 18:44 GMT+02:00 Erick Erickson erickerick...@gmail.com:
 
  Giovanni:
 
  To see how this happens, consider a shard with a leader and two
  followers. Assume your autocommit interval is 60 seconds on each.
 
  This interval can expire at slightly different wall clock times.
  Even if the servers started perfectly in synch, they can get slightly
  out of sync. So, you index a bunch of docs and these replicas close
  the current segment and re-open a new segment with slightly different
  contents.
 
  Now docs come in that replace older docs. The tf/idf statistics
  _include_ deleted document data (which is purged on optimize). Given
  that doc X an be in different segments (or, more accurately, segments
  that get merged at different times on different machines), replica 1
  may have slightly different stats than replica 2, thus computing
  slightly different scores.
 
  Optimizing purges all data related to deleted documents, so it all
  regularizes itself on optimize.
 
  Best,
  Erick
 
  On Tue, Oct 21, 2014 at 11:08 AM, Giovanni Bricconi
  giovanni.bricc...@banzai.it wrote:
   I noticed again the problem, now I was able to collect some data. in
 my
   paste http://pastebin.com/nVwf327c you can see the result of the same
  query
   issued twice, the 2nd and 3rd group are swapped.
  
   I pasted also the clusterstate and the core state for each core.
  
   The logs did'n show any problem related to indexing, only some
 malformed
   query.
  
   After doing an optimize the problem disappeared.
  
   So, is the problem related to documents that where deleted from the
  index?
  
   The optimization took 5 minutes to complete
  
   2014-10-21 11:41 GMT+02:00 Giovanni Bricconi 
  giovanni.bricc...@banzai.it:
  
   Nice!
   I will monitor the index and try this if the problem comes back.
   Actually the problem was due to small differences in score, so I
 think
  the
   problem has the same origin
  
   2014-10-21 8:10 GMT+02:00 lboutros boutr...@gmail.com:
  
   Hi Giovanni,
  
   we had this problem as well.
   The cause was that the different nodes have slightly different idf
  values.
  
   We solved this problem by doing an optimize operation which really
  remove
   suppressed data.
  
   Ludovic.
  
  
  
   -
   Jouve
   France.
   --
   View this message in context:
  
 
 http://lucene.472066.n3.nabble.com/unstable-results-on-refresh

Re: unstable results on refresh

2014-10-23 Thread Shawn Heisey
On 10/23/2014 2:44 AM, Giovanni Bricconi wrote:
 My user interface shows some boxes to describe results categories. After
 half a day of small updates and delete I noticed with various queries that
 the boxes started swapping while browsing.
 For sure I relied too much in getting the same results on each call, now
 I'm keeping the categories order in request parameters to avoid the blink
 effect while browsing.
 
 The optimize process is really slow, and I can't use it. Since I have many
 other parameters that should be carried along the request to make sure that
 the navigation is consistent, I would like to understand if is there a
 setup that can limit the idf change and keep it low enough
 
 I tried with
 
 indexConfig
 
 mergeFactor5/mergeFactor
 
 /indexConfig
 In solrconfig but this morning /solr/admin/cores?action=STATUS still
 reports a number of segments above ten for all cores of the shard. (I'm
 sure I have reloaded each core after changing the value)
 
 Now I'm trying with expungeDeletes called from solrj, but still I don't see
 the segment count decrease

It's completely normal to have more segments than the mergeFactor.
Think about this scenario with a mergeFactor of 5:

You index five segments.  They get merged to one segment.  Let's say
that this happens a total of four times, so you've indexed a total of 20
segments and merging has reduced that to four larger segments.  Let's
say that you now index four more segments.  You'll be completely stable
with eight segments.  If you index another one, that will result in a
fifth larger segment.  This sets conditions up just right for another
merge -- to one even larger segment.  This represents three levels of
merging, and there can be even more levels, each of which can have four
segments and remain stable.  Starting at the last state I described, if
you then indexed 24 more segments, you'd have a stable index with a
total of nine segments - four of them would be normal sized, four of
them would be about five times normal size, and the first one would be
about 25 times normal size.

The Solr default for the merge policy in all recent versions is
TieredMergePolicy, and this can make things slightly more complicated
than I've described, because it can merge *any* segments, not just those
indexed sequentially, and I believe that it can delay merging until the
right number of segments with suitable characteristics appear.

I've got merge settings equivalent to a mergeFactor of 35, but I
regularly see the segment count approach 100, and there's absolutely
nothing wrong with my merging.

If I understand it correctly, expungeDeletes will not decrease the
segment count.  It will simply rewrite segments that have deleted
documents so there are none.  I'm not 100% sure that I know exactly what
expungeDeletes does, though.

Thanks,
Shawn



Re: unstable results on refresh

2014-10-22 Thread Giovanni Bricconi
I have made some small patch to the application to make this problem less
visible, and I'm trying to perform the optimize once per hour, yesterday it
took 5 minutes to perform it, this morning 15 minutes. Today I will collect
some statistics but the publication process sends documents every 5
minutes, and I think the optimize is taking too much time.

I have no default mergeFactor configured for this collection, do you think
that setting it to a small value could improve the situation? If I have
understood well having to merge segments will keep similar stats on all
nodes. It's ok to have the indexing process a little bit slower.


2014-10-21 18:44 GMT+02:00 Erick Erickson erickerick...@gmail.com:

 Giovanni:

 To see how this happens, consider a shard with a leader and two
 followers. Assume your autocommit interval is 60 seconds on each.

 This interval can expire at slightly different wall clock times.
 Even if the servers started perfectly in synch, they can get slightly
 out of sync. So, you index a bunch of docs and these replicas close
 the current segment and re-open a new segment with slightly different
 contents.

 Now docs come in that replace older docs. The tf/idf statistics
 _include_ deleted document data (which is purged on optimize). Given
 that doc X an be in different segments (or, more accurately, segments
 that get merged at different times on different machines), replica 1
 may have slightly different stats than replica 2, thus computing
 slightly different scores.

 Optimizing purges all data related to deleted documents, so it all
 regularizes itself on optimize.

 Best,
 Erick

 On Tue, Oct 21, 2014 at 11:08 AM, Giovanni Bricconi
 giovanni.bricc...@banzai.it wrote:
  I noticed again the problem, now I was able to collect some data. in my
  paste http://pastebin.com/nVwf327c you can see the result of the same
 query
  issued twice, the 2nd and 3rd group are swapped.
 
  I pasted also the clusterstate and the core state for each core.
 
  The logs did'n show any problem related to indexing, only some malformed
  query.
 
  After doing an optimize the problem disappeared.
 
  So, is the problem related to documents that where deleted from the
 index?
 
  The optimization took 5 minutes to complete
 
  2014-10-21 11:41 GMT+02:00 Giovanni Bricconi 
 giovanni.bricc...@banzai.it:
 
  Nice!
  I will monitor the index and try this if the problem comes back.
  Actually the problem was due to small differences in score, so I think
 the
  problem has the same origin
 
  2014-10-21 8:10 GMT+02:00 lboutros boutr...@gmail.com:
 
  Hi Giovanni,
 
  we had this problem as well.
  The cause was that the different nodes have slightly different idf
 values.
 
  We solved this problem by doing an optimize operation which really
 remove
  suppressed data.
 
  Ludovic.
 
 
 
  -
  Jouve
  France.
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/unstable-results-on-refresh-tp4164913p4165086.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 



Re: unstable results on refresh

2014-10-22 Thread Erick Erickson
I would rather ask whether such small differences matter enough to
do this. Is this something users will _ever_ notice? Optimization
is quite a heavyweight operation, and is generally not recommended
on indexes that change often, and 5 minutes is certainly below
the recommendation for optimizing.

There is/has been work done on distributed IDF, but I don't quite
know the current status that should address this (I think).

But other than in a test setup, is it worth the effort?

Best,
Erick

On Wed, Oct 22, 2014 at 3:54 AM, Giovanni Bricconi
giovanni.bricc...@banzai.it wrote:
 I have made some small patch to the application to make this problem less
 visible, and I'm trying to perform the optimize once per hour, yesterday it
 took 5 minutes to perform it, this morning 15 minutes. Today I will collect
 some statistics but the publication process sends documents every 5
 minutes, and I think the optimize is taking too much time.

 I have no default mergeFactor configured for this collection, do you think
 that setting it to a small value could improve the situation? If I have
 understood well having to merge segments will keep similar stats on all
 nodes. It's ok to have the indexing process a little bit slower.


 2014-10-21 18:44 GMT+02:00 Erick Erickson erickerick...@gmail.com:

 Giovanni:

 To see how this happens, consider a shard with a leader and two
 followers. Assume your autocommit interval is 60 seconds on each.

 This interval can expire at slightly different wall clock times.
 Even if the servers started perfectly in synch, they can get slightly
 out of sync. So, you index a bunch of docs and these replicas close
 the current segment and re-open a new segment with slightly different
 contents.

 Now docs come in that replace older docs. The tf/idf statistics
 _include_ deleted document data (which is purged on optimize). Given
 that doc X an be in different segments (or, more accurately, segments
 that get merged at different times on different machines), replica 1
 may have slightly different stats than replica 2, thus computing
 slightly different scores.

 Optimizing purges all data related to deleted documents, so it all
 regularizes itself on optimize.

 Best,
 Erick

 On Tue, Oct 21, 2014 at 11:08 AM, Giovanni Bricconi
 giovanni.bricc...@banzai.it wrote:
  I noticed again the problem, now I was able to collect some data. in my
  paste http://pastebin.com/nVwf327c you can see the result of the same
 query
  issued twice, the 2nd and 3rd group are swapped.
 
  I pasted also the clusterstate and the core state for each core.
 
  The logs did'n show any problem related to indexing, only some malformed
  query.
 
  After doing an optimize the problem disappeared.
 
  So, is the problem related to documents that where deleted from the
 index?
 
  The optimization took 5 minutes to complete
 
  2014-10-21 11:41 GMT+02:00 Giovanni Bricconi 
 giovanni.bricc...@banzai.it:
 
  Nice!
  I will monitor the index and try this if the problem comes back.
  Actually the problem was due to small differences in score, so I think
 the
  problem has the same origin
 
  2014-10-21 8:10 GMT+02:00 lboutros boutr...@gmail.com:
 
  Hi Giovanni,
 
  we had this problem as well.
  The cause was that the different nodes have slightly different idf
 values.
 
  We solved this problem by doing an optimize operation which really
 remove
  suppressed data.
 
  Ludovic.
 
 
 
  -
  Jouve
  France.
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/unstable-results-on-refresh-tp4164913p4165086.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 



Re: unstable results on refresh

2014-10-21 Thread lboutros
Hi Giovanni,

we had this problem as well.
The cause was that the different nodes have slightly different idf values.

We solved this problem by doing an optimize operation which really remove
suppressed data.

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/unstable-results-on-refresh-tp4164913p4165086.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: unstable results on refresh

2014-10-21 Thread Giovanni Bricconi
I noticed the problem looking at a group query, the groups returned where
sorted on the score field of the first result, and then showed to the user.
Repeating the same query I noticed that the order of two group started
switching

Thank you, I will look for the thread you said

2014-10-20 22:07 GMT+02:00 Alexandre Rafalovitch arafa...@gmail.com:

 What are the differences on. The document count or things like facets?
 This could be important.

 Also, I think there was a similar thread on the mailing list a week or
 two ago, might be worth looking for it.

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 20 October 2014 04:49, Giovanni Bricconi giovanni.bricc...@banzai.it
 wrote:
  Hello
 
  I have a procedure that sends small data changes during the day to a
  solrcloud cluster, version 4.8
 
  The cluster is made of three nodes, and three shards, each node contains
  two shards
 
  The procedure has been running for days; I don't know when but at some
  point one of the cores has gone out of synch and so repeating the same
  query has began to show small differences.
 
  The core graph was not useful, everything seemed active.
 
  I have solved the problem reindexing all, because the collection is quite
  small, but is there a way to fix this problem? Suppose I can figure out
  which core returns different results, is there a command to force that
 core
  to refetch the whole index from its master?
 
  Thanks
 
  Giovanni



Re: unstable results on refresh

2014-10-21 Thread Giovanni Bricconi
Nice!
I will monitor the index and try this if the problem comes back.
Actually the problem was due to small differences in score, so I think the
problem has the same origin

2014-10-21 8:10 GMT+02:00 lboutros boutr...@gmail.com:

 Hi Giovanni,

 we had this problem as well.
 The cause was that the different nodes have slightly different idf values.

 We solved this problem by doing an optimize operation which really remove
 suppressed data.

 Ludovic.



 -
 Jouve
 France.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/unstable-results-on-refresh-tp4164913p4165086.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: unstable results on refresh

2014-10-21 Thread Giovanni Bricconi
I noticed again the problem, now I was able to collect some data. in my
paste http://pastebin.com/nVwf327c you can see the result of the same query
issued twice, the 2nd and 3rd group are swapped.

I pasted also the clusterstate and the core state for each core.

The logs did'n show any problem related to indexing, only some malformed
query.

After doing an optimize the problem disappeared.

So, is the problem related to documents that where deleted from the index?

The optimization took 5 minutes to complete

2014-10-21 11:41 GMT+02:00 Giovanni Bricconi giovanni.bricc...@banzai.it:

 Nice!
 I will monitor the index and try this if the problem comes back.
 Actually the problem was due to small differences in score, so I think the
 problem has the same origin

 2014-10-21 8:10 GMT+02:00 lboutros boutr...@gmail.com:

 Hi Giovanni,

 we had this problem as well.
 The cause was that the different nodes have slightly different idf values.

 We solved this problem by doing an optimize operation which really remove
 suppressed data.

 Ludovic.



 -
 Jouve
 France.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/unstable-results-on-refresh-tp4164913p4165086.html
 Sent from the Solr - User mailing list archive at Nabble.com.





unstable results on refresh

2014-10-20 Thread Giovanni Bricconi
Hello

I have a procedure that sends small data changes during the day to a
solrcloud cluster, version 4.8

The cluster is made of three nodes, and three shards, each node contains
two shards

The procedure has been running for days; I don't know when but at some
point one of the cores has gone out of synch and so repeating the same
query has began to show small differences.

The core graph was not useful, everything seemed active.

I have solved the problem reindexing all, because the collection is quite
small, but is there a way to fix this problem? Suppose I can figure out
which core returns different results, is there a command to force that core
to refetch the whole index from its master?

Thanks

Giovanni


Re: unstable results on refresh

2014-10-20 Thread Ramzi Alqrainy
Can you please provide us the exception when the shard goes out of sync ?
Please monitor the logs. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/unstable-results-on-refresh-tp4164913p4165002.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: unstable results on refresh

2014-10-20 Thread Alexandre Rafalovitch
What are the differences on. The document count or things like facets?
This could be important.

Also, I think there was a similar thread on the mailing list a week or
two ago, might be worth looking for it.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 20 October 2014 04:49, Giovanni Bricconi giovanni.bricc...@banzai.it wrote:
 Hello

 I have a procedure that sends small data changes during the day to a
 solrcloud cluster, version 4.8

 The cluster is made of three nodes, and three shards, each node contains
 two shards

 The procedure has been running for days; I don't know when but at some
 point one of the cores has gone out of synch and so repeating the same
 query has began to show small differences.

 The core graph was not useful, everything seemed active.

 I have solved the problem reindexing all, because the collection is quite
 small, but is there a way to fix this problem? Suppose I can figure out
 which core returns different results, is there a command to force that core
 to refetch the whole index from its master?

 Thanks

 Giovanni