Re: Batch update, order of evaluation

2010-09-09 Thread Greg Pendlebury
I can't reproduce reliably, so I'm suspecting there are issues in our code.
I'm refactoring to avoid the problem entirely.

Thanks for the response though Erick.

Greg

On 8 September 2010 21:51, Greg Pendlebury greg.pendleb...@gmail.comwrote:

 Thanks,

 I'll create a deliberate test tomorrow feed some random data through it
 several times to see what happens.

 I'm also working on simply improving the buffer to handle the situation
 internally, but a few hours of testing isn't a big deal.

 Ta,
 Greg


 On 8 September 2010 21:41, Erick Erickson erickerick...@gmail.com wrote:

 This would be surprising behavior, if you can reliably reproduce this
 it's worth a JIRA.

 But (and I'm stretching a bit here) are you sure you're committing at the
 end of the batch AND are you sure you're looking after the commit? Here's
 the scenario: Your updated document is a position 1 and 100 in your batch.
 Somewhere around SOLR processing document 50, an autocommit occurs,
 and you're looking at your results before SOLR gets around to committing
 document 100. Like I said, it's a stretch.

 To test this, you need to be absolutely sure of two things before you
 search:
 1 the batch is finished processing
 2 you've issued a commit after the last document in the batch.

 If you're sure of the above and still see the problem, please let us
 know...

 HTH
 Erick

 On Tue, Sep 7, 2010 at 10:32 PM, Greg Pendlebury
 greg.pendleb...@gmail.comwrote:

  Does anyone know with certainty how (or even if) order is evaluated when
  updates are performed by batch?
 
  Our application internally buffers solr documents for speed of ingest
  before
  sending them to the server in chunks. The XML documents sent to the solr
  server contain all documents in the order they arrived without any
 settings
  changed from the defaults (so overwrite = true). We are careful to avoid
  things like HashMaps on our side since they'd lose the order, but I
 can't
  be
  certain what occurs inside Solr.
 
  Sometimes if an object has been indexed twice for various reasons it
 could
  appear twice in the buffer but the most up-to-date version is always
 last.
  I
  have however observed instances where the first copy of the document is
  indexed and differences in the second copy are missing. Does this sound
  likely? And if so are there any obvious settings I can play with to get
 the
  behavior I desire?
 
  I looked at:
  http://wiki.apache.org/solr/UpdateXmlMessages
 
  but there is no mention of order, just the overwrite flag (which I'm
 unsure
  how it is applied internally to an update message) and the deprecated
  duplicates flag (which I have no idea about).
 
  Would switching to SolrInputDocuments on a CommonsHttpSolrServer help?
 as
  per http://wiki.apache.org/solr/Solrj. This is no mention of order
 there
  either however.
 
  Thanks to anyone who took the time to read this.
 
  Ta,
  Greg
 





Re: Batch update, order of evaluation

2010-09-08 Thread Erick Erickson
This would be surprising behavior, if you can reliably reproduce this
it's worth a JIRA.

But (and I'm stretching a bit here) are you sure you're committing at the
end of the batch AND are you sure you're looking after the commit? Here's
the scenario: Your updated document is a position 1 and 100 in your batch.
Somewhere around SOLR processing document 50, an autocommit occurs,
and you're looking at your results before SOLR gets around to committing
document 100. Like I said, it's a stretch.

To test this, you need to be absolutely sure of two things before you
search:
1 the batch is finished processing
2 you've issued a commit after the last document in the batch.

If you're sure of the above and still see the problem, please let us know...

HTH
Erick

On Tue, Sep 7, 2010 at 10:32 PM, Greg Pendlebury
greg.pendleb...@gmail.comwrote:

 Does anyone know with certainty how (or even if) order is evaluated when
 updates are performed by batch?

 Our application internally buffers solr documents for speed of ingest
 before
 sending them to the server in chunks. The XML documents sent to the solr
 server contain all documents in the order they arrived without any settings
 changed from the defaults (so overwrite = true). We are careful to avoid
 things like HashMaps on our side since they'd lose the order, but I can't
 be
 certain what occurs inside Solr.

 Sometimes if an object has been indexed twice for various reasons it could
 appear twice in the buffer but the most up-to-date version is always last.
 I
 have however observed instances where the first copy of the document is
 indexed and differences in the second copy are missing. Does this sound
 likely? And if so are there any obvious settings I can play with to get the
 behavior I desire?

 I looked at:
 http://wiki.apache.org/solr/UpdateXmlMessages

 but there is no mention of order, just the overwrite flag (which I'm unsure
 how it is applied internally to an update message) and the deprecated
 duplicates flag (which I have no idea about).

 Would switching to SolrInputDocuments on a CommonsHttpSolrServer help? as
 per http://wiki.apache.org/solr/Solrj. This is no mention of order there
 either however.

 Thanks to anyone who took the time to read this.

 Ta,
 Greg



Re: Batch update, order of evaluation

2010-09-08 Thread Greg Pendlebury
Thanks,

I'll create a deliberate test tomorrow feed some random data through it
several times to see what happens.

I'm also working on simply improving the buffer to handle the situation
internally, but a few hours of testing isn't a big deal.

Ta,
Greg

On 8 September 2010 21:41, Erick Erickson erickerick...@gmail.com wrote:

 This would be surprising behavior, if you can reliably reproduce this
 it's worth a JIRA.

 But (and I'm stretching a bit here) are you sure you're committing at the
 end of the batch AND are you sure you're looking after the commit? Here's
 the scenario: Your updated document is a position 1 and 100 in your batch.
 Somewhere around SOLR processing document 50, an autocommit occurs,
 and you're looking at your results before SOLR gets around to committing
 document 100. Like I said, it's a stretch.

 To test this, you need to be absolutely sure of two things before you
 search:
 1 the batch is finished processing
 2 you've issued a commit after the last document in the batch.

 If you're sure of the above and still see the problem, please let us
 know...

 HTH
 Erick

 On Tue, Sep 7, 2010 at 10:32 PM, Greg Pendlebury
 greg.pendleb...@gmail.comwrote:

  Does anyone know with certainty how (or even if) order is evaluated when
  updates are performed by batch?
 
  Our application internally buffers solr documents for speed of ingest
  before
  sending them to the server in chunks. The XML documents sent to the solr
  server contain all documents in the order they arrived without any
 settings
  changed from the defaults (so overwrite = true). We are careful to avoid
  things like HashMaps on our side since they'd lose the order, but I can't
  be
  certain what occurs inside Solr.
 
  Sometimes if an object has been indexed twice for various reasons it
 could
  appear twice in the buffer but the most up-to-date version is always
 last.
  I
  have however observed instances where the first copy of the document is
  indexed and differences in the second copy are missing. Does this sound
  likely? And if so are there any obvious settings I can play with to get
 the
  behavior I desire?
 
  I looked at:
  http://wiki.apache.org/solr/UpdateXmlMessages
 
  but there is no mention of order, just the overwrite flag (which I'm
 unsure
  how it is applied internally to an update message) and the deprecated
  duplicates flag (which I have no idea about).
 
  Would switching to SolrInputDocuments on a CommonsHttpSolrServer help? as
  per http://wiki.apache.org/solr/Solrj. This is no mention of order there
  either however.
 
  Thanks to anyone who took the time to read this.
 
  Ta,
  Greg
 



Batch update, order of evaluation

2010-09-07 Thread Greg Pendlebury
Does anyone know with certainty how (or even if) order is evaluated when
updates are performed by batch?

Our application internally buffers solr documents for speed of ingest before
sending them to the server in chunks. The XML documents sent to the solr
server contain all documents in the order they arrived without any settings
changed from the defaults (so overwrite = true). We are careful to avoid
things like HashMaps on our side since they'd lose the order, but I can't be
certain what occurs inside Solr.

Sometimes if an object has been indexed twice for various reasons it could
appear twice in the buffer but the most up-to-date version is always last. I
have however observed instances where the first copy of the document is
indexed and differences in the second copy are missing. Does this sound
likely? And if so are there any obvious settings I can play with to get the
behavior I desire?

I looked at:
http://wiki.apache.org/solr/UpdateXmlMessages

but there is no mention of order, just the overwrite flag (which I'm unsure
how it is applied internally to an update message) and the deprecated
duplicates flag (which I have no idea about).

Would switching to SolrInputDocuments on a CommonsHttpSolrServer help? as
per http://wiki.apache.org/solr/Solrj. This is no mention of order there
either however.

Thanks to anyone who took the time to read this.

Ta,
Greg