Re: Batch update, order of evaluation
I can't reproduce reliably, so I'm suspecting there are issues in our code. I'm refactoring to avoid the problem entirely. Thanks for the response though Erick. Greg On 8 September 2010 21:51, Greg Pendlebury greg.pendleb...@gmail.comwrote: Thanks, I'll create a deliberate test tomorrow feed some random data through it several times to see what happens. I'm also working on simply improving the buffer to handle the situation internally, but a few hours of testing isn't a big deal. Ta, Greg On 8 September 2010 21:41, Erick Erickson erickerick...@gmail.com wrote: This would be surprising behavior, if you can reliably reproduce this it's worth a JIRA. But (and I'm stretching a bit here) are you sure you're committing at the end of the batch AND are you sure you're looking after the commit? Here's the scenario: Your updated document is a position 1 and 100 in your batch. Somewhere around SOLR processing document 50, an autocommit occurs, and you're looking at your results before SOLR gets around to committing document 100. Like I said, it's a stretch. To test this, you need to be absolutely sure of two things before you search: 1 the batch is finished processing 2 you've issued a commit after the last document in the batch. If you're sure of the above and still see the problem, please let us know... HTH Erick On Tue, Sep 7, 2010 at 10:32 PM, Greg Pendlebury greg.pendleb...@gmail.comwrote: Does anyone know with certainty how (or even if) order is evaluated when updates are performed by batch? Our application internally buffers solr documents for speed of ingest before sending them to the server in chunks. The XML documents sent to the solr server contain all documents in the order they arrived without any settings changed from the defaults (so overwrite = true). We are careful to avoid things like HashMaps on our side since they'd lose the order, but I can't be certain what occurs inside Solr. Sometimes if an object has been indexed twice for various reasons it could appear twice in the buffer but the most up-to-date version is always last. I have however observed instances where the first copy of the document is indexed and differences in the second copy are missing. Does this sound likely? And if so are there any obvious settings I can play with to get the behavior I desire? I looked at: http://wiki.apache.org/solr/UpdateXmlMessages but there is no mention of order, just the overwrite flag (which I'm unsure how it is applied internally to an update message) and the deprecated duplicates flag (which I have no idea about). Would switching to SolrInputDocuments on a CommonsHttpSolrServer help? as per http://wiki.apache.org/solr/Solrj. This is no mention of order there either however. Thanks to anyone who took the time to read this. Ta, Greg
Re: Batch update, order of evaluation
This would be surprising behavior, if you can reliably reproduce this it's worth a JIRA. But (and I'm stretching a bit here) are you sure you're committing at the end of the batch AND are you sure you're looking after the commit? Here's the scenario: Your updated document is a position 1 and 100 in your batch. Somewhere around SOLR processing document 50, an autocommit occurs, and you're looking at your results before SOLR gets around to committing document 100. Like I said, it's a stretch. To test this, you need to be absolutely sure of two things before you search: 1 the batch is finished processing 2 you've issued a commit after the last document in the batch. If you're sure of the above and still see the problem, please let us know... HTH Erick On Tue, Sep 7, 2010 at 10:32 PM, Greg Pendlebury greg.pendleb...@gmail.comwrote: Does anyone know with certainty how (or even if) order is evaluated when updates are performed by batch? Our application internally buffers solr documents for speed of ingest before sending them to the server in chunks. The XML documents sent to the solr server contain all documents in the order they arrived without any settings changed from the defaults (so overwrite = true). We are careful to avoid things like HashMaps on our side since they'd lose the order, but I can't be certain what occurs inside Solr. Sometimes if an object has been indexed twice for various reasons it could appear twice in the buffer but the most up-to-date version is always last. I have however observed instances where the first copy of the document is indexed and differences in the second copy are missing. Does this sound likely? And if so are there any obvious settings I can play with to get the behavior I desire? I looked at: http://wiki.apache.org/solr/UpdateXmlMessages but there is no mention of order, just the overwrite flag (which I'm unsure how it is applied internally to an update message) and the deprecated duplicates flag (which I have no idea about). Would switching to SolrInputDocuments on a CommonsHttpSolrServer help? as per http://wiki.apache.org/solr/Solrj. This is no mention of order there either however. Thanks to anyone who took the time to read this. Ta, Greg
Re: Batch update, order of evaluation
Thanks, I'll create a deliberate test tomorrow feed some random data through it several times to see what happens. I'm also working on simply improving the buffer to handle the situation internally, but a few hours of testing isn't a big deal. Ta, Greg On 8 September 2010 21:41, Erick Erickson erickerick...@gmail.com wrote: This would be surprising behavior, if you can reliably reproduce this it's worth a JIRA. But (and I'm stretching a bit here) are you sure you're committing at the end of the batch AND are you sure you're looking after the commit? Here's the scenario: Your updated document is a position 1 and 100 in your batch. Somewhere around SOLR processing document 50, an autocommit occurs, and you're looking at your results before SOLR gets around to committing document 100. Like I said, it's a stretch. To test this, you need to be absolutely sure of two things before you search: 1 the batch is finished processing 2 you've issued a commit after the last document in the batch. If you're sure of the above and still see the problem, please let us know... HTH Erick On Tue, Sep 7, 2010 at 10:32 PM, Greg Pendlebury greg.pendleb...@gmail.comwrote: Does anyone know with certainty how (or even if) order is evaluated when updates are performed by batch? Our application internally buffers solr documents for speed of ingest before sending them to the server in chunks. The XML documents sent to the solr server contain all documents in the order they arrived without any settings changed from the defaults (so overwrite = true). We are careful to avoid things like HashMaps on our side since they'd lose the order, but I can't be certain what occurs inside Solr. Sometimes if an object has been indexed twice for various reasons it could appear twice in the buffer but the most up-to-date version is always last. I have however observed instances where the first copy of the document is indexed and differences in the second copy are missing. Does this sound likely? And if so are there any obvious settings I can play with to get the behavior I desire? I looked at: http://wiki.apache.org/solr/UpdateXmlMessages but there is no mention of order, just the overwrite flag (which I'm unsure how it is applied internally to an update message) and the deprecated duplicates flag (which I have no idea about). Would switching to SolrInputDocuments on a CommonsHttpSolrServer help? as per http://wiki.apache.org/solr/Solrj. This is no mention of order there either however. Thanks to anyone who took the time to read this. Ta, Greg
Batch update, order of evaluation
Does anyone know with certainty how (or even if) order is evaluated when updates are performed by batch? Our application internally buffers solr documents for speed of ingest before sending them to the server in chunks. The XML documents sent to the solr server contain all documents in the order they arrived without any settings changed from the defaults (so overwrite = true). We are careful to avoid things like HashMaps on our side since they'd lose the order, but I can't be certain what occurs inside Solr. Sometimes if an object has been indexed twice for various reasons it could appear twice in the buffer but the most up-to-date version is always last. I have however observed instances where the first copy of the document is indexed and differences in the second copy are missing. Does this sound likely? And if so are there any obvious settings I can play with to get the behavior I desire? I looked at: http://wiki.apache.org/solr/UpdateXmlMessages but there is no mention of order, just the overwrite flag (which I'm unsure how it is applied internally to an update message) and the deprecated duplicates flag (which I have no idea about). Would switching to SolrInputDocuments on a CommonsHttpSolrServer help? as per http://wiki.apache.org/solr/Solrj. This is no mention of order there either however. Thanks to anyone who took the time to read this. Ta, Greg