How to efficiently find documents that have a specific value for a field OR the field does not exist at all
I'm trying to find documents using this query: field:value OR (*:* AND NOT field:[* TO *]) Which means, either field is set to value or the field does not exist in the document. I'm running this for ~20 fields in a single query strung together with ANDs. The query time is high, averaging around 3.5s. Does anyone have suggestions on how to optimize this query? As a last resort, using technologies outside of Solr is a possibility. All suggestions are greatly appreciated! Thanks for your time and efforts, Artem PS. For the record, a colleague and I have brainstormed some idea of our own: * Adding a meta field to each document that consists of 1s and 0s, where each character represents a field's existence (1 yes, 0 no). In this case the query would look like: field:value OR signature:???0??? So we are looking for a certain field (the 0) that definitely does not exist and all the others we do not care about (wildcard). Note that this would have to be a leading wildcard query or we could prepend a dummy character to beginning. A bit of a hack. * Using bitwise operations to find all documents whose set of fields is a subset of they query's set of fields. This would be more work and would require writing a custom query parser or search handler.
DataImportHandler: backups prior to full-import
Does anyone know of any work done to automatically run a backup prior to a DataImportHandler full-import? I've asked this question on #solr and was pointed to https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API which is helpful but is not an automatic backup in the context of full-import's. I'm wondering if anyone else has done this work yet. -- Artem Shnayder
Re: DataImportHandler: backups prior to full-import
My typical workflow is a once-a-day full-import with hourly delta-imports. Ideally, the backup would occur only during the full-import commits. Is there a way to differentiate in the replication handler? On Wed, Mar 28, 2012 at 11:54 AM, Dyer, James james.d...@ingrambook.comwrote: I don't know of any effort out there to have DIH trigger a backup automatically. However, you can set the replication handler to automatically backup after each commit. This might solve your problem if you aren't committing frequently. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Artem Shnayder [mailto:artem@gmail.com] Sent: Wednesday, March 28, 2012 1:46 PM To: solr-user@lucene.apache.org Subject: DataImportHandler: backups prior to full-import Does anyone know of any work done to automatically run a backup prior to a DataImportHandler full-import? I've asked this question on #solr and was pointed to https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API which is helpful but is not an automatic backup in the context of full-import's. I'm wondering if anyone else has done this work yet. -- Artem Shnayder
Re: DataImportHandler: backups prior to full-import
Thanks for you help James, I'll try that out. On Wed, Mar 28, 2012 at 12:30 PM, Dyer, James james.d...@ingrambook.comwrote: Unfortunately there isn't a good way to solve this. Your best bet is to trigger a backup before the nightly re-index using /replication?command=backup The problem is the backup runs asynchronously so its hard to script a way to determine if the backup is finished or not. What we do is poll the replicationHandler with /replicaton?command=details and scrape the response until str name=snapshotCompletedAttimestamp_here/str changes to a new timestamp. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Artem Shnayder [mailto:artem@gmail.com] Sent: Wednesday, March 28, 2012 1:59 PM To: solr-user@lucene.apache.org Subject: Re: DataImportHandler: backups prior to full-import My typical workflow is a once-a-day full-import with hourly delta-imports. Ideally, the backup would occur only during the full-import commits. Is there a way to differentiate in the replication handler? On Wed, Mar 28, 2012 at 11:54 AM, Dyer, James james.d...@ingrambook.com wrote: I don't know of any effort out there to have DIH trigger a backup automatically. However, you can set the replication handler to automatically backup after each commit. This might solve your problem if you aren't committing frequently. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Artem Shnayder [mailto:artem@gmail.com] Sent: Wednesday, March 28, 2012 1:46 PM To: solr-user@lucene.apache.org Subject: DataImportHandler: backups prior to full-import Does anyone know of any work done to automatically run a backup prior to a DataImportHandler full-import? I've asked this question on #solr and was pointed to https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API which is helpful but is not an automatic backup in the context of full-import's. I'm wondering if anyone else has done this work yet. -- Artem Shnayder