How to efficiently find documents that have a specific value for a field OR the field does not exist at all

2012-10-08 Thread Artem Shnayder
I'm trying to find documents using this query:

field:value OR (*:* AND NOT field:[* TO *])

Which means, either field is set to value or the field does not exist in
the document.

I'm running this for ~20 fields in a single query strung together with
ANDs. The query time is high, averaging around 3.5s. Does anyone have
suggestions on how to optimize this query? As a last resort, using
technologies outside of Solr is a possibility.

All suggestions are greatly appreciated!


Thanks for your time and efforts,
Artem



PS. For the record, a colleague and I have brainstormed some idea of our
own:

* Adding a meta field to each document that consists of 1s and 0s, where
each character represents a field's existence (1 yes, 0 no). In this case
the query would look like: field:value OR signature:???0???   
So we are looking for a certain field (the 0) that definitely does not
exist and all the others we do not care about (wildcard). Note that this
would have to be a leading wildcard query or we could prepend a dummy
character to beginning. A bit of a hack.

* Using bitwise operations to find all documents whose set of fields is a
subset of they query's set of fields. This would be more work and would
require writing a custom query parser or search handler.




DataImportHandler: backups prior to full-import

2012-03-28 Thread Artem Shnayder
Does anyone know of any work done to automatically run a backup prior to a
DataImportHandler full-import?

I've asked this question on #solr and was pointed to
https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
which
is helpful but is not an automatic backup in the context of full-import's.
I'm wondering if anyone else has done this work yet.

-- Artem Shnayder


Re: DataImportHandler: backups prior to full-import

2012-03-28 Thread Artem Shnayder
My typical workflow is a once-a-day full-import with hourly delta-imports.
Ideally, the backup would occur only during the full-import commits. Is
there a way to differentiate in the replication handler?

On Wed, Mar 28, 2012 at 11:54 AM, Dyer, James james.d...@ingrambook.comwrote:

 I don't know of any effort out there to have DIH trigger a backup
 automatically.  However, you can set the replication handler to
 automatically backup after each commit.  This might solve your problem if
 you aren't committing frequently.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Artem Shnayder [mailto:artem@gmail.com]
 Sent: Wednesday, March 28, 2012 1:46 PM
 To: solr-user@lucene.apache.org
 Subject: DataImportHandler: backups prior to full-import

 Does anyone know of any work done to automatically run a backup prior to a
 DataImportHandler full-import?

 I've asked this question on #solr and was pointed to

 https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
 which
 is helpful but is not an automatic backup in the context of full-import's.
 I'm wondering if anyone else has done this work yet.

 -- Artem Shnayder



Re: DataImportHandler: backups prior to full-import

2012-03-28 Thread Artem Shnayder
Thanks for you help James, I'll try that out.

On Wed, Mar 28, 2012 at 12:30 PM, Dyer, James james.d...@ingrambook.comwrote:

 Unfortunately there isn't a good way to solve this.  Your best bet is to
 trigger a backup before the nightly re-index using
 /replication?command=backup

 The problem is the backup runs asynchronously so its hard to script a way
 to determine if the backup is finished or not.  What we do is poll the
 replicationHandler with /replicaton?command=details and scrape the response
 until str name=snapshotCompletedAttimestamp_here/str changes to a new
 timestamp.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Artem Shnayder [mailto:artem@gmail.com]
 Sent: Wednesday, March 28, 2012 1:59 PM
 To: solr-user@lucene.apache.org
 Subject: Re: DataImportHandler: backups prior to full-import

 My typical workflow is a once-a-day full-import with hourly delta-imports.
 Ideally, the backup would occur only during the full-import commits. Is
 there a way to differentiate in the replication handler?

 On Wed, Mar 28, 2012 at 11:54 AM, Dyer, James james.d...@ingrambook.com
 wrote:

  I don't know of any effort out there to have DIH trigger a backup
  automatically.  However, you can set the replication handler to
  automatically backup after each commit.  This might solve your problem if
  you aren't committing frequently.
 
  James Dyer
  E-Commerce Systems
  Ingram Content Group
  (615) 213-4311
 
 
  -Original Message-
  From: Artem Shnayder [mailto:artem@gmail.com]
  Sent: Wednesday, March 28, 2012 1:46 PM
  To: solr-user@lucene.apache.org
  Subject: DataImportHandler: backups prior to full-import
 
  Does anyone know of any work done to automatically run a backup prior to
 a
  DataImportHandler full-import?
 
  I've asked this question on #solr and was pointed to
 
 
 https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
  which
  is helpful but is not an automatic backup in the context of
 full-import's.
  I'm wondering if anyone else has done this work yet.
 
  -- Artem Shnayder