Re: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs)

2015-03-27 Thread Shawn Heisey
On 3/27/2015 7:07 AM, Russell Taylor wrote:
 Hi Shawn, thanks for the quick reply.
 
 I've looked at both methods and I think that they won't work for a number of 
 reasons:
 
 1)
 uniqueKey:
  I could use the uniqueKey and overwrite the original document but I need to 
 remove the documents which 
 are not on my new input list and the issue with the uniqueKey method is I 
 don't know what to delete.
 
 Documents on the index:
 docs: [
 {
 id:1
 keyField:A
 },{
 id:2
 keyField:A
 },{
 id:3
 keyField:B
 }
 ]
 New Documents to go on index
 docs: [
 {
 id:1
 keyField:A
 },{
 id:3
 keyField:B
 }
 ]
 I would never know that id:2 should be deleted. (on some new document lists 
 the delete list could be in the millions).
 
 2)
 openSearcher:
 My openSearcher is set to false and I've also commented out autoSoftCommit so 
 I don't get a partial list being returned on a query.
 !--
 autoSoftCommit
maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime
 /autoSoftCommit
 --
 
 
 So is there another way to keep the original set of documents until the new 
 set has been added to the index?

If you are 100% in control of when commits with openSearcher=true are
sent, which it sounds like you probably are, then you can do anything
you want from the start of indexing until commit time, and the user will
never see any of it, until the commit happens.  That allows the
following relatively simple paradigm:

1) Delete LOTS of stuff, or perhaps everything in the index with a
deleteByQuery of *:* (for all documents).

2) Index everything you need to index.

3) Commit.

Thanks,
Shawn



RE: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs)

2015-03-27 Thread Russell Taylor
Hi Shawn, thanks for the quick reply.

I've looked at both methods and I think that they won't work for a number of 
reasons:

1)
uniqueKey:
 I could use the uniqueKey and overwrite the original document but I need to 
remove the documents which 
are not on my new input list and the issue with the uniqueKey method is I don't 
know what to delete.

Documents on the index:
docs: [
{
id:1
keyField:A
},{
id:2
keyField:A
},{
id:3
keyField:B
}
]
New Documents to go on index
docs: [
{
id:1
keyField:A
},{
id:3
keyField:B
}
]
I would never know that id:2 should be deleted. (on some new document lists the 
delete list could be in the millions).

2)
openSearcher:
My openSearcher is set to false and I've also commented out autoSoftCommit so I 
don't get a partial list being returned on a query.
!--
autoSoftCommit
   maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime
/autoSoftCommit
--


So is there another way to keep the original set of documents until the new set 
has been added to the index?


Thanks


Russ.




-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: 26 March 2015 16:06
To: solr-user@lucene.apache.org
Subject: Re: Replacing a group of documents (Delete/Insert) without a query on 
the index ever showing an empty list (Docs)

On 3/26/2015 9:53 AM, Russell Taylor wrote:
 I have an index which is made up of groups of documents, each group is 
 defined by a field called keyField (keyField:A).
 I need to delete all the keyField:A documents and replace them with a 
 brand new set without the index ever returning zero documents on a query.

 At the moment I deleteByQuery:keyField:A and then insert a 
 SolrInputDocument list via SolrJ into my index. I have a small time 
 period where somebody doing a q=fieldKey:A can be returned an empty list.

 FYI: The keyField group might be just 100 documents or up to 10 million.

As long as you don't have any commits with openSearcher=true happening between 
the delete and the insert, that would work ... but why go through the manual 
delete if you don't have to?

If you define a suitable uniqueKey field in your schema, simply indexing a new 
document with the same value in the uniqueKeyfield as an existing document will 
delete the old document.

https://wiki.apache.org/solr/UniqueKey

Thanks,
Shawn



***
This message (including any files transmitted with it) may contain confidential 
and/or proprietary information, is the property of Interactive Data Corporation 
and/or its subsidiaries, and is directed only to the addressee(s). If you are 
not the designated recipient or have reason to believe you received this 
message in error, please delete this message from your system and notify the 
sender immediately. An unintended recipient's disclosure, copying, 
distribution, or use of this message or any attachments is prohibited and may 
be unlawful. 
***


RE: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs)

2015-03-27 Thread Russell Taylor
Yes that works and now I have a better understanding of the soft and hard 
commits to boot.

Thanks again Shawn.


Russ.

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: 27 March 2015 13:22
To: solr-user@lucene.apache.org
Subject: Re: Replacing a group of documents (Delete/Insert) without a query on 
the index ever showing an empty list (Docs)

On 3/27/2015 7:07 AM, Russell Taylor wrote:
 Hi Shawn, thanks for the quick reply.
 
 I've looked at both methods and I think that they won't work for a number of 
 reasons:
 
 1)
 uniqueKey:
  I could use the uniqueKey and overwrite the original document but I 
 need to remove the documents which are not on my new input list and the issue 
 with the uniqueKey method is I don't know what to delete.
 
 Documents on the index:
 docs: [
 {
 id:1
 keyField:A
 },{
 id:2
 keyField:A
 },{
 id:3
 keyField:B
 }
 ]
 New Documents to go on index
 docs: [
 {
 id:1
 keyField:A
 },{
 id:3
 keyField:B
 }
 ]
 I would never know that id:2 should be deleted. (on some new document lists 
 the delete list could be in the millions).
 
 2)
 openSearcher:
 My openSearcher is set to false and I've also commented out autoSoftCommit so 
 I don't get a partial list being returned on a query.
 !--
 autoSoftCommit
maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime
 /autoSoftCommit
 --
 
 
 So is there another way to keep the original set of documents until the new 
 set has been added to the index?

If you are 100% in control of when commits with openSearcher=true are sent, 
which it sounds like you probably are, then you can do anything you want from 
the start of indexing until commit time, and the user will never see any of it, 
until the commit happens.  That allows the following relatively simple paradigm:

1) Delete LOTS of stuff, or perhaps everything in the index with a 
deleteByQuery of *:* (for all documents).

2) Index everything you need to index.

3) Commit.

Thanks,
Shawn



***
This message (including any files transmitted with it) may contain confidential 
and/or proprietary information, is the property of Interactive Data Corporation 
and/or its subsidiaries, and is directed only to the addressee(s). If you are 
not the designated recipient or have reason to believe you received this 
message in error, please delete this message from your system and notify the 
sender immediately. An unintended recipient's disclosure, copying, 
distribution, or use of this message or any attachments is prohibited and may 
be unlawful. 
***


Re: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs)

2015-03-27 Thread Erick Erickson
You can simplify things a bit by indexing a batch number guaranteed
to be different between two runs for the same keyField. In fact I'd
make sure it was unique amongst all my runs. Simplest is a timestamp
(assuming you don't start two batches within a millisecond!). So it
looks like this.

get a new timestamp
Add it to _every_ doc in my current run.
issue delete-by-query like 'q=keyfield:A AND timestamp:[* TO timestamp}
commit

As Shawn says, you have to very carefully control the commits. And
also note that the curly brace at the end is NOT a typo, it excludes
the endpoint.

Best,
Erick

On Fri, Mar 27, 2015 at 7:01 AM, Russell Taylor
russell.tay...@interactivedata.com wrote:
 Yes that works and now I have a better understanding of the soft and hard 
 commits to boot.

 Thanks again Shawn.


 Russ.

 -Original Message-
 From: Shawn Heisey [mailto:apa...@elyograg.org]
 Sent: 27 March 2015 13:22
 To: solr-user@lucene.apache.org
 Subject: Re: Replacing a group of documents (Delete/Insert) without a query 
 on the index ever showing an empty list (Docs)

 On 3/27/2015 7:07 AM, Russell Taylor wrote:
 Hi Shawn, thanks for the quick reply.

 I've looked at both methods and I think that they won't work for a number of 
 reasons:

 1)
 uniqueKey:
  I could use the uniqueKey and overwrite the original document but I
 need to remove the documents which are not on my new input list and the 
 issue with the uniqueKey method is I don't know what to delete.

 Documents on the index:
 docs: [
 {
 id:1
 keyField:A
 },{
 id:2
 keyField:A
 },{
 id:3
 keyField:B
 }
 ]
 New Documents to go on index
 docs: [
 {
 id:1
 keyField:A
 },{
 id:3
 keyField:B
 }
 ]
 I would never know that id:2 should be deleted. (on some new document lists 
 the delete list could be in the millions).

 2)
 openSearcher:
 My openSearcher is set to false and I've also commented out autoSoftCommit 
 so I don't get a partial list being returned on a query.
 !--
 autoSoftCommit
maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime
 /autoSoftCommit
 --


 So is there another way to keep the original set of documents until the new 
 set has been added to the index?

 If you are 100% in control of when commits with openSearcher=true are sent, 
 which it sounds like you probably are, then you can do anything you want from 
 the start of indexing until commit time, and the user will never see any of 
 it, until the commit happens.  That allows the following relatively simple 
 paradigm:

 1) Delete LOTS of stuff, or perhaps everything in the index with a 
 deleteByQuery of *:* (for all documents).

 2) Index everything you need to index.

 3) Commit.

 Thanks,
 Shawn



 ***
 This message (including any files transmitted with it) may contain 
 confidential and/or proprietary information, is the property of Interactive 
 Data Corporation and/or its subsidiaries, and is directed only to the 
 addressee(s). If you are not the designated recipient or have reason to 
 believe you received this message in error, please delete this message from 
 your system and notify the sender immediately. An unintended recipient's 
 disclosure, copying, distribution, or use of this message or any attachments 
 is prohibited and may be unlawful.
 ***


Re: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs)

2015-03-26 Thread Shawn Heisey
On 3/26/2015 9:53 AM, Russell Taylor wrote:
 I have an index which is made up of groups of documents, each group is 
 defined by a field called keyField (keyField:A).
 I need to delete all the keyField:A documents and replace them with a brand 
 new set without the index ever returning
 zero documents on a query.

 At the moment I deleteByQuery:keyField:A and then insert a SolrInputDocument 
 list via
 SolrJ into my index. I have a small time period where somebody doing a 
 q=fieldKey:A
 can be returned an empty list.

 FYI: The keyField group might be just 100 documents or up to 10 million.

As long as you don't have any commits with openSearcher=true happening
between the delete and the insert, that would work ... but why go
through the manual delete if you don't have to?

If you define a suitable uniqueKey field in your schema, simply indexing
a new document with the same value in the uniqueKeyfield as an existing
document will delete the old document.

https://wiki.apache.org/solr/UniqueKey

Thanks,
Shawn