[jira] [Comment Edited] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs

2019-03-15 Thread Scott Blum (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794015#comment-16794015
 ] 

Scott Blum edited comment on SOLR-13320 at 3/15/19 11:19 PM:
-

[~shalinmangar] lemme break this down a bit...

Imagine you're restoring a collection from a backup, but you want to be able to 
accept writes while this is in progress.  You start accepting writes (of new 
data) on the new, empty collection, then in the background you want to backfill 
from your backup copy, but you don't want to overwrite anything that has been 
written recently.

Setting "version:-1" on all the incoming, backfill doc is almost what you want: 
add any documents that don't exist, but don't overwrite any documents that do 
exist.  The problem is that the entire batch gets rejected if even one document 
already exists.  We just want a way to be able to ignore conflicts and quietly 
drop the offending documents rather than rejecting the entire batch.

"ignoreConflicts" might be a better name.


was (Author: dragonsinth):
[~shalinmangar] lemme break this down a bit...

Imagine you're restoring a collection from a backup, but you want to be able to 
accept writes while this is in progress.  You start accepting writes (of new 
data) on the new, empty collection, then in the background you want to backfill 
from your backup copy, but you don't want to overwrite anything that has been 
written recently.

Setting "version:-1" on all the incoming, backfill doc is almost what you 
want-- add any documents that don't exist, but don't overwrite any documents 
that do exist.  The problem is that the entire batch gets rejected if even one 
document already exists.  We just want a way to be able to ignore conflicts and 
quietly drop the offending documents rather than rejecting the entire batch.

"ignoreConflicts" might be a better name.

> add a param ignoreDuplicates=true to updates to not overwrite existing docs
> ---
>
> Key: SOLR-13320
> URL: https://issues.apache.org/jira/browse/SOLR-13320
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> Updates should have an option to ignore duplicate documents and drop them if 
> an option  {{ignoreDuplicates=true}} is specified



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs

2019-03-15 Thread Scott Blum (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794015#comment-16794015
 ] 

Scott Blum edited comment on SOLR-13320 at 3/15/19 11:19 PM:
-

[~shalinmangar] lemme break this down a bit...

Imagine you're restoring a collection from a backup, but you want to be able to 
accept writes while this is in progress.  You start accepting writes (of new 
data) on the new, empty collection, then in the background you want to backfill 
from your backup copy, but you don't want to overwrite anything that has been 
written recently.

Setting "version:-1" on all the incoming, backfill doc is almost what you 
want-- add any documents that don't exist, but don't overwrite any documents 
that do exist.  The problem is that the entire batch gets rejected if even one 
document already exists.  We just want a way to be able to ignore conflicts and 
quietly drop the offending documents rather than rejecting the entire batch.

"ignoreConflicts" might be a better name.


was (Author: dragonsinth):
Shalin lemme break this down a bit...

Imagine you're restoring a collection from a backup, but you want to be able to 
accept writes while this is in progress.  You start accepting writes (of new 
data) on the new, empty collection, then in the background you want to backfill 
from your backup copy, but you don't want to overwrite anything that has been 
written recently.

Setting "version:-1" on all the incoming, backfill doc is almost what you 
want-- add any documents that don't exist, but don't overwrite any documents 
that do exist.  The problem is that the entire batch gets rejected if even one 
document already exists.  We just want a way to be able to ignore conflicts and 
quietly drop the offending documents rather than rejecting the entire batch.

"ignoreConflicts" might be a better name.

> add a param ignoreDuplicates=true to updates to not overwrite existing docs
> ---
>
> Key: SOLR-13320
> URL: https://issues.apache.org/jira/browse/SOLR-13320
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> Updates should have an option to ignore duplicate documents and drop them if 
> an option  {{ignoreDuplicates=true}} is specified



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs

2019-03-12 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791318#comment-16791318
 ] 

Shalin Shekhar Mangar edited comment on SOLR-13320 at 3/13/19 5:36 AM:
---

If the definition of duplicate is just having the same id then that can also be 
done today using optimistic concurrency. Use a negative value for the version. 
See 
https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-OptimisticConcurrency

If duplicate depends on the content of the document then you need to use the 
SignatureUpdateProcessorFactory


was (Author: shalinmangar):
If the definition of duplicate is just having the same id then that can also be 
done today using optimistic concurrency. Use {{_version_}} with a negative 
value. See 
https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-OptimisticConcurrency

If duplicate depends on the content of the document then you need to use the 
SignatureUpdateProcessorFactory

> add a param ignoreDuplicates=true to updates to not overwrite existing docs
> ---
>
> Key: SOLR-13320
> URL: https://issues.apache.org/jira/browse/SOLR-13320
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> Updates should have an option to ignore duplicate documents and drop them if 
> an option  {{ignoreDuplicates=true}} is specified



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs

2019-03-12 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791318#comment-16791318
 ] 

Shalin Shekhar Mangar edited comment on SOLR-13320 at 3/13/19 5:35 AM:
---

If the definition of duplicate is just having the same id then that can also be 
done today using optimistic concurrency. Use {{_version_}} with a negative 
value. See 
https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-OptimisticConcurrency

If duplicate depends on the content of the document then you need to use the 
SignatureUpdateProcessorFactory


was (Author: shalinmangar):
If the definition of duplicate is just having the same id then that can also be 
done today using optimistic concurrency. Use `_version_` with a negative value. 
See 
https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-OptimisticConcurrency

If duplicate depends on the content of the document then you need to use the 
SignatureUpdateProcessorFactory

> add a param ignoreDuplicates=true to updates to not overwrite existing docs
> ---
>
> Key: SOLR-13320
> URL: https://issues.apache.org/jira/browse/SOLR-13320
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> Updates should have an option to ignore duplicate documents and drop them if 
> an option  {{ignoreDuplicates=true}} is specified



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org