[jira] [Commented] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828873#comment-16828873 ] Noble Paul commented on SOLR-13320: --- [~tomasflobbe] Well, no. IIRC {{DocBasedVersionConstraintsProcessor}} can skip the docs based on the _{{version_}} field in the document (not from a request param) > add a param ignoreDuplicates=true to updates to not overwrite existing docs > --- > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828861#comment-16828861 ] Tomás Fernández Löbbe commented on SOLR-13320: -- With {{DocBasedVersionConstraintsProcessor}} you can tell Solr to skip documents that have a higher (or equal) version than the one you are trying to add (see {{ignoreOldUpdates}}). Isn't that what you need? > add a param ignoreDuplicates=true to updates to not overwrite existing docs > --- > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827496#comment-16827496 ] Noble Paul commented on SOLR-13320: --- How does {{DocBasedVersionConstraintsProcessor}} solve this [~tomasflobbe] ? > add a param ignoreDuplicates=true to updates to not overwrite existing docs > --- > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827402#comment-16827402 ] Tomás Fernández Löbbe commented on SOLR-13320: -- Isn’t this what {{DocBasedVersionConstraintsProcessor}} does? > add a param ignoreDuplicates=true to updates to not overwrite existing docs > --- > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826483#comment-16826483 ] Scott Blum commented on SOLR-13320: --- +1! > add a param ignoreDuplicates=true to updates to not overwrite existing docs > --- > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825990#comment-16825990 ] Noble Paul commented on SOLR-13320: --- {{ignoreVersionConflicts=true}} makes more sense > add a param ignoreDuplicates=true to updates to not overwrite existing docs > --- > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825968#comment-16825968 ] Shalin Shekhar Mangar commented on SOLR-13320: -- Thanks [~dragonsinth] for explaining the use-case and the problem. These are conflicts -- a document was not the version we wanted it to be. Here {{-1}} is just a special version that means the document should not have existed. So I think {{ignoreConflicts}} or {{ignoreVersionConflicts}} is more appropriate than {{ignoreDuplicates}}. Regardless of what we call the param, returning a list of docs IDs that were skipped would be nice to have as Gus noted. {{haltBatchOnError}} is definitely too broad and it is not always possible to recover from errors e.g. if there is malformed JSON in the middle of a batch. > add a param ignoreDuplicates=true to updates to not overwrite existing docs > --- > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825692#comment-16825692 ] Noble Paul commented on SOLR-13320: --- That just sounds very cool complex. We will have a tough time explaining it to people > add a param ignoreDuplicates=true to updates to not overwrite existing docs > --- > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825667#comment-16825667 ] Gus Heck commented on SOLR-13320: - It would be an error if you sent version=-1 as suggested by Shalin. So the haltBatchOnError=false plus the existing functionality with version=-1 covers your case, right? > add a param ignoreDuplicates=true to updates to not overwrite existing docs > --- > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825658#comment-16825658 ] Noble Paul commented on SOLR-13320: --- well, it's not an error in the strictest sense. * Basically what we want is ignore a document if it already exists and, * the response should have ids of discarded docs > add a param ignoreDuplicates=true to updates to not overwrite existing docs > --- > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825630#comment-16825630 ] Gus Heck commented on SOLR-13320: - Maybe this could be broadened a bit? An option to continue with a batch even if one document has an error. A return response enumerating failed docs and their associated messages would also make sense. That would be a generally useful feature I think. Call it haltBatchOnError... defaults to true. > add a param ignoreDuplicates=true to updates to not overwrite existing docs > --- > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825485#comment-16825485 ] Noble Paul commented on SOLR-13320: --- [~shalinmangar] I guess we are good to go , right? > add a param ignoreDuplicates=true to updates to not overwrite existing docs > --- > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16794790#comment-16794790 ] Noble Paul commented on SOLR-13320: --- bq. "ignoreConflicts" might be a better name. these are not really "conflicts" , right? > add a param ignoreDuplicates=true to updates to not overwrite existing docs > --- > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16794015#comment-16794015 ] Scott Blum commented on SOLR-13320: --- Shalin lemme break this down a bit... Imagine you're restoring a collection from a backup, but you want to be able to accept writes while this is in progress. You start accepting writes (of new data) on the new, empty collection, then in the background you want to backfill from your backup copy, but you don't want to overwrite anything that has been written recently. Setting "version:-1" on all the incoming, backfill doc is almost what you want-- add any documents that don't exist, but don't overwrite any documents that do exist. The problem is that the entire batch gets rejected if even one document already exists. We just want a way to be able to ignore conflicts and quietly drop the offending documents rather than rejecting the entire batch. "ignoreConflicts" might be a better name. > add a param ignoreDuplicates=true to updates to not overwrite existing docs > --- > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13320) add a param ignoreDuplicates=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791318#comment-16791318 ] Shalin Shekhar Mangar commented on SOLR-13320: -- If the definition of duplicate is just having the same id then that can also be done today using optimistic concurrency. Use `_version_` with a negative value. See https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-OptimisticConcurrency If duplicate depends on the content of the document then you need to use the SignatureUpdateProcessorFactory > add a param ignoreDuplicates=true to updates to not overwrite existing docs > --- > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org