[jira] [Updated] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs

2017-01-04 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9918:
---
Attachment: SOLR-9918.patch

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to 
> missing docs
> --
>
> Key: SOLR-9918
> URL: https://issues.apache.org/jira/browse/SOLR-9918
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: Tim Owen
> Attachments: SOLR-9918.patch, SOLR-9918.patch
>
>
> This is an UpdateRequestProcessor and Factory that we have been using in 
> production, to handle 2 common cases that were awkward to achieve using the 
> existing update pipeline and current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new 
> document inserts - do not churn the index by replacing the existing documents 
> and do not throw a noisy exception that breaks the batch of inserts. By 
> analogy with SQL, {{insert if not exists}}. In our use-case, multiple 
> application instances can (rarely) process the same input so it's easier for 
> us to de-dupe these at Solr insert time than to funnel them into a global 
> ordered queue first.
> * When applying AtomicUpdate documents, if a document being updated does not 
> exist, quietly do nothing - do not create a new partially-populated document 
> and do not throw a noisy exception about missing required fields. By analogy 
> with SQL, {{update where id = ..}}. Our use-case relies on this because we 
> apply updates optimistically and have best-effort knowledge about what 
> documents will exist, so it's easiest to skip the updates (in the same way a 
> Database would).
> I would have kept this in our own package hierarchy but it relies on some 
> package-scoped methods, and seems like it could be useful to others if they 
> choose to configure it. Some bits of the code were borrowed from 
> {{DocBasedVersionConstraintsProcessorFactory}}.
> Attached patch has unit tests to confirm the behaviour.
> This class can be used by configuring solrconfig.xml like so..
> {noformat}
>   
> 
>  class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory">
>   true
>   false 
> 
> 
> 
>   
> {noformat}
> and initParams defaults of
> {noformat}
>   skipexisting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs

2017-01-03 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9918:
---
Attachment: SOLR-9918.patch

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to 
> missing docs
> --
>
> Key: SOLR-9918
> URL: https://issues.apache.org/jira/browse/SOLR-9918
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: Tim Owen
> Attachments: SOLR-9918.patch
>
>
> This is an UpdateRequestProcessor and Factory that we have been using in 
> production, to handle 2 common cases that were awkward to achieve using the 
> existing update pipeline and current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new 
> document inserts - do not churn the index by replacing the existing documents 
> and do not throw a noisy exception that breaks the batch of inserts. By 
> analogy with SQL, {{insert if not exists}}. In our use-case, multiple 
> application instances can (rarely) process the same input so it's easier for 
> us to de-dupe these at Solr insert time than to funnel them into a global 
> ordered queue first.
> * When applying AtomicUpdate documents, if a document being updated does not 
> exist, quietly do nothing - do not create a new partially-populated document 
> and do not throw a noisy exception about missing required fields. By analogy 
> with SQL, {{update where id = ..}}. Our use-case relies on this because we 
> apply updates optimistically and have best-effort knowledge about what 
> documents will exist, so it's easiest to skip the updates (in the same way a 
> Database would).
> I would have kept this in our own package hierarchy but it relies on some 
> package-scoped methods, and seems like it could be useful to others if they 
> choose to configure it. Some bits of the code were borrowed from 
> {{DocBasedVersionConstraintsProcessorFactory}}.
> Attached patch has unit tests to confirm the behaviour.
> This class can be used by configuring solrconfig.xml like so..
> {noformat}
>   
> 
>  class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory">
>   true
>   false 
> 
> 
> 
>   
> {noformat}
> and initParams defaults of
> {noformat}
>   skipexisting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org