[jira] [Comment Edited] (SOLR-9530) Add an Atomic Update Processor

2017-05-05 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998441#comment-15998441
 ] 

Amrit Sarkar edited comment on SOLR-9530 at 5/5/17 3:02 PM:


Ah! That's fault at my part. We made some changes in URPFactory to make all 
URPs SolrCoreAware which essentially needs Solr to start with system property 
"enable.runtime.lib=true".

I have set it explicitly in beforeClass in URPFactoryTest and all the test 
cases are passing successfully now. Sorry for the hiccup.


was (Author: sarkaramr...@gmail.com):
Ah! That's fault at my part. We made some changes in URPFactory to make all 
URPs SolrCoreAware which essentially needs Solr to start with system property 
"enable.runtime.lib=true".

I have set it explicitly in beforeClass in URPFactory and all the test cases 
are passing successfully now. Sorry for the hiccup.

> Add an Atomic Update Processor 
> ---
>
> Key: SOLR-9530
> URL: https://issues.apache.org/jira/browse/SOLR-9530
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Noble Paul
> Fix For: 6.6, master (7.0)
>
> Attachments: assertU(...)-works.png, commit()-doesn't-work.png, 
> SOLR-9530.patch, SOLR-9530.patch, SOLR-9530.patch, SOLR-9530.patch, 
> SOLR-9530.patch, SOLR-9530.patch, SOLR-9530.patch, SOLR-9530.patch, 
> SOLR-9530.patch, SOLR-9530.patch, SOLR-9530.patch, SOLR-9530.patch, 
> SOLR-9530.patch
>
>
> I'd like to explore the idea of adding a new update processor to help ingest 
> partial updates.
> Example use-case - There are two datasets with a common id field. How can I 
> merge both of them at index time?
> So the first JSON dump could be ingested against 
> {{http://localhost:8983/solr/gettingstarted/update/json}}
> And then the second JSON could be ingested against
> {{http://localhost:8983/solr/gettingstarted/update/json?processor=atomic}}
> The Atomic Update Processor could support all the atomic update operations 
> currently supported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9530) Add an Atomic Update Processor

2017-02-28 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889400#comment-15889400
 ] 

Amrit Sarkar edited comment on SOLR-9530 at 3/1/17 2:54 AM:


Considering Noble's and Ishan's suggestions, cooked up a new patch with the 
following:

1. No solrconfig parameter(s) required for this URP now.

2. The URP will take inline parameters exactly as Noble mentioned:
{code}processor=Atomic_newfield=add=set_i=inc{code}

3. Both atomic and conventional updates as incoming documents to the URP are 
allowed.
   a. for atomic updates, the atomic operation in incoming doc should match 
with the parameters specified in processor call.
   e.g. {"id":"1","title":{"set":"A"}}  ||  processor=Atomic=set

4. After the conversion to atomic-style, latest _version_ will be added in the 
updated doc. If _version_, not present, send as it is.

5. if the update faces version conflict, retry by fetching latest _version_ 
from index, updating the SolrInputDoc. Maximum retries set to 5, hardcoded.

6. If the parameters are not sufficient to convert incoming document to 
atomic-style, abort the update.
   e.g {"id":"1","title":"A"}  ||  processor=Atomic=set
there is no point sending this document for update via URP

{noformat}
new file:   
solr/core/src/java/org/apache/solr/update/processor/AtomicUpdateProcessorFactory.java
new file:   
solr/core/src/test/org/apache/solr/update/processor/AtomicUpdateProcessorFactoryTest.java
{noformat}

Tried to write a test case for multiple threads executing URP simultaneously, 
but was not able to replicate the scenario exactly. The test-method is 
commented out in the patch.


was (Author: sarkaramr...@gmail.com):
Considering Noble's and Ishan's suggestions, cooked up a new patch with the 
following:

1. No solrconfig parameter(s) required for this URP now.

2. The URP will take inline parameters exactly as Noble mentioned:
{code}processor=Atomic_newfield=add=set_i=inc{code}

3. Both atomic and conventional updates as incoming documents to the URP are 
allowed.
   a. for atomic updates, the atomic operation in incoming doc should match 
with the parameters specified in processor call.
   e.g. {"id":"1","title":{"set":"A"}}  |  processor=Atomic=set

4. After the conversion to atomic-style, latest _version_ will be added in the 
updated doc. If _version_, not present, send as it is.

5. if the update faces version conflict, retry by fetching latest _version_ 
from index, updating the SolrInputDoc. Maximum retries set to 5, hardcoded.

6. If the parameters are not sufficient to convert incoming document to 
atomic-style, abort the update.
e.g. {"id":"1","title":"A"} | processor=Atomic=set
there is no point sending this document for update via URP

{noformat}
new file:   
solr/core/src/java/org/apache/solr/update/processor/AtomicUpdateProcessorFactory.java
new file:   
solr/core/src/test/org/apache/solr/update/processor/AtomicUpdateProcessorFactoryTest.java
{noformat}

Tried to write a test case for multiple threads executing URP simultaneously, 
but was not able to replicate the scenario exactly. The test-method is 
commented out in the patch.

> Add an Atomic Update Processor 
> ---
>
> Key: SOLR-9530
> URL: https://issues.apache.org/jira/browse/SOLR-9530
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
> Attachments: SOLR-9530.patch, SOLR-9530.patch, SOLR-9530.patch, 
> SOLR-9530.patch
>
>
> I'd like to explore the idea of adding a new update processor to help ingest 
> partial updates.
> Example use-case - There are two datasets with a common id field. How can I 
> merge both of them at index time?
> Proposed Solution: 
> {code}
> 
>   
> add
>   
>   
>   
> 
> {code}
> So the first JSON dump could be ingested against 
> {{http://localhost:8983/solr/gettingstarted/update/json}}
> And then the second JSON could be ingested against
> {{http://localhost:8983/solr/gettingstarted/update/json?processor=atomic}}
> The Atomic Update Processor could support all the atomic update operations 
> currently supported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9530) Add an Atomic Update Processor

2017-02-27 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885992#comment-15885992
 ] 

Amrit Sarkar edited comment on SOLR-9530 at 2/27/17 3:44 PM:
-

As discussed, the system will handle race conditions gracefully. The URP will 
fetch the _version_ before sending the appropriate atomic operation using 
optimistic concurrency. if the request fails , it with retry with updated 
_version_.

Working on the latest developments.


was (Author: sarkaramr...@gmail.com):
As discussed, the system will handle race conditions gracefully. The URP will 
fetch the _version_ before sending the appropriate atomic operation using 
optimistic concurrency. if the request fails , it with retry with updated 
{code}_version_ {code}.

Working on the latest developments.

> Add an Atomic Update Processor 
> ---
>
> Key: SOLR-9530
> URL: https://issues.apache.org/jira/browse/SOLR-9530
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
> Attachments: SOLR-9530.patch, SOLR-9530.patch, SOLR-9530.patch
>
>
> I'd like to explore the idea of adding a new update processor to help ingest 
> partial updates.
> Example use-case - There are two datasets with a common id field. How can I 
> merge both of them at index time?
> Proposed Solution: 
> {code}
> 
>   
> add
>   
>   
>   
> 
> {code}
> So the first JSON dump could be ingested against 
> {{http://localhost:8983/solr/gettingstarted/update/json}}
> And then the second JSON could be ingested against
> {{http://localhost:8983/solr/gettingstarted/update/json?processor=atomic}}
> The Atomic Update Processor could support all the atomic update operations 
> currently supported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9530) Add an Atomic Update Processor

2017-02-23 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880169#comment-15880169
 ] 

Amrit Sarkar edited comment on SOLR-9530 at 2/23/17 9:29 AM:
-

Ishan and Noble, the comments are spot-on and thank you for correcting me out.

I will update the description on not touching solrconfig for this URP and use 
in-line parameters like Noble mentioned, this makes much more sense. Also 
suitable examples on what we are trying to achieve here.

Regarding the optimistic concurrency, I will add the *\_version\_* field in the 
incoming document which will make sure the second request fails if two updates 
for a same doc is received via different threads. My doubt is how to deal with 
the failure, Noble mentioned to retry so that it gets ingested(with latest 
version info) and the updates get reflected. Ishan mentioned just throw _409 
version conflict for the doc_ and let the user decide whether to ingest again 
or not for those particular doc-ids.

Again, I am thankful to both of you for the pointers.


was (Author: sarkaramr...@gmail.com):
Ishan and Noble, the comments are spot-on and thank you for correcting me out.

I will update the description on not touching solrconfig for this URP and use 
in-line parameters like Noble mentioned, this makes much more sense. Also 
suitable examples on what we are trying to achieve here.

Regarding the optimistic concurrency, I will add the *\_version\_* field in the 
incoming document which will make sure the second request fails if two updates 
for a same doc is received via different threads. My doubt is how to deal with 
the failure, Noble mentioned to retry so that it gets ingested(with latest 
version info) and the updates gets reflected. Ishan mentioned just throw _409 
version conflict for the doc_ and let the user decide whether to ingest again 
or not for those particular doc-ids.

Again, I am thankful to both of you for the pointers.

> Add an Atomic Update Processor 
> ---
>
> Key: SOLR-9530
> URL: https://issues.apache.org/jira/browse/SOLR-9530
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
> Attachments: SOLR-9530.patch, SOLR-9530.patch, SOLR-9530.patch
>
>
> I'd like to explore the idea of adding a new update processor to help ingest 
> partial updates.
> Example use-case - There are two datasets with a common id field. How can I 
> merge both of them at index time?
> Proposed Solution: 
> {code}
> 
>   
> add
>   
>   
>   
> 
> {code}
> So the first JSON dump could be ingested against 
> {{http://localhost:8983/solr/gettingstarted/update/json}}
> And then the second JSON could be ingested against
> {{http://localhost:8983/solr/gettingstarted/update/json?processor=atomic}}
> The Atomic Update Processor could support all the atomic update operations 
> currently supported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9530) Add an Atomic Update Processor

2017-02-22 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880016#comment-15880016
 ] 

Noble Paul edited comment on SOLR-9530 at 2/23/17 7:07 AM:
---

Let's get rid of any URP configuration from {{solrconfig.xml}}. Let's move 
everything to parameters and define what is required.  The problem is , the 
config API does not support URP chain and it does not plan to do so. So, let's 
keep it as simple parameters

accept params as follows and nuke all the configuration required
{code}
processor=Atomic_newfield=add=set_i=inc
{code}


was (Author: noble.paul):
Let's get rid of any URP configuration from {{solrconfig.xml}}. Let's move 
everything to parameters and define what is required.  The problem is , the 
config API does not support URP chain and it does not plan to do so. So, let's 
keep it is simple parameters

accept params as follows and nuke all the configuration required
{code}
Atomic.my_newfield=add=set_i=inc
{code}

> Add an Atomic Update Processor 
> ---
>
> Key: SOLR-9530
> URL: https://issues.apache.org/jira/browse/SOLR-9530
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
> Attachments: SOLR-9530.patch, SOLR-9530.patch, SOLR-9530.patch
>
>
> I'd like to explore the idea of adding a new update processor to help ingest 
> partial updates.
> Example use-case - There are two datasets with a common id field. How can I 
> merge both of them at index time?
> Proposed Solution: 
> {code}
> 
>   
> add
>   
>   
>   
> 
> {code}
> So the first JSON dump could be ingested against 
> {{http://localhost:8983/solr/gettingstarted/update/json}}
> And then the second JSON could be ingested against
> {{http://localhost:8983/solr/gettingstarted/update/json?processor=atomic}}
> The Atomic Update Processor could support all the atomic update operations 
> currently supported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9530) Add an Atomic Update Processor

2017-02-22 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879998#comment-15879998
 ] 

Ishan Chattopadhyaya edited comment on SOLR-9530 at 2/23/17 7:04 AM:
-

bq. In case of multiple threads try to update the incoming doc to atomic-type 
update doc, all the threads will end up forming same atomic-type update doc (as 
same set of operations will be performed by 'SET' field).
The problem is with "inc" operations. When two clients see the value to be, say 
100, and want to increase by 50, they can supply the document version along 
with "inc":50. One of them would be executed first, and the second one would be 
rejected since the document version is no longer the same as what this client 
saw. Without optimistic concurrency, the value will end up being 200, but 
intended value was 150.

Also, do consider the cases when one client is indexing without this URP, but 
another client is using this URP, both in parallel.


was (Author: ichattopadhyaya):
bq. In case of multiple threads try to update the incoming doc to atomic-type 
update doc, all the threads will end up forming same atomic-type update doc (as 
same set of operations will be performed by 'SET' field).
The problem is with "inc" operations. When two clients see the value to be, say 
100, and want to increase by 50, they can supply the document version along 
with "inc":50. One of them would be executed first, and the second one would be 
rejected since the document version is no longer the same as what this client 
saw.

Also, do consider the cases when one client is indexing without this URP, but 
another client is using this URP, both in parallel. 

> Add an Atomic Update Processor 
> ---
>
> Key: SOLR-9530
> URL: https://issues.apache.org/jira/browse/SOLR-9530
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
> Attachments: SOLR-9530.patch, SOLR-9530.patch, SOLR-9530.patch
>
>
> I'd like to explore the idea of adding a new update processor to help ingest 
> partial updates.
> Example use-case - There are two datasets with a common id field. How can I 
> merge both of them at index time?
> Proposed Solution: 
> {code}
> 
>   
> add
>   
>   
>   
> 
> {code}
> So the first JSON dump could be ingested against 
> {{http://localhost:8983/solr/gettingstarted/update/json}}
> And then the second JSON could be ingested against
> {{http://localhost:8983/solr/gettingstarted/update/json?processor=atomic}}
> The Atomic Update Processor could support all the atomic update operations 
> currently supported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9530) Add an Atomic Update Processor

2017-02-14 Thread AMRIT SARKAR (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867142#comment-15867142
 ] 

AMRIT SARKAR edited comment on SOLR-9530 at 2/15/17 3:41 AM:
-

[~noble.paul] [~ichattopadhyaya] Varun,
thank you for looking into the patch.

[~noble.paul], the patch is not thread safe as the processor is thread-specific 
and is mutually independent of other threads. This processor is somehow doing 
in-place conversion of documents (more like a plugin) and will be passed 
ultimately to DistributedUpdateProcessor (which is thread-safe), where the 
actual "atomic" update to the documents in index takes place. 

If multiple threads try to update the same document in parallel via 
AtomicUpdateProcessor, that case will be similar to multiple threads carrying 
"atomic-style" update of same document, which is already happening in our 
latest Solr. DistributedUpdateProcessor, being thread-safe, handles the 
resources well and don't let the versions conflict each other.


was (Author: sarkaramr...@gmail.com):
[~noble.paul] [~ichattopadhyaya] Varun,
thank you for looking into the patch.

[~noble.paul], the patch is not thread safe as the processor is thread-specific 
and is mutually independent of other threads. This processor is somehow doing 
in-place conversion of documents (more like a plugin) and will be passed 
ultimately to DistributedUpdateProcessor (which is thread-safe), where the 
actual "atomic" update to the documents in index takes place. 

If multiple threads try to update the same document in parallel via 
AtomicUpdateProcessor, that case will be similar to multiple threads carrying 
"atomic-style" update of same document, which is already happening in our 
latest Solr. DistributedUpdateProcessor, being thread-safe, handles the 
resources well and don't let the versions conflict each other.

Let me know if I am missing out anything in this regard.

> Add an Atomic Update Processor 
> ---
>
> Key: SOLR-9530
> URL: https://issues.apache.org/jira/browse/SOLR-9530
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
> Attachments: SOLR-9530.patch
>
>
> I'd like to explore the idea of adding a new update processor to help ingest 
> partial updates.
> Example use-case - There are two datasets with a common id field. How can I 
> merge both of them at index time?
> Proposed Solution: 
> {code}
> 
>   
> add
>   
>   
>   
> 
> {code}
> So the first JSON dump could be ingested against 
> {{http://localhost:8983/solr/gettingstarted/update/json}}
> And then the second JSON could be ingested against
> {{http://localhost:8983/solr/gettingstarted/update/json?processor=atomic}}
> The Atomic Update Processor could support all the atomic update operations 
> currently supported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9530) Add an Atomic Update Processor

2017-02-14 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865992#comment-15865992
 ] 

Noble Paul edited comment on SOLR-9530 at 2/14/17 4:23 PM:
---

IIUC , you can use this URP and keep calling ADD instead of SET and achieve the 
same ?


was (Author: noble.paul):
IIUC , you can use this URP and keep calling ADD instead of UPDATE and achieve 
the same ?

> Add an Atomic Update Processor 
> ---
>
> Key: SOLR-9530
> URL: https://issues.apache.org/jira/browse/SOLR-9530
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
> Attachments: SOLR-9530.patch
>
>
> I'd like to explore the idea of adding a new update processor to help ingest 
> partial updates.
> Example use-case - There are two datasets with a common id field. How can I 
> merge both of them at index time?
> Proposed Solution: 
> {code}
> 
>   
> add
>   
>   
>   
> 
> {code}
> So the first JSON dump could be ingested against 
> {{http://localhost:8983/solr/gettingstarted/update/json}}
> And then the second JSON could be ingested against
> {{http://localhost:8983/solr/gettingstarted/update/json?processor=atomic}}
> The Atomic Update Processor could support all the atomic update operations 
> currently supported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9530) Add an Atomic Update Processor

2017-02-14 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865923#comment-15865923
 ] 

Ishan Chattopadhyaya edited comment on SOLR-9530 at 2/14/17 3:16 PM:
-

As far as I understand, this update processor is only for updates (if document 
pre-exists). I see this as useful in cases where we're ingesting CSV files with 
disjoint information about a document in different files. As an example:

||id||country||
|1| Japan |
|2| Russia |

||id||capital||
|1| Tokyo |
|2| Moscow |

Needs to be both ingested, and hence if both these are ingested through this 
Update Processor, we would end up with 2 documents with 3 fields each (id, 
country, capital).

Did I understand the motivation correctly?


was (Author: ichattopadhyaya):
As far as I understand, this update processor is only for updates (if document 
pre-exists). I see this as useful in cases where we're ingesting CSV files with 
disjoint information about a document in different files. As an example:

||id||country||
|1| Japan |
|2| Russia |

||id||capital||
|1| Tokyo |
|2| Moscow |

Needs to be both ingested, and hence the if both these are ingested through 
this Update Processor, we would end up with a document with 3 fields (id, 
country, capital).

Did I understand the motivation correctly?

> Add an Atomic Update Processor 
> ---
>
> Key: SOLR-9530
> URL: https://issues.apache.org/jira/browse/SOLR-9530
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
> Attachments: SOLR-9530.patch
>
>
> I'd like to explore the idea of adding a new update processor to help ingest 
> partial updates.
> Example use-case - There are two datasets with a common id field. How can I 
> merge both of them at index time?
> Proposed Solution: 
> {code}
> 
>   
> add
>   
>   
>   
> 
> {code}
> So the first JSON dump could be ingested against 
> {{http://localhost:8983/solr/gettingstarted/update/json}}
> And then the second JSON could be ingested against
> {{http://localhost:8983/solr/gettingstarted/update/json?processor=atomic}}
> The Atomic Update Processor could support all the atomic update operations 
> currently supported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9530) Add an Atomic Update Processor

2017-02-13 Thread AMRIT SARKAR (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865096#comment-15865096
 ] 

AMRIT SARKAR edited comment on SOLR-9530 at 2/14/17 5:36 AM:
-

Hi Varun, Alexandre, Noble,

SOLR-9530.patch uploaded for a new update processor - AtomicUpdateProcessor 
which which will accept conventional key-value update document and convert them 
into atomic update document for the fields specified in the processor 
definition. Fields which are not specified in the processor parameters will be 
updated in conventional manner.

Files specified in the patch:
1. AtomicUpdateProcessorFactory.java
2. AtomicUpdateProcessorFactoryTest.java (test class for 
AtomicUpdateProcessorFactory)
3. solrconfig-atomic-update-processor.xml (sample solrconfig for 
AtomicUpdateProcessorFactoryTest test cases)

As Alexandre mentioned, it will work as a standalone processor doing the 
conversion and updated document will passed onto next processor defined.
Noble, this patch right now doesn't support accepting request params as it is 
difficult to assign atomic operation to the respective field.

I will request you to review the patch and your feedback will be deeply 
appreciated.

Thanks
Amrit Sarkar


was (Author: sarkaramr...@gmail.com):
Hi Varun, Alexandre, Noble,

SOLR-9530.patch uploaded for a new update processor - AtomicUpdateProcessor 
which which will accept conventional key-value update document and convert them 
into atomic update document for the fields specified in the processor 
definition. Fields which are not specified in the processor parameters will be 
updated in conventional manner.

Files specified in the patch:
1. AtomicUpdateProcessorFactory
2. AtomicUpdateProcessorFactoryTest (test class for 
AtomicUpdateProcessorFactory)
3. solrconfig-atomic-update-processor.xml (sample solrconfig for 
AtomicUpdateProcessorFactoryTest test cases)

As Alexandre mentioned, it will work as a standalone processor doing the 
conversion and updated document will passed onto next processor defined.
Noble, this patch right now doesn't support accepting request params as it is 
difficult to assign atomic operation to the respective field.

I will request you to review the patch and your feedback will be deeply 
appreciated.

Thanks
Amrit Sarkar

> Add an Atomic Update Processor 
> ---
>
> Key: SOLR-9530
> URL: https://issues.apache.org/jira/browse/SOLR-9530
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
> Attachments: SOLR-9530.patch
>
>
> I'd like to explore the idea of adding a new update processor to help ingest 
> partial updates.
> Example use-case - There are two datasets with a common id field. How can I 
> merge both of them at index time?
> Proposed Solution: 
> {code}
> 
>   
> add
>   
>   
>   
> 
> {code}
> So the first JSON dump could be ingested against 
> {{http://localhost:8983/solr/gettingstarted/update/json}}
> And then the second JSON could be ingested against
> {{http://localhost:8983/solr/gettingstarted/update/json?processor=atomic}}
> The Atomic Update Processor could support all the atomic update operations 
> currently supported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org