[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

2016-09-30 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15536574#comment-15536574
 ] 

Alexandre Rafalovitch commented on SOLR-9493:
-

Can this case be closed now? It is not a bug and there is no next action on it.

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> --
>
> Key: SOLR-9493
> URL: https://issues.apache.org/jira/browse/SOLR-9493
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yury Kartsev
> Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> 
> Copying whole description and investigation here as well:
> 
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code} name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> 
> ...
> id{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}
> 
> id
> 
> 
> {code}And made it the default for the 
> UpdateRequestHandler:{code}
>  
>   uuid
>  
> {code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/collection_name/update?wt=json{code}
> In SOLR cloud mode, when adding document from one replica's web interface, 
> update request is (found through inspecting the call made by web interface): 
> {code}POST 
> replica_host:port/solr/collection_name_shard1_replica_1/update?wt=json{code}
> In both these cases payload is something like:{code}{
> "add": {
> "doc": {
>  .
> },
> "boost": 1.0,
> "overwrite": true,
> "commitWithin": 1000
> }
> }{code}
> In case when CloudSolrClient is used, the following happens (found through 
> debugging):
> Using ZK and some logic, URL list of replicas is constructed that looks like 
> this:{code}[http://replica_1_host:port/solr/collection_name/,
>  http://replica_2_host:port/solr/collection_name/,
>  http://replica_3_host:port/solr/collection_name/]{code}
> This code is called:{code}LBHttpSolrClient.Req req = new 
> LBHttpSolrClient.Req(request, theUrlList);
> LBHttpSolrClient.Rsp rsp = lbClient.request(req);
> return rsp.getResponse(

[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

2016-09-21 Thread Yury Kartsev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510708#comment-15510708
 ] 

Yury Kartsev commented on SOLR-9493:


Thank you. Then definitely it's better to generate it on the client side before 
indexing. Interesting, coz now I don't see any point at all in automatically 
generating a uniqueKey field in SOLR... unless it's a single instance/1 shard 
and nobody cares about ID on the client side.

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> --
>
> Key: SOLR-9493
> URL: https://issues.apache.org/jira/browse/SOLR-9493
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yury Kartsev
> Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> 
> Copying whole description and investigation here as well:
> 
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code} name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> 
> ...
> id{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}
> 
> id
> 
> 
> {code}And made it the default for the 
> UpdateRequestHandler:{code}
>  
>   uuid
>  
> {code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/collection_name/update?wt=json{code}
> In SOLR cloud mode, when adding document from one replica's web interface, 
> update request is (found through inspecting the call made by web interface): 
> {code}POST 
> replica_host:port/solr/collection_name_shard1_replica_1/update?wt=json{code}
> In both these cases payload is something like:{code}{
> "add": {
> "doc": {
>  .
> },
> "boost": 1.0,
> "overwrite": true,
> "commitWithin": 1000
> }
> }{code}
> In case when CloudSolrClient is used, the following happens (found through 
> debugging):
> Using ZK and some logic, URL list of replicas is constructed that looks like 
> this:{code}[http://replica_1_host:port/solr/collection_name/,
>  http://replica_2_host:port/solr/collection_name/,
>  http://replica_3_host:port/solr/collection_name/]{co

[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

2016-09-21 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510499#comment-15510499
 ] 

Shawn Heisey commented on SOLR-9493:


bq. return UUID.randomUUID().toString().toLowerCase(Locale.ROOT);

Yes, I am sure.  This one line is the entirety of the code in 
UUIDUpdateProcessorFactory that sets the default value for the field:

{code:java}
  return UUID.randomUUID().toString().toLowerCase(Locale.ROOT);
{code}

This is the function being used there:

https://docs.oracle.com/javase/8/docs/api/java/util/UUID.html#randomUUID--

Although generating an already-used ID would be rare just because of the sheer 
size of the number (128 bits), there's nothing in the code to prevent it.

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> --
>
> Key: SOLR-9493
> URL: https://issues.apache.org/jira/browse/SOLR-9493
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yury Kartsev
> Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> 
> Copying whole description and investigation here as well:
> 
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code} name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> 
> ...
> id{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}
> 
> id
> 
> 
> {code}And made it the default for the 
> UpdateRequestHandler:{code}
>  
>   uuid
>  
> {code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/collection_name/update?wt=json{code}
> In SOLR cloud mode, when adding document from one replica's web interface, 
> update request is (found through inspecting the call made by web interface): 
> {code}POST 
> replica_host:port/solr/collection_name_shard1_replica_1/update?wt=json{code}
> In both these cases payload is something like:{code}{
> "add": {
> "doc": {
>  .
> },
> "boost": 1.0,
> "overwrite": true,
> "commitWithin": 1000
> }
> }{code}
> In case when CloudSolrClient is used, the following happens (found thro

[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

2016-09-21 Thread Yury Kartsev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510395#comment-15510395
 ] 

Yury Kartsev commented on SOLR-9493:


{quote}Solr can generate a UUID value, but it's essentially just a random 
number, and each value has no connection to the other data in the indexed 
document at all. {quote}
That's what I thought is false. That was the whole reason of why I wanted SOLR 
to generate it - to avoid that rare case when UUID matches the existing one. I 
thought SOLR uses some kind of algorithm that somehow eliminates such a case. I 
did not want to generate it on client side solely because of that reason - 
being afraid that one day it will generate an existing one. But if you're 
saying that that's what SOLR may do, then it make no difference form this point 
of view... Are you sure about "has no connection to the other data in the 
indexed document at all"? I.e. doesn't it have "counter" or "sequence-like" 
part in UUID generation algorithm?

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> --
>
> Key: SOLR-9493
> URL: https://issues.apache.org/jira/browse/SOLR-9493
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yury Kartsev
> Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> 
> Copying whole description and investigation here as well:
> 
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code} name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> 
> ...
> id{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}
> 
> id
> 
> 
> {code}And made it the default for the 
> UpdateRequestHandler:{code}
>  
>   uuid
>  
> {code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/collection_name/update?wt=json{code}
> In SOLR cloud mode, when adding document from one replica's web interface, 
> update request is (found through inspecting the call made by web interface): 
> {code}POST 
> replica_host:port/solr/collection_name_shard1_replica_1/update?wt=json{code}
> In both these cases p

[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

2016-09-21 Thread Yury Kartsev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510369#comment-15510369
 ] 

Yury Kartsev commented on SOLR-9493:


{quote}SolrJ is designed to send a doc to the right leader by hashing the 
id.{quote}
Very interesting... Maybe I really missed a point. Just read more about it - I 
really missed this chapter while was reading SOLR In Action book. Thanks for 
pointing it out. Although in this case the whole ability of SOLR to generate 
the uniqueKey now sounds surprising just because of what you said...

Luckily currently I have only one shard (with 3 replicas) - that's what was 
figured out to be the best for our case. But it's a very good point to consider 
in the future when we have more than one shard. Thank you.

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> --
>
> Key: SOLR-9493
> URL: https://issues.apache.org/jira/browse/SOLR-9493
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yury Kartsev
> Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> 
> Copying whole description and investigation here as well:
> 
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code} name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> 
> ...
> id{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}
> 
> id
> 
> 
> {code}And made it the default for the 
> UpdateRequestHandler:{code}
>  
>   uuid
>  
> {code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/collection_name/update?wt=json{code}
> In SOLR cloud mode, when adding document from one replica's web interface, 
> update request is (found through inspecting the call made by web interface): 
> {code}POST 
> replica_host:port/solr/collection_name_shard1_replica_1/update?wt=json{code}
> In both these cases payload is something like:{code}{
> "add": {
> "doc": {
>  .
> },
> "boost": 1.0,
> "overwrite": true,
> "commitWithin": 1000
> }
> }{code}
> In case when Clou

[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

2016-09-21 Thread Yury Kartsev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510370#comment-15510370
 ] 

Yury Kartsev commented on SOLR-9493:


{quote}SolrJ is designed to send a doc to the right leader by hashing the 
id.{quote}
Very interesting... Maybe I really missed a point. Just read more about it - I 
really missed this chapter while was reading SOLR In Action book. Thanks for 
pointing it out. Although in this case the whole ability of SOLR to generate 
the uniqueKey now sounds surprising just because of what you said...

Luckily currently I have only one shard (with 3 replicas) - that's what was 
figured out to be the best for our case. But it's a very good point to consider 
in the future when we have more than one shard. Thank you.

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> --
>
> Key: SOLR-9493
> URL: https://issues.apache.org/jira/browse/SOLR-9493
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yury Kartsev
> Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> 
> Copying whole description and investigation here as well:
> 
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code} name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> 
> ...
> id{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}
> 
> id
> 
> 
> {code}And made it the default for the 
> UpdateRequestHandler:{code}
>  
>   uuid
>  
> {code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/collection_name/update?wt=json{code}
> In SOLR cloud mode, when adding document from one replica's web interface, 
> update request is (found through inspecting the call made by web interface): 
> {code}POST 
> replica_host:port/solr/collection_name_shard1_replica_1/update?wt=json{code}
> In both these cases payload is something like:{code}{
> "add": {
> "doc": {
>  .
> },
> "boost": 1.0,
> "overwrite": true,
> "commitWithin": 1000
> }
> }{code}
> In case when Clou

[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

2016-09-18 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15501081#comment-15501081
 ] 

Shawn Heisey commented on SOLR-9493:


The UUID processor is mostly useful for Cloud deployments.  For standalone 
indexes, other means of generating random identifiers will suffice, but the 
Update Processor is required for distributed SolrCloud indexes, to guarantee 
that the UUID is generated BEFORE the decision engine forwards the document to 
the final node(s) for indexing.

I think we could likely add a configurable check to the UUID processor, such 
that if the field being generated is the uniqueKey field, the generated could 
be guaranteed within the local Lucene index.  The possible wrinkle is that I do 
not know whether looking up a uniqueKey in an entire SolrCloud collection is 
something an update processor can readily do -- and this would be required.

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> --
>
> Key: SOLR-9493
> URL: https://issues.apache.org/jira/browse/SOLR-9493
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yury Kartsev
> Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> 
> Copying whole description and investigation here as well:
> 
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code} name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> 
> ...
> id{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}
> 
> id
> 
> 
> {code}And made it the default for the 
> UpdateRequestHandler:{code}
>  
>   uuid
>  
> {code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/collection_name/update?wt=json{code}
> In SOLR cloud mode, when adding document from one replica's web interface, 
> update request is (found through inspecting the call made by web interface): 
> {code}POST 
> replica_host:port/solr/collection_name_shard1_replica_1/update?wt=json{code}
> In both these cases payload is something like:{code}{
> "add": {
> "doc": {
>  .
>  

[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

2016-09-18 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15501057#comment-15501057
 ] 

Shawn Heisey commented on SOLR-9493:


I'd go even further and say that a uniqueKey field that exists both inside and 
outside Solr is particularly important in the *design* of a Solr index.  A 
unique identifier that a user can obtain from somewhere else and use to look up 
a specific document is extremely useful.

Solr can generate a UUID value, but it's essentially just a random number, and 
each value has no connection to the other data in the indexed document at all.  
When the generated field is used as uniqueKey, the UUIDUpdateProcessorFactory 
code doesn't have a check to make sure that each random value is unique within 
the index -- so two documents could end up with the same UUID value, which 
would cause the second document to overwrite the first.  Because the UUID 
represents a VERY large number, the chance of that happening is EXTREMELY 
small, but it IS possible.

Solr uses the uniqueKey field to locate a previous version of a document so 
that the old one is deleted when you index the same document again.  If that 
identifier is generated by Solr, then you lose the ability to index documents 
and have them automatically replace previous versions.

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> --
>
> Key: SOLR-9493
> URL: https://issues.apache.org/jira/browse/SOLR-9493
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yury Kartsev
> Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> 
> Copying whole description and investigation here as well:
> 
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code} name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> 
> ...
> id{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}
> 
> id
> 
> 
> {code}And made it the default for the 
> UpdateRequestHandler:{code}
>  
>   uuid
>  
> {code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/coll

[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

2016-09-17 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15498475#comment-15498475
 ] 

Noble Paul commented on SOLR-9493:
--

bq.but just because as you said "ID is very important in SolrCloud", I wanted 
SOLR itself to generate that ID

You missed the point.
SolrJ is designed to send a doc to the right leader by hashing the id. If Solr 
is assigns the id, SolrJ has no clue where to send this to. So, the doc is sent 
to a random node. The node assigns the id and it realizes that the doc belongs 
to another shard and it is forwarded there. Essentially, you lost the 
efficiency of using SolrJ by asking Solr to assign an ID.

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> --
>
> Key: SOLR-9493
> URL: https://issues.apache.org/jira/browse/SOLR-9493
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yury Kartsev
> Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> 
> Copying whole description and investigation here as well:
> 
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code} name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> 
> ...
> id{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}
> 
> id
> 
> 
> {code}And made it the default for the 
> UpdateRequestHandler:{code}
>  
>   uuid
>  
> {code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/collection_name/update?wt=json{code}
> In SOLR cloud mode, when adding document from one replica's web interface, 
> update request is (found through inspecting the call made by web interface): 
> {code}POST 
> replica_host:port/solr/collection_name_shard1_replica_1/update?wt=json{code}
> In both these cases payload is something like:{code}{
> "add": {
> "doc": {
>  .
> },
> "boost": 1.0,
> "overwrite": true,
> "commitWithin": 1000
> }
> }{code}
> In case when CloudSolrClient is used, the following happens (found through 
> debugging):
> Using ZK and some logic, URL list 

[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

2016-09-14 Thread Yury Kartsev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15490817#comment-15490817
 ] 

Yury Kartsev commented on SOLR-9493:


I was thinking about it, but just because as you said "ID is very important in 
SolrCloud", I wanted SOLR itself to generate that ID (as opposed to the client, 
especially considering that there may be more than one client nodes).

I agree about isolation. Next time will come up with a small sample test to 
make it easier for you guys.

Currently I think the issue is solved (by using 
[IgnoreFieldUpdateProcessorFactory|http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/IgnoreFieldUpdateProcessorFactory.html]),
 but I would like to close that small question I've asked above: what if some 
day I'll need to index documents with custom IDs + having an ability to 
generate it by SOLR as well? Can IgnoreFieldUpdateProcessorFactory be used with 
some condition like "remove only if value = null"? Or the only way is to simply 
generate it in the client? It's a minor question that does not really decide 
anything at this moment, I just want to kinda improve my SOLR knowledge in this 
topic for the future.

Thank you.

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> --
>
> Key: SOLR-9493
> URL: https://issues.apache.org/jira/browse/SOLR-9493
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yury Kartsev
> Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> 
> Copying whole description and investigation here as well:
> 
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code} name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> 
> ...
> id{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}
> 
> id
> 
> 
> {code}And made it the default for the 
> UpdateRequestHandler:{code}
>  
>   uuid
>  
> {code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/collection_name/update?wt=json{code}
> In SOLR cloud mode, when adding document from one replic

[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

2016-09-14 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15490252#comment-15490252
 ] 

Noble Paul commented on SOLR-9493:
--

Isn't it much better to add a single line of code to your client
{code:java}
UUID.randomUUID().toString().toLowerCase(Locale.ROOT);
{code}

In solrcloud id field is very important. The SolrJ client sends the right 
server by hashing the id.


When you have an error like this , if you can just isolate it into a small 
sample code that fails, it'll be much more easy for us to write a testcase and 
reproduce it.  

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> --
>
> Key: SOLR-9493
> URL: https://issues.apache.org/jira/browse/SOLR-9493
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yury Kartsev
> Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> 
> Copying whole description and investigation here as well:
> 
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code} name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> 
> ...
> id{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}
> 
> id
> 
> 
> {code}And made it the default for the 
> UpdateRequestHandler:{code}
>  
>   uuid
>  
> {code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/collection_name/update?wt=json{code}
> In SOLR cloud mode, when adding document from one replica's web interface, 
> update request is (found through inspecting the call made by web interface): 
> {code}POST 
> replica_host:port/solr/collection_name_shard1_replica_1/update?wt=json{code}
> In both these cases payload is something like:{code}{
> "add": {
> "doc": {
>  .
> },
> "boost": 1.0,
> "overwrite": true,
> "commitWithin": 1000
> }
> }{code}
> In case when CloudSolrClient is used, the following happens (found through 
> debugging):
> Using ZK and some logic, URL list of replicas is constructed that looks like 
> this:{code}[http://replica_1_host:port/solr

[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

2016-09-13 Thread Yury Kartsev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15487901#comment-15487901
 ] 

Yury Kartsev commented on SOLR-9493:


Although I have tried to use 
[IgnoreFieldUpdateProcessorFactory|http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/IgnoreFieldUpdateProcessorFactory.html]
 here for "id" field instead and *can confirm it works* for both cases (beans 
and documents).

For now it would work fine because I don't plan to add documents with external 
ID. Although what if some day I'll need it? Can 
IgnoreFieldUpdateProcessorFactory be used with some condition like "remove only 
if value = null"? Or maybe there is another alternative?

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> --
>
> Key: SOLR-9493
> URL: https://issues.apache.org/jira/browse/SOLR-9493
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yury Kartsev
> Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> 
> Copying whole description and investigation here as well:
> 
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code} name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> 
> ...
> id{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}
> 
> id
> 
> 
> {code}And made it the default for the 
> UpdateRequestHandler:{code}
>  
>   uuid
>  
> {code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/collection_name/update?wt=json{code}
> In SOLR cloud mode, when adding document from one replica's web interface, 
> update request is (found through inspecting the call made by web interface): 
> {code}POST 
> replica_host:port/solr/collection_name_shard1_replica_1/update?wt=json{code}
> In both these cases payload is something like:{code}{
> "add": {
> "doc": {
>  .
> },
> "boost": 1.0,
> "overwrite": true,
> "commitWithin": 1000
> }
> }{code}
> In case when CloudSolrClient is used, the following happens (found through 
> d

[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

2016-09-13 Thread Yury Kartsev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15487854#comment-15487854
 ] 

Yury Kartsev commented on SOLR-9493:


Tried it today like this:{code}  
  
  
  id
  
  
  {code}

Did not change a bit. Same thing in both cases (addBean as well as 
add(SolrInputDocument) with a null value). The description fir this 
UpdateProcessorFactory states that it "Removes any values found which are 
CharSequence with a length of 0. (ie: empty strings)". I am wondering if it 
really removes a field itself if it's NULL. Looks like 
UUIDUpdateProcessorFactory needs this field to be absent in order to generate 
UUID...

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> --
>
> Key: SOLR-9493
> URL: https://issues.apache.org/jira/browse/SOLR-9493
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yury Kartsev
> Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> 
> Copying whole description and investigation here as well:
> 
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code} name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> 
> ...
> id{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}
> 
> id
> 
> 
> {code}And made it the default for the 
> UpdateRequestHandler:{code}
>  
>   uuid
>  
> {code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/collection_name/update?wt=json{code}
> In SOLR cloud mode, when adding document from one replica's web interface, 
> update request is (found through inspecting the call made by web interface): 
> {code}POST 
> replica_host:port/solr/collection_name_shard1_replica_1/update?wt=json{code}
> In both these cases payload is something like:{code}{
> "add": {
> "doc": {
>  .
> },
> "boost": 1.0,
> "overwrite": true,
> "commitWithin": 1000
> }
> }{code}
> In case when CloudSolrClient is used, the following happens (found through 
> debugging):
> Using ZK and some

[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

2016-09-12 Thread Yury Kartsev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485887#comment-15485887
 ] 

Yury Kartsev commented on SOLR-9493:


Oh, I see, it's a feature, not a bug... I.e. if I send null value it's 
considered to be a NULL value (as opposed to 'nothing') and will be stored like 
this... Very good to know. Something I've never encountered despite being 
familiar with SOLR to a degree of writing custom similarities :) Thank you, 
I'll try it tomorrow and let you know here.

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> --
>
> Key: SOLR-9493
> URL: https://issues.apache.org/jira/browse/SOLR-9493
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yury Kartsev
> Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> 
> Copying whole description and investigation here as well:
> 
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code} name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> 
> ...
> id{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}
> 
> id
> 
> 
> {code}And made it the default for the 
> UpdateRequestHandler:{code}
>  
>   uuid
>  
> {code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/collection_name/update?wt=json{code}
> In SOLR cloud mode, when adding document from one replica's web interface, 
> update request is (found through inspecting the call made by web interface): 
> {code}POST 
> replica_host:port/solr/collection_name_shard1_replica_1/update?wt=json{code}
> In both these cases payload is something like:{code}{
> "add": {
> "doc": {
>  .
> },
> "boost": 1.0,
> "overwrite": true,
> "commitWithin": 1000
> }
> }{code}
> In case when CloudSolrClient is used, the following happens (found through 
> debugging):
> Using ZK and some logic, URL list of replicas is constructed that looks like 
> this:{code}[http://replica_1_host:port/solr/collection_name/,
>  http://replica_2_host:port/solr/collecti

[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

2016-09-12 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485792#comment-15485792
 ] 

Alexandre Rafalovitch commented on SOLR-9493:
-

Try adding 
[RemoveBlankFieldUpdateProcessorFactory|http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html]
 into the chain before your UUIDUpdateProcessorFactory. 

This should remove the empty field and then key generator can do its job.

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> --
>
> Key: SOLR-9493
> URL: https://issues.apache.org/jira/browse/SOLR-9493
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yury Kartsev
> Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> 
> Copying whole description and investigation here as well:
> 
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code} name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> 
> ...
> id{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}
> 
> id
> 
> 
> {code}And made it the default for the 
> UpdateRequestHandler:{code}
>  
>   uuid
>  
> {code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/collection_name/update?wt=json{code}
> In SOLR cloud mode, when adding document from one replica's web interface, 
> update request is (found through inspecting the call made by web interface): 
> {code}POST 
> replica_host:port/solr/collection_name_shard1_replica_1/update?wt=json{code}
> In both these cases payload is something like:{code}{
> "add": {
> "doc": {
>  .
> },
> "boost": 1.0,
> "overwrite": true,
> "commitWithin": 1000
> }
> }{code}
> In case when CloudSolrClient is used, the following happens (found through 
> debugging):
> Using ZK and some logic, URL list of replicas is constructed that looks like 
> this:{code}[http://replica_1_host:port/solr/collection_name/,
>  http://replica_2_host:port/solr/collection_name/,
>  http://rep