[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-10-09 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642871#comment-16642871
 ] 

Noble Paul commented on SOLR-12798:
---

I've created a separate issue to track this SOLR-12843

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, SOLR-12798.patch, SOLR-12798.patch, 
> SOLR-12798.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-10-06 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640660#comment-16640660
 ] 

Noble Paul commented on SOLR-12798:
---

with a test case. Using the same contentstream to post multiple file types

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, SOLR-12798.patch, SOLR-12798.patch, 
> SOLR-12798.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-10-05 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640511#comment-16640511
 ] 

Noble Paul commented on SOLR-12798:
---

We never supported binary payloads from solrJ. For this format, we only support 
binary

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, SOLR-12798.patch, SOLR-12798.patch, no params in 
> url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-10-05 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640492#comment-16640492
 ] 

Lucene/Solr QA commented on SOLR-12798:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
39s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m 25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m 20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m 20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 41s{color} 
| {color:red} core in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  4m 57s{color} 
| {color:red} solrj in the patch failed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | solr.schema.TestManagedSchemaAPI |
|   | solr.handler.TestRestoreCore |
|   | solr.cloud.LeaderFailureAfterFreshStartTest |
|   | solr.TestHighlightDedupGrouping |
|   | solr.cloud.TestCloudRecovery |
|   | solr.cloud.autoscaling.IndexSizeTriggerTest |
|   | solr.cloud.TestPullReplicaErrorHandling |
|   | solr.cloud.TriLevelCompositeIdRoutingTest |
|   | solr.cloud.api.collections.CustomCollectionTest |
|   | solr.handler.component.DistributedFacetPivotSmallTest |
|   | solr.schema.TestCloudSchemaless |
|   | solr.cloud.autoscaling.MetricTriggerIntegrationTest |
|   | solr.core.TestDynamicURP |
|   | solr.search.MergeStrategyTest |
|   | solr.handler.component.DistributedFacetPivotWhiteBoxTest |
|   | solr.cloud.ZkFailoverTest |
|   | solr.util.TestSolrCLIRunExample |
|   | solr.cloud.TestCloudPseudoReturnFields |
|   | solr.cloud.cdcr.CdcrWithNodesRestartsTest |
|   | solr.handler.component.DistributedDebugComponentTest |
|   | solr.cloud.LIRRollingUpdatesTest |
|   | solr.response.transform.TestSubQueryTransformerDistrib |
|   | solr.schema.TestBinaryField |
|   | solr.handler.component.DistributedFacetPivotLargeTest |
|   | solr.cloud.TestDownShardTolerantSearch |
|   | solr.cloud.MoveReplicaHDFSFailoverTest |
|   | solr.handler.admin.CoreAdminHandlerTest |
|   | solr.cloud.api.collections.TestCollectionAPI |
|   | solr.TestDistributedMissingSort |
|   | solr.cloud.BasicDistributedZk2Test |
|   | solr.cloud.TestShortCircuitedRequests |
|   | solr.cloud.SolrCloudExampleTest |
|   | solr.core.OpenCloseCoreStressTest |
|   | solr.core.TestDynamicLoading |
|   | solr.cloud.LeaderTragicEventTest |
|   | solr.cloud.MigrateRouteKeyTest |
|   | solr.handler.component.DistributedExpandComponentTest |
|   | solr.TestTolerantSearch |
|   | solr.handler.component.TestDistributedStatsComponentCardinality |
|   | solr.security.BasicAuthIntegrationTest |
|   | solr.handler.TestReplicationHandlerBackup |
|   | solr.cloud.TestCloudSearcherWarming |
|   | solr.cloud.TestCloudPivotFacet |
|   | solr.cloud.RecoveryZkTest |
|   | solr.security.hadoop.TestSolrCloudWithHadoopAuthPlugin |
|   | solr.handler.component.DistributedFacetPivotLongTailTest |
|   | solr.cloud.MissingSegmentRecoveryTest |
|   | solr.search.join.TestCloudNestedDocsSort |
|   | solr.DistributedIntervalFacetingTest |
|   | solr.cloud.TestCryptoKeys |
|   | solr.core.BlobRepositoryCloudTest |
|   | solr.cloud.DistribDocExpirationUpdateProcessorTest |
|   | solr.cloud.TestTolerantUpdateProcessorCloud |
|   | solr.search.stats.TestDefaultStatsCache |
|   | solr.cloud.cdcr.CdcrVersionReplicationTest |
|   | solr.update.SolrIndexSplitterTest |
|   | solr.cloud.TestPullReplica |
|   | solr.request.TestRemoteStreaming |
|   | solr.cloud.TestLeaderElectionWithEmptyReplica |
|   | solr.cloud.autoscaling.SystemLogListenerTest |
|   | solr.metrics.reporters.solr.SolrCloudReportersTest |
|   | 

[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-10-02 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635135#comment-16635135
 ] 

Mikhail Khludnev commented on SOLR-12798:
-

[~noble.paul], how do you propose to pass binary payloads in json? I've found 
one SO thread discussing it, which looks not so promising. 

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, SOLR-12798.patch, no params in url.png, 
> solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-10-01 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634537#comment-16634537
 ] 

Mikhail Khludnev commented on SOLR-12798:
-

Is there Solr Cell ticket to move from those fancy {{literal.foo}} param to 
solr docs/fields format? 

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, SOLR-12798.patch, no params in url.png, 
> solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-30 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633256#comment-16633256
 ] 

Karl Wright commented on SOLR-12798:


[~elyograg]

{quote}
I would suggest that you don't do this. At all. Tika is prone to OOM and JVM 
crashes, as Julien Massiera already noted.
{quote}

It's not a very good citizen running inside ManifoldCF either.  We have ability 
to use the external service version but really that just offshores the problem. 
 But I agree it's better to keep user-facing services alive if one can.

For backwards compatibility reasons, we will need to continue to support this 
mode of operation, but we'll recommend against it, and change our defaults 
accordingly as well.

FWIW, we've been steadily pushing tickets into the Tika queue and issues are 
getting addressed.  That's really the best long-term solution.

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, SOLR-12798.patch, no params in url.png, 
> solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-29 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633252#comment-16633252
 ] 

Karl Wright commented on SOLR-12798:


[~mkhludnev] Ugly hack has been voted on and shipped.  Hopefully by next round 
(December) there's a better way though.


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, SOLR-12798.patch, no params in url.png, 
> solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-29 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633241#comment-16633241
 ] 

Mikhail Khludnev commented on SOLR-12798:
-

After thinking some time, I'd agree with [~noble.paul]. Params are supposed to 
be meta info, which is supposed to be short and more static. Whereas payload is 
big and changes everytime. Current manifold's approach (even we fix multiparts) 
doesn.t let to pass many docs with params attached to each other. Imho it 
proves design flaw. 
 [~daddywri] can you go ahead with ugly workaround and lately migrate to 
passing meta via fields?

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, SOLR-12798.patch, no params in url.png, 
> solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-29 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633140#comment-16633140
 ] 

Lucene/Solr QA commented on SOLR-12798:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
9s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  2m 15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  2m 15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  2m 15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  6m 
55s{color} | {color:green} solrj in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-12798 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941838/SOLR-12798.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP 
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 964cc88 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | 1.8.0_172 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/194/testReport/ |
| modules | C: solr/solrj U: solr/solrj |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/194/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, SOLR-12798.patch, no params in url.png, 
> solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should 

[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632114#comment-16632114
 ] 

Jan Høydahl commented on SOLR-12798:


bq. should be what SolrJ *always* does when it's asked to do POST, so URL 
limits aren't exceeded no matter what gets thrown at it.

Sounds like a good thing to secure robustness. Also the /admin/metrics bug that 
recently surfaced would benefit if we prefer POST over GET in general more 
places

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-28 Thread Shawn Heisey (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631860#comment-16631860
 ] 

Shawn Heisey commented on SOLR-12798:
-

bq.  How do you suggest we handle binary data that is meant for SolrCell?

I would suggest that you don't do this.  At all.  Tika is prone to OOM and JVM 
crashes, as [~julienFL] already noted.  When this happens in SolrCell, Solr 
goes down too.  So it's strongly recommended for all users to never use 
SolrCell in production, which in my opinion means that MCF should not be using 
SolrCell.  Tika should be separate, so if it explodes, the Solr server keeps 
running.

That said... I think support for multi-part POST should be first class in 
SolrJ, and I would even say that sending separate parts for parameters and the 
actual body should be what SolrJ *always* does when it's asked to do POST, so 
URL limits aren't exceeded no matter what gets thrown at it.  And we need to 
make sure that multi-part handling on the server side is rock-solid.  (I'm not 
suggesting there's any problems there ... but if any are found, they need 
attention)

It's probably a good idea to support multiple *data* streams as well in SolrJ.  
This would probably require some changes on the server side, and a separate 
Jira issue.

If MCF creates SolrInputDocument objects, it can put everything there.  MCF 
wouldn't need to be concerned about format (the JSON mentioned earlier), only 
one POST part is required, URL parameters are not needed, and the standard 
/update handler can be used, even without a change for this issue.

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-28 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631644#comment-16631644
 ] 

Mikhail Khludnev commented on SOLR-12798:
-

[~noble.paul]
I'm not sure why we need to revamp handlers which expect content stream to 
manage them to read doc fields. The other concern is that now the request with 
single content stream works like a mine field, when one adds too long params it 
blows surprisingly. Always stripping params from POST urls make it way more 
predictable for users.   

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-28 Thread Julien Massiera (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631608#comment-16631608
 ] 

Julien Massiera commented on SOLR-12798:


Assuming I have a PDF file which contains an image that can be "OCRized". I 
have a process that sends the PDF to a Tika server that will extract the 
metadata of the PDF file + the text extracted from the image thanks to 
Tesseract. At the end of the Tika job, the process retrieve two elements : a 
list of metadata as an arraylist and a file containing the text extracted from 
the image inside the PDF file. Now, to the metadata list I add the ACLs of the 
PDF file (which are hudge) and I need the metadata and the file to be sent as 
one document to Solr for indexation.
What are you recommendations in term of code to do this in the most efficient 
way (in term of memory consumption and performances of course), using SolrJ ?  
And which handler would you use on Solr side ?
I will test it and see if I experience the URL limit issue

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631581#comment-16631581
 ] 

Jan Høydahl commented on SOLR-12798:


{quote}here you can see how ManifoldCF accomplish content stream blob with long 
params
{quote}
This code is for posting to ExtractingHandler, and contains a limited amount of 
literal metadata, unless the ACLs are huge, which I suppose they may very well 
be. And that would warrant the multi-part requirement in itself.

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-28 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631565#comment-16631565
 ] 

Noble Paul commented on SOLR-12798:
---

[~mkhludnev] 
 we will post the data as follows
{code:java}
{
"docs" :[
{"params:{ "a":"b","c":"d"},
"payload" : ""
},
{"params:{ "p":"q","r:"s"},
"payload" : ""
}
]
}
{code}

On the serverside, we unmarshal the params first and then read the pay load 
stream

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-28 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631548#comment-16631548
 ] 

Mikhail Khludnev commented on SOLR-12798:
-

[~janhoy], here you can see how ManifoldCF accomplish content stream blob with 
long params. 
https://github.com/apache/manifoldcf/blob/11f8021c22c7fc141d237970b713b197992b5921/connectors/solr/connector/src/main/java/org/apache/manifoldcf/agents/output/solr/HttpPoster.java#L1224
You can see particular params attached. 
I just replying your question literally, regardless of my (lack of) 
understanding nor opinion regarding this design. 

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631520#comment-16631520
 ] 

Jan Høydahl commented on SOLR-12798:


{quote}How we can avoid multipart when we have one big chunk as a content 
stream and one chunk with huge params?
{quote}
I have still not seen the usecase for this. Why would there be huge params when 
you post a binary content stream to SolrCell? The params would come from the 
metadata inside the binary docs, which are unpacked on the Solr server side? 
You could of course have large metadata about a PDF sitting in a database on 
the client and want to post that with the binary doc but as I understand the 
usecase for MCF, the huge metadata is parsed from the binary doc by Tika on the 
server side?

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-28 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631514#comment-16631514
 ] 

Mikhail Khludnev commented on SOLR-12798:
-

[~noble.paul] not sure I follow. How we can avoid multipart when we have one 
big chunk as a content stream and one chunk with huge params?

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-28 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631502#comment-16631502
 ] 

Noble Paul commented on SOLR-12798:
---

The ideal fix is to avoid multipart altogether because we support both ends of 
the communication. 


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-28 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631430#comment-16631430
 ] 

Mikhail Khludnev commented on SOLR-12798:
-

Ok. turns out to trigger passing params as a part of multipart request one 
needs to pass at least _two_ _named_ streams. Here 
[^SOLR-12798-workaround.patch]. [~kwri...@metacarta.com] would you mind to 
evaluate a quick workaround, after binary payload is added as a content stream, 
can it add a named add-nothing stream as well like in the patch below? 
{code}
   up.addContentStream(new ContentStreamBase.StringStream("") {
 {
   setName("multipart trigger. SOLR-12798");
 }
   });
{code}
 
Regarding the more or less appropriate fix: should we pass params as multipart 
with POST always? or try to estimate  their size, and put so only long one?  

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631337#comment-16631337
 ] 

Noble Paul commented on SOLR-12798:
---


An ideal solution would be
* Be able to construct a SolrInputDocument with a binary payload + metadata 
parameters for that doc
* When this is sent to Solr, SolrJ should sent the payload+parameters in the 
body
* This ensures that the query string length is always constant
* This also helps in inter-node communication where the documents are sent 
between replicas

I'm not sure if we can achieve this without some changes at the server side 
too. Meanwhile we may need a custom HttpSolrClient implementation that can do a 
multipart request

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, no params in url.png, 
> solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Michael Schumann (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631211#comment-16631211
 ] 

Michael Schumann commented on SOLR-12798:
-

I wanted to chime in here because we have run into the problem of the body of 
large POST requests getting encoded in the URL in a different scenario and it 
would be nice if there was a solution for this. To work around the problem we 
have had to copy and modify Solr classes.

Our use case is not a common one: we sometimes make query requests to a custom 
handler with a very large number of integer values encoded into a 
RoaringBitMap. On the client side it is not a big problem, we created a 
subclass of {{HttpSolrClient.Builder}} that set {{UseMultiPartPost}} to true. 
This is passed in to the {{LBHttpSolrClient}} which in turn is passed into 
{{CloudSolrClient}}.

The problem that was harder to solve was in the {{HttpShardHandler}} on the 
Solr nodes, which ends up encoding the parameters in the URL. The work around 
we came up with was to duplicate and modify {{HttpShardHandler}} so we could 
again set {{UseMultiPartPost}} to true. We also had to subclass 
{{HttpShardHandlerFactory}} and {{HttpSolrClient.Builder.}}

It would be great if there was a way to force the request both on the Solrj 
client side and in the requests made between the nodes to use multipart 
requests.

 

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, no params in url.png, 
> solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631149#comment-16631149
 ] 

Karl Wright commented on SOLR-12798:


[~janhoy]:

{quote}
That would be for case 1) where  you don't do Tika stuff on the MCF side but 
want Solr to handle the binary stream. In this case there should be no problem 
with huge metadata request params. And I agree that SolrJ should support this 
case (ContentStreamUpdateRequest?).
{quote}

Ok.  At the moment that sort of request seems to be transmitted with standard 
POST with metadata stuffed into the URL.  So a fix is needed for that.

{code}
I got confused by your other use case where you parse the file with Tika on the 
MCF side and still sent the text to /extract
{code}

While Julien has a custom Solr handler, that's not what we typically do, and we 
recommend that already-Tika-extracted content and metadata be sent to the 
/update handler.  In that case, we build a SolrInputDocument from the content 
stream, and add it into an UpdateRequest.  This mode of usage also seems to use 
standard POST or even PUT, and it puts all the metadata parameters on the URL.  
This is transmitted to the /update handler.  Do you want to support the case 
where the metadata parameters are sizable enough that the URL exceeds 8192 
bytes?






> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, no params in url.png, 
> solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Julien Massiera (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631127#comment-16631127
 ] 

Julien Massiera commented on SOLR-12798:


[~janhoy],

Your proposal could resolve the long URL problem , but how would you create a 
Solr document (XML, JSON or CSV cause if I am not wrong, these are the only 
three formats that the update handler of Solr can manage) based on some 
metadata and a content file (which in my case is pure text) without having to 
entirely read the content file to inject it to the Solr document ? 
I think it will have hudge performance impact when one have to crawl millions 
of documents if not billions 
The Solr Output connector of MCF is currently just constructing a simple POST 
request with document metadata as parameters and the content file as stream. 
Your solution will add a significant step before sending the document. Am I 
wrong ? 

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, no params in url.png, 
> solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631107#comment-16631107
 ] 

Jan Høydahl commented on SOLR-12798:


{quote}How do you suggest we handle binary data that is meant for SolrCell?
{quote}
That would be for case 1) where  you don't do Tika stuff on the MCF side but 
want Solr to handle the binary stream. In this case there should be no problem 
with huge metadata request params. And I agree that SolrJ should support this 
case ({{ContentStreamUpdateRequest}}?). I got confused by your other use case 
where you parse the file with Tika on the MCF side and still sent the text to 
/extract.

As I understand, this Jira issue is really mainly about the classic use case 
where you do NOT invoke Tika on client side but stream binary content to 
SolrCell and still need some Url parameters, and doing this in SolrJ is broken 
somehow. In this case there will NOT be huge metadata to pass as URL 
parameters, right?

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, no params in url.png, 
> solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631084#comment-16631084
 ] 

Karl Wright commented on SOLR-12798:


[~janhoy], if you didn't mean that the metadata and content should be sent in 
the content body, then I'm completely missing what your suggestion is.

{quote}
My cURL examples were just to discus what "metadata" might mean in this context.
{quote}

Repositories that are crawled by ManifoldCF have documents that are represented 
as follows:
- A long content stream, binary
- N pairs of name/value data, called metadata, which is fielded data associated 
with the document

If the metadata is extracted in a ManifoldCF pipeline from the content stream, 
it's done via Tika, from a binary stream, which changes the binary content 
stream to a simple text stream, and also supplies more metadata generated as a 
result of the extraction.  In other words, your JSON example is not like 
anything we do at all at this time.

If you want this translated into CURL, you can do it one of two ways:
(1) Put the metadata onto the URL as & parameters, e.g. 
name1=value1=value2 etc, or
(2) Send the metadata as sections in a multipart post.  This too can be set up 
in CURL if you want me to propose an example.  Each section in a multipart post 
has a name, and you can thus transmit a section for every metadata name/value 
pair, as well as one for the content part (which has its own name, that is in 
fact used by SolrCell for metadata of its own.)

Hope this helps.


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, no params in url.png, 
> solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631066#comment-16631066
 ] 

Jan Høydahl commented on SOLR-12798:


{quote}so your suggestion is to use JSON format
{quote}
Not at all. My cURL examples were just to discus what "metadata" might mean in 
this context. In a pure type-2) case where Tika runs in MCF one would construct 
documents with all metadata as fields in those documents. So I still don't 
understand why/how you'd get those long URLs at all in this scenario, since all 
the content goes into the streamed body. But I have not tested this streaming 
fashion use of SolrJ myself, I have just compiled in-memory SolrInputDocuments 
as usual and understand that you want to be memory efficient here and stream 
those docs as far as possible.

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, no params in url.png, 
> solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631059#comment-16631059
 ] 

Karl Wright commented on SOLR-12798:


[~janhoy], so your suggestion is to use JSON format for the body, and put the 
metadata into that.  How do you suggest we handle binary data that is meant for 
SolrCell?  Encoding the binary in a JSON document is possible but in practice 
this is quite verbose, yielding 3 or 4 bytes to one.  Is that nevertheless your 
official suggestion?


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631057#comment-16631057
 ] 

Karl Wright commented on SOLR-12798:


[~mkhludnev], your walkthrough in the code is fine but (a) when we use 
ContentStreamUpdateHandler in the manner you describe to the update/extract 
handler, we still wind up going through the contentWriter clause above where 
you stop, and (b) when we use UpdateHandler in the manner you describe we also 
go through that same path.  In fact I could find no way to send the content 
through any other path with the code as it exists in master right now, because 
in our usage there's always a contentWriter and the check for its presence 
excludes all else that happens after that.  So I don't understand where the 
disconnect is.  Perhaps if you attach the exact code you are testing we can 
resolve this.



> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631041#comment-16631041
 ] 

Mikhail Khludnev commented on SOLR-12798:
-

Karl, fwiw {{SolrExampleTests.testMultiContentStreamRequest()}} bypasses the 
code path you pointed me on. I still not fully understand, but why don't pass 
all it needs via {{ContentStreamUpdateRequest.addFile()}} and {{.setParam()}} 
instead of {{ContentWriter}}? I've checked that long {{wparams}} encoded and 
passed as a separate part keeping URL short. 
 !no params in url.png!

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630999#comment-16630999
 ] 

Jan Høydahl commented on SOLR-12798:


If by "metadata" you mean the {{=value}} http parameters that 
the ExtractingRequestHandler expects, then why would you send those on a normal 
update request containing a SolrInputDocument with all fields embedded?

I.e. instead of this (which does not even make sense since JSON update handler 
does not support literal param)
{code:java}
curl -XPOST 
http://localhost:8983/solr/foo/update?literal.id=1=George=Hello{code}
you post all metadata as fields in the body:
{code:java}
curl -XPOST http://localhost:8983/solr/foo/update -H "Content-type: 
application/json" -d '[{"id":1", "author":"George", "title":"Hello"}]'{code}

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630851#comment-16630851
 ] 

Karl Wright commented on SOLR-12798:


Please examine the following code from master HttpSolrClient.java:

{code}
  if(contentWriter != null) {
String fullQueryUrl = url + wparams.toQueryString();
HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == 
request.getMethod() ?new HttpPost(fullQueryUrl) : new HttpPut(fullQueryUrl);
postOrPut.addHeader("Content-Type",
contentWriter.getContentType());
postOrPut.setEntity(new BasicHttpEntity(){
  @Override
  public boolean isStreaming() {
return true;
  }

  @Override
  public void writeTo(OutputStream outstream) throws IOException {
contentWriter.write(outstream);
  }
});
return postOrPut;

  } else if (streams == null || isMultipart) {
{code}

The request is formed by taking all the parameters in wparams (which include 
the metadata fields AFAICT) and putting them into the URL:

{code}
HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == 
request.getMethod() ?new HttpPost(fullQueryUrl) : new HttpPut(fullQueryUrl);
{code}

There is no other way in the SolrJ request handling code for PUT and POST 
requests to transmit metadata to Solr.  

Indeed, right now, both documents added to an UpdateRequest, as well as 
documents that are specified via ContentStreamUpdateRequest, go by this route.  
We did verify that using the 7.5.0 version of SolrJ and completely removing all 
ManifoldCF custom code led to documents that would exceed the maximum URL 
length if their metadata was long enough.


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630625#comment-16630625
 ] 

Mikhail Khludnev commented on SOLR-12798:
-

I'm trying to understand what's problem. Giving that the challenge is to send a 
huge file in body and long param. I took the test: 

[https://github.com/apache/lucene-solr/blob/c587410f99375005c680ece5e24a4dfd40d8d3eb/solr/solrj/src/test/org/apache/solr/client/solrj/SolrExampleTests.java#L675]

added long param into:

{{up.setParam(CommonParams.HEADER_ECHO_PARAMS, 
CommonParams.EchoParamStyle.ALL.toString());}}
{{ { // added long param}}
{{ StringBuilder sb = new StringBuilder();}}
{{ for(int i=0; i<1000; i++) {}}
{{ sb.append((char)('a'+((char)(i%26;}}
{{ }}}
{{ String longparam = sb.toString();}}
{{ //System.out.println(longparam.length());}}
{{ up.setParam("b", longparam);}}
{{ }}}
{{ up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);}}

 

Then I run SolrExampleJettyTest and it passed. Is it possible if Manifold 
request by the same way? 

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Julien Massiera (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630605#comment-16630605
 ] 

Julien Massiera commented on SOLR-12798:


[~janhoy], considering the discussion thread, I don't think that having us send 
you what we do will convince you that we do it the proper way. I think it would 
be more helpful for us if you show us the SolrJ code that you envision in order 
to create a Solr document with some content and some metadata, and stream it to 
Solr via POST method. 

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630547#comment-16630547
 ] 

Jan Høydahl commented on SOLR-12798:


{quote}you will note that all parameters and metadata are folded into the URL 
for the ContentWriter transmission mechanism
{quote}
I don't get it. What parameters and metadata are we talking about here, that 
you wish to send to Solr's standard {{/update}} handler? All the document 
fields and metadata would go in the POST body, not? Please give an example of 
this type 2) request. Does not need to be an example with a large request, just 
any request using MCF's Tika component and then how things look like when 
attempting to POST that content to Solr's {{/update}} endpoint.

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630531#comment-16630531
 ] 

Karl Wright commented on SOLR-12798:


{quote}
This looks to me like a plain Solr document post to /update handler, in 
whatever format you'd like? If you can take adavantage of Noble Paul's 
enhancements to stream the content this can still be a plain document not 
needing multipart, and no need sending data in http params?
{quote}

The streaming part is great.  But if you look at the current master 
implementation of HttpSolrClient, you will note that all parameters and 
metadata are folded into the URL for the ContentWriter transmission mechanism.  
This fails for us because the URL size can easily exceed 8192 bytes.  That is 
why we need the multipart post handling even for 
UpdateRequest/SolrInputDocument requests.


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630508#comment-16630508
 ] 

Jan Høydahl commented on SOLR-12798:


Ok, let's keep the discussion about standard handlers. Then when MCF is not 
going to stream a huge binary file to Solr but rather send one Solr document 
with one potentially huge plain-text content field and several other metadata 
fields. This looks to me like a plain Solr document post to /update handler, in 
whatever format you'd like? If you can take adavantage of Noble Paul's 
enhancements to stream the content this can still be a plain document not 
needing multipart, and no need sending data in http params?

However, if you have a use case where you both need to post some binary blob to 
Solr Cell and also need to pass huge metadata in literal params, then things 
would be different. But I have not seen such a usecase yet?

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Julien Massiera (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630406#comment-16630406
 ] 

Julien Massiera commented on SOLR-12798:


[~kwri...@metacarta.com], [~janhoy],

actually the provided example IS of type 2), as I mentioned, the handler used 
on Solr side is a modified /update handler, not an /extract, the name is 
misleading I would have renammed it as /update/no-tika and here is its 
declaration in the solrconfig.xml file :
{code:java}

  
true
ignored_
ignored_
ignored_
datafari
  

{code}
It is not using Tika and understands literal.xxx parameters, so, from my point 
of view, no need to discuss about this...
 

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630203#comment-16630203
 ] 

Karl Wright commented on SOLR-12798:


[~janhoy], the example we provided is using type (1), as Julien noted.  Do you 
want a type (2) example?


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630151#comment-16630151
 ] 

Jan Høydahl commented on SOLR-12798:


{quote}typically for case (2) the /update handler is used, not the 
/update/extract handler.
{quote}
But the /update handler does not support the {{literal.xxx}} parameters, so 
that makes no sense, does it?

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630133#comment-16630133
 ] 

Karl Wright commented on SOLR-12798:


Hi [~janhoy], typically for case (2) the /update handler is used, not the 
/update/extract handler.


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Julien Massiera (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630027#comment-16630027
 ] 

Julien Massiera commented on SOLR-12798:


Hi [~janhoy],

The /extract update handler is misleading in the case of the log I shared with 
you as it is not the original update/extract of Solr but a custom one that is 
NOT using Tika. Because, like you said, the document has already been parsed by 
the Tika of MCF. 

The 1) that you mentioned is definitely not a recommended solution in a 
production environment cause, till now, I experienced a lot of OOM when Tika 
has to deal with exotic files. As we use Solr as the search engine and we 
cannot afford to have an interruption of service when indexing phase, this 
proposal is not an option. 

 

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629971#comment-16629971
 ] 

Jan Høydahl commented on SOLR-12798:


If I understand correctly, you now have a choice in MCF whether to
 # Stream the original binary document to Solr's extracting request handler and 
use Solr's built-in Tika to parse it. 
In this case there will NOT be a problem since you won't have much metadata as 
request params, just the few you would have configured statically
 # Let MCF do the Tika conversion using Tika Content Extractor 
([https://manifoldcf.apache.org/release/release-1.10/en_US/end-user-documentation.html#tikaextractor)
I|https://manifoldcf.apache.org/release/release-1.10/en_US/end-user-documentation.html#tikaextractor)]n
 this case MCF will have all the various metadata parsed from the docs, that it 
may want to send to Solr, alongside the plain-text parsed version of the 
document.

For 1) you don't have an issue, as you send the binary stream to /extract 
endpoint.

For 2) I wonder why you use {{/extract}} at all, since Tika has already been 
invoked on the MCF side. This seems like an anti-pattern. The best way would be 
to construct a SolrInputDocument on where each {{literal.xyz}} params becomes a 
separate {{xyz}} field, and where the text body is put into a {{content}} field 
(configurable) and everything is sent to {{/update}} as opposed to 
{{/extract}}. In the case of jpg files the body text would of course be empty 
as there is only metadata to be indexed.

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629864#comment-16629864
 ] 

Karl Wright commented on SOLR-12798:


I've attached a patch, not meant to be applied, which shows the general 
approach I'd like to explore for a fix.
The biggest problems I've had in making this stuff work is figuring out when 
multipart ought to be used in the HttpSolrClient code.  I therefore propose 
that there be an explicit METHOD type created for multipart post, and that 
HttpSolrClient pay attention to that when assembling its payload.  The payload 
would be assembled solely using the ContentWriter mechanism, but the metadata 
would go into multipart form fields rather than the URL.  The patch does not 
contain the modifications to HttpSolrClient yet; I just wanted to initiate the 
discussion.  Does anyone see a problem with this?


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629860#comment-16629860
 ] 

Karl Wright commented on SOLR-12798:


I should also note that other prime examples of this issue *cannot* be added to 
this ticket for security reasons.  Most of ManifoldCF's clients are 
integrators; they don't generally have permission to include company content 
without obtaining specific company permission.   Luckily FranceLabs has a few 
examples hanging around or it would be a real challenge to put together a 
real-world example for you guys.

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-27 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629853#comment-16629853
 ] 

Karl Wright commented on SOLR-12798:


{quote}
Specifically the example that generates meaningful metadata and body 
(multipart) both of which are ending-up used in Solr. 
{quote}

The data has now been provided, and the Solr [INFO] log line for it as well.  
Are you still asking for the multipart request that *should* be generated by 
SolrJ for that request?  As I've stated, we have had to modify chunks of SolrJ 
in order to generate that multipart request; with some work we can probably 
capture it in an HttpClient wire log, but it *is* some work.


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-26 Thread Julien Massiera (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629146#comment-16629146
 ] 

Julien Massiera commented on SOLR-12798:


Hi [~noble.paul], [~arafalov], [~kwri...@metacarta.com]

I am a ManifoldCF user/committer and you will find as attached files an example 
of an update request that is sent to Solr after being analyzed by Tika 
(solr-update-request.txt) and the corresponding original file.
I also have an entity extractor that produce a lot of metadata on files that 
exceed the URL limits.

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-26 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628912#comment-16628912
 ] 

Noble Paul commented on SOLR-12798:
---

bq.How many examples do you need to convince yourselves that we're not making 
this up?

Looks like you don't understand the objectives here. We design SolrJ client and 
server with certain usecases in mind. While doing that we assume that we meet 
the needs of most/all users. The fact that you had to implement a custom client 
suggests that either we have failed in that or you have failed in understanding 
how SolrJ works . I'm sure you wouldn't open a ticket to waste our time. We 
have also come across so many cases were users are "holding it wrong" . That is 
why a specific example is useful. If we realize that there is a genuine use 
case that cannot be satisfied by the state-of-the-art SolrJ client, we will 
work towards improving our code so that you don't have to do the dirty work. 
The objective of Solr is not to support multipart form posts . It is designed 
to send in docs/commands and get out query results. The multipart mechanism is 
just a means to an end. Imagine, Solr working on a non HTTP standard. In that 
case we still need to support all these use cases. So, please be patient if we 
are trying to get details

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-26 Thread Alexandre Rafalovitch (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628903#comment-16628903
 ] 

Alexandre Rafalovitch commented on SOLR-12798:
--

Karl, we totally believe you that it is happening. We just don't have enough 
knowledge about your use cases to easily visualize our side of it. I think one 
or two simple examples would be sufficient, no need to do an all-point. 
Clearly, even though your use-case was working for a long time, we somehow 
missed it in our tests/reasoning. So, this discussion is explicitly trying to 
do better on it than the last time.

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-26 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628879#comment-16628879
 ] 

Karl Wright commented on SOLR-12798:


{quote}
The data may be generic, but it has to be fed into Solr in one of the accepted 
parameters.
{quote}

Um, this stuff has been working for more than a decade.  Yes, we're using 
accepted parameters.

{quote}
This reason why we insist on an example is because we want to know which 
parameters are sent as part of query string.
{quote}

Ok, if that's what you need, I will put out an all points bulletin on the 
ManifoldCF user list for a Solr INFO message that contains an example of long 
metadata.  How many examples do you need to convince yourselves that we're not 
making this up?





> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-26 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628827#comment-16628827
 ] 

Noble Paul commented on SOLR-12798:
---

bq.there's no general answer to that question, because there's no one 
definitive example of metadata.

The data may be generic, but it has to be fed into Solr in one of the accepted 
parameters. This reason why we insist on an example is because we want to know 
which parameters are sent as part of query string. We also want to find out if 
you are using it wrong

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-26 Thread Alexandre Rafalovitch (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628813#comment-16628813
 ] 

Alexandre Rafalovitch commented on SOLR-12798:
--

[~kwri...@metacarta.com] I am with Shalin on this. While I appreciate that MCF 
(which we do refer people to from Solr) is very general framework, I think it 
would be very useful to have a concrete sample example that shows what kind of 
information actually goes to the wire.

Specifically the example that generates meaningful metadata and body 
(multipart) both of which are ending-up used in Solr. This would really help us 
to visualize the kind of use-cases, that are very obvious to your project. The 
link example was about forcing multipart, so was not quite representative. 
Similarly, Tika generates one part with all parameters. An example that has 2 
(3?) meaningful parts would be most helpful I feel. And maybe even something 
that could go into a Solr test (so does not need to be very long, just truly 
multipart).

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-26 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628630#comment-16628630
 ] 

Karl Wright commented on SOLR-12798:


[~shalinmangar], there's no general answer to that question, because there's no 
one definitive example of metadata.

I refer you to the project page for ManifoldCF here:

https://manifoldcf.apache.org/en_US/index.html#What+Is+Apache+ManifoldCF%3F



> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-26 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628586#comment-16628586
 ] 

Shalin Shekhar Mangar commented on SOLR-12798:
--

[~kwri...@metacarta.com] - One thing that wasn't very clear to me reading 
through the issue description and comments is what's the metadata for and why 
is it supposed to go through the request URL? I'd appreciate if you can give an 
example of the metadata for my understanding. Thanks!

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-26 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628354#comment-16628354
 ] 

Karl Wright commented on SOLR-12798:


Ok, thanks for the clarification.

I will propose SolrJ changes to allow multipart form transport as a first-class 
citizen, using the ContentWriter construct, and attach those as a patch to this 
ticket.  The other fixes I will propose separately.  Or, if you want to tackle 
this, I'd be happy to hand it to you.  Please let me know.

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-26 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628338#comment-16628338
 ] 

Noble Paul commented on SOLR-12798:
---

{quote}So there's a fix for multipart post usage? Is this committed to master? 
How do you turn it on, or does it do this automatically?
{quote}

I never bothered with multipart post. I wanted to ensure that we don't write 
the docs to memory before we post to the server. That's the fix. As long as you 
can generate docs in a streaming fashion there is no limit to the no:of docs 
that we can write in a single request in the client  

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-26 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628307#comment-16628307
 ] 

Karl Wright commented on SOLR-12798:


{quote}
this no longer is the case
{quote}

That's good news; I can change things in ManifoldCF accordingly, since we no 
longer have to enforce a maximum document size limit in that case then.

{quote}
I have fixed this problem in the current SolrJ
{quote}

So there's a fix for multipart post usage?  Is this committed to master?  How 
do you turn it on, or does it do this automatically?

Once that's there, it would be straightforward to add my other fixes; I'm a 
Lucene/Solr committer now as well, so I can ticket and propose them and they 
will get done this time.


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-25 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628265#comment-16628265
 ] 

Noble Paul commented on SOLR-12798:
---

bq. here are two problems with using UpdateRequest. First, as you point out, 
the entire document has to hit memory. 

this no longer is the case. The reason why I changed the interface is to ensure 
that we don't write everything to memory .You can provide a request that 
creates documents on the fly and the memory consumption is trivial.

bq.Yes, of course it can, but the way SolrJ is constructed it makes no use of 
this. In fact, it currently doesn't use multipart post at all, unless I 
override much functionality in order to force it to do so.

I have fixed this problem in the current SolrJ

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-25 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627826#comment-16627826
 ] 

Karl Wright commented on SOLR-12798:


[~dsmiley], whereas it doesn't seem to have been appreciated, SolrJ did have 
reasonable support for multipart post some few major version ago but I 
appreciate the fact that this is no longer a priority.  I'm happy to help get 
this back to a point that MCF needs.


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-25 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627718#comment-16627718
 ] 

David Smiley commented on SOLR-12798:
-

Okay I appreciate your points.  It'd be nice if SolrJ could be enhanced to 
support multi-part to avoid long URL construction.  Please help make this a 
first class supported feature if it matters to you/ManifoldCF.

I don't think this is a bug though, and thus not a regression.  Before 7.5 by 
your account you really had to go out of your way to make multi-part work with 
SolrJ.  The internals changed which thwarted your efforts (a shame) but doesn't 
represent a bug.  I appreciate it's a frustrating unexpected turn of events, 
nonetheless.

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-25 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627698#comment-16627698
 ] 

Karl Wright commented on SOLR-12798:


[~dsmiley], there are two problems with using UpdateRequest.  First, as you 
point out, the entire document has to hit memory.  This is problematic because 
sometimes these documents are massive and nevertheless Tika needs all of them 
to extract stuff from them.  So we allow two modes of operation:

(1) Via Solr Cell, in which case we use ContentStreamUpdateRequest, which 
embeds a stream and forms the request without having the entire document hit 
memory, and
(2) Via UpdateRequest, and SolrinputDocument, but only after Tika has been 
invoked, and with a length limit.  Even then we have problems with people 
running out of memory unless they are very careful, given that there are 
sometimes dozens of indexing requests active at any one time.

This information, by the way, has nothing to do with length limits on the URL, 
since those are determined solely by metadata, which can be large and is 
independent of the main content stream.  URL limits get in the way just as 
readily when we use mode (2) as when we use mode (1).


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-25 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627674#comment-16627674
 ] 

David Smiley commented on SOLR-12798:
-

[~arafalov] I believe this issue is about SolrJ (client-side), not the Solr 
server.

[~kwri...@metacarta.com] why must ManifoldCF rely on HTTP Multipart in 
particular – can't it compose a SolrInputDocument and just send it like 
basically all Solr clients I've ever seen?  Is the issue about the "infinite 
length" content stream, which I presume maps to some sort of body text?  Note 
the existence of {{UpdateRequest.setDocIterator(Iterator)}} 
which can be helpful in streaming and materializing documents on the fly.

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-25 Thread Alexandre Rafalovitch (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627589#comment-16627589
 ] 

Alexandre Rafalovitch commented on SOLR-12798:
--

As far as I understand, we do advertise multipart upload in several places:
 * 
[https://lucene.apache.org/solr/guide/7_5/content-streams.html#content-stream-sources]
 * multipartUploadLimitInKB parameter in solrconfig.xml 
[https://lucene.apache.org/solr/guide/7_5/requestdispatcher-in-solrconfig.html#requestparsers-element]
 

If the current issue changes those expectations without explicitly discussing 
them and including the new user-visible limitation in the migration guide for 
7.5, we have a clear case of critical regression on our hands. I cannot see any 
tests in the codebases for this, but - if my assessment is correct - perhaps 
one should exist.

 

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-25 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627547#comment-16627547
 ] 

Karl Wright commented on SOLR-12798:


[~noble.paul] 'We are assuming your usecase can only be implemented using a 
multipart request. Can we see what do you send in the request parameters?'

That's kind of a silly question if you don't mind me saying so.  MCF is a 
framework with dozens of connectors for accessing different kinds of document 
repositories.  A "document" in ManifoldCF consists of:

- A content stream of infinite length
- Unlimited metadata, in the form of name/valuelist pairs

Documents that have large amounts of metadata are common.  The details vary 
considerably by source repository.  For only one example, we have one client 
who seemingly specializes in indexing image content.  The images are run 
through Tika, which takes these images and produces a zero-length text file and 
sometimes 100K bytes of metadata text, in multiple metadata fields.

I hope that's enough to demonstrate why it is impossible to expect all the 
metadata for a document to fit in the URL.


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-25 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627408#comment-16627408
 ] 

Noble Paul commented on SOLR-12798:
---

ContentWriter is mostly implemented by anonymous classes. StringPayloadWriter 
is just a helper class.

I'm still thinking this is an XY problem. We are assuming your usecase can only 
be implemented using a multipart request. Can we see what do you send in the 
request parameters?

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-25 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627138#comment-16627138
 ] 

Karl Wright commented on SOLR-12798:


It looks like the only implementer of ContentWriter is 
StringPayloadContentWriter, which just furnishes a string for output, correct?

In order to work within that framework, ContentStreamUpdateHandler would need a 
streaming ContentWriter implementation that pulls from the input and writes to 
the output.  That seems to be missing.  And then this has nothing whatsoever to 
do with how the content is actually transmitted -- it seems that the assumption 
is that the new ContentWriter stuff all goes via PUT with metadata in the URL.  
That's not good for two reasons: first, the URL length problems I've already 
mentioned, and second -- Solr Cell uses the "name" part of the multipart post 
to inject its own bit of metadata into the document, and there would be no way 
to transmit that anymore.  Logic is still therefore going to be needed to use 
multipart forms under specific circumstances.  Maybe there needs to be a 
useMultipart() method in all Requests, and HttpSolrClient should look at that 
to decide whether to use multipart or standard PUT?




> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-25 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627098#comment-16627098
 ] 

Karl Wright commented on SOLR-12798:


Hi [~noble.paul], as I explained before, we have document metadata in excess of 
the maximum URL length quite often.  In fact, it's the typical case.  That is 
why we must use multipart post in this application.  My rough estimate of the 
percentage of ManifoldCF users who fall into this category is greater than 90%.



> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-25 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627038#comment-16627038
 ] 

Noble Paul commented on SOLR-12798:
---

Pardon me, 
I'm still wondering what is the real reason why you must use a multi part 
request. What is stopping you from using a standard update request with all 
these operations instead of using a multipart requests. 

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-25 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627028#comment-16627028
 ] 

Karl Wright commented on SOLR-12798:


[~noble.paul] We have a custom implementation because SolrJ and indeed 
HttpComponents/HttpClient have problems we're forced to work around.  These 
have been raised before but so far not taken too seriously apparently.  The 
need to workaround things has gotten even more significant with the latest 
release.

ModifiedHttpSolrClient is a derivation of HttpSolrClient.  The method 
overridden, createMethod(), is a direct copy of HttpSolrClient.createMethod() 
with certain very specific changes.  These are apparently all still necessary.  
I've included the method code below. 

If I disable this custom method, and use standard code, I *never* get multipart 
form posts at all.  That is unacceptable in this application.

With the current modifications included below, I get multipart posts for 
everything, including for deletions, which breaks because Solr doesn't like 
that.

I'm asking for advice as to how to get multipart posts only for documents, 
either ones transmitted by ContentStreamUpdateHandler or 
UpdateHandler.add(SolrInputDocument).

{code}
  @Override
  protected HttpRequestBase createMethod(SolrRequest request, String 
collection) throws IOException, SolrServerException {
if (request instanceof V2RequestSupport) {
  request = ((V2RequestSupport) request).getV2Request();
}
SolrParams params = request.getParams();
RequestWriter.ContentWriter contentWriter = 
requestWriter.getContentWriter(request);
Collection streams = contentWriter == null ? 
requestWriter.getContentStreams(request) : null;
String path = requestWriter.getPath(request);
if (path == null || !path.startsWith("/")) {
  path = DEFAULT_PATH;
}

ResponseParser parser = request.getResponseParser();
if (parser == null) {
  parser = this.parser;
}

// The parser 'wt=' and 'version=' params are used instead of the original
// params
ModifiableSolrParams wparams = new ModifiableSolrParams(params);
if (parser != null) {
  wparams.set(CommonParams.WT, parser.getWriterType());
  wparams.set(CommonParams.VERSION, parser.getVersion());
}
if (invariantParams != null) {
  wparams.add(invariantParams);
}

String basePath = baseUrl;
if (collection != null)
  basePath += "/" + collection;

if (request instanceof V2Request) {
  if (System.getProperty("solr.v2RealPath") == null) {
basePath = baseUrl.replace("/solr", "/api");
  } else {
basePath = baseUrl + "/v2";
  }
}

if (SolrRequest.METHOD.GET == request.getMethod()) {
  if (streams != null || contentWriter != null) {
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "GET can't 
send streams!");
  }

  return new HttpGet(basePath + path + toQueryString(wparams, false));
}

if (SolrRequest.METHOD.DELETE == request.getMethod()) {
  return new HttpDelete(basePath + path + toQueryString(wparams, false));
}

if (SolrRequest.METHOD.POST == request.getMethod() || 
SolrRequest.METHOD.PUT == request.getMethod()) {

  // UpdateRequest uses PUT now, and ContentStreamUpdateHandler uses POST.
  // We must override PUT with POST if multipart is on.
  // If useMultipart is on, we fall back to getting streams directly from 
the request.
  final boolean mustUseMultipart;
  final SolrRequest.METHOD methodToUse;
  if (this.useMultiPartPost) {
final Collection requestStreams = 
request.getContentStreams();
mustUseMultipart = requestStreams != null && requestStreams.size() > 0;
if (mustUseMultipart) {
  System.out.println("Overriding with multipart post");
  streams = requestStreams;
  methodToUse = SolrRequest.METHOD.POST;
} else {
  methodToUse = request.getMethod();
}
  } else {
mustUseMultipart = false;
methodToUse = request.getMethod();
  }
  
  //System.out.println("Post or put");
  String url = basePath + path;
  /*
  boolean hasNullStreamName = false;
  if (streams != null) {
for (ContentStream cs : streams) {
  if (cs.getName() == null) {
hasNullStreamName = true;
break;
  }
}
  }
  */
  
  /*
  final boolean isMultipart = ((this.useMultiPartPost && 
SolrRequest.METHOD.POST == methodToUse)
  || (streams != null && streams.size() > 1)) && !hasNullStreamName;
  */
  final boolean isMultipart = this.useMultiPartPost && 
SolrRequest.METHOD.POST == methodToUse &&
  (streams != null && streams.size() >= 1);
  
  System.out.println("isMultipart = "+isMultipart);
  
  LinkedList postOrPutParams = new LinkedList<>();

  

[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-25 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627010#comment-16627010
 ] 

Karl Wright commented on SOLR-12798:


I'm looking for workarounds -- initially, at least.

What I've tried is adding the following code in the POST/PUT section of the 
HttpSolrClient code:

{code}
  // UpdateRequest uses PUT now, and ContentStreamUpdateHandler uses POST.
  // We must override PUT with POST if multipart is on.
  // If useMultipart is on, we fall back to getting streams directly from 
the request.
  final boolean mustUseMultipart;
  final SolrRequest.METHOD methodToUse;
  if (this.useMultiPartPost) {
final Collection requestStreams = 
request.getContentStreams();
mustUseMultipart = requestStreams != null && requestStreams.size() > 0;
if (mustUseMultipart) {
  System.out.println("Overriding with multipart post");
  streams = requestStreams;
  methodToUse = SolrRequest.METHOD.POST;
} else {
  methodToUse = request.getMethod();
}
  } else {
mustUseMultipart = false;
methodToUse = request.getMethod();
  }
  
  //System.out.println("Post or put");
  String url = basePath + path;
  /*
  boolean hasNullStreamName = false;
  if (streams != null) {
for (ContentStream cs : streams) {
  if (cs.getName() == null) {
hasNullStreamName = true;
break;
  }
}
  }
  */
  
  /*
  final boolean isMultipart = ((this.useMultiPartPost && 
SolrRequest.METHOD.POST == methodToUse)
  || (streams != null && streams.size() > 1)) && !hasNullStreamName;
  */
  final boolean isMultipart = this.useMultiPartPost && 
SolrRequest.METHOD.POST == methodToUse &&
  (streams != null && streams.size() >= 1);
  
  System.out.println("isMultipart = "+isMultipart);
{code}

The problem is that when multipart post is used for document delete requests, 
they fail because the stream is empty.  And the code above doesn't distinguish 
between UpdateRequests that include real documents and UpdateRequests that are 
delete requests.  Any ideas?



> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-25 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626970#comment-16626970
 ] 

Noble Paul commented on SOLR-12798:
---

You have custom impl of HttpSolrClient. Is it not possible for you to start 
using the {{ public ContentWriter getContentWriter(SolrRequest req) }} method ?

The {{getContentStreams}} method is not the appropriate method to use

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-25 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626903#comment-16626903
 ] 

Karl Wright commented on SOLR-12798:


Thinking about this: I think the fix might well be to add a decent 
implementation of getContentStreams() to BinaryRequestWriter, and then 
prioritizing the use of content streams in HttpSolrClient when useMultipart is 
true.  That would fix the basic problem, if it doesn't introduce other ones.

For the ManifoldCF project's immediate release concerns, I'd have to create a 
ModifiedUpdateRequest class and a ModifiedBinaryRequestWriter class, if they're 
not locked down anyway, and use ModifiedUpdateRequest instead of UpdateRequest 
whenever I need to add SolrInputDocuments.  I'll check out whether this would 
work.  That makes some six SolrJ classes that ManifoldCF needs to override, 
however, just to get multipart post to work properly.  I think it's time to 
make multipart post a first-class citizen for SolrJ, no?
 

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-24 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626079#comment-16626079
 ] 

Karl Wright commented on SOLR-12798:


[~noble.paul], I can verify that the problem still exists on Solr 7.5.


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-24 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625869#comment-16625869
 ] 

Noble Paul commented on SOLR-12798:
---

Can you test with the latest SolrJ release {{7.5}} and confirm if this problem 
still exists. I shall take a look after that

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-24 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625809#comment-16625809
 ] 

Karl Wright commented on SOLR-12798:


The status is as follows:

(1) I've confirmed that the RequestWriter override only permits multipart form 
requests for the "commit" request.  "Update" or "Delete" both do not allow this 
pathway at all.

(2) If I change the logic for all POST and PUT requests to disable the 
contentWriter clause, POST requests of documents work properly, but delete 
document requests fail.

(4) Conditionally disabling contentWriter when the request is of class 
ContentStreamUpdateRequest allows things to work partly.  Text documents that 
are indexed via standard UpdateRequest do not use multipart post, however.  So 
we need a better solution.



> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-24 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625558#comment-16625558
 ] 

Karl Wright commented on SOLR-12798:


We're researching the actual issue that is blocking release.  It seems that 
deleting documents using a Solr Cloud installation may not be working; for each 
document, we're seeing a 400 error with the following message: 

Error from server at http://localhost:8983/solr/FileShare_shard1_replica_n1: 
missing content stream: Error from server at 
http://localhost:8983/solr/FileShare_shard1_replica_n1: missing content stream

Furthermore, after checking the Solr index, none of the documents have been 
removed.
This is obviously severe and we're trying now to confirm that this happens 
without our modifications to HttpSolrClient.



> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2018-09-24 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625544#comment-16625544
 ] 

Karl Wright commented on SOLR-12798:


[~noble.paul], any help would be welcome.  We're in a ManifoldCF release cycle 
now, and SolrJ issues are blocking it.



> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Priority: Major
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org