[jira] [Commented] (CONNECTORS-1410) Use Body as Content at Emails
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970158#comment-15970158 ] Karl Wright commented on CONNECTORS-1410: - r1791549 (release branch) > Use Body as Content at Emails > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Improvement > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.7 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch, > CONNECTORS-1410.patch, CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1410) Use Body as Content at Emails
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970157#comment-15970157 ] Furkan KAMACI commented on CONNECTORS-1410: --- r1791548 > Use Body as Content at Emails > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Improvement > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.7 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch, > CONNECTORS-1410.patch, CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CONNECTORS-1410) Use Body as Content at Emails
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1410: Fix Version/s: (was: ManifoldCF 2.8) ManifoldCF 2.7 > Use Body as Content at Emails > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Improvement > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.7 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch, > CONNECTORS-1410.patch, CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CONNECTORS-1410) Use Body as Content at Emails
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated CONNECTORS-1410: -- Summary: Use Body as Content at Emails (was: Binary Attachment Data as Plain Text at Email Content) > Use Body as Content at Emails > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Improvement > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.8 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch, > CONNECTORS-1410.patch, CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970156#comment-15970156 ] Karl Wright commented on CONNECTORS-1410: - [~kamaci] Please go ahead and commit to trunk. > Binary Attachment Data as Plain Text at Email Content > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Improvement > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.8 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch, > CONNECTORS-1410.patch, CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated CONNECTORS-1410: -- Issue Type: Improvement (was: Bug) > Binary Attachment Data as Plain Text at Email Content > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Improvement > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.8 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch, > CONNECTORS-1410.patch, CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated CONNECTORS-1410: -- Attachment: CONNECTORS-1410.patch > Binary Attachment Data as Plain Text at Email Content > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.8 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch, > CONNECTORS-1410.patch, CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970153#comment-15970153 ] Furkan KAMACI commented on CONNECTORS-1410: --- [~kwri...@metacarta.com] Opps, nice catch! I let it to be searchable but accidentally removed from BASIC_SEARCHABLE_ATTRIBUTES. > Binary Attachment Data as Plain Text at Email Content > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.8 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch, > CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970147#comment-15970147 ] Karl Wright commented on CONNECTORS-1410: - [~kamaci] The patch looks good except I think you could still allow the body to be searchable and that would be fine. Once that is restored, please go ahead and commit to trunk. I will pull up to the release branch. > Binary Attachment Data as Plain Text at Email Content > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.8 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch, > CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970143#comment-15970143 ] Furkan KAMACI commented on CONNECTORS-1410: --- [~kwri...@metacarta.com], you are right. Could you check my latest patch? I've tested it and works fine. > Binary Attachment Data as Plain Text at Email Content > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.8 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch, > CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated CONNECTORS-1410: -- Attachment: CONNECTORS-1410.patch > Binary Attachment Data as Plain Text at Email Content > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.8 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch, > CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1408) Request-URI Too Long
[ https://issues.apache.org/jira/browse/CONNECTORS-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970114#comment-15970114 ] Cihad Guzel commented on CONNECTORS-1408: - Thanks [~kwri...@metacarta.com], it is run succesfully > Request-URI Too Long > > > Key: CONNECTORS-1408 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1408 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector, Solr 6.x component >Affects Versions: ManifoldCF 2.6 >Reporter: Cihad Guzel >Assignee: Karl Wright > Fix For: ManifoldCF 2.7 > > Attachments: http-wire2.log, http-wire.log > > > I run email connector job and follow "Simple History" from UI. I see an error > as follow: > {code} > Error from server at http://localhost:8983/solr/mycore: non ok status: 414, > message:Request-URI Too Long > {code} > It is sent by Solr. > Solr logs say: > {code} > HttpParser - URI is too large >8192 > {code} > and > {code} > HttpParser - bad HTTP parsed: 414 for > HttpChannelOverHttp@2b6931dd{r=0,c=false,a=IDLE,uri=null} > > {code} > ManifoldCF ModifiedHttpSolrClient.java has following code: > {code} > // It is has one stream, it is the post body, put the params in the URL > else { > String pstr = toQueryString(wparams, false); > HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == > request.getMethod() ? > new HttpPost(url + pstr) : new HttpPut(url + pstr); > {code} > There is "pstr" field appended to the URL. "pstr" field have all Solr params. > It contains email content. We have "URI is too large" error when email has > large content. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970111#comment-15970111 ] Karl Wright commented on CONNECTORS-1410: - [~kamaci] I think having the BODY present twice in the indexed document is confusing and unnecessary. Since we've already broken backwards compatibility for this connector, if we're going to index the body as the main document content, I think we might as well remove the body from all consideration for the metadata. What do you think? > Binary Attachment Data as Plain Text at Email Content > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.8 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: [VOTE] Release Apache ManifoldCF 2.7, RC0
RC0 withdrawn due to solution found for CONNECTORS-1408. Karl On Sat, Apr 15, 2017 at 7:58 AM, Karl Wrightwrote: > Please vote on whether to release Apache ManifoldCF 2.7, RC0. This > release has many major changes, including a new version of Tika, a new UI, > and major improvements to some connectors, e.g. the Email Connector. > > You can download the release from: > > https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.7 > > There is also a release tag here: > > https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.7-RC0 > > Thanks, > Karl > >
[jira] [Resolved] (CONNECTORS-1408) Request-URI Too Long
[ https://issues.apache.org/jira/browse/CONNECTORS-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-1408. - Resolution: Fixed r1791542 (trunk) r1791543 (release branch) > Request-URI Too Long > > > Key: CONNECTORS-1408 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1408 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector, Solr 6.x component >Affects Versions: ManifoldCF 2.6 >Reporter: Cihad Guzel >Assignee: Karl Wright > Fix For: ManifoldCF 2.7 > > Attachments: http-wire2.log, http-wire.log > > > I run email connector job and follow "Simple History" from UI. I see an error > as follow: > {code} > Error from server at http://localhost:8983/solr/mycore: non ok status: 414, > message:Request-URI Too Long > {code} > It is sent by Solr. > Solr logs say: > {code} > HttpParser - URI is too large >8192 > {code} > and > {code} > HttpParser - bad HTTP parsed: 414 for > HttpChannelOverHttp@2b6931dd{r=0,c=false,a=IDLE,uri=null} > > {code} > ManifoldCF ModifiedHttpSolrClient.java has following code: > {code} > // It is has one stream, it is the post body, put the params in the URL > else { > String pstr = toQueryString(wparams, false); > HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == > request.getMethod() ? > new HttpPost(url + pstr) : new HttpPut(url + pstr); > {code} > There is "pstr" field appended to the URL. "pstr" field have all Solr params. > It contains email content. We have "URI is too large" error when email has > large content. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970107#comment-15970107 ] Furkan KAMACI commented on CONNECTORS-1410: --- [~kwri...@metacarta.com] Do you want me remove body from included metadata? > Binary Attachment Data as Plain Text at Email Content > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.8 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1408) Request-URI Too Long
[ https://issues.apache.org/jira/browse/CONNECTORS-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970105#comment-15970105 ] Karl Wright commented on CONNECTORS-1408: - [~cguzel], it looks like that is the key. For the email connector, the file name field is always set, but it may be being set to null: {code} RepositoryDocument rd = new RepositoryDocument(); rd.setFileName(msg.getFileName()); {code} When the Solr Connector sees a null file name, it won't use multipart post. It so happens that connectors that do not set the file name will work fine because there's a default value for the filename set in RepositoryDocument: {code} protected String fileName = "docname"; {code} I guess the right thing to do is to modify the Solr connector to insure that the filename is never null. I will commit a fix for this and we'll spin a new 2.7 with this change on Monday. > Request-URI Too Long > > > Key: CONNECTORS-1408 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1408 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector, Solr 6.x component >Affects Versions: ManifoldCF 2.6 >Reporter: Cihad Guzel >Assignee: Karl Wright > Fix For: ManifoldCF 2.7 > > Attachments: http-wire2.log, http-wire.log > > > I run email connector job and follow "Simple History" from UI. I see an error > as follow: > {code} > Error from server at http://localhost:8983/solr/mycore: non ok status: 414, > message:Request-URI Too Long > {code} > It is sent by Solr. > Solr logs say: > {code} > HttpParser - URI is too large >8192 > {code} > and > {code} > HttpParser - bad HTTP parsed: 414 for > HttpChannelOverHttp@2b6931dd{r=0,c=false,a=IDLE,uri=null} > > {code} > ManifoldCF ModifiedHttpSolrClient.java has following code: > {code} > // It is has one stream, it is the post body, put the params in the URL > else { > String pstr = toQueryString(wparams, false); > HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == > request.getMethod() ? > new HttpPost(url + pstr) : new HttpPut(url + pstr); > {code} > There is "pstr" field appended to the URL. "pstr" field have all Solr params. > It contains email content. We have "URI is too large" error when email has > large content. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CONNECTORS-1408) Request-URI Too Long
[ https://issues.apache.org/jira/browse/CONNECTORS-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1408: Fix Version/s: (was: ManifoldCF 2.8) ManifoldCF 2.7 > Request-URI Too Long > > > Key: CONNECTORS-1408 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1408 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector, Solr 6.x component >Affects Versions: ManifoldCF 2.6 >Reporter: Cihad Guzel >Assignee: Karl Wright > Fix For: ManifoldCF 2.7 > > Attachments: http-wire2.log, http-wire.log > > > I run email connector job and follow "Simple History" from UI. I see an error > as follow: > {code} > Error from server at http://localhost:8983/solr/mycore: non ok status: 414, > message:Request-URI Too Long > {code} > It is sent by Solr. > Solr logs say: > {code} > HttpParser - URI is too large >8192 > {code} > and > {code} > HttpParser - bad HTTP parsed: 414 for > HttpChannelOverHttp@2b6931dd{r=0,c=false,a=IDLE,uri=null} > > {code} > ManifoldCF ModifiedHttpSolrClient.java has following code: > {code} > // It is has one stream, it is the post body, put the params in the URL > else { > String pstr = toQueryString(wparams, false); > HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == > request.getMethod() ? > new HttpPost(url + pstr) : new HttpPut(url + pstr); > {code} > There is "pstr" field appended to the URL. "pstr" field have all Solr params. > It contains email content. We have "URI is too large" error when email has > large content. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CONNECTORS-1408) Request-URI Too Long
[ https://issues.apache.org/jira/browse/CONNECTORS-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970104#comment-15970104 ] Cihad Guzel edited comment on CONNECTORS-1408 at 4/15/17 7:55 PM: -- Yes Karl, hasNullStreamName is set true was (Author: cguzel): Yes Karl, hasNullStreamName is true > Request-URI Too Long > > > Key: CONNECTORS-1408 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1408 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector, Solr 6.x component >Affects Versions: ManifoldCF 2.6 >Reporter: Cihad Guzel >Assignee: Karl Wright > Fix For: ManifoldCF 2.8 > > Attachments: http-wire2.log, http-wire.log > > > I run email connector job and follow "Simple History" from UI. I see an error > as follow: > {code} > Error from server at http://localhost:8983/solr/mycore: non ok status: 414, > message:Request-URI Too Long > {code} > It is sent by Solr. > Solr logs say: > {code} > HttpParser - URI is too large >8192 > {code} > and > {code} > HttpParser - bad HTTP parsed: 414 for > HttpChannelOverHttp@2b6931dd{r=0,c=false,a=IDLE,uri=null} > > {code} > ManifoldCF ModifiedHttpSolrClient.java has following code: > {code} > // It is has one stream, it is the post body, put the params in the URL > else { > String pstr = toQueryString(wparams, false); > HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == > request.getMethod() ? > new HttpPost(url + pstr) : new HttpPut(url + pstr); > {code} > There is "pstr" field appended to the URL. "pstr" field have all Solr params. > It contains email content. We have "URI is too large" error when email has > large content. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1408) Request-URI Too Long
[ https://issues.apache.org/jira/browse/CONNECTORS-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970104#comment-15970104 ] Cihad Guzel commented on CONNECTORS-1408: - Yes Karl, hasNullStreamName is true > Request-URI Too Long > > > Key: CONNECTORS-1408 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1408 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector, Solr 6.x component >Affects Versions: ManifoldCF 2.6 >Reporter: Cihad Guzel >Assignee: Karl Wright > Fix For: ManifoldCF 2.8 > > Attachments: http-wire2.log, http-wire.log > > > I run email connector job and follow "Simple History" from UI. I see an error > as follow: > {code} > Error from server at http://localhost:8983/solr/mycore: non ok status: 414, > message:Request-URI Too Long > {code} > It is sent by Solr. > Solr logs say: > {code} > HttpParser - URI is too large >8192 > {code} > and > {code} > HttpParser - bad HTTP parsed: 414 for > HttpChannelOverHttp@2b6931dd{r=0,c=false,a=IDLE,uri=null} > > {code} > ManifoldCF ModifiedHttpSolrClient.java has following code: > {code} > // It is has one stream, it is the post body, put the params in the URL > else { > String pstr = toQueryString(wparams, false); > HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == > request.getMethod() ? > new HttpPost(url + pstr) : new HttpPut(url + pstr); > {code} > There is "pstr" field appended to the URL. "pstr" field have all Solr params. > It contains email content. We have "URI is too large" error when email has > large content. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CONNECTORS-1408) Request-URI Too Long
[ https://issues.apache.org/jira/browse/CONNECTORS-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970101#comment-15970101 ] Karl Wright edited comment on CONNECTORS-1408 at 4/15/17 7:48 PM: -- Ok, why is isMultipart false? I can't see any way for that to happen. {code} boolean isMultipart = ((this.useMultiPartPost && SolrRequest.METHOD.POST == request.getMethod()) || (streams != null && streams.size() > 1)) && !hasNullStreamName; {code} About the only way we can get isMultipart set to false is if hasNullStreamName is set to true. was (Author: kwri...@metacarta.com): Ok, why is isMultipart false? I can't see any way for that to happen. > Request-URI Too Long > > > Key: CONNECTORS-1408 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1408 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector, Solr 6.x component >Affects Versions: ManifoldCF 2.6 >Reporter: Cihad Guzel >Assignee: Karl Wright > Fix For: ManifoldCF 2.8 > > Attachments: http-wire2.log, http-wire.log > > > I run email connector job and follow "Simple History" from UI. I see an error > as follow: > {code} > Error from server at http://localhost:8983/solr/mycore: non ok status: 414, > message:Request-URI Too Long > {code} > It is sent by Solr. > Solr logs say: > {code} > HttpParser - URI is too large >8192 > {code} > and > {code} > HttpParser - bad HTTP parsed: 414 for > HttpChannelOverHttp@2b6931dd{r=0,c=false,a=IDLE,uri=null} > > {code} > ManifoldCF ModifiedHttpSolrClient.java has following code: > {code} > // It is has one stream, it is the post body, put the params in the URL > else { > String pstr = toQueryString(wparams, false); > HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == > request.getMethod() ? > new HttpPost(url + pstr) : new HttpPut(url + pstr); > {code} > There is "pstr" field appended to the URL. "pstr" field have all Solr params. > It contains email content. We have "URI is too large" error when email has > large content. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1410: The intent was for the main document stream to be the body of the email, and not for there to be a body attribute. If you get rid of the body attribute I will accept your patch in 2.7. > Binary Attachment Data as Plain Text at Email Content > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.8 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CONNECTORS-1408) Request-URI Too Long
[ https://issues.apache.org/jira/browse/CONNECTORS-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1408: Ok, why is isMultipart false? I can't see any way for that to happen. > Request-URI Too Long > > > Key: CONNECTORS-1408 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1408 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector, Solr 6.x component >Affects Versions: ManifoldCF 2.6 >Reporter: Cihad Guzel >Assignee: Karl Wright > Fix For: ManifoldCF 2.8 > > Attachments: http-wire2.log, http-wire.log > > > I run email connector job and follow "Simple History" from UI. I see an error > as follow: > {code} > Error from server at http://localhost:8983/solr/mycore: non ok status: 414, > message:Request-URI Too Long > {code} > It is sent by Solr. > Solr logs say: > {code} > HttpParser - URI is too large >8192 > {code} > and > {code} > HttpParser - bad HTTP parsed: 414 for > HttpChannelOverHttp@2b6931dd{r=0,c=false,a=IDLE,uri=null} > > {code} > ManifoldCF ModifiedHttpSolrClient.java has following code: > {code} > // It is has one stream, it is the post body, put the params in the URL > else { > String pstr = toQueryString(wparams, false); > HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == > request.getMethod() ? > new HttpPost(url + pstr) : new HttpPut(url + pstr); > {code} > There is "pstr" field appended to the URL. "pstr" field have all Solr params. > It contains email content. We have "URI is too large" error when email has > large content. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated CONNECTORS-1410: -- Attachment: CONNECTORS-1410.patch > Binary Attachment Data as Plain Text at Email Content > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.8 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated CONNECTORS-1410: -- Attachment: (was: CONNECTORS-1410.patch) > Binary Attachment Data as Plain Text at Email Content > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.8 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1408) Request-URI Too Long
[ https://issues.apache.org/jira/browse/CONNECTORS-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970066#comment-15970066 ] Cihad Guzel commented on CONNECTORS-1408: - {code} boolean isMultipart = ((this.useMultiPartPost && SolrRequest.METHOD.POST == request.getMethod()) || (streams != null && streams.size() > 1)) && !hasNullStreamName; LinkedList postOrPutParams = new LinkedList<>(); if (streams == null || isMultipart) { {code} "isMultipart" is false for this case. So, run always the else block. {code} ... else { String pstr = toQueryString(wparams, false); HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == request.getMethod() ? new HttpPost(url + pstr) : new HttpPut(url + pstr); {code} The lines you said are in the "if" block. > Request-URI Too Long > > > Key: CONNECTORS-1408 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1408 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector, Solr 6.x component >Affects Versions: ManifoldCF 2.6 >Reporter: Cihad Guzel >Assignee: Karl Wright > Fix For: ManifoldCF 2.8 > > Attachments: http-wire2.log, http-wire.log > > > I run email connector job and follow "Simple History" from UI. I see an error > as follow: > {code} > Error from server at http://localhost:8983/solr/mycore: non ok status: 414, > message:Request-URI Too Long > {code} > It is sent by Solr. > Solr logs say: > {code} > HttpParser - URI is too large >8192 > {code} > and > {code} > HttpParser - bad HTTP parsed: 414 for > HttpChannelOverHttp@2b6931dd{r=0,c=false,a=IDLE,uri=null} > > {code} > ManifoldCF ModifiedHttpSolrClient.java has following code: > {code} > // It is has one stream, it is the post body, put the params in the URL > else { > String pstr = toQueryString(wparams, false); > HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == > request.getMethod() ? > new HttpPost(url + pstr) : new HttpPut(url + pstr); > {code} > There is "pstr" field appended to the URL. "pstr" field have all Solr params. > It contains email content. We have "URI is too large" error when email has > large content. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970063#comment-15970063 ] Furkan KAMACI commented on CONNECTORS-1410: --- [~kwri...@metacarta.com] No, I claim that we already get Body via: {code:java} ... mbp.getContent().toString() ... {code} So, I've just set it as content too. You can check my updated patch. > Binary Attachment Data as Plain Text at Email Content > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.8 > > Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CONNECTORS-1408) Request-URI Too Long
[ https://issues.apache.org/jira/browse/CONNECTORS-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cihad Guzel updated CONNECTORS-1408: Attachment: http-wire2.log I've attached new log file > Request-URI Too Long > > > Key: CONNECTORS-1408 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1408 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector, Solr 6.x component >Affects Versions: ManifoldCF 2.6 >Reporter: Cihad Guzel >Assignee: Karl Wright > Fix For: ManifoldCF 2.8 > > Attachments: http-wire2.log, http-wire.log > > > I run email connector job and follow "Simple History" from UI. I see an error > as follow: > {code} > Error from server at http://localhost:8983/solr/mycore: non ok status: 414, > message:Request-URI Too Long > {code} > It is sent by Solr. > Solr logs say: > {code} > HttpParser - URI is too large >8192 > {code} > and > {code} > HttpParser - bad HTTP parsed: 414 for > HttpChannelOverHttp@2b6931dd{r=0,c=false,a=IDLE,uri=null} > > {code} > ManifoldCF ModifiedHttpSolrClient.java has following code: > {code} > // It is has one stream, it is the post body, put the params in the URL > else { > String pstr = toQueryString(wparams, false); > HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == > request.getMethod() ? > new HttpPost(url + pstr) : new HttpPut(url + pstr); > {code} > There is "pstr" field appended to the URL. "pstr" field have all Solr params. > It contains email content. We have "URI is too large" error when email has > large content. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1408) Request-URI Too Long
[ https://issues.apache.org/jira/browse/CONNECTORS-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969978#comment-15969978 ] Karl Wright commented on CONNECTORS-1408: - [~cguzel], I don't even see the verb in the request. I would expect to see POST, but it's not there. The content type is not the multipart form: {code} DEBUG 2017-04-14 18:14:20,968 (Thread-8588) - http-outgoing-8 >> "Content-Type: text/plain[\r][\n]" {code} This is a complete mystery to me at this point; it doesn't look like any HTTP request I've ever seen before. > Request-URI Too Long > > > Key: CONNECTORS-1408 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1408 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector, Solr 6.x component >Affects Versions: ManifoldCF 2.6 >Reporter: Cihad Guzel >Assignee: Karl Wright > Fix For: ManifoldCF 2.8 > > Attachments: http-wire.log > > > I run email connector job and follow "Simple History" from UI. I see an error > as follow: > {code} > Error from server at http://localhost:8983/solr/mycore: non ok status: 414, > message:Request-URI Too Long > {code} > It is sent by Solr. > Solr logs say: > {code} > HttpParser - URI is too large >8192 > {code} > and > {code} > HttpParser - bad HTTP parsed: 414 for > HttpChannelOverHttp@2b6931dd{r=0,c=false,a=IDLE,uri=null} > > {code} > ManifoldCF ModifiedHttpSolrClient.java has following code: > {code} > // It is has one stream, it is the post body, put the params in the URL > else { > String pstr = toQueryString(wparams, false); > HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == > request.getMethod() ? > new HttpPost(url + pstr) : new HttpPut(url + pstr); > {code} > There is "pstr" field appended to the URL. "pstr" field have all Solr params. > It contains email content. We have "URI is too large" error when email has > large content. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
[ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969959#comment-15969959 ] Furkan KAMACI commented on CONNECTORS-1410: --- [~kwri...@metacarta.com] What do you think about this fix? > Binary Attachment Data as Plain Text at Email Content > - > > Key: CONNECTORS-1410 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector >Affects Versions: ManifoldCF 2.6 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: ManifoldCF 2.8 > > Attachments: CONNECTORS-1410.patch > > > Previously, we were indexing e-mails and its attachments together. We changed > this logic with CONNECTORS-1375 as indexing e-mail and its attachments > separately. > However, there is a problem. Content fields of emails which has attachment(s) > includes both body and attachments's binary content as plain text. > As we index attachments separately, we can just index body as content instead > of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
Furkan KAMACI created CONNECTORS-1410: - Summary: Binary Attachment Data as Plain Text at Email Content Key: CONNECTORS-1410 URL: https://issues.apache.org/jira/browse/CONNECTORS-1410 Project: ManifoldCF Issue Type: Bug Components: Email connector Affects Versions: ManifoldCF 2.6 Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: ManifoldCF 2.8 Previously, we were indexing e-mails and its attachments together. We changed this logic with CONNECTORS-1375 as indexing e-mail and its attachments separately. However, there is a problem. Content fields of emails which has attachment(s) includes both body and attachments's binary content as plain text. As we index attachments separately, we can just index body as content instead of appending email body and all attachments' binary data as plain text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1290) Nuxeo Repository and Authority Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969940#comment-15969940 ] Karl Wright commented on CONNECTORS-1290: - There are two major issues with the CONNECTORS-1290-3 code: (1) Logging no longer works for MCF in general, and (2) the Nuxeo connector tests fail, as follows: {code} Running org.apache.manifoldcf.crawler.connectors.nuxeo.tests.NuxeoConnectorTest Tests run: 6, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.455 sec <<< FAILURE! - in org.apache.manifoldcf.crawler.connectors.nuxeo.tests.NuxeoConnectorTest mockSimpleIngestion(org.apache.manifoldcf.crawler.connectors.nuxeo.tests.NuxeoConnectorTest) Time elapsed: 0.009 sec <<< ERROR! java.lang.NullPointerException: null at org.apache.manifoldcf.crawler.connectors.nuxeo.NuxeoRepositoryConnector.getUrl(NuxeoRepositoryConnector.java:364) at org.apache.manifoldcf.crawler.connectors.nuxeo.NuxeoRepositoryConnector.processDocumentInternal(NuxeoRepositoryConnector.java:619) at org.apache.manifoldcf.crawler.connectors.nuxeo.NuxeoRepositoryConnector.processDocument(NuxeoRepositoryConnector.java:549) at org.apache.manifoldcf.crawler.connectors.nuxeo.NuxeoRepositoryConnector.processDocuments(NuxeoRepositoryConnector.java:515) at org.apache.manifoldcf.crawler.connectors.nuxeo.tests.NuxeoConnectorTest.mockSimpleIngestion(NuxeoConnectorTest.java:171) checkMockInjection(org.apache.manifoldcf.crawler.connectors.nuxeo.tests.NuxeoConnectorTest) Time elapsed: 0.006 sec <<< ERROR! org.apache.manifoldcf.core.interfaces.ManifoldCFException: Parameter protocol required but not set at org.apache.manifoldcf.crawler.connectors.nuxeo.NuxeoRepositoryConnector.initNuxeoClient(NuxeoRepositoryConnector.java:322) at org.apache.manifoldcf.crawler.connectors.nuxeo.NuxeoRepositoryConnector.check(NuxeoRepositoryConnector.java:303) at org.apache.manifoldcf.crawler.connectors.nuxeo.tests.NuxeoConnectorTest.checkMockInjection(NuxeoConnectorTest.java:68) Results : Tests in error: NuxeoAuthorityTest.check:48 » ManifoldCF Parameter protocol required but not s... NuxeoConnectorTest.mockSimpleIngestion:171 » NullPointer NuxeoConnectorTest.checkMockInjection:68 » ManifoldCF Parameter protocol requi... Tests run: 9, Failures: 0, Errors: 3, Skipped: 0 {code} > Nuxeo Repository and Authority Connector > > > Key: CONNECTORS-1290 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1290 > Project: ManifoldCF > Issue Type: Improvement >Affects Versions: ManifoldCF 2.3 >Reporter: Antonio David Pérez Morales >Assignee: Antonio David Pérez Morales > Labels: gsoc, gsoc2016, mentor > Fix For: ManifoldCF 2.8 > > > Nuxeo [1] provides an open source content management system platform, also > known as the Nuxeo Platform, which offers different types of information > management solutions such as Document Management, Digital Asset Management, > and Case Management. > Nuxeo provides a REST API [2] allowing developers to access documents > information using standard communication and transport protocols. > Furthermore, Nuxeo provides some adapters to the REST API to provide further > information, like document ACLs [3]. > [1] https://doc.nuxeo.com/display/NXDOC/Quick+Overview > [2] https://doc.nuxeo.com/display/NXDOC/REST+API > [3] https://doc.nuxeo.com/display/NXDOC/Web+Adapters+for+the+REST+API -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1408) Request-URI Too Long
[ https://issues.apache.org/jira/browse/CONNECTORS-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969930#comment-15969930 ] Karl Wright commented on CONNECTORS-1408: - No feedback, and need to get release process started, so moving this to 2.8. > Request-URI Too Long > > > Key: CONNECTORS-1408 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1408 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector, Solr 6.x component >Affects Versions: ManifoldCF 2.6 >Reporter: Cihad Guzel >Assignee: Karl Wright > Fix For: ManifoldCF 2.8 > > > I run email connector job and follow "Simple History" from UI. I see an error > as follow: > {code} > Error from server at http://localhost:8983/solr/mycore: non ok status: 414, > message:Request-URI Too Long > {code} > It is sent by Solr. > Solr logs say: > {code} > HttpParser - URI is too large >8192 > {code} > and > {code} > HttpParser - bad HTTP parsed: 414 for > HttpChannelOverHttp@2b6931dd{r=0,c=false,a=IDLE,uri=null} > > {code} > ManifoldCF ModifiedHttpSolrClient.java has following code: > {code} > // It is has one stream, it is the post body, put the params in the URL > else { > String pstr = toQueryString(wparams, false); > HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == > request.getMethod() ? > new HttpPost(url + pstr) : new HttpPut(url + pstr); > {code} > There is "pstr" field appended to the URL. "pstr" field have all Solr params. > It contains email content. We have "URI is too large" error when email has > large content. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CONNECTORS-1408) Request-URI Too Long
[ https://issues.apache.org/jira/browse/CONNECTORS-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1408: Fix Version/s: (was: ManifoldCF 2.7) ManifoldCF 2.8 > Request-URI Too Long > > > Key: CONNECTORS-1408 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1408 > Project: ManifoldCF > Issue Type: Bug > Components: Email connector, Solr 6.x component >Affects Versions: ManifoldCF 2.6 >Reporter: Cihad Guzel >Assignee: Karl Wright > Fix For: ManifoldCF 2.8 > > > I run email connector job and follow "Simple History" from UI. I see an error > as follow: > {code} > Error from server at http://localhost:8983/solr/mycore: non ok status: 414, > message:Request-URI Too Long > {code} > It is sent by Solr. > Solr logs say: > {code} > HttpParser - URI is too large >8192 > {code} > and > {code} > HttpParser - bad HTTP parsed: 414 for > HttpChannelOverHttp@2b6931dd{r=0,c=false,a=IDLE,uri=null} > > {code} > ManifoldCF ModifiedHttpSolrClient.java has following code: > {code} > // It is has one stream, it is the post body, put the params in the URL > else { > String pstr = toQueryString(wparams, false); > HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == > request.getMethod() ? > new HttpPost(url + pstr) : new HttpPut(url + pstr); > {code} > There is "pstr" field appended to the URL. "pstr" field have all Solr params. > It contains email content. We have "URI is too large" error when email has > large content. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1290) Nuxeo Repository and Authority Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969894#comment-15969894 ] Karl Wright commented on CONNECTORS-1290: - I've rebased again and modified the code so it's 2.8-SNAPSHOT level. The branch name is branches/CONNECTORS-1290-3. > Nuxeo Repository and Authority Connector > > > Key: CONNECTORS-1290 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1290 > Project: ManifoldCF > Issue Type: Improvement >Affects Versions: ManifoldCF 2.3 >Reporter: Antonio David Pérez Morales >Assignee: Antonio David Pérez Morales > Labels: gsoc, gsoc2016, mentor > Fix For: ManifoldCF 2.8 > > > Nuxeo [1] provides an open source content management system platform, also > known as the Nuxeo Platform, which offers different types of information > management solutions such as Document Management, Digital Asset Management, > and Case Management. > Nuxeo provides a REST API [2] allowing developers to access documents > information using standard communication and transport protocols. > Furthermore, Nuxeo provides some adapters to the REST API to provide further > information, like document ACLs [3]. > [1] https://doc.nuxeo.com/display/NXDOC/Quick+Overview > [2] https://doc.nuxeo.com/display/NXDOC/REST+API > [3] https://doc.nuxeo.com/display/NXDOC/Web+Adapters+for+the+REST+API -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CONNECTORS-1290) Nuxeo Repository and Authority Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969891#comment-15969891 ] Karl Wright commented on CONNECTORS-1290: - The branch (CONNECTORS-1290-2) has been updated with the new code. It turns out that (a) the current client dependency (2.5-SNAPSHOT) is unavailable; (b) there needed to be huge formatting changes (tabs->spaces specifically); (c) there is a dependency on log4j 2.x which basically required porting all of MCF to log4j 2.x, and (d) there were (and are) significant code deficiencies also, some of which I addressed, but others which I didn't have time for. Based especially on (a) and (c), I've decided we'll have to postpone this until 2.8. But I really don't want my work to go away. Can we make sure to use this branch as a basis for future refinements please? > Nuxeo Repository and Authority Connector > > > Key: CONNECTORS-1290 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1290 > Project: ManifoldCF > Issue Type: Improvement >Affects Versions: ManifoldCF 2.3 >Reporter: Antonio David Pérez Morales >Assignee: Antonio David Pérez Morales > Labels: gsoc, gsoc2016, mentor > Fix For: ManifoldCF 2.8 > > > Nuxeo [1] provides an open source content management system platform, also > known as the Nuxeo Platform, which offers different types of information > management solutions such as Document Management, Digital Asset Management, > and Case Management. > Nuxeo provides a REST API [2] allowing developers to access documents > information using standard communication and transport protocols. > Furthermore, Nuxeo provides some adapters to the REST API to provide further > information, like document ACLs [3]. > [1] https://doc.nuxeo.com/display/NXDOC/Quick+Overview > [2] https://doc.nuxeo.com/display/NXDOC/REST+API > [3] https://doc.nuxeo.com/display/NXDOC/Web+Adapters+for+the+REST+API -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CONNECTORS-1290) Nuxeo Repository and Authority Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-1290: Fix Version/s: (was: ManifoldCF 2.7) ManifoldCF 2.8 > Nuxeo Repository and Authority Connector > > > Key: CONNECTORS-1290 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1290 > Project: ManifoldCF > Issue Type: Improvement >Affects Versions: ManifoldCF 2.3 >Reporter: Antonio David Pérez Morales >Assignee: Antonio David Pérez Morales > Labels: gsoc, gsoc2016, mentor > Fix For: ManifoldCF 2.8 > > > Nuxeo [1] provides an open source content management system platform, also > known as the Nuxeo Platform, which offers different types of information > management solutions such as Document Management, Digital Asset Management, > and Case Management. > Nuxeo provides a REST API [2] allowing developers to access documents > information using standard communication and transport protocols. > Furthermore, Nuxeo provides some adapters to the REST API to provide further > information, like document ACLs [3]. > [1] https://doc.nuxeo.com/display/NXDOC/Quick+Overview > [2] https://doc.nuxeo.com/display/NXDOC/REST+API > [3] https://doc.nuxeo.com/display/NXDOC/Web+Adapters+for+the+REST+API -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: I would like to put together an MCF 2.7 release candidate by the end of the week
Yes, I agree. I put some effort into trying to get ManifoldCF to work with log4j 2.x, and was partly successful, but logs still are going to stdout so obviously I'm not done yet. It's too destabilizing; this will have to wait until 2.8. Karl On Sat, Apr 15, 2017 at 5:12 AM, Rafa Harowrote: > Hi Karl, > > I honestly have just downloaded the new connector from the student GitHub, > I have not checked anything else, so that's the main source of the > problems. I know he has been testing it so far by packaging it in a single > jar and including it into manifold classpath > > I would suggest to then not include it in this release and leave me > sometime to properly test and fix it > > Wdyt? > > El El sáb, 15 abr 2017 a las 7:45, Karl Wright > escribió: > >> FWIW, the reason for the job crawl failure is due to the version mismatch >> for log4j, so we'll need to figure out what to do about that: >> >> >> >> org.nuxeo.client.internals.spi.NuxeoClientException: >> org/apache/logging/log4j/Lo >> gManager >> at org.nuxeo.client.api.objects.NuxeoEntity.getCall( >> NuxeoEntity.java:223 >> ) >> at org.nuxeo.client.api.objects.NuxeoEntity.getResponse( >> NuxeoEntity.java >> :120) >> at org.nuxeo.client.api.objects.Repository.query(Repository. >> java:142) >> at org.apache.manifoldcf.crawler.connectors.nuxeo. >> NuxeoRepositoryConnect >> or.getDocsByDate(NuxeoRepositoryConnector.java:484) >> at org.apache.manifoldcf.crawler.connectors.nuxeo. >> NuxeoRepositoryConnect >> or.addSeedDocuments(NuxeoRepositoryConnector.java:409) >> at org.apache.manifoldcf.crawler.system.StartupThread.run( >> StartupThread. >> java:154) >> Caused by: java.lang.reflect.InvocationTargetException >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke( >> NativeMethodAccessorImpl. >> java:62) >> at sun.reflect.DelegatingMethodAccessorImpl. >> invoke(DelegatingMethodAcces >> sorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:483) >> at org.nuxeo.client.api.objects.NuxeoEntity.getCall( >> NuxeoEntity.java:217 >> ) >> ... 5 more >> Caused by: java.lang.NoClassDefFoundError: org/apache/logging/log4j/ >> LogManager >> at org.nuxeo.client.api.marshaller.NuxeoResponseConverterFactory. >> > >(NuxeoResponseConverterFactory.java:56) >> at org.nuxeo.client.api.marshaller.NuxeoConverterFactory. >> responseBodyCon >> verter(NuxeoConverterFactory.java:74) >> at retrofit2.Retrofit.nextResponseBodyConverter( >> Retrofit.java:325) >> at retrofit2.Retrofit.responseBodyConverter(Retrofit.java:308) >> at retrofit2.ServiceMethod$Builder.createResponseConverter( >> ServiceMethod >> .java:679) >> at retrofit2.ServiceMethod$Builder.build(ServiceMethod.java:166) >> at retrofit2.Retrofit.loadServiceMethod(Retrofit.java:166) >> at retrofit2.Retrofit$1.invoke(Retrofit.java:145) >> at com.sun.proxy.$Proxy2.query(Unknown Source) >> ... 10 more >> Caused by: java.lang.ClassNotFoundException: org.apache.logging.log4j. >> LogManager >> >> at java.net.URLClassLoader$1.run(URLClassLoader.java:372) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java: >> 798) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> ... 19 more >> Exception: >> status: 666 >> type: Error >> exception: org/apache/logging/log4j/LogManager >> throwable: java.lang.reflect.InvocationTargetException >> << >> >> Karl >> >> >> On Sat, Apr 15, 2017 at 1:41 AM, Karl Wright wrote: >> >>> Well, when I run it using made-up host names, the job dies with the >>> following: >>> >>> >> >>> Error: Repeated service interruptions during startup: Could not >>> initialize class org.nuxeo.client.api.marshaller. >>> NuxeoResponseConverterFactory >>> << >>> >>> This is against client version 2.5, not version 2.5-SNAPSHOT, which I >>> cannot get. >>> >>> I would say that the connector is not yet ready to ship, both because of >>> the dependency issues and because of the execution problems. As far as the >>> code itself is concerned, I think it needs work in the area of exception >>> handling; it treats all errors as being service interruptions, which is >>> really not going to cut it. >>> >>> Karl >>> >>> On Fri, Apr 14, 2017 at 8:49 PM, Karl Wright wrote: >>> >> There were a lot of problems with the code -- everything from missing Apache headers to broken HTML to broken password management. I've fixed what I could find. It
Re: I would like to put together an MCF 2.7 release candidate by the end of the week
Hi Karl, I honestly have just downloaded the new connector from the student GitHub, I have not checked anything else, so that's the main source of the problems. I know he has been testing it so far by packaging it in a single jar and including it into manifold classpath I would suggest to then not include it in this release and leave me sometime to properly test and fix it Wdyt? El El sáb, 15 abr 2017 a las 7:45, Karl Wrightescribió: > FWIW, the reason for the job crawl failure is due to the version mismatch > for log4j, so we'll need to figure out what to do about that: > > >> > org.nuxeo.client.internals.spi.NuxeoClientException: > org/apache/logging/log4j/Lo > gManager > at > org.nuxeo.client.api.objects.NuxeoEntity.getCall(NuxeoEntity.java:223 > ) > at > org.nuxeo.client.api.objects.NuxeoEntity.getResponse(NuxeoEntity.java > :120) > at > org.nuxeo.client.api.objects.Repository.query(Repository.java:142) > at > org.apache.manifoldcf.crawler.connectors.nuxeo.NuxeoRepositoryConnect > or.getDocsByDate(NuxeoRepositoryConnector.java:484) > at > org.apache.manifoldcf.crawler.connectors.nuxeo.NuxeoRepositoryConnect > or.addSeedDocuments(NuxeoRepositoryConnector.java:409) > at > org.apache.manifoldcf.crawler.system.StartupThread.run(StartupThread. > java:154) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. > java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces > sorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.nuxeo.client.api.objects.NuxeoEntity.getCall(NuxeoEntity.java:217 > ) > ... 5 more > Caused by: java.lang.NoClassDefFoundError: > org/apache/logging/log4j/LogManager > at > org.nuxeo.client.api.marshaller.NuxeoResponseConverterFactory. >(NuxeoResponseConverterFactory.java:56) > at > org.nuxeo.client.api.marshaller.NuxeoConverterFactory.responseBodyCon > verter(NuxeoConverterFactory.java:74) > at retrofit2.Retrofit.nextResponseBodyConverter(Retrofit.java:325) > at retrofit2.Retrofit.responseBodyConverter(Retrofit.java:308) > at > retrofit2.ServiceMethod$Builder.createResponseConverter(ServiceMethod > .java:679) > at retrofit2.ServiceMethod$Builder.build(ServiceMethod.java:166) > at retrofit2.Retrofit.loadServiceMethod(Retrofit.java:166) > at retrofit2.Retrofit$1.invoke(Retrofit.java:145) > at com.sun.proxy.$Proxy2.query(Unknown Source) > ... 10 more > Caused by: java.lang.ClassNotFoundException: > org.apache.logging.log4j.LogManager > > at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at > java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:798) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 19 more > Exception: > status: 666 > type: Error > exception: org/apache/logging/log4j/LogManager > throwable: java.lang.reflect.InvocationTargetException > << > > Karl > > > On Sat, Apr 15, 2017 at 1:41 AM, Karl Wright wrote: > >> Well, when I run it using made-up host names, the job dies with the >> following: >> >> >> >> Error: Repeated service interruptions during startup: Could not >> initialize class >> org.nuxeo.client.api.marshaller.NuxeoResponseConverterFactory >> << >> >> This is against client version 2.5, not version 2.5-SNAPSHOT, which I >> cannot get. >> >> I would say that the connector is not yet ready to ship, both because of >> the dependency issues and because of the execution problems. As far as the >> code itself is concerned, I think it needs work in the area of exception >> handling; it treats all errors as being service interruptions, which is >> really not going to cut it. >> >> Karl >> >> On Fri, Apr 14, 2017 at 8:49 PM, Karl Wright wrote: >> > There were a lot of problems with the code -- everything from missing >>> Apache headers to broken HTML to broken password management. I've fixed >>> what I could find. It would be great if someone could try this connector >>> out enough to confirm that it's working. >>> >>> It's also a bit disconcerting that you always get back "connection >>> working". Is there any way to check the connection? >>> >>> Karl >>> >>> On Fri, Apr 14, 2017 at 6:33 PM, Karl Wright wrote: >>> >> Ok, I've added everything needed. There is a version difference that I don't want to tackle though right now: the nuxeo client