Hi Karl, I added LOG line for testing. It looks attachmentIndex is null.
2017-02-08 0:11 GMT+03:00 Karl Wright <[email protected]>: > I attached a second patch (to apply on top of the first patch). Please > let me know if that fixes the issue. > > Karl > > > On Tue, Feb 7, 2017 at 3:59 PM, Cihad Guzel <[email protected]> wrote: > >> Hi Karl, >> >> I have an error as follow: >> >> FATAL 2017-02-07 23:56:09,483 (Worker thread '29') - Error tossed: For >> input string: "myFolder/test:<CADNgPDgSXHeWo >> [email protected]>" >> java.lang.NumberFormatException: For input string: "myFolder/test:< >> cadngpdgsxhewo0gdnul6s2sogusxua9mx2wxot23wi37hog...@mail.gmail.com>" >> at java.lang.NumberFormatException.forInputString(NumberFormatE >> xception.java:65) >> at java.lang.Integer.parseInt(Integer.java:580) >> at java.lang.Integer.parseInt(Integer.java:615) >> at org.apache.manifoldcf.crawler.connectors.email.EmailConnecto >> r.processDocuments(EmailConnector.java:705) >> at org.apache.manifoldcf.crawler.system.WorkerThread.run(Worker >> Thread.java:399) >> >> >> 2017-02-07 22:50 GMT+03:00 Cihad Guzel <[email protected]>: >> >>> Thanks Karl, >>> >>> I will try it. >>> >>> Regards >>> Cihad Guzel >>> >>> 2017-02-07 22:36 GMT+03:00 Karl Wright <[email protected]>: >>> >>>> I've created a ticket and attached a patch to it. CONNECTORS-1375. >>>> Please let me know if it works for you; if not, I'll fix what doesn't work. >>>> >>>> Karl >>>> >>>> >>>> On Tue, Feb 7, 2017 at 1:19 PM, Karl Wright <[email protected]> wrote: >>>> >>>>> Correction: the only metadata attribute we set is the attachment(s) >>>>> mimetype (as a multivalued field) -- this doesn't currently include the >>>>> attachment data. >>>>> >>>>> Karl >>>>> >>>>> >>>>> On Tue, Feb 7, 2017 at 1:14 PM, Karl Wright <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Cihad, >>>>>> >>>>>> The email connector is providing the attachment data unextracted to >>>>>> the output connector as metadata attribute data. There are no >>>>>> transformation connectors that look at this metadata. Solr cell also >>>>>> probably does not handle binary in random metadata attributes the proper >>>>>> way. >>>>>> >>>>>> The connector's attachment code therefore seems to be designed only >>>>>> to deal with textual attachments. The right solution is to have >>>>>> individual >>>>>> IDs for each attachment. But that would also require there to be a URL >>>>>> we >>>>>> could construct for each attachment. We could provide an additional URI >>>>>> template for attachments, but I'd wonder if your system has the ability >>>>>> to >>>>>> serve attachments by their own URLs. Please let me know if this would >>>>>> work >>>>>> and if so I can create a ticket and work on making these changes. >>>>>> >>>>>> Thanks, >>>>>> Karl >>>>>> >>>>>> >>>>>> On Tue, Feb 7, 2017 at 12:56 PM, Cihad Guzel <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I try the email connector with gmail. I attach the file [1] in my >>>>>>> new email. And sent to my test email adress. >>>>>>> >>>>>>> My mail content body is like: "this is test mail for mfc" >>>>>>> >>>>>>> Then I run my email job and the email is indexed to Solr >>>>>>> successfully. But, the solr's content field have not my attachment's >>>>>>> content body. Solr content filed looks like: >>>>>>> >>>>>>> "content":" \n \n \n \n \n \n \n \n \n \n >>>>>>> --94eb2c1910841bc55f0547f43443\r\nContent-Type: >>>>>>> multipart/alternative; boundary=94eb2c1910841bc553054 >>>>>>> 7f43441\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type: >>>>>>> text/plain; charset=UTF-8\r\n\r\nthis is test mail for >>>>>>> mfc.\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type: >>>>>>> text/html; charset=UTF-8\r\n\r\n<div dir=\"ltr\">this is test mail for >>>>>>> mfc.\r\n</div>\r\n\r\n--94eb2c1910841bc5530547f43441--\r\n-- >>>>>>> 94eb2c1910841bc55f0547f43443\r\nContent-Type: application/pdf; >>>>>>> name=\"pdf-test.pdf\"\r\nContent-Disposition: attachment; >>>>>>> filename=\"pdf-test.pdf\"\r\nContent-Transfer-Encoding: >>>>>>> base64\r\nX-Attachment-Id: f_iyvt78qa0\r\n\r\nJVBERi0xLjY >>>>>>> NJeLjz9MNCjM3IDAgb2JqIDw8L0xpbmVhcml6ZWQgMS9MIDIwNTk3L08gNDA >>>>>>> vRSAx\r\nNDExNS9OIDEvVCAxOTc5NS9IIFsgMTAwNSAyMTVdPj4NZW5kb2J >>>>>>> qDSAgICAgICAgICAgICAgICAg\r\nDQp4cmVmDQozNyAzNA0KMDAwMDAwMDA >>>>>>> xNiAwMDAwMCBuDQowMDAwMDAxMzg2IDAwMDAwIG4NCjAw\r\nMDAwMDE1MjIgMDAwM >>>>>>> ..." >>>>>>> >>>>>>> Does the MFC email connector know that the attachment's file type is >>>>>>> pdf? Does not extract the contents? >>>>>>> >>>>>>> [1] http://www.orimi.com/pdf-test.pdf >>>>>>> -- >>>>>>> Regards >>>>>>> Cihad Güzel >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> Teşekkürler >>> Cihad Güzel >>> >> >> >> >> -- >> Teşekkürler >> Cihad Güzel >> > > -- Teşekkürler Cihad Güzel
