Thanks! I'll apply it and let you know how it goes. On Mon, Feb 8, 2016 at 6:51 PM, Karl Wright <[email protected]> wrote:
> Ok, I have a patch. It's actually pretty tiny; the bug is in our code, > not Commons-IO, but Commons-IO changed things so that it tweaked it. > > I've created a ticket (CONNECTORS-1271) and attached the patch to it. > > Thanks! > Karl > > > On Mon, Feb 8, 2016 at 4:27 PM, Karl Wright <[email protected]> wrote: > >> I have chased this down to a completely broken Apache Commons-IO >> library. It no longer works with the JSONReader objects in ManifoldCF at >> all, and refuses to read anything from them. Unfortunately I can't change >> versions of that library because other things depend upon it. So I'll need >> to write my own code to replace its functionality. That will take some >> amount of time to do. >> >> This probably happened the last time our dependencies were updated. My >> apologies. >> >> Karl >> >> >> On Mon, Feb 8, 2016 at 4:18 PM, Juan Pablo Diaz-Vaz < >> [email protected]> wrote: >> >>> Thanks, >>> >>> Don't know if it'll help, but removing the usage of JSONObjectReader on >>> addOrReplaceDocumentWithException and posting to Amazon chunk-by-chunk >>> instead of using the JSONArrayReader on flushDocuments, changed the error I >>> was getting from Amazon. >>> >>> Maybe those objects are failing on parsing the content to JSON. >>> >>> On Mon, Feb 8, 2016 at 6:04 PM, Karl Wright <[email protected]> wrote: >>> >>>> Ok, I'm debugging away, and I can confirm that no data is getting >>>> through. I'll have to open a ticket and create a patch when I find the >>>> problem. >>>> >>>> Karl >>>> >>>> >>>> On Mon, Feb 8, 2016 at 3:15 PM, Juan Pablo Diaz-Vaz < >>>> [email protected]> wrote: >>>> >>>>> Thank you very much. >>>>> >>>>> On Mon, Feb 8, 2016 at 5:13 PM, Karl Wright <[email protected]> >>>>> wrote: >>>>> >>>>>> Ok, thanks, this is helpful -- it clearly sounds like Amazon is >>>>>> unhappy about the JSON format we are sending it. The deprecation message >>>>>> is probably a strong clue. I'll experiment here with logging document >>>>>> contents so that I can give you further advice. Stay tuned. >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> On Mon, Feb 8, 2016 at 3:07 PM, Juan Pablo Diaz-Vaz < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> I'm actually not seeing anything on Amazon. The CloudSearch >>>>>>> connector fails when sending the request to amazon cloudsearch: >>>>>>> >>>>>>> AmazonCloudSearch: Error sending document chunk 0: '{"status": >>>>>>> "error", "errors": [{"message": "[*Deprecated*: Use the outer message >>>>>>> field] Encountered unexpected end of file"}], "adds": 0, "__type": >>>>>>> "#DocumentServiceException", "message": "{ [\"Encountered unexpected >>>>>>> end of >>>>>>> file\"] }", "deletes": 0}' >>>>>>> >>>>>>> ERROR 2016-02-08 20:04:16,544 (Job notification thread) - >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 8, 2016 at 5:00 PM, Karl Wright <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> If you can possibly include a snippet of the JSON you are seeing on >>>>>>>> the Amazon end, that would be great. >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 8, 2016 at 2:45 PM, Karl Wright <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> More likely this is a bug. >>>>>>>>> >>>>>>>>> I take it that it is the body string that is not coming out, >>>>>>>>> correct? Do all the other JSON fields look reasonable? Does the body >>>>>>>>> clause exist and is just empty, or is it not there at all? >>>>>>>>> >>>>>>>>> Karl >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 8, 2016 at 2:36 PM, Juan Pablo Diaz-Vaz < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> When running a copy of the job, but with SOLR as a target, I'm >>>>>>>>>> seeing the expected content being posted to SOLR, so it may not be >>>>>>>>>> an issue >>>>>>>>>> with TIKA. After adding some more logging to the CloudSearch >>>>>>>>>> connector, I >>>>>>>>>> think the data is getting lost just before passing it to the >>>>>>>>>> DocumentChunkManager, which inserts the empty records to the DB. >>>>>>>>>> Could it >>>>>>>>>> be that the JSONObjectReader doesn't like my data? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> On Mon, Feb 8, 2016 at 3:48 PM, Karl Wright <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Juan, >>>>>>>>>>> >>>>>>>>>>> I'd try to reproduce as much of the pipeline as possible using a >>>>>>>>>>> solr output connection. If you include the tika extractor in the >>>>>>>>>>> pipeline, >>>>>>>>>>> you will want to configure the solr connection to not use the >>>>>>>>>>> extracting >>>>>>>>>>> update handler. There's a checkbox on the Schema tab you need to >>>>>>>>>>> uncheck >>>>>>>>>>> for that. But if you do that you can see what is being sent to >>>>>>>>>>> Solr pretty >>>>>>>>>>> exactly; it all gets logged in the INFO messages dumped to solr >>>>>>>>>>> log. This >>>>>>>>>>> should help you figure out if the problem is your tika >>>>>>>>>>> configuration or not. >>>>>>>>>>> >>>>>>>>>>> Please give this a try and let me know what happens. >>>>>>>>>>> >>>>>>>>>>> Karl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Feb 8, 2016 at 1:28 PM, Juan Pablo Diaz-Vaz < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I've successfully sent data to FileSystems and SOLR, but for >>>>>>>>>>>> Amazon CloudSearch I'm seeing that only empty messages are being >>>>>>>>>>>> sent to my >>>>>>>>>>>> domain. I think this may be an issue on how I've setup the TIKA >>>>>>>>>>>> Extractor >>>>>>>>>>>> Transformation or the field mapping. I think the Database where >>>>>>>>>>>> the records >>>>>>>>>>>> are supposed to be stored before flushing to Amazon, is storing >>>>>>>>>>>> empty >>>>>>>>>>>> content. >>>>>>>>>>>> >>>>>>>>>>>> I've tried to find documentation on how to setup the TIKA >>>>>>>>>>>> Transformation, but I haven't been able to find any. >>>>>>>>>>>> >>>>>>>>>>>> If someone could provide an example of a job setup to send from >>>>>>>>>>>> a FileSystem to CloudSearch, that'd be great! >>>>>>>>>>>> >>>>>>>>>>>> Thanks in advance, >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Juan Pablo Diaz-Vaz Varas, >>>>>>>>>>>> Full Stack Developer - MC+A Chile >>>>>>>>>>>> +56 9 84265890 >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Juan Pablo Diaz-Vaz Varas, >>>>>>>>>> Full Stack Developer - MC+A Chile >>>>>>>>>> +56 9 84265890 >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Juan Pablo Diaz-Vaz Varas, >>>>>>> Full Stack Developer - MC+A Chile >>>>>>> +56 9 84265890 >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Juan Pablo Diaz-Vaz Varas, >>>>> Full Stack Developer - MC+A Chile >>>>> +56 9 84265890 >>>>> >>>> >>>> >>> >>> >>> -- >>> Juan Pablo Diaz-Vaz Varas, >>> Full Stack Developer - MC+A Chile >>> +56 9 84265890 >>> >> >> > -- Juan Pablo Diaz-Vaz Varas, Full Stack Developer - MC+A Chile +56 9 84265890
