Ok, I'm debugging away, and I can confirm that no data is getting through. I'll have to open a ticket and create a patch when I find the problem.
Karl On Mon, Feb 8, 2016 at 3:15 PM, Juan Pablo Diaz-Vaz <[email protected]> wrote: > Thank you very much. > > On Mon, Feb 8, 2016 at 5:13 PM, Karl Wright <[email protected]> wrote: > >> Ok, thanks, this is helpful -- it clearly sounds like Amazon is unhappy >> about the JSON format we are sending it. The deprecation message is >> probably a strong clue. I'll experiment here with logging document >> contents so that I can give you further advice. Stay tuned. >> >> Karl >> >> >> On Mon, Feb 8, 2016 at 3:07 PM, Juan Pablo Diaz-Vaz < >> [email protected]> wrote: >> >>> I'm actually not seeing anything on Amazon. The CloudSearch connector >>> fails when sending the request to amazon cloudsearch: >>> >>> AmazonCloudSearch: Error sending document chunk 0: '{"status": "error", >>> "errors": [{"message": "[*Deprecated*: Use the outer message field] >>> Encountered unexpected end of file"}], "adds": 0, "__type": >>> "#DocumentServiceException", "message": "{ [\"Encountered unexpected end of >>> file\"] }", "deletes": 0}' >>> >>> ERROR 2016-02-08 20:04:16,544 (Job notification thread) - >>> >>> >>> >>> On Mon, Feb 8, 2016 at 5:00 PM, Karl Wright <[email protected]> wrote: >>> >>>> If you can possibly include a snippet of the JSON you are seeing on the >>>> Amazon end, that would be great. >>>> >>>> Karl >>>> >>>> >>>> On Mon, Feb 8, 2016 at 2:45 PM, Karl Wright <[email protected]> wrote: >>>> >>>>> More likely this is a bug. >>>>> >>>>> I take it that it is the body string that is not coming out, correct? >>>>> Do all the other JSON fields look reasonable? Does the body clause exist >>>>> and is just empty, or is it not there at all? >>>>> >>>>> Karl >>>>> >>>>> >>>>> On Mon, Feb 8, 2016 at 2:36 PM, Juan Pablo Diaz-Vaz < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> When running a copy of the job, but with SOLR as a target, I'm seeing >>>>>> the expected content being posted to SOLR, so it may not be an issue with >>>>>> TIKA. After adding some more logging to the CloudSearch connector, I >>>>>> think >>>>>> the data is getting lost just before passing it to the >>>>>> DocumentChunkManager, which inserts the empty records to the DB. Could it >>>>>> be that the JSONObjectReader doesn't like my data? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> On Mon, Feb 8, 2016 at 3:48 PM, Karl Wright <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Juan, >>>>>>> >>>>>>> I'd try to reproduce as much of the pipeline as possible using a >>>>>>> solr output connection. If you include the tika extractor in the >>>>>>> pipeline, >>>>>>> you will want to configure the solr connection to not use the extracting >>>>>>> update handler. There's a checkbox on the Schema tab you need to >>>>>>> uncheck >>>>>>> for that. But if you do that you can see what is being sent to Solr >>>>>>> pretty >>>>>>> exactly; it all gets logged in the INFO messages dumped to solr log. >>>>>>> This >>>>>>> should help you figure out if the problem is your tika configuration or >>>>>>> not. >>>>>>> >>>>>>> Please give this a try and let me know what happens. >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 8, 2016 at 1:28 PM, Juan Pablo Diaz-Vaz < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I've successfully sent data to FileSystems and SOLR, but for Amazon >>>>>>>> CloudSearch I'm seeing that only empty messages are being sent to my >>>>>>>> domain. I think this may be an issue on how I've setup the TIKA >>>>>>>> Extractor >>>>>>>> Transformation or the field mapping. I think the Database where the >>>>>>>> records >>>>>>>> are supposed to be stored before flushing to Amazon, is storing empty >>>>>>>> content. >>>>>>>> >>>>>>>> I've tried to find documentation on how to setup the TIKA >>>>>>>> Transformation, but I haven't been able to find any. >>>>>>>> >>>>>>>> If someone could provide an example of a job setup to send from a >>>>>>>> FileSystem to CloudSearch, that'd be great! >>>>>>>> >>>>>>>> Thanks in advance, >>>>>>>> >>>>>>>> -- >>>>>>>> Juan Pablo Diaz-Vaz Varas, >>>>>>>> Full Stack Developer - MC+A Chile >>>>>>>> +56 9 84265890 >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Juan Pablo Diaz-Vaz Varas, >>>>>> Full Stack Developer - MC+A Chile >>>>>> +56 9 84265890 >>>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Juan Pablo Diaz-Vaz Varas, >>> Full Stack Developer - MC+A Chile >>> +56 9 84265890 >>> >> >> > > > -- > Juan Pablo Diaz-Vaz Varas, > Full Stack Developer - MC+A Chile > +56 9 84265890 >
