Thank you very much. On Mon, Feb 8, 2016 at 5:13 PM, Karl Wright <[email protected]> wrote:
> Ok, thanks, this is helpful -- it clearly sounds like Amazon is unhappy > about the JSON format we are sending it. The deprecation message is > probably a strong clue. I'll experiment here with logging document > contents so that I can give you further advice. Stay tuned. > > Karl > > > On Mon, Feb 8, 2016 at 3:07 PM, Juan Pablo Diaz-Vaz <[email protected] > > wrote: > >> I'm actually not seeing anything on Amazon. The CloudSearch connector >> fails when sending the request to amazon cloudsearch: >> >> AmazonCloudSearch: Error sending document chunk 0: '{"status": "error", >> "errors": [{"message": "[*Deprecated*: Use the outer message field] >> Encountered unexpected end of file"}], "adds": 0, "__type": >> "#DocumentServiceException", "message": "{ [\"Encountered unexpected end of >> file\"] }", "deletes": 0}' >> >> ERROR 2016-02-08 20:04:16,544 (Job notification thread) - >> >> >> >> On Mon, Feb 8, 2016 at 5:00 PM, Karl Wright <[email protected]> wrote: >> >>> If you can possibly include a snippet of the JSON you are seeing on the >>> Amazon end, that would be great. >>> >>> Karl >>> >>> >>> On Mon, Feb 8, 2016 at 2:45 PM, Karl Wright <[email protected]> wrote: >>> >>>> More likely this is a bug. >>>> >>>> I take it that it is the body string that is not coming out, correct? >>>> Do all the other JSON fields look reasonable? Does the body clause exist >>>> and is just empty, or is it not there at all? >>>> >>>> Karl >>>> >>>> >>>> On Mon, Feb 8, 2016 at 2:36 PM, Juan Pablo Diaz-Vaz < >>>> [email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> When running a copy of the job, but with SOLR as a target, I'm seeing >>>>> the expected content being posted to SOLR, so it may not be an issue with >>>>> TIKA. After adding some more logging to the CloudSearch connector, I think >>>>> the data is getting lost just before passing it to the >>>>> DocumentChunkManager, which inserts the empty records to the DB. Could it >>>>> be that the JSONObjectReader doesn't like my data? >>>>> >>>>> Thanks, >>>>> >>>>> On Mon, Feb 8, 2016 at 3:48 PM, Karl Wright <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Juan, >>>>>> >>>>>> I'd try to reproduce as much of the pipeline as possible using a solr >>>>>> output connection. If you include the tika extractor in the pipeline, >>>>>> you >>>>>> will want to configure the solr connection to not use the extracting >>>>>> update >>>>>> handler. There's a checkbox on the Schema tab you need to uncheck for >>>>>> that. But if you do that you can see what is being sent to Solr pretty >>>>>> exactly; it all gets logged in the INFO messages dumped to solr log. >>>>>> This >>>>>> should help you figure out if the problem is your tika configuration or >>>>>> not. >>>>>> >>>>>> Please give this a try and let me know what happens. >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> On Mon, Feb 8, 2016 at 1:28 PM, Juan Pablo Diaz-Vaz < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I've successfully sent data to FileSystems and SOLR, but for Amazon >>>>>>> CloudSearch I'm seeing that only empty messages are being sent to my >>>>>>> domain. I think this may be an issue on how I've setup the TIKA >>>>>>> Extractor >>>>>>> Transformation or the field mapping. I think the Database where the >>>>>>> records >>>>>>> are supposed to be stored before flushing to Amazon, is storing empty >>>>>>> content. >>>>>>> >>>>>>> I've tried to find documentation on how to setup the TIKA >>>>>>> Transformation, but I haven't been able to find any. >>>>>>> >>>>>>> If someone could provide an example of a job setup to send from a >>>>>>> FileSystem to CloudSearch, that'd be great! >>>>>>> >>>>>>> Thanks in advance, >>>>>>> >>>>>>> -- >>>>>>> Juan Pablo Diaz-Vaz Varas, >>>>>>> Full Stack Developer - MC+A Chile >>>>>>> +56 9 84265890 >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Juan Pablo Diaz-Vaz Varas, >>>>> Full Stack Developer - MC+A Chile >>>>> +56 9 84265890 >>>>> >>>> >>>> >>> >> >> >> -- >> Juan Pablo Diaz-Vaz Varas, >> Full Stack Developer - MC+A Chile >> +56 9 84265890 >> > > -- Juan Pablo Diaz-Vaz Varas, Full Stack Developer - MC+A Chile +56 9 84265890
