Ok, thanks, this is helpful -- it clearly sounds like Amazon is unhappy about the JSON format we are sending it. The deprecation message is probably a strong clue. I'll experiment here with logging document contents so that I can give you further advice. Stay tuned.
Karl On Mon, Feb 8, 2016 at 3:07 PM, Juan Pablo Diaz-Vaz <[email protected]> wrote: > I'm actually not seeing anything on Amazon. The CloudSearch connector > fails when sending the request to amazon cloudsearch: > > AmazonCloudSearch: Error sending document chunk 0: '{"status": "error", > "errors": [{"message": "[*Deprecated*: Use the outer message field] > Encountered unexpected end of file"}], "adds": 0, "__type": > "#DocumentServiceException", "message": "{ [\"Encountered unexpected end of > file\"] }", "deletes": 0}' > > ERROR 2016-02-08 20:04:16,544 (Job notification thread) - > > > > On Mon, Feb 8, 2016 at 5:00 PM, Karl Wright <[email protected]> wrote: > >> If you can possibly include a snippet of the JSON you are seeing on the >> Amazon end, that would be great. >> >> Karl >> >> >> On Mon, Feb 8, 2016 at 2:45 PM, Karl Wright <[email protected]> wrote: >> >>> More likely this is a bug. >>> >>> I take it that it is the body string that is not coming out, correct? >>> Do all the other JSON fields look reasonable? Does the body clause exist >>> and is just empty, or is it not there at all? >>> >>> Karl >>> >>> >>> On Mon, Feb 8, 2016 at 2:36 PM, Juan Pablo Diaz-Vaz < >>> [email protected]> wrote: >>> >>>> Hi, >>>> >>>> When running a copy of the job, but with SOLR as a target, I'm seeing >>>> the expected content being posted to SOLR, so it may not be an issue with >>>> TIKA. After adding some more logging to the CloudSearch connector, I think >>>> the data is getting lost just before passing it to the >>>> DocumentChunkManager, which inserts the empty records to the DB. Could it >>>> be that the JSONObjectReader doesn't like my data? >>>> >>>> Thanks, >>>> >>>> On Mon, Feb 8, 2016 at 3:48 PM, Karl Wright <[email protected]> wrote: >>>> >>>>> Hi Juan, >>>>> >>>>> I'd try to reproduce as much of the pipeline as possible using a solr >>>>> output connection. If you include the tika extractor in the pipeline, you >>>>> will want to configure the solr connection to not use the extracting >>>>> update >>>>> handler. There's a checkbox on the Schema tab you need to uncheck for >>>>> that. But if you do that you can see what is being sent to Solr pretty >>>>> exactly; it all gets logged in the INFO messages dumped to solr log. This >>>>> should help you figure out if the problem is your tika configuration or >>>>> not. >>>>> >>>>> Please give this a try and let me know what happens. >>>>> >>>>> Karl >>>>> >>>>> >>>>> On Mon, Feb 8, 2016 at 1:28 PM, Juan Pablo Diaz-Vaz < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I've successfully sent data to FileSystems and SOLR, but for Amazon >>>>>> CloudSearch I'm seeing that only empty messages are being sent to my >>>>>> domain. I think this may be an issue on how I've setup the TIKA Extractor >>>>>> Transformation or the field mapping. I think the Database where the >>>>>> records >>>>>> are supposed to be stored before flushing to Amazon, is storing empty >>>>>> content. >>>>>> >>>>>> I've tried to find documentation on how to setup the TIKA >>>>>> Transformation, but I haven't been able to find any. >>>>>> >>>>>> If someone could provide an example of a job setup to send from a >>>>>> FileSystem to CloudSearch, that'd be great! >>>>>> >>>>>> Thanks in advance, >>>>>> >>>>>> -- >>>>>> Juan Pablo Diaz-Vaz Varas, >>>>>> Full Stack Developer - MC+A Chile >>>>>> +56 9 84265890 >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Juan Pablo Diaz-Vaz Varas, >>>> Full Stack Developer - MC+A Chile >>>> +56 9 84265890 >>>> >>> >>> >> > > > -- > Juan Pablo Diaz-Vaz Varas, > Full Stack Developer - MC+A Chile > +56 9 84265890 >
