Thanks, Don't know if it'll help, but removing the usage of JSONObjectReader on addOrReplaceDocumentWithException and posting to Amazon chunk-by-chunk instead of using the JSONArrayReader on flushDocuments, changed the error I was getting from Amazon.
Maybe those objects are failing on parsing the content to JSON. On Mon, Feb 8, 2016 at 6:04 PM, Karl Wright <[email protected]> wrote: > Ok, I'm debugging away, and I can confirm that no data is getting > through. I'll have to open a ticket and create a patch when I find the > problem. > > Karl > > > On Mon, Feb 8, 2016 at 3:15 PM, Juan Pablo Diaz-Vaz <[email protected] > > wrote: > >> Thank you very much. >> >> On Mon, Feb 8, 2016 at 5:13 PM, Karl Wright <[email protected]> wrote: >> >>> Ok, thanks, this is helpful -- it clearly sounds like Amazon is unhappy >>> about the JSON format we are sending it. The deprecation message is >>> probably a strong clue. I'll experiment here with logging document >>> contents so that I can give you further advice. Stay tuned. >>> >>> Karl >>> >>> >>> On Mon, Feb 8, 2016 at 3:07 PM, Juan Pablo Diaz-Vaz < >>> [email protected]> wrote: >>> >>>> I'm actually not seeing anything on Amazon. The CloudSearch connector >>>> fails when sending the request to amazon cloudsearch: >>>> >>>> AmazonCloudSearch: Error sending document chunk 0: '{"status": "error", >>>> "errors": [{"message": "[*Deprecated*: Use the outer message field] >>>> Encountered unexpected end of file"}], "adds": 0, "__type": >>>> "#DocumentServiceException", "message": "{ [\"Encountered unexpected end of >>>> file\"] }", "deletes": 0}' >>>> >>>> ERROR 2016-02-08 20:04:16,544 (Job notification thread) - >>>> >>>> >>>> >>>> On Mon, Feb 8, 2016 at 5:00 PM, Karl Wright <[email protected]> wrote: >>>> >>>>> If you can possibly include a snippet of the JSON you are seeing on >>>>> the Amazon end, that would be great. >>>>> >>>>> Karl >>>>> >>>>> >>>>> On Mon, Feb 8, 2016 at 2:45 PM, Karl Wright <[email protected]> >>>>> wrote: >>>>> >>>>>> More likely this is a bug. >>>>>> >>>>>> I take it that it is the body string that is not coming out, >>>>>> correct? Do all the other JSON fields look reasonable? Does the body >>>>>> clause exist and is just empty, or is it not there at all? >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> On Mon, Feb 8, 2016 at 2:36 PM, Juan Pablo Diaz-Vaz < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> When running a copy of the job, but with SOLR as a target, I'm >>>>>>> seeing the expected content being posted to SOLR, so it may not be an >>>>>>> issue >>>>>>> with TIKA. After adding some more logging to the CloudSearch connector, >>>>>>> I >>>>>>> think the data is getting lost just before passing it to the >>>>>>> DocumentChunkManager, which inserts the empty records to the DB. Could >>>>>>> it >>>>>>> be that the JSONObjectReader doesn't like my data? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> On Mon, Feb 8, 2016 at 3:48 PM, Karl Wright <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Juan, >>>>>>>> >>>>>>>> I'd try to reproduce as much of the pipeline as possible using a >>>>>>>> solr output connection. If you include the tika extractor in the >>>>>>>> pipeline, >>>>>>>> you will want to configure the solr connection to not use the >>>>>>>> extracting >>>>>>>> update handler. There's a checkbox on the Schema tab you need to >>>>>>>> uncheck >>>>>>>> for that. But if you do that you can see what is being sent to Solr >>>>>>>> pretty >>>>>>>> exactly; it all gets logged in the INFO messages dumped to solr log. >>>>>>>> This >>>>>>>> should help you figure out if the problem is your tika configuration >>>>>>>> or not. >>>>>>>> >>>>>>>> Please give this a try and let me know what happens. >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 8, 2016 at 1:28 PM, Juan Pablo Diaz-Vaz < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I've successfully sent data to FileSystems and SOLR, but for >>>>>>>>> Amazon CloudSearch I'm seeing that only empty messages are being sent >>>>>>>>> to my >>>>>>>>> domain. I think this may be an issue on how I've setup the TIKA >>>>>>>>> Extractor >>>>>>>>> Transformation or the field mapping. I think the Database where the >>>>>>>>> records >>>>>>>>> are supposed to be stored before flushing to Amazon, is storing empty >>>>>>>>> content. >>>>>>>>> >>>>>>>>> I've tried to find documentation on how to setup the TIKA >>>>>>>>> Transformation, but I haven't been able to find any. >>>>>>>>> >>>>>>>>> If someone could provide an example of a job setup to send from a >>>>>>>>> FileSystem to CloudSearch, that'd be great! >>>>>>>>> >>>>>>>>> Thanks in advance, >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Juan Pablo Diaz-Vaz Varas, >>>>>>>>> Full Stack Developer - MC+A Chile >>>>>>>>> +56 9 84265890 >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Juan Pablo Diaz-Vaz Varas, >>>>>>> Full Stack Developer - MC+A Chile >>>>>>> +56 9 84265890 >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Juan Pablo Diaz-Vaz Varas, >>>> Full Stack Developer - MC+A Chile >>>> +56 9 84265890 >>>> >>> >>> >> >> >> -- >> Juan Pablo Diaz-Vaz Varas, >> Full Stack Developer - MC+A Chile >> +56 9 84265890 >> > > -- Juan Pablo Diaz-Vaz Varas, Full Stack Developer - MC+A Chile +56 9 84265890
