If you can possibly include a snippet of the JSON you are seeing on the Amazon end, that would be great.
Karl On Mon, Feb 8, 2016 at 2:45 PM, Karl Wright <[email protected]> wrote: > More likely this is a bug. > > I take it that it is the body string that is not coming out, correct? Do > all the other JSON fields look reasonable? Does the body clause exist and > is just empty, or is it not there at all? > > Karl > > > On Mon, Feb 8, 2016 at 2:36 PM, Juan Pablo Diaz-Vaz <[email protected] > > wrote: > >> Hi, >> >> When running a copy of the job, but with SOLR as a target, I'm seeing the >> expected content being posted to SOLR, so it may not be an issue with TIKA. >> After adding some more logging to the CloudSearch connector, I think the >> data is getting lost just before passing it to the DocumentChunkManager, >> which inserts the empty records to the DB. Could it be that the >> JSONObjectReader doesn't like my data? >> >> Thanks, >> >> On Mon, Feb 8, 2016 at 3:48 PM, Karl Wright <[email protected]> wrote: >> >>> Hi Juan, >>> >>> I'd try to reproduce as much of the pipeline as possible using a solr >>> output connection. If you include the tika extractor in the pipeline, you >>> will want to configure the solr connection to not use the extracting update >>> handler. There's a checkbox on the Schema tab you need to uncheck for >>> that. But if you do that you can see what is being sent to Solr pretty >>> exactly; it all gets logged in the INFO messages dumped to solr log. This >>> should help you figure out if the problem is your tika configuration or not. >>> >>> Please give this a try and let me know what happens. >>> >>> Karl >>> >>> >>> On Mon, Feb 8, 2016 at 1:28 PM, Juan Pablo Diaz-Vaz < >>> [email protected]> wrote: >>> >>>> Hi, >>>> >>>> I've successfully sent data to FileSystems and SOLR, but for Amazon >>>> CloudSearch I'm seeing that only empty messages are being sent to my >>>> domain. I think this may be an issue on how I've setup the TIKA Extractor >>>> Transformation or the field mapping. I think the Database where the records >>>> are supposed to be stored before flushing to Amazon, is storing empty >>>> content. >>>> >>>> I've tried to find documentation on how to setup the TIKA >>>> Transformation, but I haven't been able to find any. >>>> >>>> If someone could provide an example of a job setup to send from a >>>> FileSystem to CloudSearch, that'd be great! >>>> >>>> Thanks in advance, >>>> >>>> -- >>>> Juan Pablo Diaz-Vaz Varas, >>>> Full Stack Developer - MC+A Chile >>>> +56 9 84265890 >>>> >>> >>> >> >> >> -- >> Juan Pablo Diaz-Vaz Varas, >> Full Stack Developer - MC+A Chile >> +56 9 84265890 >> > >
