Sure; please blow away the database instance first, and then you should be all set.
Karl On Tue, Feb 9, 2016 at 9:43 AM, Juan Pablo Diaz-Vaz <[email protected]> wrote: > I'm using the quick start, I'll try to do a fresh start. > > On Tue, Feb 9, 2016 at 11:42 AM, Karl Wright <[email protected]> wrote: > >> Hi Juan, >> >> It occurs to me that you may have records in the document chunk table >> that were corrupted by the earlier version of the connector, and that is >> what is being sent. Are you using the quick-start example, or Postgres? >> If postgres, I'd recommend just deleting all rows in the document chunk >> table. >> >> Karl >> >> >> On Tue, Feb 9, 2016 at 9:13 AM, Karl Wright <[email protected]> wrote: >> >>> This is a puzzle; the only way this could occur is if some of the >>> records being produced generated absolutely no JSON. Since there is an ID >>> and a type record for all of them I can't see how this could happen. So we >>> must be adding records for documents that don't exist somehow? I'll have >>> to look into it. >>> >>> Karl >>> >>> On Tue, Feb 9, 2016 at 8:49 AM, Juan Pablo Diaz-Vaz < >>> [email protected]> wrote: >>> >>>> Hi, >>>> >>>> The patch worked and now at least the POST has content. Amazon is >>>> responding with a Parsing Error though. >>>> >>>> I logged the message before it gets posted to Amazon and it's not a >>>> valid JSON, it had extra commas and parenthesis characters when >>>> concatenating records. Don't know if this is an issue on my setup or >>>> the JSONArrayReader. >>>> >>>> [{ >>>> "id": "100D84BAF0BF348EC6EC593E5F5B1F49585DF555", >>>> "type": "add", >>>> "fields": { >>>> <record fields> >>>> } >>>> }, , { >>>> "id": "1E6DC8BA1E42159B14658321FDE0FC2DC467432C", >>>> "type": "add", >>>> "fields": { >>>> <record fields> >>>> } >>>> }, , , , , , , , , , , , , , , , { >>>> "id": "92C7EDAD8398DAC797A7DEA345C1859E6E9897FB", >>>> "type": "add", >>>> "fields": { >>>> <record fields> >>>> } >>>> }, , , ] >>>> >>>> Thanks, >>>> >>>> On Mon, Feb 8, 2016 at 7:17 PM, Juan Pablo Diaz-Vaz < >>>> [email protected]> wrote: >>>> >>>>> Thanks! I'll apply it and let you know how it goes. >>>>> >>>>> On Mon, Feb 8, 2016 at 6:51 PM, Karl Wright <[email protected]> >>>>> wrote: >>>>> >>>>>> Ok, I have a patch. It's actually pretty tiny; the bug is in our >>>>>> code, not Commons-IO, but Commons-IO changed things so that it tweaked >>>>>> it. >>>>>> >>>>>> I've created a ticket (CONNECTORS-1271) and attached the patch to it. >>>>>> >>>>>> Thanks! >>>>>> Karl >>>>>> >>>>>> >>>>>> On Mon, Feb 8, 2016 at 4:27 PM, Karl Wright <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I have chased this down to a completely broken Apache Commons-IO >>>>>>> library. It no longer works with the JSONReader objects in ManifoldCF >>>>>>> at >>>>>>> all, and refuses to read anything from them. Unfortunately I can't >>>>>>> change >>>>>>> versions of that library because other things depend upon it. So I'll >>>>>>> need >>>>>>> to write my own code to replace its functionality. That will take some >>>>>>> amount of time to do. >>>>>>> >>>>>>> This probably happened the last time our dependencies were updated. >>>>>>> My apologies. >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 8, 2016 at 4:18 PM, Juan Pablo Diaz-Vaz < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Don't know if it'll help, but removing the usage of >>>>>>>> JSONObjectReader on addOrReplaceDocumentWithException and posting to >>>>>>>> Amazon >>>>>>>> chunk-by-chunk instead of using the JSONArrayReader on flushDocuments, >>>>>>>> changed the error I was getting from Amazon. >>>>>>>> >>>>>>>> Maybe those objects are failing on parsing the content to JSON. >>>>>>>> >>>>>>>> On Mon, Feb 8, 2016 at 6:04 PM, Karl Wright <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Ok, I'm debugging away, and I can confirm that no data is getting >>>>>>>>> through. I'll have to open a ticket and create a patch when I find >>>>>>>>> the >>>>>>>>> problem. >>>>>>>>> >>>>>>>>> Karl >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 8, 2016 at 3:15 PM, Juan Pablo Diaz-Vaz < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Thank you very much. >>>>>>>>>> >>>>>>>>>> On Mon, Feb 8, 2016 at 5:13 PM, Karl Wright <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Ok, thanks, this is helpful -- it clearly sounds like Amazon is >>>>>>>>>>> unhappy about the JSON format we are sending it. The deprecation >>>>>>>>>>> message >>>>>>>>>>> is probably a strong clue. I'll experiment here with logging >>>>>>>>>>> document >>>>>>>>>>> contents so that I can give you further advice. Stay tuned. >>>>>>>>>>> >>>>>>>>>>> Karl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Feb 8, 2016 at 3:07 PM, Juan Pablo Diaz-Vaz < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> I'm actually not seeing anything on Amazon. The CloudSearch >>>>>>>>>>>> connector fails when sending the request to amazon cloudsearch: >>>>>>>>>>>> >>>>>>>>>>>> AmazonCloudSearch: Error sending document chunk 0: '{"status": >>>>>>>>>>>> "error", "errors": [{"message": "[*Deprecated*: Use the outer >>>>>>>>>>>> message >>>>>>>>>>>> field] Encountered unexpected end of file"}], "adds": 0, "__type": >>>>>>>>>>>> "#DocumentServiceException", "message": "{ [\"Encountered >>>>>>>>>>>> unexpected end of >>>>>>>>>>>> file\"] }", "deletes": 0}' >>>>>>>>>>>> >>>>>>>>>>>> ERROR 2016-02-08 20:04:16,544 (Job notification thread) - >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Feb 8, 2016 at 5:00 PM, Karl Wright <[email protected] >>>>>>>>>>>> > wrote: >>>>>>>>>>>> >>>>>>>>>>>>> If you can possibly include a snippet of the JSON you are >>>>>>>>>>>>> seeing on the Amazon end, that would be great. >>>>>>>>>>>>> >>>>>>>>>>>>> Karl >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Feb 8, 2016 at 2:45 PM, Karl Wright < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> More likely this is a bug. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I take it that it is the body string that is not coming out, >>>>>>>>>>>>>> correct? Do all the other JSON fields look reasonable? Does >>>>>>>>>>>>>> the body >>>>>>>>>>>>>> clause exist and is just empty, or is it not there at all? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Karl >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Feb 8, 2016 at 2:36 PM, Juan Pablo Diaz-Vaz < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> When running a copy of the job, but with SOLR as a target, >>>>>>>>>>>>>>> I'm seeing the expected content being posted to SOLR, so it may >>>>>>>>>>>>>>> not be an >>>>>>>>>>>>>>> issue with TIKA. After adding some more logging to the >>>>>>>>>>>>>>> CloudSearch >>>>>>>>>>>>>>> connector, I think the data is getting lost just before passing >>>>>>>>>>>>>>> it to the >>>>>>>>>>>>>>> DocumentChunkManager, which inserts the empty records to the >>>>>>>>>>>>>>> DB. Could it >>>>>>>>>>>>>>> be that the JSONObjectReader doesn't like my data? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Feb 8, 2016 at 3:48 PM, Karl Wright < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Juan, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'd try to reproduce as much of the pipeline as possible >>>>>>>>>>>>>>>> using a solr output connection. If you include the tika >>>>>>>>>>>>>>>> extractor in the >>>>>>>>>>>>>>>> pipeline, you will want to configure the solr connection to >>>>>>>>>>>>>>>> not use the >>>>>>>>>>>>>>>> extracting update handler. There's a checkbox on the Schema >>>>>>>>>>>>>>>> tab you need >>>>>>>>>>>>>>>> to uncheck for that. But if you do that you can see what is >>>>>>>>>>>>>>>> being sent to >>>>>>>>>>>>>>>> Solr pretty exactly; it all gets logged in the INFO messages >>>>>>>>>>>>>>>> dumped to solr >>>>>>>>>>>>>>>> log. This should help you figure out if the problem is your >>>>>>>>>>>>>>>> tika >>>>>>>>>>>>>>>> configuration or not. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Please give this a try and let me know what happens. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Feb 8, 2016 at 1:28 PM, Juan Pablo Diaz-Vaz < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I've successfully sent data to FileSystems and SOLR, but >>>>>>>>>>>>>>>>> for Amazon CloudSearch I'm seeing that only empty messages >>>>>>>>>>>>>>>>> are being sent >>>>>>>>>>>>>>>>> to my domain. I think this may be an issue on how I've setup >>>>>>>>>>>>>>>>> the TIKA >>>>>>>>>>>>>>>>> Extractor Transformation or the field mapping. I think the >>>>>>>>>>>>>>>>> Database where >>>>>>>>>>>>>>>>> the records are supposed to be stored before flushing to >>>>>>>>>>>>>>>>> Amazon, is storing >>>>>>>>>>>>>>>>> empty content. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I've tried to find documentation on how to setup the TIKA >>>>>>>>>>>>>>>>> Transformation, but I haven't been able to find any. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If someone could provide an example of a job setup to send >>>>>>>>>>>>>>>>> from a FileSystem to CloudSearch, that'd be great! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks in advance, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Juan Pablo Diaz-Vaz Varas, >>>>>>>>>>>>>>>>> Full Stack Developer - MC+A Chile >>>>>>>>>>>>>>>>> +56 9 84265890 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Juan Pablo Diaz-Vaz Varas, >>>>>>>>>>>>>>> Full Stack Developer - MC+A Chile >>>>>>>>>>>>>>> +56 9 84265890 >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Juan Pablo Diaz-Vaz Varas, >>>>>>>>>>>> Full Stack Developer - MC+A Chile >>>>>>>>>>>> +56 9 84265890 >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Juan Pablo Diaz-Vaz Varas, >>>>>>>>>> Full Stack Developer - MC+A Chile >>>>>>>>>> +56 9 84265890 >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Juan Pablo Diaz-Vaz Varas, >>>>>>>> Full Stack Developer - MC+A Chile >>>>>>>> +56 9 84265890 >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Juan Pablo Diaz-Vaz Varas, >>>>> Full Stack Developer - MC+A Chile >>>>> +56 9 84265890 >>>>> >>>> >>>> >>>> >>>> -- >>>> Juan Pablo Diaz-Vaz Varas, >>>> Full Stack Developer - MC+A Chile >>>> +56 9 84265890 >>>> >>> >>> >> > > > -- > Juan Pablo Diaz-Vaz Varas, > Full Stack Developer - MC+A Chile > +56 9 84265890 >
