For ListS3, you will want to separate those in the Bucket and Prefix properties.
> On Dec 13, 2017, at 9:34 AM, Aruna Sankaralingam > <[email protected]> wrote: > > James, > > “part-d-prescription-drug” is the main folder in S3 and “unstructured” is the > sub folder inside the main folder. > > From: James Wing [mailto:[email protected]] > Sent: Wednesday, December 13, 2017 1:34 AM > To: [email protected] > Subject: Re: ListS3 Processor Error > > Are you able to list the bucket with the AWS CLI (aws s3 ls)? It can be > helpful to compare performance between NiFi and the AWS CLI, especially if > you are able to do so from the same machine, with the same permissions, and > as similar bucket and prefix settings as you can manage. > > In the screenshot above, the bucket is shown as > "part-d-prescription-drug/unstructured", which looks unusual to me. Is the > bucket "part-d-prescription-drug" and the prefix "unstructured/"? > > Thanks, > > James > > On Tue, Dec 12, 2017 at 7:34 AM, Aruna Sankaralingam > <[email protected]> wrote: > Joe, > > No, I don’t have anything in between AWS and NiFi. > NiFi is installed in one of the EC2 instance in AWS – N.Virginia Region > S3 is also in N.Virginia Region > > From: Joe Witt [mailto:[email protected]] > Sent: Monday, December 11, 2017 1:28 PM > To: [email protected] > Subject: Re: ListS3 Processor Error > > The XML response is truncated for some reason as implied by the following. Do > you have any devices/software/systems/proxies in between your NiFi and the > amazon service? Are you able to manually issue the request and get the > response you expect? > > 2017-12-11 18:01:02,875 ERROR [Timer-Driven Process Thread-6] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session due > to com.amazonaws.SdkClientException: Failed to parse XML document with > handler class > com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler: > {} > com.amazonaws.SdkClientException: Failed to parse XML document with handler > class > com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler > at > com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:156) > at > com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:298) > at > com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70) > at > com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59) > at > com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62) > at > com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31) > at > com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1444) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1151) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:964) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:676) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:650) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:633) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:601) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:583) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:447) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4137) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4079) > at > com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:819) > at > org.apache.nifi.processors.aws.s3.ListS3$S3ObjectBucketLister.listVersions(ListS3.java:314) > at > org.apache.nifi.processors.aws.s3.ListS3.onTrigger(ListS3.java:208) > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1119) > at > org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147) > at > org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) > at > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.xml.sax.SAXParseException: Premature end of file. > at > com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203) > at > com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) > at > com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400) > at > com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327) > at > com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1014) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602) > at > com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770) > at > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) > at > com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:142) > ... 32 common frames omitted > > > On Mon, Dec 11, 2017 at 1:07 PM, Aruna Sankaralingam > <[email protected]> wrote: > Attached my nifi-app.log. Could you please let me know what went wrong? > > From: Joe Witt [mailto:[email protected]] > Sent: Friday, December 08, 2017 4:04 PM > > To: [email protected] > Subject: Re: ListS3 Processor Error > > Here is an example I found for another processor > > > https://mail-archives.apache.org/mod_mbox/nifi-dev/201509.mbox/%3CCAFddr26AEVqnoQ=mWr7DSNDFVrr9NuYy9GCcXg=4fyycqab...@mail.gmail.com%3E > > Thanks > > On Fri, Dec 8, 2017 at 4:02 PM, Aruna Sankaralingam > <[email protected]> wrote: > Joe, > Could you please let me know how to turn on the debug logging? > > From: Joe Witt [mailto:[email protected]] > Sent: Friday, December 08, 2017 3:59 PM > To: [email protected] > Subject: Re: ListS3 Processor Error > > What version of NiFi? > > Looks like either a classpath/classloader issue OR the amazon client library > cannot parse the response it is getting back... > > The logs/nifi-app.log should have the full stack trace. If not you can turn > on debug logging for that processor and perhaps then it will. > > Thanks > > On Fri, Dec 8, 2017 at 3:56 PM, Aruna Sankaralingam > <[email protected]> wrote: > I am trying to get a pdf file from S3 and load to Elastic Search. The ListS3 > processor is giving me this error. Could someone please let me know where I > am going wrong? > > 20:52:25 UTC > ERROR > 37d7226e-0160-1000-6049-d4c489cd32f3 > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session due > to com.amazonaws.SdkClientException: Failed to parse XML document with > handler class > com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler: > Failed to parse XML document with handler class > com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler > 20:52:25 UTC > WARNING > 37d7226e-0160-1000-6049-d4c489cd32f3 > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor Administratively > Yielded for 1 sec due to processing failure > 20:52:26 UTC > ERROR > 37d7226e-0160-1000-6049-d4c489cd32f3 > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process due to > com.amazonaws.SdkClientException: Failed to parse XML document with handler > class > com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler; > rolling back session: Failed to parse XML document with handler class > com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler > 20:52:26 UTC > ERROR > 37d7226e-0160-1000-6049-d4c489cd32f3 > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session due > to com.amazonaws.SdkClientException: Failed to parse XML document with > handler class > com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler: > Failed to parse XML document with handler class > com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler > 20:52:26 UTC > WARNING > 37d7226e-0160-1000-6049-d4c489cd32f3 > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor Administratively > Yielded for 1 sec due to processing failure > Auto-refresh > > <image001.png> > > > >
