S3 presents an illusion of a folder tree, but the S3 API requires a clear distinction between the name of the bucket and a key prefix to filter objects within the bucket. ListS3 tries to mirror that terminology. I suspect that you are trying to list the Bucket "part-d-prescription-drug", filtering the objects in the bucket to read only keys within the Prefix "unstructured". I recommend trying to configure your ListS3 processor properties as follows: * Bucket: part-d-prescription-drug * Prefix: unstructured/
Even better, you can experiment with the AWS CLI to find the right combination of bucket and prefix that yields the results you want, using something like this: aws s3 ls s3://part-d-prescription-drug/unstructured/ Using the AWS CLI can help confirm your bucket and prefix, as well as provide a contrasting opinion on the security settings. Thanks, James On Wed, Dec 13, 2017 at 11:32 AM, Aruna Sankaralingam < [email protected]> wrote: > James, I am sorry I am not sure if I follow that. Could you please give an > example? > > > > *From:* James Wing [mailto:[email protected]] > *Sent:* Wednesday, December 13, 2017 12:55 PM > *To:* [email protected] > *Subject:* Re: ListS3 Processor Error > > > > For ListS3, you will want to separate those in the Bucket and Prefix > properties. > > > On Dec 13, 2017, at 9:34 AM, Aruna Sankaralingam < > [email protected]> wrote: > > James, > > > > “part-d-prescription-drug” is the main folder in S3 and “unstructured” is > the sub folder inside the main folder. > > > > *From:* James Wing [mailto:[email protected] <[email protected]>] > *Sent:* Wednesday, December 13, 2017 1:34 AM > *To:* [email protected] > *Subject:* Re: ListS3 Processor Error > > > > Are you able to list the bucket with the AWS CLI (aws s3 ls)? It can be > helpful to compare performance between NiFi and the AWS CLI, especially if > you are able to do so from the same machine, with the same permissions, and > as similar bucket and prefix settings as you can manage. > > In the screenshot above, the bucket is shown as > "part-d-prescription-drug/unstructured", > which looks unusual to me. Is the bucket "part-d-prescription-drug" and > the prefix "unstructured/"? > > Thanks, > > James > > > > On Tue, Dec 12, 2017 at 7:34 AM, Aruna Sankaralingam < > [email protected]> wrote: > > Joe, > > > > No, I don’t have anything in between AWS and NiFi. > > NiFi is installed in one of the EC2 instance in AWS – N.Virginia Region > > S3 is also in N.Virginia Region > > > > *From:* Joe Witt [mailto:[email protected]] > *Sent:* Monday, December 11, 2017 1:28 PM > *To:* [email protected] > *Subject:* Re: ListS3 Processor Error > > > > The XML response is truncated for some reason as implied by the following. > Do you have any devices/software/systems/proxies in between your NiFi and > the amazon service? Are you able to manually issue the request and get the > response you expect? > > > > 2017-12-11 18:01:02,875 ERROR [Timer-Driven Process Thread-6] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session > due to com.amazonaws.SdkClientException: Failed to parse XML document > with handler class com.amazonaws.services.s3.model.transform. > XmlResponsesSaxParser$ListBucketHandler: {} > > com.amazonaws.SdkClientException: Failed to parse XML document with > handler class com.amazonaws.services.s3.model.transform. > XmlResponsesSaxParser$ListBucketHandler > > at com.amazonaws.services.s3.model.transform. > XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:156) > > at com.amazonaws.services.s3.model.transform. > XmlResponsesSaxParser.parseListBucketObjectsResponse > (XmlResponsesSaxParser.java:298) > > at com.amazonaws.services.s3.model.transform.Unmarshallers$ > ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70) > > at com.amazonaws.services.s3.model.transform.Unmarshallers$ > ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59) > > at com.amazonaws.services.s3.internal.S3XmlResponseHandler. > handle(S3XmlResponseHandler.java:62) > > at com.amazonaws.services.s3.internal.S3XmlResponseHandler. > handle(S3XmlResponseHandler.java:31) > > at com.amazonaws.http.response.AwsResponseHandlerAdapter. > handle(AwsResponseHandlerAdapter.java:70) > > at com.amazonaws.http.AmazonHttpClient$RequestExecutor. > handleResponse(AmazonHttpClient.java:1444) > > at com.amazonaws.http.AmazonHttpClient$RequestExecutor. > executeOneRequest(AmazonHttpClient.java:1151) > > at com.amazonaws.http.AmazonHttpClient$ > RequestExecutor.executeHelper(AmazonHttpClient.java:964) > > at com.amazonaws.http.AmazonHttpClient$ > RequestExecutor.doExecute(AmazonHttpClient.java:676) > > at com.amazonaws.http.AmazonHttpClient$RequestExecutor. > executeWithTimer(AmazonHttpClient.java:650) > > at com.amazonaws.http.AmazonHttpClient$ > RequestExecutor.execute(AmazonHttpClient.java:633) > > at com.amazonaws.http.AmazonHttpClient$ > RequestExecutor.access$300(AmazonHttpClient.java:601) > > at com.amazonaws.http.AmazonHttpClient$ > RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:583) > > at com.amazonaws.http.AmazonHttpClient.execute( > AmazonHttpClient.java:447) > > at com.amazonaws.services.s3.AmazonS3Client.invoke( > AmazonS3Client.java:4137) > > at com.amazonaws.services.s3.AmazonS3Client.invoke( > AmazonS3Client.java:4079) > > at com.amazonaws.services.s3.AmazonS3Client.listObjects( > AmazonS3Client.java:819) > > at org.apache.nifi.processors.aws.s3.ListS3$ > S3ObjectBucketLister.listVersions(ListS3.java:314) > > at org.apache.nifi.processors.aws.s3.ListS3.onTrigger( > ListS3.java:208) > > at org.apache.nifi.processor.AbstractProcessor.onTrigger( > AbstractProcessor.java:27) > > at org.apache.nifi.controller.StandardProcessorNode.onTrigger( > StandardProcessorNode.java:1119) > > at org.apache.nifi.controller.tasks. > ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147) > > at org.apache.nifi.controller.tasks. > ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) > > at org.apache.nifi.controller.scheduling. > TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128) > > at java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:511) > > at java.util.concurrent.FutureTask.runAndReset( > FutureTask.java:308) > > at java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > > at java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1149) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: org.xml.sax.SAXParseException: Premature end of file. > > at com.sun.org.apache.xerces.internal.util. > ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203) > > at com.sun.org.apache.xerces.internal.util. > ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) > > at com.sun.org.apache.xerces.internal.impl. > XMLErrorReporter.reportError(XMLErrorReporter.java:400) > > at com.sun.org.apache.xerces.internal.impl. > XMLErrorReporter.reportError(XMLErrorReporter.java:327) > > at com.sun.org.apache.xerces.internal.impl.XMLScanner. > reportFatalError(XMLScanner.java:1472) > > at com.sun.org.apache.xerces.internal.impl. > XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1014) > > at com.sun.org.apache.xerces.internal.impl. > XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602) > > at com.sun.org.apache.xerces.internal.impl. > XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112) > > at com.sun.org.apache.xerces.internal.impl. > XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl > .java:505) > > at com.sun.org.apache.xerces.internal.parsers. > XML11Configuration.parse(XML11Configuration.java:841) > > at com.sun.org.apache.xerces.internal.parsers. > XML11Configuration.parse(XML11Configuration.java:770) > > at com.sun.org.apache.xerces.internal.parsers.XMLParser. > parse(XMLParser.java:141) > > at com.sun.org.apache.xerces.internal.parsers. > AbstractSAXParser.parse(AbstractSAXParser.java:1213) > > at com.amazonaws.services.s3.model.transform. > XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:142) > > ... 32 common frames omitted > > > > > > On Mon, Dec 11, 2017 at 1:07 PM, Aruna Sankaralingam < > [email protected]> wrote: > > Attached my nifi-app.log. Could you please let me know what went wrong? > > > > *From:* Joe Witt [mailto:[email protected]] > *Sent:* Friday, December 08, 2017 4:04 PM > > > *To:* [email protected] > *Subject:* Re: ListS3 Processor Error > > > > Here is an example I found for another processor > > > > https://mail-archives.apache.org/mod_mbox/nifi-dev/201509.mbox/% > 3CCAFddr26AEVqnoQ=mWr7DSNDFVrr9NuYy9GCcXg=4fyycqab...@mail.gmail.com%3E > > > > Thanks > > > > On Fri, Dec 8, 2017 at 4:02 PM, Aruna Sankaralingam < > [email protected]> wrote: > > Joe, > > Could you please let me know how to turn on the debug logging? > > > > *From:* Joe Witt [mailto:[email protected]] > *Sent:* Friday, December 08, 2017 3:59 PM > *To:* [email protected] > *Subject:* Re: ListS3 Processor Error > > > > What version of NiFi? > > > > Looks like either a classpath/classloader issue OR the amazon client > library cannot parse the response it is getting back... > > > > The logs/nifi-app.log should have the full stack trace. If not you can > turn on debug logging for that processor and perhaps then it will. > > > > Thanks > > > > On Fri, Dec 8, 2017 at 3:56 PM, Aruna Sankaralingam < > [email protected]> wrote: > > I am trying to get a pdf file from S3 and load to Elastic Search. The > ListS3 processor is giving me this error. Could someone please let me know > where I am going wrong? > > > > *20:52:25 UTC* > > *ERROR* > > *37d7226e-0160-1000-6049-d4c489cd32f3* > > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session > due to com.amazonaws.SdkClientException: Failed to parse XML document > with handler class com.amazonaws.services.s3.model.transform. > XmlResponsesSaxParser$ListBucketHandler: Failed to parse XML document > with handler class com.amazonaws.services.s3.model.transform. > XmlResponsesSaxParser$ListBucketHandler > > *20:52:25 UTC* > > *WARNING* > > *37d7226e-0160-1000-6049-d4c489cd32f3* > > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor > Administratively Yielded for 1 sec due to processing failure > > *20:52:26 UTC* > > *ERROR* > > *37d7226e-0160-1000-6049-d4c489cd32f3* > > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process due to > com.amazonaws.SdkClientException: Failed to parse XML document with > handler class com.amazonaws.services.s3.model.transform. > XmlResponsesSaxParser$ListBucketHandler; rolling back session: Failed to > parse XML document with handler class com.amazonaws.services.s3. > model.transform.XmlResponsesSaxParser$ListBucketHandler > > *20:52:26 UTC* > > *ERROR* > > *37d7226e-0160-1000-6049-d4c489cd32f3* > > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session > due to com.amazonaws.SdkClientException: Failed to parse XML document > with handler class com.amazonaws.services.s3.model.transform. > XmlResponsesSaxParser$ListBucketHandler: Failed to parse XML document > with handler class com.amazonaws.services.s3.model.transform. > XmlResponsesSaxParser$ListBucketHandler > > *20:52:26 UTC* > > *WARNING* > > *37d7226e-0160-1000-6049-d4c489cd32f3* > > ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor > Administratively Yielded for 1 sec due to processing failure > > Auto-refresh > > > > <image001.png> > > > > > > > > > >
