S3 presents an illusion of a folder tree, but the S3 API requires a clear
distinction between the name of the bucket and a key prefix to filter
objects within the bucket.  ListS3 tries to mirror that terminology.  I
suspect that you are trying to list the Bucket "part-d-prescription-drug",
filtering the objects in the bucket to read only keys within the Prefix
"unstructured".    I recommend trying to configure your ListS3 processor
properties as follows:
* Bucket: part-d-prescription-drug
* Prefix: unstructured/

Even better, you can experiment with the AWS CLI to find the right
combination of bucket and prefix that yields the results you want, using
something like this:

aws s3 ls s3://part-d-prescription-drug/unstructured/

Using the AWS CLI can help confirm your bucket and prefix, as well as
provide a contrasting opinion on the security settings.

Thanks,

James

On Wed, Dec 13, 2017 at 11:32 AM, Aruna Sankaralingam <
[email protected]> wrote:

> James, I am sorry I am not sure if I follow that. Could you please give an
> example?
>
>
>
> *From:* James Wing [mailto:[email protected]]
> *Sent:* Wednesday, December 13, 2017 12:55 PM
> *To:* [email protected]
> *Subject:* Re: ListS3 Processor Error
>
>
>
> For ListS3, you will want to separate those in the Bucket and Prefix
> properties.
>
>
> On Dec 13, 2017, at 9:34 AM, Aruna Sankaralingam <
> [email protected]> wrote:
>
> James,
>
>
>
> “part-d-prescription-drug” is the main folder in S3 and “unstructured” is
> the sub folder inside the main folder.
>
>
>
> *From:* James Wing [mailto:[email protected] <[email protected]>]
> *Sent:* Wednesday, December 13, 2017 1:34 AM
> *To:* [email protected]
> *Subject:* Re: ListS3 Processor Error
>
>
>
> Are you able to list the bucket with the AWS CLI (aws s3 ls)?  It can be
> helpful to compare performance between NiFi and the AWS CLI, especially if
> you are able to do so from the same machine, with the same permissions, and
> as similar bucket and prefix settings as you can manage.
>
> In the screenshot above, the bucket is shown as 
> "part-d-prescription-drug/unstructured",
> which looks unusual to me.  Is the bucket "part-d-prescription-drug" and
> the prefix "unstructured/"?
>
> Thanks,
>
> James
>
>
>
> On Tue, Dec 12, 2017 at 7:34 AM, Aruna Sankaralingam <
> [email protected]> wrote:
>
> Joe,
>
>
>
> No, I don’t have anything in between AWS and NiFi.
>
> NiFi is installed in one of the EC2 instance in AWS – N.Virginia Region
>
> S3 is also in N.Virginia Region
>
>
>
> *From:* Joe Witt [mailto:[email protected]]
> *Sent:* Monday, December 11, 2017 1:28 PM
> *To:* [email protected]
> *Subject:* Re: ListS3 Processor Error
>
>
>
> The XML response is truncated for some reason as implied by the following.
> Do you have any devices/software/systems/proxies in between your NiFi and
> the amazon service?  Are you able to manually issue the request and get the
> response you expect?
>
>
>
> 2017-12-11 18:01:02,875 ERROR [Timer-Driven Process Thread-6]
> org.apache.nifi.processors.aws.s3.ListS3 
> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3]
> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session
> due to com.amazonaws.SdkClientException: Failed to parse XML document
> with handler class com.amazonaws.services.s3.model.transform.
> XmlResponsesSaxParser$ListBucketHandler: {}
>
> com.amazonaws.SdkClientException: Failed to parse XML document with
> handler class com.amazonaws.services.s3.model.transform.
> XmlResponsesSaxParser$ListBucketHandler
>
>             at com.amazonaws.services.s3.model.transform.
> XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:156)
>
>             at com.amazonaws.services.s3.model.transform.
> XmlResponsesSaxParser.parseListBucketObjectsResponse
> (XmlResponsesSaxParser.java:298)
>
>             at com.amazonaws.services.s3.model.transform.Unmarshallers$
> ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70)
>
>             at com.amazonaws.services.s3.model.transform.Unmarshallers$
> ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59)
>
>             at com.amazonaws.services.s3.internal.S3XmlResponseHandler.
> handle(S3XmlResponseHandler.java:62)
>
>             at com.amazonaws.services.s3.internal.S3XmlResponseHandler.
> handle(S3XmlResponseHandler.java:31)
>
>             at com.amazonaws.http.response.AwsResponseHandlerAdapter.
> handle(AwsResponseHandlerAdapter.java:70)
>
>             at com.amazonaws.http.AmazonHttpClient$RequestExecutor.
> handleResponse(AmazonHttpClient.java:1444)
>
>             at com.amazonaws.http.AmazonHttpClient$RequestExecutor.
> executeOneRequest(AmazonHttpClient.java:1151)
>
>             at com.amazonaws.http.AmazonHttpClient$
> RequestExecutor.executeHelper(AmazonHttpClient.java:964)
>
>             at com.amazonaws.http.AmazonHttpClient$
> RequestExecutor.doExecute(AmazonHttpClient.java:676)
>
>             at com.amazonaws.http.AmazonHttpClient$RequestExecutor.
> executeWithTimer(AmazonHttpClient.java:650)
>
>             at com.amazonaws.http.AmazonHttpClient$
> RequestExecutor.execute(AmazonHttpClient.java:633)
>
>             at com.amazonaws.http.AmazonHttpClient$
> RequestExecutor.access$300(AmazonHttpClient.java:601)
>
>             at com.amazonaws.http.AmazonHttpClient$
> RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:583)
>
>             at com.amazonaws.http.AmazonHttpClient.execute(
> AmazonHttpClient.java:447)
>
>             at com.amazonaws.services.s3.AmazonS3Client.invoke(
> AmazonS3Client.java:4137)
>
>             at com.amazonaws.services.s3.AmazonS3Client.invoke(
> AmazonS3Client.java:4079)
>
>             at com.amazonaws.services.s3.AmazonS3Client.listObjects(
> AmazonS3Client.java:819)
>
>             at org.apache.nifi.processors.aws.s3.ListS3$
> S3ObjectBucketLister.listVersions(ListS3.java:314)
>
>             at org.apache.nifi.processors.aws.s3.ListS3.onTrigger(
> ListS3.java:208)
>
>             at org.apache.nifi.processor.AbstractProcessor.onTrigger(
> AbstractProcessor.java:27)
>
>             at org.apache.nifi.controller.StandardProcessorNode.onTrigger(
> StandardProcessorNode.java:1119)
>
>             at org.apache.nifi.controller.tasks.
> ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
>
>             at org.apache.nifi.controller.tasks.
> ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
>
>             at org.apache.nifi.controller.scheduling.
> TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128)
>
>             at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
>
>             at java.util.concurrent.FutureTask.runAndReset(
> FutureTask.java:308)
>
>             at java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>
>             at java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>
>             at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
>
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
>
>             at java.lang.Thread.run(Thread.java:748)
>
> Caused by: org.xml.sax.SAXParseException: Premature end of file.
>
>             at com.sun.org.apache.xerces.internal.util.
> ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
>
>             at com.sun.org.apache.xerces.internal.util.
> ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
>
>             at com.sun.org.apache.xerces.internal.impl.
> XMLErrorReporter.reportError(XMLErrorReporter.java:400)
>
>             at com.sun.org.apache.xerces.internal.impl.
> XMLErrorReporter.reportError(XMLErrorReporter.java:327)
>
>             at com.sun.org.apache.xerces.internal.impl.XMLScanner.
> reportFatalError(XMLScanner.java:1472)
>
>             at com.sun.org.apache.xerces.internal.impl.
> XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1014)
>
>             at com.sun.org.apache.xerces.internal.impl.
> XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
>
>             at com.sun.org.apache.xerces.internal.impl.
> XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
>
>             at com.sun.org.apache.xerces.internal.impl.
> XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl
> .java:505)
>
>             at com.sun.org.apache.xerces.internal.parsers.
> XML11Configuration.parse(XML11Configuration.java:841)
>
>             at com.sun.org.apache.xerces.internal.parsers.
> XML11Configuration.parse(XML11Configuration.java:770)
>
>             at com.sun.org.apache.xerces.internal.parsers.XMLParser.
> parse(XMLParser.java:141)
>
>             at com.sun.org.apache.xerces.internal.parsers.
> AbstractSAXParser.parse(AbstractSAXParser.java:1213)
>
>             at com.amazonaws.services.s3.model.transform.
> XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:142)
>
>             ... 32 common frames omitted
>
>
>
>
>
> On Mon, Dec 11, 2017 at 1:07 PM, Aruna Sankaralingam <
> [email protected]> wrote:
>
> Attached my nifi-app.log. Could you please let me know what went wrong?
>
>
>
> *From:* Joe Witt [mailto:[email protected]]
> *Sent:* Friday, December 08, 2017 4:04 PM
>
>
> *To:* [email protected]
> *Subject:* Re: ListS3 Processor Error
>
>
>
> Here is an example I found for another processor
>
>
>
>   https://mail-archives.apache.org/mod_mbox/nifi-dev/201509.mbox/%
> 3CCAFddr26AEVqnoQ=mWr7DSNDFVrr9NuYy9GCcXg=4fyycqab...@mail.gmail.com%3E
>
>
>
> Thanks
>
>
>
> On Fri, Dec 8, 2017 at 4:02 PM, Aruna Sankaralingam <
> [email protected]> wrote:
>
> Joe,
>
> Could you please let me know how to turn on the debug logging?
>
>
>
> *From:* Joe Witt [mailto:[email protected]]
> *Sent:* Friday, December 08, 2017 3:59 PM
> *To:* [email protected]
> *Subject:* Re: ListS3 Processor Error
>
>
>
> What version of NiFi?
>
>
>
> Looks like either a classpath/classloader issue OR the amazon client
> library cannot parse the response it is getting back...
>
>
>
> The logs/nifi-app.log should have the full stack trace.  If not you can
> turn on debug logging for that processor and perhaps then it will.
>
>
>
> Thanks
>
>
>
> On Fri, Dec 8, 2017 at 3:56 PM, Aruna Sankaralingam <
> [email protected]> wrote:
>
> I am trying to get a pdf file from S3 and load to Elastic Search. The
> ListS3 processor is giving me this error. Could someone please let me know
> where I am going wrong?
>
>
>
> *20:52:25 UTC*
>
> *ERROR*
>
> *37d7226e-0160-1000-6049-d4c489cd32f3*
>
> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3]
> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session
> due to com.amazonaws.SdkClientException: Failed to parse XML document
> with handler class com.amazonaws.services.s3.model.transform.
> XmlResponsesSaxParser$ListBucketHandler: Failed to parse XML document
> with handler class com.amazonaws.services.s3.model.transform.
> XmlResponsesSaxParser$ListBucketHandler
>
> *20:52:25 UTC*
>
> *WARNING*
>
> *37d7226e-0160-1000-6049-d4c489cd32f3*
>
> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor
> Administratively Yielded for 1 sec due to processing failure
>
> *20:52:26 UTC*
>
> *ERROR*
>
> *37d7226e-0160-1000-6049-d4c489cd32f3*
>
> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3]
> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process due to
> com.amazonaws.SdkClientException: Failed to parse XML document with
> handler class com.amazonaws.services.s3.model.transform.
> XmlResponsesSaxParser$ListBucketHandler; rolling back session: Failed to
> parse XML document with handler class com.amazonaws.services.s3.
> model.transform.XmlResponsesSaxParser$ListBucketHandler
>
> *20:52:26 UTC*
>
> *ERROR*
>
> *37d7226e-0160-1000-6049-d4c489cd32f3*
>
> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3]
> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session
> due to com.amazonaws.SdkClientException: Failed to parse XML document
> with handler class com.amazonaws.services.s3.model.transform.
> XmlResponsesSaxParser$ListBucketHandler: Failed to parse XML document
> with handler class com.amazonaws.services.s3.model.transform.
> XmlResponsesSaxParser$ListBucketHandler
>
> *20:52:26 UTC*
>
> *WARNING*
>
> *37d7226e-0160-1000-6049-d4c489cd32f3*
>
> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor
> Administratively Yielded for 1 sec due to processing failure
>
> Auto-refresh
>
>
>
> <image001.png>
>
>
>
>
>
>
>
>
>
>

Reply via email to