Looking at the code it suggests the two cases where it would come up with nothing for listing (when there are items to list) is if there is state already tracking lastModified of a previously pulled object or previously pulled object with the same key. Since you're not even getting to the point where state is being persisted it suggests it really is getting nothing back on the listing request.
Just in looking at the docs I wonder if you'll need to explicitly set the prefix value to something like '/'? JeffStorck/JamesWing: Any ideas? We should update the code to provide debug information when listed objects are skipped. Thanks Joe On Thu, Jul 20, 2017 at 2:44 PM, Laurens Vets <[email protected]> wrote: > I enabled DEBUG logging and I see the following: > > > 2017-07-20 11:39:08,670 DEBUG [StandardProcessScheduler Thread-1] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Using aws credentials for > creating client > 2017-07-20 11:39:08,670 INFO [StandardProcessScheduler Thread-1] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Creating client with AWS > credentials > 2017-07-20 11:39:08,672 INFO [StandardProcessScheduler Thread-1] > o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled > ListS3[id=6119854d-015d-1000-341f-b294838980af] to run with 1 threads > 2017-07-20 11:39:08,674 DEBUG [Timer-Driven Process Thread-4] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Returning CLUSTER State: > StandardStateMap[version=-1, values={}] > 2017-07-20 11:39:09,089 INFO [Flow Service Tasks Thread-2] > o.a.nifi.controller.StandardFlowService Saved flow controller > org.apache.nifi.controller.FlowController@7c10f421 // Another save pending = > false > 2017-07-20 11:39:09,249 INFO [Timer-Driven Process Thread-4] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Successfully listed S3 > bucket BUCKETNAME in 575 millis > 2017-07-20 11:39:09,249 DEBUG [Timer-Driven Process Thread-4] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] No new objects in S3 bucket > BUCKETNAME to list. Yielding. > 2017-07-20 11:39:09,249 DEBUG [Timer-Driven Process Thread-4] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] has chosen to yield its > resources; will not be scheduled to run again for 1000 milliseconds > 2017-07-20 11:39:10,246 INFO [Write-Ahead Local State Provider Maintenance] > org.wali.MinimalLockingWriteAheadLog > org.wali.MinimalLockingWriteAheadLog@2480acc3 checkpointed with 0 Records > and 0 Swap Files in 9 milliseconds (Stop-the-world time = 1 milliseconds, > Clear Edit Logs time = 0 millis), max Transaction ID -1 > 2017-07-20 11:39:10,250 DEBUG [Timer-Driven Process Thread-4] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Returning CLUSTER State: > StandardStateMap[version=-1, values={}] > 2017-07-20 11:39:10,288 INFO [Timer-Driven Process Thread-4] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Successfully listed S3 > bucket BUCKETNAME in 37 millis > 2017-07-20 11:39:10,288 DEBUG [Timer-Driven Process Thread-4] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] No new objects in S3 bucket > BUCKETNAME to list. Yielding. > 2017-07-20 11:39:10,288 DEBUG [Timer-Driven Process Thread-4] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] has chosen to yield its > resources; will not be scheduled to run again for 1000 milliseconds > 2017-07-20 11:39:10,558 INFO [pool-8-thread-1] > o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile > Repository > 2017-07-20 11:39:10,633 INFO [pool-8-thread-1] > org.wali.MinimalLockingWriteAheadLog > org.wali.MinimalLockingWriteAheadLog@1773faf8 checkpointed with 0 Records > and 0 Swap Files in 74 milliseconds (Stop-the-world time = 34 milliseconds, > Clear Edit Logs time = 30 millis), max Transaction ID -1 > 2017-07-20 11:39:10,633 INFO [pool-8-thread-1] > o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile > Repository with 0 records in 75 milliseconds > 2017-07-20 11:39:11,289 DEBUG [Timer-Driven Process Thread-10] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Returning CLUSTER State: > StandardStateMap[version=-1, values={}] > 2017-07-20 11:39:11,328 INFO [Timer-Driven Process Thread-10] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Successfully listed S3 > bucket BUCKETNAME in 39 millis > 2017-07-20 11:39:11,328 DEBUG [Timer-Driven Process Thread-10] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] No new objects in S3 bucket > BUCKETNAME to list. Yielding. > 2017-07-20 11:39:11,328 DEBUG [Timer-Driven Process Thread-10] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] has chosen to yield its > resources; will not be scheduled to run again for 1000 milliseconds > 2017-07-20 11:39:12,329 DEBUG [Timer-Driven Process Thread-2] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Returning CLUSTER State: > StandardStateMap[version=-1, values={}] > 2017-07-20 11:39:12,376 INFO [Timer-Driven Process Thread-2] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Successfully listed S3 > bucket BUCKETNAME in 46 millis > 2017-07-20 11:39:12,376 DEBUG [Timer-Driven Process Thread-2] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] No new objects in S3 bucket > BUCKETNAME to list. Yielding. > 2017-07-20 11:39:12,376 DEBUG [Timer-Driven Process Thread-2] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] has chosen to yield its > resources; will not be scheduled to run again for 1000 milliseconds > 2017-07-20 11:39:13,377 DEBUG [Timer-Driven Process Thread-2] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Returning CLUSTER State: > StandardStateMap[version=-1, values={}] > 2017-07-20 11:39:13,411 INFO [Timer-Driven Process Thread-2] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Successfully listed S3 > bucket BUCKETNAME in 34 millis > 2017-07-20 11:39:13,411 DEBUG [Timer-Driven Process Thread-2] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] No new objects in S3 bucket > BUCKETNAME to list. Yielding. > 2017-07-20 11:39:13,412 DEBUG [Timer-Driven Process Thread-2] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] has chosen to yield its > resources; will not be scheduled to run again for 1000 milliseconds > 2017-07-20 11:39:14,413 DEBUG [Timer-Driven Process Thread-4] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Returning CLUSTER State: > StandardStateMap[version=-1, values={}] > 2017-07-20 11:39:14,449 INFO [Timer-Driven Process Thread-4] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Successfully listed S3 > bucket BUCKETNAME in 36 millis > 2017-07-20 11:39:14,450 DEBUG [Timer-Driven Process Thread-4] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] No new objects in S3 bucket > BUCKETNAME to list. Yielding. > 2017-07-20 11:39:14,450 DEBUG [Timer-Driven Process Thread-4] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] has chosen to yield its > resources; will not be scheduled to run again for 1000 milliseconds > 2017-07-20 11:39:15,451 DEBUG [Timer-Driven Process Thread-8] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Returning CLUSTER State: > StandardStateMap[version=-1, values={}] > 2017-07-20 11:39:15,506 INFO [Timer-Driven Process Thread-8] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] Successfully listed S3 > bucket BUCKETNAME in 54 millis > 2017-07-20 11:39:15,506 DEBUG [Timer-Driven Process Thread-8] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] No new objects in S3 bucket > BUCKETNAME to list. Yielding. > 2017-07-20 11:39:15,506 DEBUG [Timer-Driven Process Thread-8] > org.apache.nifi.processors.aws.s3.ListS3 > ListS3[id=6119854d-015d-1000-341f-b294838980af] has chosen to yield its > resources; will not be scheduled to run again for 1000 milliseconds > > My S3 log structure is: > > BUCKETNAME/AWSLogs/ARN/CloudTrail-Digest/ap-northeast-1/2017/07/03/869964652807_CloudTrail-Digest_ap-northeast-1_cloudtrail-orca_us-west-2_20170703T192938Z.json.gz > > Any idea why it would not recurse into the BUCKETNAME? > > On 2017-07-20 09:31, Laurens Vets wrote: > > There's no state currently, ie state is empty. > > I would think that when there's no state, ListS3 would start from the > beginning? > > FYI, the only items I've filled in in the ListS3 processor are: > > - Bucket: Our bucketname. > > - Region: Apparently I have to choose one, this is set to us-west-2 > > - Access Key: <set> > > - Secret Key: <set> > > I'm pretty sure the above settings are correct because when I do "aws s3 ls > s3://<bucketname>" with the above keys, I do get output. > > On 2017-07-20 09:18, Pierre Villard wrote: > > Can you check what's the current state of the processor? (right click / view > state) > Are you sure there is data to retrieve more recent that what is currently in > the processor's state? > > Pierre > > 2017-07-20 18:16 GMT+02:00 Laurens Vets <[email protected]>: >> >> I'm running 1.3.0 at the moment... I'm tempted to go back to 1.2.0 as I >> remember I got something working with S3. >> >> Can I just downgrade? >> >> On 2017-07-20 09:12, Adam Lamar wrote: >> >> Hi Laurens, >> >> What NiFi version are you running? There was an issue where ListS3 would >> spin like that on buckets with many files, but it was fixed in version 1.1.0 >> IIRC. >> >> Hope that helps, >> Adam >> >> >> On Thu, Jul 20, 2017 at 10:05 AM, Laurens Vets <[email protected]> wrote: >>> >>> Hello, >>> >>> I'm trying to ingest AWS CloudTrail logs with NiFi. I think I configured >>> ListS3 correctly, but it has been running for hours & hours without showing >>> anything (except for the # of tasks). >>> >>> How long does it take before I should see _any_ output/state/something in >>> the ListS3 processor? >> >> > >
