Thanks! Let's move the discussion over to the PR.
Matt Sent from my iPhone > On Sep 10, 2016, at 12:14 AM, Peter Wicks (pwicks) <[email protected]> wrote: > > Matt, > > I’ve identified the source of the issue, created a patch/unit test, and PR. > In StandardFlowFileQueue: writeSwapFilesIfNecessary. When it calculates the > `numSwapFiles`, if the number of FlowFiles in the queue is perfectly > splitable (in my case 100000/20000 = 5) and the Active Queue is empty then > ALL files move to swap and none are left in Active. > > https://github.com/apache/nifi/pull/1000 > > If you have a chance to take a look at the PR I’d appreciate it. > > Thanks, > Peter > > From: Matt Gilman [mailto:[email protected]] > Sent: Friday, September 09, 2016 5:00 PM > To: [email protected] > Subject: Re: Erroneous Queue has No FlowFiles message > > Peter, > > Thanks for the confirmation. I think there is some case were hitting here > where some flowfiles are being swapped instead of added back to the active > queue. The queue listing only returns the top 100 entries in the active > queue. Haven't identified the case that's causing it yet but definitely have > a better idea what's going on now. > > Thanks > > Matt > > Sent from my iPhone > > On Sep 9, 2016, at 5:42 PM, Peter Wicks (pwicks) <[email protected]> wrote: > > Matt, > > I followed the swapping train of thought and debugged the code. When I debug > the code where it gets the files the `size` variable looks like this: > > FlowFile Queue Size[ ActiveQueue=[0, 0 Bytes], Swap Queue=[100000, 26600000 > Bytes], Swap Files=[10], Unacknowledged=[0, 0 Bytes] ] > > But the List FlowFiles command only looks at the Active queue… > > That looks like the root cause, what I don’t know is if this is by design. > > --Peter > > From: Peter Wicks (pwicks) > Sent: Friday, September 09, 2016 3:28 PM > To: '[email protected]' <[email protected]> > Subject: RE: Erroneous Queue has No FlowFiles message > > Matt, > > You also asked in an earlier email if I could still reproduce it, and if so > to try enabling DEBUG level logging. I am able to reproduce, so I enabled it: > > 2016-09-09 21:27:28,352 DEBUG [List FlowFiles for Connection > 0f620e2d-0157-1000-4a1d-fd988c59e290] o.a.n.controller.StandardFlowFileQueue > FlowFileQueue[id=0f620e2d-0157-1000-4a1d-fd988c59e290] Acquired lock to > perform listing of FlowFiles > > 2016-09-09 21:27:28,353 DEBUG [List FlowFiles for Connection > 0f620e2d-0157-1000-4a1d-fd988c59e290] o.a.n.controller.StandardFlowFileQueue > FlowFileQueue[id=0f620e2d-0157-1000-4a1d-fd988c59e290] Finished listing > FlowFiles for active queue with a total of 0 results > > 2016-09-09 21:27:29,656 INFO [NiFi Web Server-112] > o.a.n.controller.StandardFlowFileQueue Canceling ListFlowFile Request with ID > 10d92339-0157-1000-42f4-464c37340fdb > > Thanks, > Peter > > From: Peter Wicks (pwicks) > Sent: Friday, September 09, 2016 3:15 PM > To: [email protected] > Subject: RE: Erroneous Queue has No FlowFiles message > > Matt, > > PutSQL is the end of the line, no downstream processors. > Batch size is 1000, yes I have fragmented transactions set to false. > > nifi.queue.swap.threshold=20000 > > --Peter > > > From: Matt Gilman [mailto:[email protected]] > Sent: Friday, September 09, 2016 2:23 PM > To: [email protected] > Subject: Re: Erroneous Queue has No FlowFiles message > > Peter, > > Would you be able to share what you've configured for the batch size of > PutSQL (assuming that 'fragmented transactions' is disabled) and what your > swap threshold is configured to (nifi.queue.swap.threshold in > nifi.properties)? > > Also, what is following the PutSQL? Had any of those connections exceeded > their configured back pressure threshold? > > Thanks again. > > Matt > > On Fri, Sep 9, 2016 at 11:18 AM, Peter Wicks (pwicks) <[email protected]> > wrote: > PutSQL. The 100k FlowFiles are all SQL Insert queries with associated > attributys, generated by a JSONToSQL processor. > > From: Matt Gilman [mailto:[email protected]] > Sent: Friday, September 09, 2016 8:51 AM > > To: [email protected] > Subject: Re: Erroneous Queue has No FlowFiles message > > Peter, > > What is the processor downstream of the connection in question? Thanks. > > Matt > > On Fri, Sep 9, 2016 at 10:39 AM, Matt Gilman <[email protected]> wrote: > Peter, > > Thanks for the answers. Still not quite sure what's causing this and am > trying to narrow down the possible cause. Are you still able to replicate the > issue? If so, can you enable debug level logging for > > org.apache.nifi.controller.StandardFlowFileQueue > > and see if there are any meaningful messages in the nifi-app.log? > > Thanks! > > Matt > > > On Fri, Sep 9, 2016 at 9:52 AM, Peter Wicks (pwicks) <[email protected]> > wrote: > Matt, > > This is not a cluster. > Yes, it’s secured. Kerberos. > > The thing that gets me is I can list another queue on the same graph/same > processor group. > > --Peter > > > From: Matt Gilman [mailto:[email protected]] > Sent: Friday, September 09, 2016 5:25 AM > > To: [email protected] > Subject: Re: Erroneous Queue has No FlowFiles message > > Peter, > > Thanks for the details! These will be very helpful investigating what's > happening here. A couple follow-up questions... > > - Is this a cluster? > - Is this instance secured? > > Thanks > > Matt > > On Fri, Sep 9, 2016 at 12:13 AM, Peter Wicks (pwicks) <[email protected]> > wrote: > Gunjan, > > Thanks for the response. I included those messages to emphasize the > difference between a normal Queue List and mine. In a normal queue list the > GET step includes a non-empty “flowFileSummaries” array, assuming there are > FlowFiles to show. > When I list my other queue, the one with 23 FlowFiles in it, I get back an > array with 23 entries. Based on the JSON I’m assuming that my queue with > 100,000 files in it should return 100, but instead I get 0. > > Thanks, > Peter > > From: Gunjan Dave [mailto:[email protected]] > Sent: Thursday, September 08, 2016 9:26 PM > To: [email protected] > Subject: Re: Erroneous Queue has No FlowFiles message > > Hi Peter, once you post the request, your first step, you get a listing > request reference handle UUID as part of response. > This UUID is used to perform the all the operations on the queue. > This UUID is active until a DELETE request is sent. Once you delete the > active request, you get the message you mentioned in the logs, this is not an > issue. > If you check the developer panel in chrome, you will see all 3 operations, > post-get-delete in succession. > > > On Fri, Sep 9, 2016, 8:48 AM Peter Wicks (pwicks) <[email protected]> wrote: > Running NiFI 1.0.0, I’m listing a queue that has 100k files queued. I’ve > stopped both the incoming and outgoing processors, so the files are just > hanging out in the queue, no possible motion. > > I get, “The queue has no FlowFiles” message. Here are the actual responses > from the REST calls: > > POST - Listing-requests > {"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https://localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/listing-requests/0cee44de-0157-1000-5668-6e93a465e227","submissionTime":"09/09/2016 > 03:12:04.318 GMT+00:00","lastUpdated":"03:12:04 > GMT+00:00","percentCompleted":0,"finished":false,"maxResults":100,"state":"Waiting > for other queue requests to > complete","queueSize":{"byteCount":25400000,"objectCount":100000},"sourceRunning":false,"destinationRunning":false}} > > GET > {"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https:// > > localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/listing-requests/0cee44de-0157-1000-5668-6e93a465e227","submissionTime":"09/09/2016 > 03:12:04.318 GMT+00:00","lastUpdated":"03:12:04 > GMT+00:00","percentCompleted":100,"finished":true,"maxResults":100,"state":"Completed > > successfully","queueSize":{"byteCount":25400000,"objectCount":100000},"flowFileSummaries":[],"sourceRunning":false,"destinationRunning":false}} > > DELETE > {"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https:// > > localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/listing-requests/0cee44de-0157-1000-5668-6e93a465e227","submissionTime":"09/09/2016 > 03:12:04.318 GMT+00:00","lastUpdated":"03:12:04 > GMT+00:00","percentCompleted":100,"finished":true,"maxResults":100,"state":"Completed > > successfully","queueSize":{"byteCount":25400000,"objectCount":100000},"sourceRunning":false,"destinationRunning":false}} > > On a subsequent test (thus the difference in ID’s) I checked the nifi-app.log > file and found this single message: > > 2016-09-09 03:15:50,043 INFO [NiFi Web Server-828] > o.a.n.controller.StandardFlowFileQueue Canceling ListFlowFile Request with ID > 0cf1b178-0157-1000-9111-9b889415bcdc > > Not clear why it was canceled. > > I went up one step in the process, and that queue has 23 items in it. I was > able to list it without issue. > > Any ideas why I can’t list the queue? > > Thanks, > Peter Wicks > > > >
