Re: NiFi flow provides 0 output on large files

Jeff Fri, 25 Sep 2015 08:33:47 -0700

Thanks of this info on the JIRA

Does anyone have any input on the PutToConfluentKafka idea?



> On Sep 25, 2015, at 8:55 AM, Matt Gilman <[email protected]> wrote:
> 
> Yep. JIRA is already created [1] as well as other features we'll be 
> supporting regarding queue management [2].
> 
> Matt
> 
> [1] https://issues.apache.org/jira/browse/NIFI-730 
> <https://issues.apache.org/jira/browse/NIFI-730>
> [2] https://issues.apache.org/jira/browse/NIFI-108 
> <https://issues.apache.org/jira/browse/NIFI-108>
> 
> On Fri, Sep 25, 2015 at 9:52 AM, Ryan Ward <[email protected] 
> <mailto:[email protected]>> wrote:
> This is actually very easy to overlook and miss. Often times we change the 
> file expiration on a queue to simply empty the queue. 
> 
> Could we add in a right click empty queue option, with an are you sure 
> prompt? Is there already a JIRA for this feature?
> 
> Thanks,
> Ryan
> 
> On Fri, Sep 25, 2015 at 9:12 AM, Jeff <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> That was a rookie mistake.
> 
> Indeed the JSON_to_Avro queue was set to 5 sec.  Is there information in a 
> log that states a flow file was expired?  
> 
> My ultimate goal is to put all of this data into a Confluent Kafka topic, 
> taking advantage of the schema registry. I do not believe the current 
> PutToKafka provides the ability to use this registry correct?   I’m curious 
> if anyone is working on PutToConfluentKafka processor?
> 
> Thanks for your help.
> 
> Jeff
> 
>> On Sep 25, 2015, at 7:52 AM, Matt Gilman <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Jeff,
>> 
>> What is the expiration setting on your connections? The little clock icon 
>> indicates that they are configured to automatically expire flowfiles of a 
>> certain age.
>> 
>> Matt
>> 
>> On Fri, Sep 25, 2015 at 8:50 AM, Jeff <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi Aldrin, 
>> 
>> After the DDA_Processor
>> 
>> The below image shows that the GetFile Processed 174.6 MB and the 
>> DDA_Processor is working on 1 file (the 1 in the upper right of the 
>> DDA_Processor box)
>> 
>> <unknown.gif>
>> 
>> The below image shows that the DDA_Processor is complete but data did not 
>> make it to ConvertJSONtoAvro.  No errors are being generated.  DDA_Processor 
>> takes fixed width data and converts it to JSON.  
>> 
>> <unknown.gif>
>> 
>> Thanks
>> 
>> 
>>> On Sep 25, 2015, at 7:30 AM, Aldrin Piri <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Jeff,
>>> 
>>> With regards to:
>>> 
>>> "Anything over, the GetFile and DDA_Processor shows data movement but the 
>>> no other downstream processor shows movement."
>>> 
>>> Are you referencing downstream processors starting immediately after the 
>>> DDA_Processor (ConvertJsonToAvro) or starting immediately after the 
>>> ConvertJsonToAvro processor?
>>> 
>>> In the case of starting immediately after the DDA Processor, as it is a 
>>> custom processor, we would need some additional information as to how this 
>>> processor is behaving.  In the case of the second condition, if you have 
>>> some additional context as to the format of the data that is problematic to 
>>> what you are seeing (the effective "schema" of the JSON) would be helpful 
>>> in tracking down the issue.
>>> 
>>> Thanks!
>>> Aldrin
>>> 
>>> On Fri, Sep 25, 2015 at 8:22 AM, Jeff <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Hi Adam,
>>> 
>>> 
>>> I have a flow that does the following;
>>> 
>>> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute > PutFile
>>> 
>>> My source file has 182897 rows at 1001 bytes per row.  If I do any number 
>>> of rows under ~15000 an output file is created.  Anything over, the GetFile 
>>> and DDA_Processor shows data movement but the no other downstream processor 
>>> shows movement.  
>>> 
>>> I confirmed that it is not a data problem by processing a 10,000 row file 
>>> successfully, then concatenating 10,000 rows into one file twice.  
>>> 
>>> Thanks for your insight.
>>> 
>>> Jeff
>>> <Mail Attachment.gif> 
>>> 
>>> 
>>>> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Jeff,
>>>> 
>>>> This seems to be a bit different as the processor is showing data as 
>>>> having been written and there is a listing of one FlowFile of 381 MB being 
>>>> transferred out from the processor.  Could you provide additional 
>>>> information as to how data is not being sent out in the manner 
>>>> anticipated?  If you can track the issue down more, let us know.  May be 
>>>> helpful to create another message to help us track the issues separately 
>>>> as we work through them.
>>>> 
>>>> Thanks!
>>>> 
>>>> Adam,
>>>> 
>>>> Found a sizable JSON file to work against and have been doing some initial 
>>>> exploration.  With the large files, it certainly is a nontrivial process.  
>>>> At cursory inspection, a good portion of processing seems to be spent on 
>>>> validation.  There are some ways to tweak the strictness of this with the 
>>>> supporting library, but will have to dive in a bit more.
>>>> 
>>>> 
>>>> 
>>>> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> 
>>>> 
>>>> I’m having a very similar problem.  The process picks up the file, a 
>>>> custom processor does it’s thing but no data is sent out.
>>>> 
>>>> <unknown.gif>
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 
>

Re: NiFi flow provides 0 output on large files

Reply via email to