Re: NiFi flow provides 0 output on large files

Jeff Fri, 25 Sep 2015 15:20:04 -0700

Hi Joe, 

The Confluent Kafka platform I’ve been working with is open source.


Below are some links but I’m not sure if this is what you are looking for. 

https://github.com/confluentinc <https://github.com/confluentinc>

http://www.confluent.io/product <http://www.confluent.io/product>

http://docs.confluent.io/1.0.1/platform.html 
<http://docs.confluent.io/1.0.1/platform.html>


> On Sep 25, 2015, at 11:38 AM, Joe Witt <[email protected]> wrote:
> 
> If whatever it would mean is open source friendly it sounds like a
> fine idea.  Seems unlikely we'd need to have something vendor
> specific.  Jeff are there are docs you can direct us to for this?
> 
> On Fri, Sep 25, 2015 at 11:33 AM, Jeff <[email protected]> wrote:
>> 
>> Thanks of this info on the JIRA
>> 
>> Does anyone have any input on the PutToConfluentKafka idea?
>> 
>> 
>> On Sep 25, 2015, at 8:55 AM, Matt Gilman <[email protected]> wrote:
>> 
>> Yep. JIRA is already created [1] as well as other features we'll be
>> supporting regarding queue management [2].
>> 
>> Matt
>> 
>> [1] https://issues.apache.org/jira/browse/NIFI-730
>> [2] https://issues.apache.org/jira/browse/NIFI-108
>> 
>> On Fri, Sep 25, 2015 at 9:52 AM, Ryan Ward <[email protected]> wrote:
>>> 
>>> This is actually very easy to overlook and miss. Often times we change the
>>> file expiration on a queue to simply empty the queue.
>>> 
>>> Could we add in a right click empty queue option, with an are you sure
>>> prompt? Is there already a JIRA for this feature?
>>> 
>>> Thanks,
>>> Ryan
>>> 
>>> On Fri, Sep 25, 2015 at 9:12 AM, Jeff <[email protected]> wrote:
>>>> 
>>>> 
>>>> That was a rookie mistake.
>>>> 
>>>> Indeed the JSON_to_Avro queue was set to 5 sec.  Is there information in
>>>> a log that states a flow file was expired?
>>>> 
>>>> My ultimate goal is to put all of this data into a Confluent Kafka topic,
>>>> taking advantage of the schema registry. I do not believe the current
>>>> PutToKafka provides the ability to use this registry correct?   I’m curious
>>>> if anyone is working on PutToConfluentKafka processor?
>>>> 
>>>> Thanks for your help.
>>>> 
>>>> Jeff
>>>> 
>>>> On Sep 25, 2015, at 7:52 AM, Matt Gilman <[email protected]> wrote:
>>>> 
>>>> Jeff,
>>>> 
>>>> What is the expiration setting on your connections? The little clock icon
>>>> indicates that they are configured to automatically expire flowfiles of a
>>>> certain age.
>>>> 
>>>> Matt
>>>> 
>>>> On Fri, Sep 25, 2015 at 8:50 AM, Jeff <[email protected]> wrote:
>>>>> 
>>>>> 
>>>>> Hi Aldrin,
>>>>> 
>>>>> After the DDA_Processor
>>>>> 
>>>>> The below image shows that the GetFile Processed 174.6 MB and the
>>>>> DDA_Processor is working on 1 file (the 1 in the upper right of the
>>>>> DDA_Processor box)
>>>>> 
>>>>> <unknown.gif>
>>>>> 
>>>>> The below image shows that the DDA_Processor is complete but data did
>>>>> not make it to ConvertJSONtoAvro.  No errors are being generated.
>>>>> DDA_Processor takes fixed width data and converts it to JSON.
>>>>> 
>>>>> <unknown.gif>
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> 
>>>>> On Sep 25, 2015, at 7:30 AM, Aldrin Piri <[email protected]> wrote:
>>>>> 
>>>>> Jeff,
>>>>> 
>>>>> With regards to:
>>>>> 
>>>>> "Anything over, the GetFile and DDA_Processor shows data movement but
>>>>> the no other downstream processor shows movement."
>>>>> 
>>>>> Are you referencing downstream processors starting immediately after the
>>>>> DDA_Processor (ConvertJsonToAvro) or starting immediately after the
>>>>> ConvertJsonToAvro processor?
>>>>> 
>>>>> In the case of starting immediately after the DDA Processor, as it is a
>>>>> custom processor, we would need some additional information as to how this
>>>>> processor is behaving.  In the case of the second condition, if you have
>>>>> some additional context as to the format of the data that is problematic 
>>>>> to
>>>>> what you are seeing (the effective "schema" of the JSON) would be helpful 
>>>>> in
>>>>> tracking down the issue.
>>>>> 
>>>>> Thanks!
>>>>> Aldrin
>>>>> 
>>>>> On Fri, Sep 25, 2015 at 8:22 AM, Jeff <[email protected]> wrote:
>>>>>> 
>>>>>> Hi Adam,
>>>>>> 
>>>>>> 
>>>>>> I have a flow that does the following;
>>>>>> 
>>>>>> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute > PutFile
>>>>>> 
>>>>>> My source file has 182897 rows at 1001 bytes per row.  If I do any
>>>>>> number of rows under ~15000 an output file is created.  Anything over, 
>>>>>> the
>>>>>> GetFile and DDA_Processor shows data movement but the no other downstream
>>>>>> processor shows movement.
>>>>>> 
>>>>>> I confirmed that it is not a data problem by processing a 10,000 row
>>>>>> file successfully, then concatenating 10,000 rows into one file twice.
>>>>>> 
>>>>>> Thanks for your insight.
>>>>>> 
>>>>>> Jeff
>>>>>> <Mail Attachment.gif>
>>>>>> 
>>>>>> 
>>>>>> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <[email protected]> wrote:
>>>>>> 
>>>>>> Jeff,
>>>>>> 
>>>>>> This seems to be a bit different as the processor is showing data as
>>>>>> having been written and there is a listing of one FlowFile of 381 MB 
>>>>>> being
>>>>>> transferred out from the processor.  Could you provide additional
>>>>>> information as to how data is not being sent out in the manner 
>>>>>> anticipated?
>>>>>> If you can track the issue down more, let us know.  May be helpful to 
>>>>>> create
>>>>>> another message to help us track the issues separately as we work through
>>>>>> them.
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> Adam,
>>>>>> 
>>>>>> Found a sizable JSON file to work against and have been doing some
>>>>>> initial exploration.  With the large files, it certainly is a nontrivial
>>>>>> process.  At cursory inspection, a good portion of processing seems to be
>>>>>> spent on validation.  There are some ways to tweak the strictness of this
>>>>>> with the supporting library, but will have to dive in a bit more.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <[email protected]> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I’m having a very similar problem.  The process picks up the file, a
>>>>>>> custom processor does it’s thing but no data is sent out.
>>>>>>> 
>>>>>>> <unknown.gif>
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>>

Re: NiFi flow provides 0 output on large files

Reply via email to