Re: NIFI 1.9.2 stuck in cluster mode

2019-12-03 Thread nayan sharma
Hi Mark,
Thanks for your valuable suggestion. It worked a lot. Now I can understand, 
there is no point in load balancing between FetchSFTP and CompressContent.

After making all the changes it worked but some of the  flow files are stuck 
between CompressContent and putHDFS  https://i.imgur.com/oSYkYuA.png 

And 2nd thing is that 10 FlowFiles between ListSFTP and FetchSFTP is there for 
long time
https://i.imgur.com/Q44VDW6.png

Please suggested where I can start debugging these two issues.

Meanwhile we are migrating to 1.10.0. This time we are doing through HDF and it 
has NIFI 1.9.0 as latest version. We are planing to  replace the library and 
content of 1.9.0 with 1.10.0. Can we go ahead with this approach or is there 
are other way.

Currently 1.9.2 is an independent cluster. 



On 2019/12/03 14:30:43, Mark Payne  wrote: 
> Nayan,
> 
> Looking at the screenshot, I can see two different connections there that are 
> load balanced. One of them holds the nearly 100 GB of data.
> 
> There are a handful of bugs related to load-balanced connections in 1.9.2 
> that were addressed in 1.10.0. If you're relying on load-balanced connections 
> to spread data across the cluster (and this particular flow clearly is), then 
> I would strongly encourage you to upgrade to 1.10.0 because at least one of 
> these bugs does cause the flow to appear to stop flowing.
> 
> That being said, there are two other things that you may want to consider:
> 
> 1. You're trying to load balance 100 GB of data spread across 6 files. So 
> each file is nearly 20 GB of data. It may take a little while to push that 
> from Node A to Node B. If the data is queued up, waiting to go to another 
> node, or is on the way to another node, it will not be shown in the FlowFile 
> listing. That will only show FlowFiles that are queued up to be processed on 
> the node that it currently lives on.
> 
> 2. You should not be using a load balanced connection between FetchSFTP and 
> CompressContent. The way that these processors are designed, the listing 
> should be performed, and then the connection between ListSFTP and FetchSFTP 
> should be load balanced. Once that has happened, the listing has been 
> federated across the cluster, so whichever node receives the listing for File 
> A should be responsible for fetching and processing it. Since the listing has 
> already been spread across the cluster, there is no benefit to fetching the 
> data, and then re-spreading it across the cluster. This will be very 
> expensive with little to no benefit. Similarly, you don't want to load 
> balance between CompressContent and PutHDFS. Simply load balance the listing 
> itself (which is very cheap because the FlowFiles have no content) and the 
> data will automatically be balanced across the cluster.
> 
> Thanks
> -Mark
> 
> 
> > On Dec 3, 2019, at 9:18 AM, nayan sharma  wrote:
> > 
> > Hi,
> > Thanks for your reply.
> > Please find the attachment. Flow files has been for last 7 days. And while 
> > listing flow files it says The queue has no Flow Files.
> > Let me know your thoughts.
> > 
> > Thanks & Regards,
> > Nayan Sharma
> >  +91-8095382952
> > 
> >   
> > 
> > 
> > On Tue, Dec 3, 2019 at 7:34 PM Bryan Bende  > > wrote:
> > Hello,
> > 
> > It would be helpful if you could upload a screenshot of your flow
> > somewhere and send a link.
> > 
> > Thanks,
> > 
> > Bryan
> > 
> > On Tue, Dec 3, 2019 at 6:06 AM nayan sharma  > > wrote:
> > >
> > > Hi,
> > > I am using 2 nodes cluster.
> > > nodes config Heap(max) 48gb & 64 core machine
> > > Processor flow
> > > ListSFTP--->FetchSFTP(all nodes with 10 threads)--->CompressContent(all 
> > > nodes,10 threads)-->PutHDFS
> > >
> > > Queues shows it has 96gb in queue but when I do listing it shows no flow 
> > > files.
> > >
> > > Everything seems stuck, nothing is moving.
> > >
> > > I was wondering and curious also  even if with such heavy machines, What 
> > > I am doing wrong or with which config parameter.
> > >
> > > I couldn't find out solution for by myself so I reached here. Any help or 
> > > suggestion will be much highly appreciated.
> > >
> > > Thanks,
> > > Nayan
> > 
> 
> 


Re: NIFI 1.9.2 stuck in cluster mode

2019-12-03 Thread Mark Payne
Nayan,

Looking at the screenshot, I can see two different connections there that are 
load balanced. One of them holds the nearly 100 GB of data.

There are a handful of bugs related to load-balanced connections in 1.9.2 that 
were addressed in 1.10.0. If you're relying on load-balanced connections to 
spread data across the cluster (and this particular flow clearly is), then I 
would strongly encourage you to upgrade to 1.10.0 because at least one of these 
bugs does cause the flow to appear to stop flowing.

That being said, there are two other things that you may want to consider:

1. You're trying to load balance 100 GB of data spread across 6 files. So each 
file is nearly 20 GB of data. It may take a little while to push that from Node 
A to Node B. If the data is queued up, waiting to go to another node, or is on 
the way to another node, it will not be shown in the FlowFile listing. That 
will only show FlowFiles that are queued up to be processed on the node that it 
currently lives on.

2. You should not be using a load balanced connection between FetchSFTP and 
CompressContent. The way that these processors are designed, the listing should 
be performed, and then the connection between ListSFTP and FetchSFTP should be 
load balanced. Once that has happened, the listing has been federated across 
the cluster, so whichever node receives the listing for File A should be 
responsible for fetching and processing it. Since the listing has already been 
spread across the cluster, there is no benefit to fetching the data, and then 
re-spreading it across the cluster. This will be very expensive with little to 
no benefit. Similarly, you don't want to load balance between CompressContent 
and PutHDFS. Simply load balance the listing itself (which is very cheap 
because the FlowFiles have no content) and the data will automatically be 
balanced across the cluster.

Thanks
-Mark


> On Dec 3, 2019, at 9:18 AM, nayan sharma  wrote:
> 
> Hi,
> Thanks for your reply.
> Please find the attachment. Flow files has been for last 7 days. And while 
> listing flow files it says The queue has no Flow Files.
> Let me know your thoughts.
> 
> Thanks & Regards,
> Nayan Sharma
>  +91-8095382952
> 
>   
> 
> 
> On Tue, Dec 3, 2019 at 7:34 PM Bryan Bende  > wrote:
> Hello,
> 
> It would be helpful if you could upload a screenshot of your flow
> somewhere and send a link.
> 
> Thanks,
> 
> Bryan
> 
> On Tue, Dec 3, 2019 at 6:06 AM nayan sharma  > wrote:
> >
> > Hi,
> > I am using 2 nodes cluster.
> > nodes config Heap(max) 48gb & 64 core machine
> > Processor flow
> > ListSFTP--->FetchSFTP(all nodes with 10 threads)--->CompressContent(all 
> > nodes,10 threads)-->PutHDFS
> >
> > Queues shows it has 96gb in queue but when I do listing it shows no flow 
> > files.
> >
> > Everything seems stuck, nothing is moving.
> >
> > I was wondering and curious also  even if with such heavy machines, What I 
> > am doing wrong or with which config parameter.
> >
> > I couldn't find out solution for by myself so I reached here. Any help or 
> > suggestion will be much highly appreciated.
> >
> > Thanks,
> > Nayan
> 



Re: Sanity check on a use case

2019-12-03 Thread Mark Payne
Agreed. Is a bit unclear whether you're looking to partition by the date field 
or sort by the date field, or both. If you want to partition, then you'd use 
PartitionRecord. If you want to sort, we do not have a SortRecord processor. 
However, what we do have is QueryRecord, which can be used to easily sort the 
data using the query:

SELECT *
FROM FLOWFILE
ORDER BY date

Assuming that 'date' is the name of the field that you want to order by.

Hope this helps!
-Mark

> On Dec 3, 2019, at 9:12 AM, Joe Witt  wrote:
> 
> I read/replied too fast - if you mean that you want them together but sorted 
> by date then it makes sense we'd offer a SortRecord processor.  If you wanted 
> to simply group them by month then PartitionRecord should do the trick.
> 
> On Tue, Dec 3, 2019 at 8:10 AM Joe Witt  > wrote:
> Sounds like a perfect use of PartitionRecord.  And if you wanted larger 
> bundles of such things you could then follow it with MergeRecord correlated 
> on that same partitioned value.
> 
> Thanks
> 
> On Tue, Dec 3, 2019 at 8:09 AM Mike Thomsen  > wrote:
> We need to be able to split a record set by examining a date field and 
> sorting the messages by month into new record sets. The reason is that 
> they're going to be fed to an Elastic cluster that uses an index template to 
> build new indexes based on a date convention. We have a simple solution for 
> now that matches our volume, but I'd like to know if there is a better way to 
> do this out of the box than an ExecuteScript and if there might be others 
> who'd benefit from a broader solution.
> 
> Thanks,
> 
> Mike



Re: Sanity check on a use case

2019-12-03 Thread Bryan Bende
Sounds like PartitionRecord by month.

On Tue, Dec 3, 2019 at 9:12 AM Joe Witt  wrote:
>
> I read/replied too fast - if you mean that you want them together but sorted 
> by date then it makes sense we'd offer a SortRecord processor.  If you wanted 
> to simply group them by month then PartitionRecord should do the trick.
>
> On Tue, Dec 3, 2019 at 8:10 AM Joe Witt  wrote:
>>
>> Sounds like a perfect use of PartitionRecord.  And if you wanted larger 
>> bundles of such things you could then follow it with MergeRecord correlated 
>> on that same partitioned value.
>>
>> Thanks
>>
>> On Tue, Dec 3, 2019 at 8:09 AM Mike Thomsen  wrote:
>>>
>>> We need to be able to split a record set by examining a date field and 
>>> sorting the messages by month into new record sets. The reason is that 
>>> they're going to be fed to an Elastic cluster that uses an index template 
>>> to build new indexes based on a date convention. We have a simple solution 
>>> for now that matches our volume, but I'd like to know if there is a better 
>>> way to do this out of the box than an ExecuteScript and if there might be 
>>> others who'd benefit from a broader solution.
>>>
>>> Thanks,
>>>
>>> Mike


Re: Sanity check on a use case

2019-12-03 Thread Joe Witt
I read/replied too fast - if you mean that you want them together but
sorted by date then it makes sense we'd offer a SortRecord processor.  If
you wanted to simply group them by month then PartitionRecord should do the
trick.

On Tue, Dec 3, 2019 at 8:10 AM Joe Witt  wrote:

> Sounds like a perfect use of PartitionRecord.  And if you wanted larger
> bundles of such things you could then follow it with MergeRecord correlated
> on that same partitioned value.
>
> Thanks
>
> On Tue, Dec 3, 2019 at 8:09 AM Mike Thomsen 
> wrote:
>
>> We need to be able to split a record set by examining a date field and
>> sorting the messages by month into new record sets. The reason is that
>> they're going to be fed to an Elastic cluster that uses an index template
>> to build new indexes based on a date convention. We have a simple solution
>> for now that matches our volume, but I'd like to know if there is a better
>> way to do this out of the box than an ExecuteScript and if there might be
>> others who'd benefit from a broader solution.
>>
>> Thanks,
>>
>> Mike
>>
>


Re: Sanity check on a use case

2019-12-03 Thread Joe Witt
Sounds like a perfect use of PartitionRecord.  And if you wanted larger
bundles of such things you could then follow it with MergeRecord correlated
on that same partitioned value.

Thanks

On Tue, Dec 3, 2019 at 8:09 AM Mike Thomsen  wrote:

> We need to be able to split a record set by examining a date field and
> sorting the messages by month into new record sets. The reason is that
> they're going to be fed to an Elastic cluster that uses an index template
> to build new indexes based on a date convention. We have a simple solution
> for now that matches our volume, but I'd like to know if there is a better
> way to do this out of the box than an ExecuteScript and if there might be
> others who'd benefit from a broader solution.
>
> Thanks,
>
> Mike
>


Sanity check on a use case

2019-12-03 Thread Mike Thomsen
We need to be able to split a record set by examining a date field and
sorting the messages by month into new record sets. The reason is that
they're going to be fed to an Elastic cluster that uses an index template
to build new indexes based on a date convention. We have a simple solution
for now that matches our volume, but I'd like to know if there is a better
way to do this out of the box than an ExecuteScript and if there might be
others who'd benefit from a broader solution.

Thanks,

Mike


Re: NIFI 1.9.2 stuck in cluster mode

2019-12-03 Thread Bryan Bende
Hello,

It would be helpful if you could upload a screenshot of your flow
somewhere and send a link.

Thanks,

Bryan

On Tue, Dec 3, 2019 at 6:06 AM nayan sharma  wrote:
>
> Hi,
> I am using 2 nodes cluster.
> nodes config Heap(max) 48gb & 64 core machine
> Processor flow
> ListSFTP--->FetchSFTP(all nodes with 10 threads)--->CompressContent(all 
> nodes,10 threads)-->PutHDFS
>
> Queues shows it has 96gb in queue but when I do listing it shows no flow 
> files.
>
> Everything seems stuck, nothing is moving.
>
> I was wondering and curious also  even if with such heavy machines, What I am 
> doing wrong or with which config parameter.
>
> I couldn't find out solution for by myself so I reached here. Any help or 
> suggestion will be much highly appreciated.
>
> Thanks,
> Nayan


NIFI 1.9.2 stuck in cluster mode

2019-12-03 Thread nayan sharma
Hi,
I am using 2 nodes cluster.
nodes config Heap(max) 48gb & 64 core machine
Processor flow
ListSFTP--->FetchSFTP(all nodes with 10 threads)--->CompressContent(all 
nodes,10 threads)-->PutHDFS

Queues shows it has 96gb in queue but when I do listing it shows no flow files.

Everything seems stuck, nothing is moving.

I was wondering and curious also  even if with such heavy machines, What I am 
doing wrong or with which config parameter.

I couldn't find out solution for by myself so I reached here. Any help or 
suggestion will be much highly appreciated.

Thanks,
Nayan