Re: Re: NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

2018-02-12 Thread Martini, Adam
Hi Koji and all other who responded,

Thanks for getting this PR out so quickly.  Your original response chain was 
sent to the CC address (Maxwell Eng) so I only now got this email.

I made on comment on your PR, but otw it was the same fix I made to the AMI we 
are testing now with great performance.

Thanks again!

Adam

On 2/12/18, 2:16 PM, "Eng, Maxwell"  wrote:



On 2/9/18, 7:29 PM, "Koji Kawamura"  wrote:

Hi,

The PR is ready for review. I confirmed that performance issue is 
addressed.

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_nifi_pull_2464=DwIFaQ=7DfhQjPWzR3PmWBQVpi-kw=NutwvZ9ElvhWvhi1rBJ0mGwi4rVfiCUP7ys98E-FZMk=DShcVudbDkrAz_aYA4C4uklUFAqfg8A6UmvK2Y3-Nv8=awjLBwtc0H5JFLdSB_3R-Acp5xEwRvrhnk0WRMQH3uk=

I was also testing to see if the
nifi-hbase_1_1_2-client-service-nar-1.6.0-SNAPSHOT.nar can be used in
NiFi 1.5.0 env. But unfortunately it doesn't seem we can put it as it
is.
A validation error occurs saying, 'HBase_1_1_2_ClientService
-1.6.0-SNAPSHOT from org.apache.nifi -
nifi-hbase_1_1_2-client-service-nar is not compatible with
HBaseClientService -1.5.0 from org.apache.nifi -
nifi-standard-services-api-nar'.
It looks like nifi-standard-services needs to be updated, too, but I
think that's a bit risky, it may affect other services.

So, I've wrote a Gist to work around this, with
nifi-hbase_1_1_2-client-service-nar-1.5.0_nifi-4866.nar built with
1.5.0 released commit with cherry-picked performance fix.

https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_ijokarumawak_85db60ca71f1825f543c18c62bf7c3fd=DwIFaQ=7DfhQjPWzR3PmWBQVpi-kw=NutwvZ9ElvhWvhi1rBJ0mGwi4rVfiCUP7ys98E-FZMk=DShcVudbDkrAz_aYA4C4uklUFAqfg8A6UmvK2Y3-Nv8=1-xmJiTPwyx15tjdJdDzNM7W2amJTzkovP2OuP10b08=

Thanks,
Koji



On Sat, Feb 10, 2018 at 10:37 AM, Koji Kawamura 
 wrote:
> Hi Adam,
>
> Thank you very much for reporting the performance issue.
> I created NIFI-4866 and started fixing the issue by moving the
> problematic code block to createConnection.
> After confirming that addresses performance issue, I will send a PR to
> get it merged.
>
> Koji
>
>
> On Sat, Feb 10, 2018 at 9:25 AM, Joe Witt  wrote:
>> adam
>>
>> you should also be able to put the old hbase nar in and switch to 
that
>> version.
>>
>> we now support multiple versions of the same component.
>>
>> thanks
>>
>> On Feb 9, 2018 7:10 PM, "Mike Thomsen"  
wrote:
>>
>>> Adam,
>>>
>>> If you're doing bulk ingestion of JSON, I would recommend using
>>> PutHBaseRecord. I wrote it/contributed it when my team ran into 
similar
>>> limitations doing genomic data ingestion (several 10s of billions 
of Puts
>>> from the 1000 genomes project). If you run into problems with it, 
just post
>>> them and poke me.
>>>
>>> Mike
>>>
>>> On Fri, Feb 9, 2018 at 6:56 PM, Joe Witt  wrote:
>>>
>>> > adam
>>> >
>>> > thanks for reporting and if you can do a contrib that would be 
great!
>>> >
>>> > thanks
>>> > joe
>>> >
>>> > On Feb 9, 2018 6:56 PM, "Martini, Adam"  
wrote:
>>> >
>>> > > Hello NiFi Dev Community,
>>> > >
>>> > > This commit hash (part of the NiFi 1.5.0 release) created 
serious
>>> > > performance issues for HBase Put operations: "
>>> > > 116c8463428c1fb51bfb7a8adfcf23c32fded964".
>>> > >
>>> > > The override of the “toTransitUri” method makes a call to
>>> > > “connection.getAdmin().getClusterStatus().getMaster()
>>> .getHostAndPort()”
>>> > > upon every flow file transfer, which essentially doubles the 
traffic
>>> > > through the HBase connector.  The performance of our 
PutHBaseJSON
>>> > processor
>>> > > dropped to 1/3 after deploying NiFi 1.5.0.
>>> > >
>>> > > Please let us know a timeline for a fix.  We are building and 
testing
>>> our
>>> > > own tar ball in the interim to fix the issue and are happy to
>>> contribute
>>> > > our code back to the project if you would like.
>>> > >
>>> > > All the best and thank you.
>>> > >
>>> > > Adam Martini
>>> > > Senior Developer, Nike Digital
>>> > >
>>> > >
>>> > >
>>> >
>>>






Re: NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

2018-02-09 Thread Koji Kawamura
Hi,

The PR is ready for review. I confirmed that performance issue is addressed.
https://github.com/apache/nifi/pull/2464

I was also testing to see if the
nifi-hbase_1_1_2-client-service-nar-1.6.0-SNAPSHOT.nar can be used in
NiFi 1.5.0 env. But unfortunately it doesn't seem we can put it as it
is.
A validation error occurs saying, 'HBase_1_1_2_ClientService
-1.6.0-SNAPSHOT from org.apache.nifi -
nifi-hbase_1_1_2-client-service-nar is not compatible with
HBaseClientService -1.5.0 from org.apache.nifi -
nifi-standard-services-api-nar'.
It looks like nifi-standard-services needs to be updated, too, but I
think that's a bit risky, it may affect other services.

So, I've wrote a Gist to work around this, with
nifi-hbase_1_1_2-client-service-nar-1.5.0_nifi-4866.nar built with
1.5.0 released commit with cherry-picked performance fix.
https://gist.github.com/ijokarumawak/85db60ca71f1825f543c18c62bf7c3fd

Thanks,
Koji



On Sat, Feb 10, 2018 at 10:37 AM, Koji Kawamura  wrote:
> Hi Adam,
>
> Thank you very much for reporting the performance issue.
> I created NIFI-4866 and started fixing the issue by moving the
> problematic code block to createConnection.
> After confirming that addresses performance issue, I will send a PR to
> get it merged.
>
> Koji
>
>
> On Sat, Feb 10, 2018 at 9:25 AM, Joe Witt  wrote:
>> adam
>>
>> you should also be able to put the old hbase nar in and switch to that
>> version.
>>
>> we now support multiple versions of the same component.
>>
>> thanks
>>
>> On Feb 9, 2018 7:10 PM, "Mike Thomsen"  wrote:
>>
>>> Adam,
>>>
>>> If you're doing bulk ingestion of JSON, I would recommend using
>>> PutHBaseRecord. I wrote it/contributed it when my team ran into similar
>>> limitations doing genomic data ingestion (several 10s of billions of Puts
>>> from the 1000 genomes project). If you run into problems with it, just post
>>> them and poke me.
>>>
>>> Mike
>>>
>>> On Fri, Feb 9, 2018 at 6:56 PM, Joe Witt  wrote:
>>>
>>> > adam
>>> >
>>> > thanks for reporting and if you can do a contrib that would be great!
>>> >
>>> > thanks
>>> > joe
>>> >
>>> > On Feb 9, 2018 6:56 PM, "Martini, Adam"  wrote:
>>> >
>>> > > Hello NiFi Dev Community,
>>> > >
>>> > > This commit hash (part of the NiFi 1.5.0 release) created serious
>>> > > performance issues for HBase Put operations: "
>>> > > 116c8463428c1fb51bfb7a8adfcf23c32fded964".
>>> > >
>>> > > The override of the “toTransitUri” method makes a call to
>>> > > “connection.getAdmin().getClusterStatus().getMaster()
>>> .getHostAndPort()”
>>> > > upon every flow file transfer, which essentially doubles the traffic
>>> > > through the HBase connector.  The performance of our PutHBaseJSON
>>> > processor
>>> > > dropped to 1/3 after deploying NiFi 1.5.0.
>>> > >
>>> > > Please let us know a timeline for a fix.  We are building and testing
>>> our
>>> > > own tar ball in the interim to fix the issue and are happy to
>>> contribute
>>> > > our code back to the project if you would like.
>>> > >
>>> > > All the best and thank you.
>>> > >
>>> > > Adam Martini
>>> > > Senior Developer, Nike Digital
>>> > >
>>> > >
>>> > >
>>> >
>>>


Re: NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

2018-02-09 Thread Koji Kawamura
Hi Adam,

Thank you very much for reporting the performance issue.
I created NIFI-4866 and started fixing the issue by moving the
problematic code block to createConnection.
After confirming that addresses performance issue, I will send a PR to
get it merged.

Koji


On Sat, Feb 10, 2018 at 9:25 AM, Joe Witt  wrote:
> adam
>
> you should also be able to put the old hbase nar in and switch to that
> version.
>
> we now support multiple versions of the same component.
>
> thanks
>
> On Feb 9, 2018 7:10 PM, "Mike Thomsen"  wrote:
>
>> Adam,
>>
>> If you're doing bulk ingestion of JSON, I would recommend using
>> PutHBaseRecord. I wrote it/contributed it when my team ran into similar
>> limitations doing genomic data ingestion (several 10s of billions of Puts
>> from the 1000 genomes project). If you run into problems with it, just post
>> them and poke me.
>>
>> Mike
>>
>> On Fri, Feb 9, 2018 at 6:56 PM, Joe Witt  wrote:
>>
>> > adam
>> >
>> > thanks for reporting and if you can do a contrib that would be great!
>> >
>> > thanks
>> > joe
>> >
>> > On Feb 9, 2018 6:56 PM, "Martini, Adam"  wrote:
>> >
>> > > Hello NiFi Dev Community,
>> > >
>> > > This commit hash (part of the NiFi 1.5.0 release) created serious
>> > > performance issues for HBase Put operations: "
>> > > 116c8463428c1fb51bfb7a8adfcf23c32fded964".
>> > >
>> > > The override of the “toTransitUri” method makes a call to
>> > > “connection.getAdmin().getClusterStatus().getMaster()
>> .getHostAndPort()”
>> > > upon every flow file transfer, which essentially doubles the traffic
>> > > through the HBase connector.  The performance of our PutHBaseJSON
>> > processor
>> > > dropped to 1/3 after deploying NiFi 1.5.0.
>> > >
>> > > Please let us know a timeline for a fix.  We are building and testing
>> our
>> > > own tar ball in the interim to fix the issue and are happy to
>> contribute
>> > > our code back to the project if you would like.
>> > >
>> > > All the best and thank you.
>> > >
>> > > Adam Martini
>> > > Senior Developer, Nike Digital
>> > >
>> > >
>> > >
>> >
>>


Re: NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

2018-02-09 Thread Joe Witt
adam

you should also be able to put the old hbase nar in and switch to that
version.

we now support multiple versions of the same component.

thanks

On Feb 9, 2018 7:10 PM, "Mike Thomsen"  wrote:

> Adam,
>
> If you're doing bulk ingestion of JSON, I would recommend using
> PutHBaseRecord. I wrote it/contributed it when my team ran into similar
> limitations doing genomic data ingestion (several 10s of billions of Puts
> from the 1000 genomes project). If you run into problems with it, just post
> them and poke me.
>
> Mike
>
> On Fri, Feb 9, 2018 at 6:56 PM, Joe Witt  wrote:
>
> > adam
> >
> > thanks for reporting and if you can do a contrib that would be great!
> >
> > thanks
> > joe
> >
> > On Feb 9, 2018 6:56 PM, "Martini, Adam"  wrote:
> >
> > > Hello NiFi Dev Community,
> > >
> > > This commit hash (part of the NiFi 1.5.0 release) created serious
> > > performance issues for HBase Put operations: "
> > > 116c8463428c1fb51bfb7a8adfcf23c32fded964".
> > >
> > > The override of the “toTransitUri” method makes a call to
> > > “connection.getAdmin().getClusterStatus().getMaster()
> .getHostAndPort()”
> > > upon every flow file transfer, which essentially doubles the traffic
> > > through the HBase connector.  The performance of our PutHBaseJSON
> > processor
> > > dropped to 1/3 after deploying NiFi 1.5.0.
> > >
> > > Please let us know a timeline for a fix.  We are building and testing
> our
> > > own tar ball in the interim to fix the issue and are happy to
> contribute
> > > our code back to the project if you would like.
> > >
> > > All the best and thank you.
> > >
> > > Adam Martini
> > > Senior Developer, Nike Digital
> > >
> > >
> > >
> >
>


Re: NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

2018-02-09 Thread Mike Thomsen
Adam,

If you're doing bulk ingestion of JSON, I would recommend using
PutHBaseRecord. I wrote it/contributed it when my team ran into similar
limitations doing genomic data ingestion (several 10s of billions of Puts
from the 1000 genomes project). If you run into problems with it, just post
them and poke me.

Mike

On Fri, Feb 9, 2018 at 6:56 PM, Joe Witt  wrote:

> adam
>
> thanks for reporting and if you can do a contrib that would be great!
>
> thanks
> joe
>
> On Feb 9, 2018 6:56 PM, "Martini, Adam"  wrote:
>
> > Hello NiFi Dev Community,
> >
> > This commit hash (part of the NiFi 1.5.0 release) created serious
> > performance issues for HBase Put operations: "
> > 116c8463428c1fb51bfb7a8adfcf23c32fded964".
> >
> > The override of the “toTransitUri” method makes a call to
> > “connection.getAdmin().getClusterStatus().getMaster().getHostAndPort()”
> > upon every flow file transfer, which essentially doubles the traffic
> > through the HBase connector.  The performance of our PutHBaseJSON
> processor
> > dropped to 1/3 after deploying NiFi 1.5.0.
> >
> > Please let us know a timeline for a fix.  We are building and testing our
> > own tar ball in the interim to fix the issue and are happy to contribute
> > our code back to the project if you would like.
> >
> > All the best and thank you.
> >
> > Adam Martini
> > Senior Developer, Nike Digital
> >
> >
> >
>


Re: NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

2018-02-09 Thread Joe Witt
adam

thanks for reporting and if you can do a contrib that would be great!

thanks
joe

On Feb 9, 2018 6:56 PM, "Martini, Adam"  wrote:

> Hello NiFi Dev Community,
>
> This commit hash (part of the NiFi 1.5.0 release) created serious
> performance issues for HBase Put operations: "
> 116c8463428c1fb51bfb7a8adfcf23c32fded964".
>
> The override of the “toTransitUri” method makes a call to
> “connection.getAdmin().getClusterStatus().getMaster().getHostAndPort()”
> upon every flow file transfer, which essentially doubles the traffic
> through the HBase connector.  The performance of our PutHBaseJSON processor
> dropped to 1/3 after deploying NiFi 1.5.0.
>
> Please let us know a timeline for a fix.  We are building and testing our
> own tar ball in the interim to fix the issue and are happy to contribute
> our code back to the project if you would like.
>
> All the best and thank you.
>
> Adam Martini
> Senior Developer, Nike Digital
>
>
>


NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

2018-02-09 Thread Martini, Adam
Hello NiFi Dev Community,

This commit hash (part of the NiFi 1.5.0 release) created serious performance 
issues for HBase Put operations: "116c8463428c1fb51bfb7a8adfcf23c32fded964".

The override of the “toTransitUri” method makes a call to 
“connection.getAdmin().getClusterStatus().getMaster().getHostAndPort()” upon 
every flow file transfer, which essentially doubles the traffic through the 
HBase connector.  The performance of our PutHBaseJSON processor dropped to 1/3 
after deploying NiFi 1.5.0.

Please let us know a timeline for a fix.  We are building and testing our own 
tar ball in the interim to fix the issue and are happy to contribute our code 
back to the project if you would like.

All the best and thank you.

Adam Martini
Senior Developer, Nike Digital