[ANNOUNCE] Apache Phoenix 5.0.0 released

2018-07-13 Thread rajeshb...@apache.org
The Apache Phoenix team is pleased to announce release of it's next major
version 5.0.0
compatible with HBase 2.0+. Apache Phoenix enables SQL-based OLTP and
operational
analytics for Apache Hadoop using Apache HBase as its backing store and
providing
integration with other projects in the Apache ecosystem such as Spark,
Hive, Pig, Flume, and
MapReduce.

The 5.0.0 release has feature parity with recently released 4.14.0.
Highlights of the release include:

* Cleanup deprecated APIs and leveraged new Performant APIs
* Refactored coprocessor implementations to make use new Coprocessor or
Observer APIs in HBase 2.0
* Hive and Spark Integration works in latest versions of Hive(3.0.0) and
Spark(2.3.0) respectively.

For more details, visit our blog here [1] and download source and binaries
here [2].

Thanks,
Rajeshbabu (on behalf of the Apache Phoenix team)

[1]
https://blogs.apache.org/phoenix/entry/apache-phoenix-releases-next-major
[2] http://phoenix.apache.org/download.html


Re: Upsert is EXTREMELY slow

2018-07-13 Thread alchemist
Thanks so much for your response.

Now I am getting better perforamnce i.e 15K per minute,  I made two changes. 
I disabled pheonix transaction.


  phoenix.transactions.enabled
  false


And I removed the transaction connection.commit();  Logically this should
not make any difference because by default transactions are disabled.  



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/


Re: Upsert is EXTREMELY slow

2018-07-13 Thread Josh Elser
Sorry, I was brief and didn't get my point across. I meant to say the 
same thing you did.


Someone manually submitting two updates to an index is naively faster 
that what Phoenix goes through to automatically (and safely) do this.


On 7/13/18 12:07 PM, James Taylor wrote:
Phoenix won’t be slower to update secondary indexes than a use case 
would be. Both have to do the writes to a second table to keep it in sync.


On Fri, Jul 13, 2018 at 8:39 AM Josh Elser > wrote:


Also, they're relying on Phoenix to do secondary index updates for them.

Obviously, you can do this faster than Phoenix can if you know the
exact
use-case.

On 7/12/18 6:31 PM, Pedro Boado wrote:
 > A tip for performance is reusing the same preparedStatement , just
 > clearParameters() , set values and executeUpdate() over and over
again.
 > Don't close the statement or connections after each upsert. Also, I
 > haven't seen any noticeable benefit on using jdbc batches as Phoenix
 > controls batching by when commit() is called.
 >
 > Keep an eye on not calling commit after every executeUpdate
(that's a
 > real performance killer) . Batch commits in every ~1k upserts .
 >
 > Also that attempt of asynchronous code is probably another
performance
 > killer. Are you creating a new Runnable per database write and
opening
 > and closing dB connections per write? Just spawn a few threads (5
to 10,
 > if client cpu is not maxed keep increasing it) and send upserts
in a for
 > loop reusing preparedStatement and connections.
 >
 > With a cluster that size I would expect seeing tens of thousands of
 > writes per second.
 >
 > Finally have you checked that all RS receive same traffic ?
 >
 > On Thu, 12 Jul 2018, 23:10 Pedro Boado, mailto:pedro.bo...@gmail.com>
 > >> wrote:
 >
 >     I believe it's related to your client code - In our use case
we do
 >     easily 15k writes/sec in a cluster lower specced than yours.
 >
 >     Check that your jdbc connection has autocommit off so Phoenix can
 >     batch writes and that table has a reasonable
UPDATE_CACHE_FREQUENCY
 >     ( more than 6 ).
 >
 >
 >     On Thu, 12 Jul 2018, 21:54 alchemist,
mailto:alchemistsrivast...@gmail.com>
 >     >> wrote:
 >
 >         Thanks a lot for your help.
 >         Our test is inserting new rows individually. For our use
case,
 >         we are
 >         benchmarking that we could be able to get 10,000 new rows
in a
 >         minute, using
 >         a cluster of writers if needed.
 >         When executing the inserts with Phoenix API (UPSERT) we have
 >         been able to
 >         get up to 6,000 new rows per minute.
 >
 >         We changed our test to perform the inserts individually using
 >         the HBase API
 >         (Put) rather than Phoenix API (UPSERT) and got an
improvement of
 >         more than
 >         10x. (up to 60,000 rows per minute).
 >
 >         What would explain this difference? I assume that in both
cases
 >         HBase must
 >         grab the locks individually in the same way.
 >
 >
 >
 >         --
 >         Sent from:
http://apache-phoenix-user-list.1124778.n5.nabble.com/
 >



Re: Upsert is EXTREMELY slow

2018-07-13 Thread James Taylor
Phoenix won’t be slower to update secondary indexes than a use case would
be. Both have to do the writes to a second table to keep it in sync.

On Fri, Jul 13, 2018 at 8:39 AM Josh Elser  wrote:

> Also, they're relying on Phoenix to do secondary index updates for them.
>
> Obviously, you can do this faster than Phoenix can if you know the exact
> use-case.
>
> On 7/12/18 6:31 PM, Pedro Boado wrote:
> > A tip for performance is reusing the same preparedStatement , just
> > clearParameters() , set values and executeUpdate() over and over again.
> > Don't close the statement or connections after each upsert. Also, I
> > haven't seen any noticeable benefit on using jdbc batches as Phoenix
> > controls batching by when commit() is called.
> >
> > Keep an eye on not calling commit after every executeUpdate (that's a
> > real performance killer) . Batch commits in every ~1k upserts .
> >
> > Also that attempt of asynchronous code is probably another performance
> > killer. Are you creating a new Runnable per database write and opening
> > and closing dB connections per write? Just spawn a few threads (5 to 10,
> > if client cpu is not maxed keep increasing it) and send upserts in a for
> > loop reusing preparedStatement and connections.
> >
> > With a cluster that size I would expect seeing tens of thousands of
> > writes per second.
> >
> > Finally have you checked that all RS receive same traffic ?
> >
> > On Thu, 12 Jul 2018, 23:10 Pedro Boado,  > > wrote:
> >
> > I believe it's related to your client code - In our use case we do
> > easily 15k writes/sec in a cluster lower specced than yours.
> >
> > Check that your jdbc connection has autocommit off so Phoenix can
> > batch writes and that table has a reasonable UPDATE_CACHE_FREQUENCY
> > ( more than 6 ).
> >
> >
> > On Thu, 12 Jul 2018, 21:54 alchemist,  > > wrote:
> >
> > Thanks a lot for your help.
> > Our test is inserting new rows individually. For our use case,
> > we are
> > benchmarking that we could be able to get 10,000 new rows in a
> > minute, using
> > a cluster of writers if needed.
> > When executing the inserts with Phoenix API (UPSERT) we have
> > been able to
> > get up to 6,000 new rows per minute.
> >
> > We changed our test to perform the inserts individually using
> > the HBase API
> > (Put) rather than Phoenix API (UPSERT) and got an improvement of
> > more than
> > 10x. (up to 60,000 rows per minute).
> >
> > What would explain this difference? I assume that in both cases
> > HBase must
> > grab the locks individually in the same way.
> >
> >
> >
> > --
> > Sent from:
> http://apache-phoenix-user-list.1124778.n5.nabble.com/
> >
>


Re: Upsert is EXTREMELY slow

2018-07-13 Thread Josh Elser

Also, they're relying on Phoenix to do secondary index updates for them.

Obviously, you can do this faster than Phoenix can if you know the exact 
use-case.


On 7/12/18 6:31 PM, Pedro Boado wrote:
A tip for performance is reusing the same preparedStatement , just 
clearParameters() , set values and executeUpdate() over and over again. 
Don't close the statement or connections after each upsert. Also, I 
haven't seen any noticeable benefit on using jdbc batches as Phoenix 
controls batching by when commit() is called.


Keep an eye on not calling commit after every executeUpdate (that's a 
real performance killer) . Batch commits in every ~1k upserts .


Also that attempt of asynchronous code is probably another performance 
killer. Are you creating a new Runnable per database write and opening 
and closing dB connections per write? Just spawn a few threads (5 to 10, 
if client cpu is not maxed keep increasing it) and send upserts in a for 
loop reusing preparedStatement and connections.


With a cluster that size I would expect seeing tens of thousands of 
writes per second.


Finally have you checked that all RS receive same traffic ?

On Thu, 12 Jul 2018, 23:10 Pedro Boado, > wrote:


I believe it's related to your client code - In our use case we do
easily 15k writes/sec in a cluster lower specced than yours.

Check that your jdbc connection has autocommit off so Phoenix can
batch writes and that table has a reasonable UPDATE_CACHE_FREQUENCY 
( more than 6 ).



On Thu, 12 Jul 2018, 21:54 alchemist, mailto:alchemistsrivast...@gmail.com>> wrote:

Thanks a lot for your help.
Our test is inserting new rows individually. For our use case,
we are
benchmarking that we could be able to get 10,000 new rows in a
minute, using
a cluster of writers if needed.
When executing the inserts with Phoenix API (UPSERT) we have
been able to
get up to 6,000 new rows per minute.

We changed our test to perform the inserts individually using
the HBase API
(Put) rather than Phoenix API (UPSERT) and got an improvement of
more than
10x. (up to 60,000 rows per minute).

What would explain this difference? I assume that in both cases
HBase must
grab the locks individually in the same way.



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/