Re: Upsert is EXTREMELY slow

2018-07-13 Thread alchemist
Thanks so much for your response.

Now I am getting better perforamnce i.e 15K per minute,  I made two changes. 
I disabled pheonix transaction.


  phoenix.transactions.enabled
  false


And I removed the transaction connection.commit();  Logically this should
not make any difference because by default transactions are disabled.  



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/


Re: Upsert is EXTREMELY slow

2018-07-13 Thread Josh Elser
Sorry, I was brief and didn't get my point across. I meant to say the 
same thing you did.


Someone manually submitting two updates to an index is naively faster 
that what Phoenix goes through to automatically (and safely) do this.


On 7/13/18 12:07 PM, James Taylor wrote:
Phoenix won’t be slower to update secondary indexes than a use case 
would be. Both have to do the writes to a second table to keep it in sync.


On Fri, Jul 13, 2018 at 8:39 AM Josh Elser > wrote:


Also, they're relying on Phoenix to do secondary index updates for them.

Obviously, you can do this faster than Phoenix can if you know the
exact
use-case.

On 7/12/18 6:31 PM, Pedro Boado wrote:
 > A tip for performance is reusing the same preparedStatement , just
 > clearParameters() , set values and executeUpdate() over and over
again.
 > Don't close the statement or connections after each upsert. Also, I
 > haven't seen any noticeable benefit on using jdbc batches as Phoenix
 > controls batching by when commit() is called.
 >
 > Keep an eye on not calling commit after every executeUpdate
(that's a
 > real performance killer) . Batch commits in every ~1k upserts .
 >
 > Also that attempt of asynchronous code is probably another
performance
 > killer. Are you creating a new Runnable per database write and
opening
 > and closing dB connections per write? Just spawn a few threads (5
to 10,
 > if client cpu is not maxed keep increasing it) and send upserts
in a for
 > loop reusing preparedStatement and connections.
 >
 > With a cluster that size I would expect seeing tens of thousands of
 > writes per second.
 >
 > Finally have you checked that all RS receive same traffic ?
 >
 > On Thu, 12 Jul 2018, 23:10 Pedro Boado, mailto:pedro.bo...@gmail.com>
 > >> wrote:
 >
 >     I believe it's related to your client code - In our use case
we do
 >     easily 15k writes/sec in a cluster lower specced than yours.
 >
 >     Check that your jdbc connection has autocommit off so Phoenix can
 >     batch writes and that table has a reasonable
UPDATE_CACHE_FREQUENCY
 >     ( more than 6 ).
 >
 >
 >     On Thu, 12 Jul 2018, 21:54 alchemist,
mailto:alchemistsrivast...@gmail.com>
 >     >> wrote:
 >
 >         Thanks a lot for your help.
 >         Our test is inserting new rows individually. For our use
case,
 >         we are
 >         benchmarking that we could be able to get 10,000 new rows
in a
 >         minute, using
 >         a cluster of writers if needed.
 >         When executing the inserts with Phoenix API (UPSERT) we have
 >         been able to
 >         get up to 6,000 new rows per minute.
 >
 >         We changed our test to perform the inserts individually using
 >         the HBase API
 >         (Put) rather than Phoenix API (UPSERT) and got an
improvement of
 >         more than
 >         10x. (up to 60,000 rows per minute).
 >
 >         What would explain this difference? I assume that in both
cases
 >         HBase must
 >         grab the locks individually in the same way.
 >
 >
 >
 >         --
 >         Sent from:
http://apache-phoenix-user-list.1124778.n5.nabble.com/
 >



Re: Upsert is EXTREMELY slow

2018-07-13 Thread James Taylor
Phoenix won’t be slower to update secondary indexes than a use case would
be. Both have to do the writes to a second table to keep it in sync.

On Fri, Jul 13, 2018 at 8:39 AM Josh Elser  wrote:

> Also, they're relying on Phoenix to do secondary index updates for them.
>
> Obviously, you can do this faster than Phoenix can if you know the exact
> use-case.
>
> On 7/12/18 6:31 PM, Pedro Boado wrote:
> > A tip for performance is reusing the same preparedStatement , just
> > clearParameters() , set values and executeUpdate() over and over again.
> > Don't close the statement or connections after each upsert. Also, I
> > haven't seen any noticeable benefit on using jdbc batches as Phoenix
> > controls batching by when commit() is called.
> >
> > Keep an eye on not calling commit after every executeUpdate (that's a
> > real performance killer) . Batch commits in every ~1k upserts .
> >
> > Also that attempt of asynchronous code is probably another performance
> > killer. Are you creating a new Runnable per database write and opening
> > and closing dB connections per write? Just spawn a few threads (5 to 10,
> > if client cpu is not maxed keep increasing it) and send upserts in a for
> > loop reusing preparedStatement and connections.
> >
> > With a cluster that size I would expect seeing tens of thousands of
> > writes per second.
> >
> > Finally have you checked that all RS receive same traffic ?
> >
> > On Thu, 12 Jul 2018, 23:10 Pedro Boado,  > > wrote:
> >
> > I believe it's related to your client code - In our use case we do
> > easily 15k writes/sec in a cluster lower specced than yours.
> >
> > Check that your jdbc connection has autocommit off so Phoenix can
> > batch writes and that table has a reasonable UPDATE_CACHE_FREQUENCY
> > ( more than 6 ).
> >
> >
> > On Thu, 12 Jul 2018, 21:54 alchemist,  > > wrote:
> >
> > Thanks a lot for your help.
> > Our test is inserting new rows individually. For our use case,
> > we are
> > benchmarking that we could be able to get 10,000 new rows in a
> > minute, using
> > a cluster of writers if needed.
> > When executing the inserts with Phoenix API (UPSERT) we have
> > been able to
> > get up to 6,000 new rows per minute.
> >
> > We changed our test to perform the inserts individually using
> > the HBase API
> > (Put) rather than Phoenix API (UPSERT) and got an improvement of
> > more than
> > 10x. (up to 60,000 rows per minute).
> >
> > What would explain this difference? I assume that in both cases
> > HBase must
> > grab the locks individually in the same way.
> >
> >
> >
> > --
> > Sent from:
> http://apache-phoenix-user-list.1124778.n5.nabble.com/
> >
>


Re: Upsert is EXTREMELY slow

2018-07-13 Thread Josh Elser

Also, they're relying on Phoenix to do secondary index updates for them.

Obviously, you can do this faster than Phoenix can if you know the exact 
use-case.


On 7/12/18 6:31 PM, Pedro Boado wrote:
A tip for performance is reusing the same preparedStatement , just 
clearParameters() , set values and executeUpdate() over and over again. 
Don't close the statement or connections after each upsert. Also, I 
haven't seen any noticeable benefit on using jdbc batches as Phoenix 
controls batching by when commit() is called.


Keep an eye on not calling commit after every executeUpdate (that's a 
real performance killer) . Batch commits in every ~1k upserts .


Also that attempt of asynchronous code is probably another performance 
killer. Are you creating a new Runnable per database write and opening 
and closing dB connections per write? Just spawn a few threads (5 to 10, 
if client cpu is not maxed keep increasing it) and send upserts in a for 
loop reusing preparedStatement and connections.


With a cluster that size I would expect seeing tens of thousands of 
writes per second.


Finally have you checked that all RS receive same traffic ?

On Thu, 12 Jul 2018, 23:10 Pedro Boado, > wrote:


I believe it's related to your client code - In our use case we do
easily 15k writes/sec in a cluster lower specced than yours.

Check that your jdbc connection has autocommit off so Phoenix can
batch writes and that table has a reasonable UPDATE_CACHE_FREQUENCY 
( more than 6 ).



On Thu, 12 Jul 2018, 21:54 alchemist, mailto:alchemistsrivast...@gmail.com>> wrote:

Thanks a lot for your help.
Our test is inserting new rows individually. For our use case,
we are
benchmarking that we could be able to get 10,000 new rows in a
minute, using
a cluster of writers if needed.
When executing the inserts with Phoenix API (UPSERT) we have
been able to
get up to 6,000 new rows per minute.

We changed our test to perform the inserts individually using
the HBase API
(Put) rather than Phoenix API (UPSERT) and got an improvement of
more than
10x. (up to 60,000 rows per minute).

What would explain this difference? I assume that in both cases
HBase must
grab the locks individually in the same way.



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/



Re: Upsert is EXTREMELY slow

2018-07-12 Thread Pedro Boado
A tip for performance is reusing the same preparedStatement , just
clearParameters() , set values and executeUpdate() over and over again.
Don't close the statement or connections after each upsert. Also, I haven't
seen any noticeable benefit on using jdbc batches as Phoenix controls
batching by when commit() is called.

Keep an eye on not calling commit after every executeUpdate (that's a real
performance killer) . Batch commits in every ~1k upserts .

Also that attempt of asynchronous code is probably another performance
killer. Are you creating a new Runnable per database write and opening and
closing dB connections per write? Just spawn a few threads (5 to 10, if
client cpu is not maxed keep increasing it) and send upserts in a for loop
reusing preparedStatement and connections.

With a cluster that size I would expect seeing tens of thousands of writes
per second.

Finally have you checked that all RS receive same traffic ?

On Thu, 12 Jul 2018, 23:10 Pedro Boado,  wrote:

> I believe it's related to your client code - In our use case we do easily
> 15k writes/sec in a cluster lower specced than yours.
>
> Check that your jdbc connection has autocommit off so Phoenix can batch
> writes and that table has a reasonable UPDATE_CACHE_FREQUENCY  ( more than
> 6 ).
>
>
> On Thu, 12 Jul 2018, 21:54 alchemist, 
> wrote:
>
>> Thanks a lot for your help.
>> Our test is inserting new rows individually. For our use case, we are
>> benchmarking that we could be able to get 10,000 new rows in a minute,
>> using
>> a cluster of writers if needed.
>> When executing the inserts with Phoenix API (UPSERT) we have been able to
>> get up to 6,000 new rows per minute.
>>
>> We changed our test to perform the inserts individually using the HBase
>> API
>> (Put) rather than Phoenix API (UPSERT) and got an improvement of more than
>> 10x. (up to 60,000 rows per minute).
>>
>> What would explain this difference? I assume that in both cases HBase must
>> grab the locks individually in the same way.
>>
>>
>>
>> --
>> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>>
>


Re: Upsert is EXTREMELY slow

2018-07-12 Thread Pedro Boado
I believe it's related to your client code - In our use case we do easily
15k writes/sec in a cluster lower specced than yours.

Check that your jdbc connection has autocommit off so Phoenix can batch
writes and that table has a reasonable UPDATE_CACHE_FREQUENCY  ( more than
6 ).


On Thu, 12 Jul 2018, 21:54 alchemist,  wrote:

> Thanks a lot for your help.
> Our test is inserting new rows individually. For our use case, we are
> benchmarking that we could be able to get 10,000 new rows in a minute,
> using
> a cluster of writers if needed.
> When executing the inserts with Phoenix API (UPSERT) we have been able to
> get up to 6,000 new rows per minute.
>
> We changed our test to perform the inserts individually using the HBase API
> (Put) rather than Phoenix API (UPSERT) and got an improvement of more than
> 10x. (up to 60,000 rows per minute).
>
> What would explain this difference? I assume that in both cases HBase must
> grab the locks individually in the same way.
>
>
>
> --
> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>


Re: Upsert is EXTREMELY slow

2018-07-12 Thread alchemist
Thanks a lot for your help.
Our test is inserting new rows individually. For our use case, we are
benchmarking that we could be able to get 10,000 new rows in a minute, using
a cluster of writers if needed.
When executing the inserts with Phoenix API (UPSERT) we have been able to
get up to 6,000 new rows per minute.

We changed our test to perform the inserts individually using the HBase API
(Put) rather than Phoenix API (UPSERT) and got an improvement of more than
10x. (up to 60,000 rows per minute).

What would explain this difference? I assume that in both cases HBase must
grab the locks individually in the same way.



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/


Re: Upsert is EXTREMELY slow

2018-07-12 Thread Josh Elser
HBase must grab a lock for the row which is being updated. Normally, for 
a batch of updates sent to a region server, the RS will grab as many row 
locks as it can at once. If you only send one row to update at a time, 
you obviously get no amortization.


It's just the normal semantics of batching which you are completely 
missing out on. There are multiple manifestations of this. Row-locks are 
just one (network overhead, serialization, and rpc scheduling/execution 
are three others I can easily see)


On 7/11/18 4:10 PM, alchemist wrote:

Josh Elser-2 wrote

Josh thanks so much for all your help.  I do not understand why you
"However, you are still fighting yourself when you have threads all trying
to grab the same lock to write their data."  My understanding is if we
have many processes running many threads, the data will be logged into WAL
as they come then flushed to the disk.  Where is the contention in this
case? Do we have one write per region/region server?



On 7/11/18 11:33 AM, alchemist wrote:

Thanks so much Josh!  I am unable to understand why performance is
extremely
slow.

1.  If I perform update using PreparedStatement addBatch and executeBatch
then I get nearly 6000 transactions per minute.

2.  But in our case we need to save each transaction so cannot perform
update batch,  so I am using PreparedStatement executeQuery and commit()
getting nearly 100 transactions per minute.

These numbers seems extremely slow,  therefore I am wondering I am doing
something very incorrect.



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/







--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/



Re: Upsert is EXTREMELY slow

2018-07-12 Thread Josh Elser
Phoenix does not recommend connection pooling because Phoenix 
Connections are not expensive to create as most DB connections are.


The first connection you make from a JVM is expensive. Every subsequent 
one is cheap.


On 7/11/18 2:55 PM, alchemist wrote:

Since Phoenix does not recommend connection pooling.  Then even if we have
multiple threads and processes, each thread will take time to get connection
to Phoenix server, execute upsert operation then commit the operation.  This
whole process will take sometime.  And no matter what level of threading we
have creating connection will take time.  Not sure if my understanding is
correct.



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/



Re: Upsert is EXTREMELY slow

2018-07-11 Thread alchemist
Josh Elser-2 wrote
> Josh thanks so much for all your help.  I do not understand why you
> "However, you are still fighting yourself when you have threads all trying
> to grab the same lock to write their data."  My understanding is if we
> have many processes running many threads, the data will be logged into WAL
> as they come then flushed to the disk.  Where is the contention in this
> case? Do we have one write per region/region server?
> 
> 
> 
> On 7/11/18 11:33 AM, alchemist wrote:
>> Thanks so much Josh!  I am unable to understand why performance is
>> extremely
>> slow.
>> 
>> 1.  If I perform update using PreparedStatement addBatch and executeBatch
>> then I get nearly 6000 transactions per minute.
>> 
>> 2.  But in our case we need to save each transaction so cannot perform
>> update batch,  so I am using PreparedStatement executeQuery and commit()
>> getting nearly 100 transactions per minute.
>> 
>> These numbers seems extremely slow,  therefore I am wondering I am doing
>> something very incorrect.
>> 
>> 
>> 
>> --
>> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>>





--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/


Re: Upsert is EXTREMELY slow

2018-07-11 Thread alchemist
Since Phoenix does not recommend connection pooling.  Then even if we have
multiple threads and processes, each thread will take time to get connection
to Phoenix server, execute upsert operation then commit the operation.  This
whole process will take sometime.  And no matter what level of threading we
have creating connection will take time.  Not sure if my understanding is
correct.



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/


Re: Upsert is EXTREMELY slow

2018-07-11 Thread alchemist
Thanks Josh!  My use case is very simple,  I have data in S3 that is backed
by Hbase in EMR.  I need to access this data from outside EMR cluster, in
this case I created three EC2 nodes that executes multithreaded Java program
in parallel.  These multithreaded program from different EC2 nodes are
trying to get the connection using Phoenix, call upsert to insert data and
commit.

The issue is even though I have 3 nodes running multithreaded program that
inserts data into Hbase is not giving me 10K per minute transactions.



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/


Re: Upsert is EXTREMELY slow

2018-07-11 Thread Josh Elser
Your real-world situation is not a single-threaded application, is it? 
You will have multiple threads which are all updating Phoenix concurrently.


Given the semantics that your application needs from the requirements 
you stated, I'm not sure what else you can do differently. You can get 
low-latency out of HBase, but that's at the cost of throughput (not a 
unique characteristic of HBase).


Denormalizing your tables will reduce the amount of work each update 
will have to execute. Every secondary index is another update that needs 
to be executed to satisfy your UPSERT.


Updates in HBase go in memory and to the WAL. New updates to HBase are 
blocked when the memstore fills up and needs to flush to disk. Thus, it 
is optimal to keep flush-times short so that you don't have many threads 
blocked. However, you are still fighting yourself when you have threads 
all trying to grab the same lock to write their data.


You can also try reaching out to your vendor (EMR) to see what other 
tunings they recommend. I don't know what this architecture looks like.


On 7/11/18 11:33 AM, alchemist wrote:

Thanks so much Josh!  I am unable to understand why performance is extremely
slow.

1.  If I perform update using PreparedStatement addBatch and executeBatch
then I get nearly 6000 transactions per minute.

2.  But in our case we need to save each transaction so cannot perform
update batch,  so I am using PreparedStatement executeQuery and commit()
getting nearly 100 transactions per minute.

These numbers seems extremely slow,  therefore I am wondering I am doing
something very incorrect.



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/



Re: Upsert is EXTREMELY slow

2018-07-11 Thread alchemist
Thanks so much Josh!  I am unable to understand why performance is extremely
slow.  

1.  If I perform update using PreparedStatement addBatch and executeBatch
then I get nearly 6000 transactions per minute.

2.  But in our case we need to save each transaction so cannot perform
update batch,  so I am using PreparedStatement executeQuery and commit()
getting nearly 100 transactions per minute.

These numbers seems extremely slow,  therefore I am wondering I am doing
something very incorrect.



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/


Re: Upsert is EXTREMELY slow

2018-07-11 Thread Josh Elser

Some thoughts:

* Please _remove_ commented lines before sharing configuration next 
time. We don't need to see all of the things you don't have set :)


* 100 salt buckets is really excessive for a 4 node cluster. Salt 
buckets are not synonymous with pre-splitting HBase tables. This many 
salt buckets will have a negative impact on the read performance of your 
cluster.


* m4.10xlarge should be more than enough to run HBase extremely well.

* You seem to have set your HBase heapsize to be 1KB? I can't imagine 
that HBase would even run with that, so I'm left to assume that you're 
using the Java default heap size of 1GB instead.


export HBASE_HEAPSIZE=1024

You probably want to see what the heapsize your regionservers are 
running with and validate that..


On 7/11/18 10:53 AM, alchemist wrote:

Thanks so much for all your help and response.  Here are the recommended
details.

I have deployed Hbase backed by S3 on EMR cluster.  And performing upsert
from EC2 node outside EMR using simple java multithreaded client to perform
atomic update.

I have created a table like following

CREATE TABLE IF NOT EXISTS VBQL_PHOENIX_TRANSCRIPT ( PK VARCHAR NOT NULL
PRIMARY KEY, IMMUTABLES.VBMETAJSON VARCHAR, IMMUTABLES.ACCOUNTID VARCHAR,
IMMUTABLES.DATECREATED VARCHAR, IMMUTABLES.DATEFINISHED VARCHAR,
IMMUTABLES.MEDIAID VARCHAR, IMMUTABLES.JOBID VARCHAR, IMMUTABLES.STATUS
VARCHAR, UPDATABLE.METADATA VARCHAR, CATEGORIES.C_ACOUNTID_CATEGORYNAME
VARCHAR, COMPUTED.ADDITIONALMETRICS VARCHAR) SALT_BUCKETS =100;

with secondary index like this:

CREATE INDEX  VBQL_PHOENIX_TRANSCRIPT_INDEX5  ON  VBQL_PHOENIX_TRANSCRIPT5
(IMMUTABLES.MEDIAID) ;


Sample Upsert
UPSERT INTO VBQL_PHOENIX_TRANSCRIPT2  ( PK , IMMUTABLES.ACCOUNTID ,
IMMUTABLES.DATECREATED , IMMUTABLES.DATEFINISHED ,
IMMUTABLES.MEDIAID , IMMUTABLES.JOBID , IMMUTABLES.STATUS  )
VALUES
('5DAD32BA-9656-41F3-BD38-BBF890B85CD62018-05-18T18:09:38.6075D681A95C-8CDA-47B2-93BE-C165B1DEC7D8',
'AAA5DAD32BA-9656',
'2018-04-18T18:09:38.607+',
'2018-05-18T18:09:38.607+','5D681A95C-8CDA-47B2-93BE-C165B1DEC7D8',
'JOB123', 'FINISHED');


HBASE IS INSTALLED ON EMR CLUSTER HERE TABLE IS CREATED USING ABOVE CREATE
TABLE COMMANDS

EMR Cluster is 4 node m4.4xlarge cluster (32 vCore, 64 GiB memory, EBS only
storage
EBS Storage:32 GiB)

Client is a Java program running in EC2 (m4.10xlarge m4.10xlarge40 CPU  
160
RAM 10 GiB Network EBS Only 10 Gbps 4,000 Mbps) Client is a multithreaded
program that creates atomic connection to Hbase and performs inserts.


CLIENT hbase-site.xml looks like following:





   
   hbase.regionserver.wal.codec
  
org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec



   
 hbase.cluster.distributed
 true
   

   
 hbase.zookeeper.quorum
 10.16.129.55
   

   
 hbase.rootdir
 s3://dev-mock-transcription/
   

   
 dfs.support.append
 true
   

   
 hbase.rest.port
 8070
   


   
 hbase.replication
 false
   

   
 hbase.balancer.tablesOnMaster
 hbase:meta
   

   
 hbase.bucketcache.size
 8192
   

   
 hbase.master.balancer.uselocality
 false
   

   
 hbase.master.startup.retainassign
 false
   

   
 hbase.wal.dir
 hdfs://10.16.129.55:8020/user/hbase/WAL
   

   
 hbase.bulkload.retries.retryOnIOException
 true
   

   
 hbase.bucketcache.ioengine
 files:/mnt1/hbase/bucketcache
   


   hbase.rpc.timeout
   180
 


   
   phoenix.query.timeoutMs
   1800
 
  
   phbase.regionserver.lease.period
   1800
 

   
   hbase.client.scanner.caching
   18
 

   
   phbase.client.scanner.timeout.period
   1800
 

  
   index.writer.threads.max
   30
  
  
   index.builder.threads.max
   30
  
  
   phoenix.query.threadPoolSize
   256
  
  
   index.builder.threads.keepalivetime
   9
  

   phoenix.query.timeoutMs
   9
  






HBASE ENV LOOKS LIKE FOLLOWING:

[ec2-user@ip-10-16-129-55 conf]$ cat hbase-env.sh
#
#/**
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements.  See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership.  The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License.  You may obtain a copy of the License at
# *
# * http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */

# Set environment variables here.

# This script 

Re: Upsert is EXTREMELY slow

2018-07-11 Thread alchemist
Thanks so much for all your help and response.  Here are the recommended
details.

I have deployed Hbase backed by S3 on EMR cluster.  And performing upsert
from EC2 node outside EMR using simple java multithreaded client to perform
atomic update.

I have created a table like following

CREATE TABLE IF NOT EXISTS VBQL_PHOENIX_TRANSCRIPT ( PK VARCHAR NOT NULL
PRIMARY KEY, IMMUTABLES.VBMETAJSON VARCHAR, IMMUTABLES.ACCOUNTID VARCHAR,
IMMUTABLES.DATECREATED VARCHAR, IMMUTABLES.DATEFINISHED VARCHAR,
IMMUTABLES.MEDIAID VARCHAR, IMMUTABLES.JOBID VARCHAR, IMMUTABLES.STATUS
VARCHAR, UPDATABLE.METADATA VARCHAR, CATEGORIES.C_ACOUNTID_CATEGORYNAME
VARCHAR, COMPUTED.ADDITIONALMETRICS VARCHAR) SALT_BUCKETS =100;

with secondary index like this:

CREATE INDEX  VBQL_PHOENIX_TRANSCRIPT_INDEX5  ON  VBQL_PHOENIX_TRANSCRIPT5
(IMMUTABLES.MEDIAID) ;


Sample Upsert
UPSERT INTO VBQL_PHOENIX_TRANSCRIPT2  ( PK , IMMUTABLES.ACCOUNTID ,
IMMUTABLES.DATECREATED , IMMUTABLES.DATEFINISHED ,
IMMUTABLES.MEDIAID , IMMUTABLES.JOBID , IMMUTABLES.STATUS  ) 
VALUES
('5DAD32BA-9656-41F3-BD38-BBF890B85CD62018-05-18T18:09:38.6075D681A95C-8CDA-47B2-93BE-C165B1DEC7D8',
'AAA5DAD32BA-9656', 
'2018-04-18T18:09:38.607+',
'2018-05-18T18:09:38.607+','5D681A95C-8CDA-47B2-93BE-C165B1DEC7D8',
'JOB123', 'FINISHED');


HBASE IS INSTALLED ON EMR CLUSTER HERE TABLE IS CREATED USING ABOVE CREATE
TABLE COMMANDS

EMR Cluster is 4 node m4.4xlarge cluster (32 vCore, 64 GiB memory, EBS only
storage
EBS Storage:32 GiB)

Client is a Java program running in EC2 (m4.10xlarge m4.10xlarge40 CPU  
160 
RAM 10 GiB Network EBS Only 10 Gbps 4,000 Mbps) Client is a multithreaded
program that creates atomic connection to Hbase and performs inserts.


CLIENT hbase-site.xml looks like following:





  
  hbase.regionserver.wal.codec
 
org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec


  
hbase.cluster.distributed
true
  

  
hbase.zookeeper.quorum
10.16.129.55
  

  
hbase.rootdir
s3://dev-mock-transcription/
  

  
dfs.support.append
true
  

  
hbase.rest.port
8070
  


  
hbase.replication
false
  

  
hbase.balancer.tablesOnMaster
hbase:meta
  

  
hbase.bucketcache.size
8192
  

  
hbase.master.balancer.uselocality
false
  

  
hbase.master.startup.retainassign
false
  

  
hbase.wal.dir
hdfs://10.16.129.55:8020/user/hbase/WAL
  

  
hbase.bulkload.retries.retryOnIOException
true
  

  
hbase.bucketcache.ioengine
files:/mnt1/hbase/bucketcache
  

   
  hbase.rpc.timeout
  180



  
  phoenix.query.timeoutMs
  1800

 
  phbase.regionserver.lease.period
  1800


  
  hbase.client.scanner.caching
  18


  
  phbase.client.scanner.timeout.period
  1800


 
  index.writer.threads.max
  30
 
 
  index.builder.threads.max
  30
 
 
  phoenix.query.threadPoolSize
  256
 
 
  index.builder.threads.keepalivetime
  9
 

  phoenix.query.timeoutMs
  9
 






HBASE ENV LOOKS LIKE FOLLOWING:

[ec2-user@ip-10-16-129-55 conf]$ cat hbase-env.sh 
#
#/**
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements.  See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership.  The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License.  You may obtain a copy of the License at
# *
# * http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */

# Set environment variables here.

# This script sets variables multiple times over the course of starting an
hbase process,
# so try to keep things idempotent unless you want to take an even deeper
look
# into the startup scripts (bin/hbase, etc.)

# The java implementation to use.  Java 1.7+ required.
# export JAVA_HOME=/usr/java/jdk1.6.0/

# Extra Java CLASSPATH elements.  Optional.
export HBASE_CLASSPATH=/etc/hadoop/conf

# The maximum amount of heap to use. Default is left to JVM default.
# export HBASE_HEAPSIZE=1G
export HBASE_HEAPSIZE=1024

# Uncomment below if you intend to use off heap cache. For example, to
allocate 8G of 
# offheap, set the value to "8G".
# export HBASE_OFFHEAPSIZE=1G

# Extra Java runtime options.
# Below are what we set by default.  May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="$HBASE_OPTS 

Re: Upsert is EXTREMELY slow

2018-07-11 Thread Josh Elser
The explain plan for your tables isn't a substitute for the DDLs. Please 
provide those.


How about sharing your completely hbase-site.xml and hbase-env.sh files, 
rather than just snippets like you have. A full picture is often needed.


Given that HBase cannot directly run on S3, please also describe how 
your have this setup functioning. Do you have an HDFS instance running 
for the HBase WALs? Finally, assuming that you're using EC2 if you're 
using S3, what kind of nodes do you have?


As a courtesy, please take the time to consolidate your thoughts into 
one email instead of sending multiple in rapid succession like this. 
Remember that Apache communities are volunteer based.


On 7/11/18 7:53 AM, alchemist wrote:


I have written a threaded program to batch upsert data into Phoenix.  I am
using Phoenix because of secondary index capability.  I am getting extremely
slow performance in write.

Explain query looks like following

0: jdbc:phoenix:localhost:2181:/hbase> EXPLAIN select count(1) from
VBQL_PHOENIX_TRANSCRIPT5
. . . . . . . . . . . . . . . . . . .> ;
+---+-+--+
| PLAN
| EST_BYTES_READ  | EST_ |
+---+-+--+
| CLIENT 100-CHUNK 6838539 ROWS 314572800 BYTES PARALLEL 100-WAY FULL SCAN
OVER VBQL_PHOENIX_TRANSCRIPT_INDEX5  | 314572800   | 6838 |
| SERVER FILTER BY FIRST KEY ONLY
| 314572800   | 6838 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800   | 6838 |
+---+-+--+

I have secondary index that looks like this:

0: jdbc:phoenix:localhost:2181:/hbase> EXPLAIN select count(1) from
VBQL_PHOENIX_TRANSCRIPT_INDEX5;
+---+-+--+
| PLAN
| EST_BYTES_READ  | EST_ |
+---+-+--+
| CLIENT 100-CHUNK 6838539 ROWS 314572800 BYTES PARALLEL 100-WAY FULL SCAN
OVER VBQL_PHOENIX_TRANSCRIPT_INDEX5  | 314572800   | 6838 |
| SERVER FILTER BY FIRST KEY ONLY
| 314572800   | 6838 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800   | 6838 |
+---+-+--+


Tuning Settings used in HBase.  Problem is it is very hard to scale, I tried
adding more nodes to Hbase cluster, and I also tried adding more threads to
the client program but it is not scaling beyond 6K per minutes, which is
VERY SLOW.  Any help is greatly appreciated.

  
   index.writer.threads.max
   30
  
  
   index.builder.threads.max
   30
  
  
   phoenix.query.threadPoolSize
   256
  
  
   index.builder.threads.keepalivetime
   9
  

   phoenix.query.timeoutMs
   9
  

  Rest of the settings I am using are default.

Source Code  (Simple multithreded  with write need to be saved individually/
we cannot use batch  write)

  public void generateAndInsertDataToHbase(int iterationCount ){
try {
 int iterations =0;
 Instant start = Instant.now();
 ExecutorService executorService = 
Executors.newCachedThreadPool();
  while (true) {
if (iterations == iterationCount) {
verifyRowCountAndExit(start,executorService);
}
Future future = executorService.submit(new 
Callable() {
public Void call() throws Exception {
List vbqlList =
VbqlUtils.generateRandomAccountVbql(accountCategoryMap);
if(vbqlList.size() >0) {
addVbqlAccountToHbaseAsBatch( vbqlList,connStr);
}
return null;
}
});
future.get();
iterations ++;
  }
}catch(Exception e) {
e.printStackTrace();
}
  }
  


  public void addVbqlAccountToHbaseAsBatch(List vbqlList, String
connStr) throws Exception{
 Connection connection = null;
 try {
 connection = DBUtils.initSingleConnection(connStr);
// connection.setAutoCommit(false);
 if(vbqlList.size() >0) {
for (VbqlBean vbqlBean : vbqlList) {
DBUtils.executeSingleVbqlQuery( connStr,  vbqlBean, 
tableName);
}
 }
 connection.commit();
 } finally {

Re: Upsert is EXTREMELY slow

2018-07-11 Thread alchemist
Is there any configurations I am missing?



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/


Re: Upsert is EXTREMELY slow

2018-07-11 Thread alchemist
I am doing a POC where I am using Pheonix for single write that updates the
database after the write.  So I cannot batch update the write.  I am getting
1 transaction per second as TPS.  I have 3 node EMR cluster.  I am using
HBase with S3 as backend.  

I tried tuning parameter I found online I created threaded app but still
performance is extemely slow.



--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/


Re: Upsert is EXTREMELY slow

2018-07-11 Thread alchemist
I have also tried adding cache frequency.


  phoenix.default.update.cache.frequency
  30
 http://apache-phoenix-user-list.1124778.n5.nabble.com/


Upsert is EXTREMELY slow

2018-07-11 Thread alchemist


I have written a threaded program to batch upsert data into Phoenix.  I am
using Phoenix because of secondary index capability.  I am getting extremely
slow performance in write.

Explain query looks like following

0: jdbc:phoenix:localhost:2181:/hbase> EXPLAIN select count(1) from
VBQL_PHOENIX_TRANSCRIPT5
. . . . . . . . . . . . . . . . . . .> ;
+---+-+--+
| PLAN  
   
| EST_BYTES_READ  | EST_ |
+---+-+--+
| CLIENT 100-CHUNK 6838539 ROWS 314572800 BYTES PARALLEL 100-WAY FULL SCAN
OVER VBQL_PHOENIX_TRANSCRIPT_INDEX5  | 314572800   | 6838 |
| SERVER FILTER BY FIRST KEY ONLY   
   
| 314572800   | 6838 |
| SERVER AGGREGATE INTO SINGLE ROW  
   
| 314572800   | 6838 |
+---+-+--+

I have secondary index that looks like this:

0: jdbc:phoenix:localhost:2181:/hbase> EXPLAIN select count(1) from
VBQL_PHOENIX_TRANSCRIPT_INDEX5;
+---+-+--+
| PLAN  
   
| EST_BYTES_READ  | EST_ |
+---+-+--+
| CLIENT 100-CHUNK 6838539 ROWS 314572800 BYTES PARALLEL 100-WAY FULL SCAN
OVER VBQL_PHOENIX_TRANSCRIPT_INDEX5  | 314572800   | 6838 |
| SERVER FILTER BY FIRST KEY ONLY   
   
| 314572800   | 6838 |
| SERVER AGGREGATE INTO SINGLE ROW  
   
| 314572800   | 6838 |
+---+-+--+


Tuning Settings used in HBase.  Problem is it is very hard to scale, I tried
adding more nodes to Hbase cluster, and I also tried adding more threads to
the client program but it is not scaling beyond 6K per minutes, which is
VERY SLOW.  Any help is greatly appreciated.

 
  index.writer.threads.max
  30
 
 
  index.builder.threads.max
  30
 
 
  phoenix.query.threadPoolSize
  256
 
 
  index.builder.threads.keepalivetime
  9
 

  phoenix.query.timeoutMs
  9
 

 Rest of the settings I am using are default.

Source Code  (Simple multithreded  with write need to be saved individually/
we cannot use batch  write)

 public void generateAndInsertDataToHbase(int iterationCount ){ 
   try {
 int iterations =0;
 Instant start = Instant.now();
 ExecutorService executorService = 
Executors.newCachedThreadPool();
  while (true) {
if (iterations == iterationCount) {
verifyRowCountAndExit(start,executorService);
}
Future future = executorService.submit(new 
Callable() {
public Void call() throws Exception {
List vbqlList =
VbqlUtils.generateRandomAccountVbql(accountCategoryMap);
if(vbqlList.size() >0) {
addVbqlAccountToHbaseAsBatch( vbqlList,connStr);
}
return null;
}
});
future.get(); 
iterations ++;
  }
   }catch(Exception e) {
   e.printStackTrace();
   }
 }
 

 public void addVbqlAccountToHbaseAsBatch(List vbqlList, String
connStr) throws Exception{
 Connection connection = null;
 try {
connection = DBUtils.initSingleConnection(connStr);
   // connection.setAutoCommit(false);
if(vbqlList.size() >0) {
for (VbqlBean vbqlBean : vbqlList) {
DBUtils.executeSingleVbqlQuery( connStr,  vbqlBean, 
tableName);
}
}
connection.commit();
 } finally {
if (connection != null) try { connection.close(); } catch
(SQLException ignore) {}
}
 }





--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/