Re: unsubscribe

2016-03-28 Thread F21
Send your unsubscribe request to user-unsubscr...@phoenix.apache.org to 
unsubscribe. :)


On 29/03/2016 4:54 PM, Dor Ben Dov wrote:


This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement, you may 
review at http://www.amdocs.com/email_disclaimer.asp 




Exception while trying to connect to Phoenix client

2016-03-28 Thread Chagarlamudi, Prasanth
Hello,
We just upgraded phoenix to 4.6 from 4.5.3. Trying to start the client and I 
see this exception.
Not sure if I am missing anything here. Any suggestions/help is greatly 
appreciated.


Error: org.apache.phoenix.exception.PhoenixIOException: 
org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.AbstractMethodError
at 
org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2065)
at 
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.AbstractMethodError
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2278)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at 
org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2034)
... 4 more (state=08000,code=101)
org.apache.phoenix.exception.PhoenixIOException: 
org.apache.phoenix.exception.PhoenixIOException: 
org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.AbstractMethodError
at 
org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2065)
at 
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.AbstractMethodError
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2278)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at 
org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2034)
... 4 more

at 
org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108)
at 
org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:558)
at 
org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48)
at 
org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:84)
at 
org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:111)
at 
org.apache.phoenix.iterate.MergeSortTopNResultIterator.next(MergeSortTopNResultIterator.java:85)
at 
org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:771)
at sqlline.SqlLine.getColumnNames(SqlLine.java:1128)
at sqlline.SqlCompleter.(SqlCompleter.java:81)
at 
sqlline.DatabaseConnection.setCompletions(DatabaseConnection.java:84)
at sqlline.SqlLine.setCompletions(SqlLine.java:1730)
at sqlline.Commands.connect(Commands.java:1066)
at sqlline.Commands.connect(Commands.java:996)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
at sqlline.SqlLine.dispatch(SqlLine.java:804)
at sqlline.SqlLine.initArgs(SqlLine.java:588)
at sqlline.SqlLine.begin(SqlLine.java:656)
at sqlline.SqlLine.start(SqlLine.java:398)
at sqlline.SqlLine.main(SqlLine.java:292)
Caused by: java.util.concurrent.ExecutionException: 
org.apache.phoenix.exception.PhoenixIOException: 
org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.AbstractMethodError
at 
org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2065)
at 
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.AbstractMethodError
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2278)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at 

Re: Speeding Up Group By Queries

2016-03-28 Thread Mujtaba Chohan
Here's the chart for time it takes for each of the parallel scans after
split. On RS where data is not read from disk scan gets back in ~20 secs
but for the RS which has 6 it's ~45 secs.

[image: Inline image 2]

 Yes I see disk reads with 607 ios/second on the hosts that stores 6 regions
>

Two things that you should try to reduce disk reads or maybe a combination
of both 1. Have only the columns used in your group by query in a separate
column family CREATE TABLE T (K integer primary key, GRPBYCF.UNIT_CNT_SOLD
integer, GRPBYCF.TOTAL_SALES integer, GRPBYCF.T_COUNTRY varchar, ...) 2.
Turn on snappy compression for your table ALTER TABLE T SET
COMPRESSION='SNAPPY' followed by a major compaction.

I tried to compact the table from the hbase web UI
>

You need to do *major_compact* from HBase shell. From UI it's minor.

- mujtaba

On Mon, Mar 28, 2016 at 12:32 AM, Amit Shah  wrote:

> Thanks Mujtaba and James for replying back.
>
> Mujtaba, Below are details to your follow up queries
>
> 1. How wide is your table
>
>
> I have 26 columns in the TRANSACTIONS table with a couple of columns
> combined to be marked as a primary key
>
> 2. How many region servers is your data distributed on and what's the heap
>> size?
>
>
> When I posted the initial readings of the query taking around 2 minutes, I
> had one region server storing 4 regions for the 10 mil records TRANSACTIONS
> table. The heap size on the master server is 1 GB while the region server
> has 3.63 GB heap setting.
>
> Later I added 2 more region servers to the cluster and configured them as
> data nodes and region servers. After this step, the regions got split on
> two region servers with the count as 2 on one region server and 6 on
> another. I didn't follow what action caused this region split or was it
> automatically done by hbase (load balancer??)
>
> 3. Do you see lots of disk I/O on region servers during aggregation?
>
>
>  Yes I see disk reads with 607 ios/second on the hosts that stores 6
> regions. Kindly find the disk io statistics attached as images.
>
> 4. Can you try your query after major compacting your table?
>
>
> I tried to compact the table from the hbase web UI. For some reason, the
> compaction table attribute on the web ui is still shown as NONE. After
> these changes, the query time is down to *42 secs. *
> Is compression different from compaction? Would the query performance
> improve by compressing the data by one of the algorithms? Logically it
> doesn't sound right though.
>
> Can you also replace log4j.properties with the attached one and reply back
>> with phoenix.log created by executing your query in sqlline?
>
>
> After replacing the log4j.properties, I have captured the logs for the
> group by query execution and attached.
>
>
> James,
> If I follow the queries that you pasted, I see the index getting used but
> if I try to explain the query plan on the pre-loaded TRANSACTIONS table I
> do not see the index being used. Probably the query plan is changing based
> on whether the table has data or not.
>
> The query time is reduced down to 42 secs right now. Let me know if you
> have more suggestions on to improve it further.
>
> Thanks,
> Amit.
>
> On Sat, Mar 26, 2016 at 4:21 AM, James Taylor 
> wrote:
>
>> Hi Amit,
>> Using 4.7.0-HBase-1.1 release, I see the index being used for that query
>> (see below). An index will help some, as the aggregation can be done in
>> place as the scan over the index is occurring (as opposed to having to hold
>> the distinct values found during grouping in memory per chunk of work and
>> sorting each chunk on the client). It's not going to prevent the entire
>> index from being scanned though. You'll need a WHERE clause to prevent that.
>>
>> 0: jdbc:phoenix:localhost> create table TRANSACTIONS (K integer primary
>> key, UNIT_CNT_SOLD integer, TOTAL_SALES integer, T_COUNTRY varchar);
>> No rows affected (1.32 seconds)
>> 0: jdbc:phoenix:localhost> CREATE INDEX TRANSACTIONS_COUNTRY_INDEX ON
>> TRANSACTIONS (T_COUNTRY) INCLUDE (UNIT_CNT_SOLD, TOTAL_SALES);
>> No rows affected (6.452 seconds)
>> 0: jdbc:phoenix:localhost> explain SELECT SUM(UNIT_CNT_SOLD),
>> SUM(TOTAL_SALES) FROM TRANSACTIONS GROUP BY T_COUNTRY;
>>
>> +--+
>> |   PLAN
>>   |
>>
>> +--+
>> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TRANSACTIONS_COUNTRY_INDEX
>>  |
>> | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["T_COUNTRY"]
>>   |
>> | CLIENT MERGE SORT
>>  |
>>
>> +--+
>> 3 rows selected (0.028 seconds)
>>
>> Thanks,
>> James
>>
>>
>> On Fri, Mar 25, 2016 at 10:37 AM, Mujtaba Chohan 
>> wrote:
>>
>>> That seems excessively slow for 10M rows which should be in order of few
>>> seconds at most without 

Re: Slow metadata update queries during upsert

2016-03-28 Thread James Taylor
Hi Ankur,
Try setting the UPDATE_CACHE_FREQUENCY on your table (4.7.0 or above) to
prevent the client from checking with the server on whether or not your
table metadata is up to date. See here[1] for more information. You can
issue a command like this which will hold on to your metadata on the client
for 15 minutes before checking back with the server to get metadata or
statistics updates on your table:

ALTER TABLE my_table SET UPDATE_CACHE_FREQUENCY=90

Thanks,
James

[1] https://phoenix.apache.org/#Altering

On Mon, Mar 28, 2016 at 12:33 AM, Ankur Jain  wrote:

> Hi
>
> We are using phoenix as our transactional data store(though we are not yet
> using its latest transaction feature yet). Earlier we had our own custom
> query layer built on top of hbase that we are trying to replace.
>
> During tests we found that inserts are very slow as compared to regular
> hbase puts. There is always 7-8ms of additional time associated with each
> upsert query. This time is taken mostly during validate phase, where the
> cache is updated with latest table metadata. Is there a way to avoid
> refresh of this cache always?
>
> Out of 15ms for a general upsert query in our case 11ms are taken to just
> update metadata cache of that table. Rest 3ms are spent in actual hbase
> batch call and 1ms in all other phoenix processing.
>
> We have two use cases,
> 1. Our table metadata is always static and we know we are not going to add
> any new columns at least on runtime.
> we would like to avoid any cost of this metadata update cost so that
> our inserts are faster. Is this possible with existing code base.
>
> 2. We add columns to our tables on the fly.
> Adding new columns on the fly is generally a rare event. Is there a
> control where we can explicitly invalidate cache, in case a column is
> updated and we are caching metadata infinitely.
>
> Is metadata cache at connection level or is at global level? Because we
> are aways creating new connections.
>
> I have also observed that CsvToKeyValueMapper is fast because it avoids
> connection.commit() step and do all the validations upfront to avoid update
> cache step during commit.
>
> Just to add another analysis where Phoenix inserts are much slower that
> native hbase put is https://issues.apache.org/jira/browse/YARN-2928. 
> TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
> clearly states that. I believe this might be related.
>
> Thanks,
> Ankur Jain
>


Re: Slow metadata update queries during upsert

2016-03-28 Thread Ankur Jain
Please ignore the same query from another email id of mine. I was getting 
failure notification while sending emails from other id but after few hours 
somehow they showed up. Sorry for spamming.

Thanks,
Ankur Jain

From: Ankur Jain >
Reply-To: "user@phoenix.apache.org" 
>
Date: Monday, 28 March 2016 1:03 pm
To: "user@phoenix.apache.org" 
>
Subject: Slow metadata update queries during upsert

Hi

We are using phoenix as our transactional data store(though we are not yet 
using its latest transaction feature yet). Earlier we had our own custom query 
layer built on top of hbase that we are trying to replace.

During tests we found that inserts are very slow as compared to regular hbase 
puts. There is always 7-8ms of additional time associated with each upsert 
query. This time is taken mostly during validate phase, where the cache is 
updated with latest table metadata. Is there a way to avoid refresh of this 
cache always?

Out of 15ms for a general upsert query in our case 11ms are taken to just 
update metadata cache of that table. Rest 3ms are spent in actual hbase batch 
call and 1ms in all other phoenix processing.

We have two use cases,
1. Our table metadata is always static and we know we are not going to add any 
new columns at least on runtime.
we would like to avoid any cost of this metadata update cost so that our 
inserts are faster. Is this possible with existing code base.

2. We add columns to our tables on the fly.
Adding new columns on the fly is generally a rare event. Is there a control 
where we can explicitly invalidate cache, in case a column is updated and we 
are caching metadata infinitely.

Is metadata cache at connection level or is at global level? Because we are 
aways creating new connections.

I have also observed that CsvToKeyValueMapper is fast because it avoids 
connection.commit() step and do all the validations upfront to avoid update 
cache step during commit.

Just to add another analysis where Phoenix inserts are much slower that native 
hbase put is https://issues.apache.org/jira/browse/YARN-2928. 
TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf clearly states that. 
I believe this might be related.

Thanks,
Ankur Jain


Slow metadata update queries during upsert

2016-03-28 Thread ankur jain
Hi 
We are using phoenix as our transactional data store(though we are not yet 
using its latest transaction feature yet). Earlier we had our own custom query 
layer built on top of hbase that we are trying to replace.
During tests we found that inserts are very slow as compared to regular hbase 
puts. There is always 7-8ms of additional time associated with each upsert 
query. This time is taken mostly during validate phase, where the cache is 
updated with latest table metadata. Is there a way to avoid refresh of this 
cache always?
Out of 15ms for a general upsert query in our case 11ms are taken to just 
update metadata cache of that table. Rest 3ms are spent in actual hbase batch 
call and 1ms in all other phoenix processing.
We have two use cases, 1. Our table metadata is always static and we know we 
are not going to add any new columns at least on runtime.    we would like to 
avoid any cost of this metadata update cost so that our inserts are faster. Is 
this possible with existing code base.

2. We add columns to our tables on the fly.    Adding new columns on the fly is 
generally a rare event. Is there a control where we can explicitly invalidate 
cache, in case a column is updated and we are caching metadata infinitely.

Is metadata cache at connection level or is at global level? Because we are 
aways creating new connections.
I have also observed that CsvToKeyValueMapper is fast because it avoids 
connection.commit() step and do all the validations upfront to avoid update 
cache step during commit. 
Just to add another analysis where Phoenix inserts are much slower that native 
hbase put is https://issues.apache.org/jira/browse/YARN-2928. 
TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf clearly states that. 
I believe this might be related.
Thanks,Ankur Jain 

Slow metadata update queries during upsert

2016-03-28 Thread ankur jain
Hi 



We are using phoenix as our transactional data store(though we are not yet 
using its latest transaction feature yet). Earlier we had our own custom query 
layer built on top of hbase that we are trying to replace.
During tests we found that inserts are very slow as compared to regular hbase 
puts. There is always 7-8ms of additional time associated with each upsert 
query. This time is taken mostly during validate phase, where the cache is 
updated with latest table metadata. Is there a way to avoid refresh of this 
cache always?
Out of 15ms for a general upsert query in our case 11ms are taken to just 
update metadata cache of that table. Rest 3ms are spent in actual hbase batch 
call and 1ms in all other phoenix processing.
We have two use cases, 1. Our table metadata is always static and we know we 
are not going to add any new columns at least on runtime.    we would like to 
avoid any cost of this metadata update cost so that our inserts are faster. Is 
this possible with existing code base.

2. We add columns to our tables on the fly.    Adding new columns on the fly is 
generally a rare event. Is there a control where we can explicitly invalidate 
cache, in case a column is updated and we are caching metadata infinitely.

Is metadata cache at connection level or is at global level? Because we are 
aways creating new connections.
I have also observed that CsvToKeyValueMapper is fast because it avoids 
connection.commit() step and do all the validations upfront to avoid update 
cache step during commit. 
Just to add another analysis where Phoenix inserts are much slower that native 
hbase put is https://issues.apache.org/jira/browse/YARN-2928. 
TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf clearly states that. 
I believe this might be related.
Thanks,Ankur Jain