Phoenix Index tool deletes data from main table and does not populate local index

2022-08-22 Thread Pradheep Shanmugam via user
HI,

I am using CDP 7.3.1
I am trying create a phoenix local index for already existing table with ~14M 
rows.
Following are the steps I did.
1. take snapshot of table from hbase (table is salted in phoenix)
2. clone the snapshot to new test table
3. create the table in phoenix to link the phoenix to new habse table created 
from clone snapshot. I can see all rows in the phoenix table.

4. create the local index in async mode:
CREATE LOCAL INDEX MILESTONE_LOCAL_INDEX ON MILESTONE_TEST (h.eventTimestamp) 
ASYNC;

5. Run the MR job Phoenix IndexTool:

./hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table 
MILESTONE_TEST --index-table MILESTONE_LOCAL_INDEX --output-path 
/hbase/data/default/MILESTONE_LOCAL_INDEX_HFILE

MR jobs says succeeded, but he data in main test table  and local index is 0 
but the index is marked active.
I don’t see any explicit error in MR job.
What could be the issue with MR job?

Another question on Local index usage.
When I tried with some 5 rows, I tried to see if the local index is used

explain select * from MILESTONE_TEST where eventTimestamp <= 
TO_TIMESTAMP('2022-07-25 14:03:22.559');
+---+
|   PLAN
|
+---+
| CLIENT 10-CHUNK 0 ROWS 0 BYTES PARALLEL 10-WAY ROUND ROBIN FULL SCAN OVER 
MILESTONE_TEST  |
| SERVER FILTER BY H.EVENTTIMESTAMP <= TIMESTAMP '2022-07-25 14:03:22.559'  
|
+---+

I am not seeing the local index name in the explain suggesting its usage.
Does the phoenix not use the local index if there are less rows? Hence am 
trying with millions of rows so that phoenix with try to use the local index. 
Can I expect to see to the local index name in explain if I have a few million 
rows in table and local index?

Thanks,
Pradheep


Using Secondary index hint for many to many relationship model

2019-07-17 Thread Pradheep Shanmugam
Hi,

We have a table with with parented and childid combination as unique. We have 
to query it sometimes with parent id and sometimes with child id. So we have 
secondary index(global index) on child id. As we don’t want to store all the 
fields again in the secondary index , we wanted to use index hinting. Also we 
do pre-splitting on the first field of table and as well as secondary index to 
distribute the write. We pre-calculate  Secondary_Dist_salt and write to a 
column in primary table which is used to populate the index.

Table(TEST)

Primary_Dist_salt
Parentid
Childid

Other columns

Secondary Index: (TESTINDEX)

Secondary_Dist_salt
Childid
Parentid

If we do a query like below, I get below plan I see that the index is used, I 
am worried if the full scan on primary table will affect performance in future 
as the table may grow to several hundreds of millions.
Also I see that Primary_Dist_salt is also added implicitly when I create the 
secondary index without it. So I thought we have all the keys required to do a 
point lookup on primary table to get rest of the columns.
I presume that the range scan runs parallel and hence full scan occurs?
Can you please throw some light on this? Is there some way we can avoid that 
full scan.

explain select /*+ INDEX(TEST TESTINDEX) */ * from TEST where Childid
='10' and Secondary_dist_salt='3k';

CLIENT 10-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER TEST
SKIP-SCAN-JOIN TABLE 0
   CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER TESTINDEX 
['3k','10']
SERVER FILTER BY FIRST KEY ONLY
DYNAMIC SERVER FILTER BY ("TEST.PRIMARY_DIST_SALT", "TEST.PARENTID", 
"TEST.CHILDID") IN (($2.$4, $2.$5, $2.$6)

Thanks,
Pradheep


Re: Phoenix Client threads

2018-05-22 Thread Pradheep Shanmugam
Hi Sergey,

Thanks for your response..I got the issue..
Could you please throw some light on the Phoenix thread and Hconnections? Is it 
like 1 phoenix thread creates 30 Hconnections .
Assuming we are running 30 concurrent reads queries which would need 30 * 30 = 
900 Hconnections but I guess we have only 256 Hconnections (is there a 
parameter to tweak this) will the parallel scan be converted to serial scans 
reducing query performance?
Does increasing the phoenix.query.threadPoolSize (from 128 to 256) also 
increase the Hconnections thread pool size and how much)?
Should we definitely consider reducing the salting(from 30 to 10)  considering 
this huge multiplication factor of number of threads? Initially we decided to 
use salt buckets of 30 as this is write heave table and total number of rows 
could grow up to several billions.

Thanks,
Pradheep

From: <sergey.solda...@gmail.com> on behalf of Sergey Soldatov 
<sergeysolda...@gmail.com>
Reply-To: "user@phoenix.apache.org" <user@phoenix.apache.org>
Date: Tuesday, May 22, 2018 at 1:45 PM
To: "user@phoenix.apache.org" <user@phoenix.apache.org>
Subject: Re: Phoenix Client threads

Salting byte is calculated using a hash function for the whole row key (using 
all pk columns). So if you are using only one of PK columns in the WHERE 
clause, Phoenix is unable to identify which salting byte (bucket number) should 
be used, so it runs scans for all salting bytes.  All those threads are 
lightweight, mostly waiting for a response from HBase server, so you may 
consider the option to adjust nproc limit. Or you may decrease the number of 
phoenix threads by phoenix.query.threadPoolSize property. Decreasing number of 
salting buckets can be used as well.

Thanks,
Sergey

On Tue, May 22, 2018 at 8:52 AM, Pradheep Shanmugam 
<pradheep.shanmu...@infor.com<mailto:pradheep.shanmu...@infor.com>> wrote:
Hi,

We have table with key as (type, id1, id2) (type is same for all rows where as 
id1 and id2 are unique for each row) which is salted (30 salt buckets)
The load on this table is about 30 queries/sec with each query taking ~6ms
we are using phoenix 4.7.0 non-thin client
we have query like below

SELECT tab.a, tab.b
FROM tab
WHERE tab.id1 = '1F64F5DY0J0A03692'
AND tab.type = 4
AND tab.isActive = 1;

CLIENT 30-CHUNK 0 ROWS 0 BYTES PARALLEL 30-WAY ROUND ROBIN RANGE SCAN OVER TAB 
[0,4, '1F64F5DY0J0A03692']
SERVER FILTER BY TS.ISACTIVE = 1

Here I could see that about 30 threads are being used for this query..here 
‘type’ is same for all rows..and thought that it is the reason for looking into 
all the chunks to get the key and hence using 30 threads

Then I ran the same query on a similar table with keys rearranged (id1, id2, 
type) and salted (30)

But still I see same 30 threads are being used , thought it can uniquely 
identify a row with given id1 which should be in one of the chunks (is this due 
to salting that it does not know where the keys is)

CLIENT 30-CHUNK PARALLEL 30-WAY ROUND ROBIN RANGE SCAN OVER TAB [0, 
'1F64F5DY0J0A03692']
SERVER FILTER BY (TYPE = 4 AND TS.ISACTIVE = 1)

Currently I am exceeding my nproc limit set in my app server with (phoenix 
threads 128 and hconnection threads reaching 256 = 384 threads). Can you please 
throw some light on phoenix connections and Hconnections  and how to reduce 
that to reasonable level..and also on the above query plans. Should we consider 
reducing the SALT Number to 10( we have 10 region servers)?

Thanks,
Pradheep



Phoenix Client threads

2018-05-22 Thread Pradheep Shanmugam
Hi,

We have table with key as (type, id1, id2) (type is same for all rows where as 
id1 and id2 are unique for each row) which is salted (30 salt buckets)
The load on this table is about 30 queries/sec with each query taking ~6ms
we are using phoenix 4.7.0 non-thin client
we have query like below

SELECT tab.a, tab.b
FROM tab
WHERE tab.id1 = '1F64F5DY0J0A03692'
AND tab.type = 4
AND tab.isActive = 1;

CLIENT 30-CHUNK 0 ROWS 0 BYTES PARALLEL 30-WAY ROUND ROBIN RANGE SCAN OVER TAB 
[0,4, '1F64F5DY0J0A03692']
SERVER FILTER BY TS.ISACTIVE = 1

Here I could see that about 30 threads are being used for this query..here 
‘type’ is same for all rows..and thought that it is the reason for looking into 
all the chunks to get the key and hence using 30 threads

Then I ran the same query on a similar table with keys rearranged (id1, id2, 
type) and salted (30)

But still I see same 30 threads are being used , thought it can uniquely 
identify a row with given id1 which should be in one of the chunks (is this due 
to salting that it does not know where the keys is)

CLIENT 30-CHUNK PARALLEL 30-WAY ROUND ROBIN RANGE SCAN OVER TAB [0, 
'1F64F5DY0J0A03692']
SERVER FILTER BY (TYPE = 4 AND TS.ISACTIVE = 1)

Currently I am exceeding my nproc limit set in my app server with (phoenix 
threads 128 and hconnection threads reaching 256 = 384 threads). Can you please 
throw some light on phoenix connections and Hconnections  and how to reduce 
that to reasonable level..and also on the above query plans. Should we consider 
reducing the SALT Number to 10( we have 10 region servers)?

Thanks,
Pradheep


increase column datatype length

2017-10-17 Thread Pradheep Shanmugam
Hi,

I have a requirement to increase a varchar column length from 500 to 2000..
Looking at below.
http://apache-phoenix-user-list.1124778.n5.nabble.com/Can-I-change-a-String-column-s-size-and-preserve-the-data-tt2664.html#none
how does this change datatype length at hbase layer..should a major_compaction 
be done at hbase layer?
will above solution work in phoenix 4.4

other options I see are
create a new table with updated columns and copy the data over from old table

are there any other options?

Thanks,
Pradheep


Re: Salt Number

2017-09-08 Thread Pradheep Shanmugam
HI James,


We have a table where multiple customer could have rows.

Some of them may be large and some very small in terms for number of rows.

we have a row key based on customerid+type+orderid..if not salted all the rows 
of large customer will end up in some regions leading to hot spotting(being 
large customer and more frequently used)


Thanks,

Pradheep


From: James Taylor <jamestay...@apache.org>
Sent: Friday, September 8, 2017 12:56:31 PM
To: user
Subject: Re: Salt Number

Hi Pradheep,
Would you be able to describe your use case and why you're salting? We really 
only recommend salting if you have write hotspotting. Otherwise, it increases 
the overall load on your cluster.
Thanks,
James

On Fri, Sep 8, 2017 at 9:13 AM, Pradheep Shanmugam 
<pradheep.shanmu...@infor.com<mailto:pradheep.shanmu...@infor.com>> wrote:

Hi,


As the salt number cannot be changed later, what is is best number we can give 
in different cases for cluster with 10 region servers with say 6 cores in each.

Should we consider cores while deciding the number..

In some places i see number can be in the range 1-256 and in some place i see 
that it is equal to the number of region servers..can the number in the 
multiples of region server(say 20, 30 etc)

read heavy large(several 100 millions) table with range scans

write heavy large table with less frequent range scans

large table with hybrid load with range scans


Thanks,

Pradheep



optimistic lock in phenix

2017-08-30 Thread Pradheep Shanmugam
hi,


we have a table in phoenix 4.4 which has a modifyrevision as a column..

when a thread updates the row the modifyrevision has to be incremented..

only option i can see is to use the upsert select to get the latest revision 
but since it is not atomic how do we avoid the dirty reads?is there any other 
way this can be done?


Thanks,

Pradheep


Phoenix log

2017-04-26 Thread Pradheep Shanmugam
Hi,

I am doing a point lookup query based on row key and get back a column from 
another column family(H.DATA] and it takes about 100ms though sqlline..what is 
a typical point lookup time?
I am not getting what takes this 54ms at the beginning..any clue…also the total 
time taken does not sum up to the time returned by sqlline.assuming network 
time.

Phoenix log is as below:

2017-04-26 18:47:12 DEBUG ParallelIterators:97 - Id: 
4595e0a3-0d93-4ab7-b300-a5e590d7f838, Time: 54ms, Scan: 
{"timeRange":[0,1493232432648],"batch":-1,"startRow":"","stopRow":"\\x01","loadColumnFamiliesOnDemand":null,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":2097152,"maxVersions":1,"filter":"FilterList
 AND (2/2): [COLUMN_NAME IS NOT NULL, ]","caching":1}

2017-04-26 18:48:10 DEBUG ParallelIterators:97 - Id: 
d6ad1212-e56e-4b00-87f9-c8a4b534bbc4, Time: 4ms, Scan: 
{"timeRange":[0,1493232491081],"batch":-1,"startRow":"\\x00\\x00MSBO_PHOENIX_COMP_ROWKEY\\x00\\x01","stopRow":"\\x00\\x00MSBO_PHOENIX_COMP_ROWKEY\\x01","loadColumnFamiliesOnDemand":null,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":2097152,"maxVersions":1,"filter":"FilterList
 AND (2/2): [COLUMN_FAMILY IS NULL, ]","caching":1}

2017-04-26 18:48:10 DEBUG ParallelIterators:97 - Id: 
bd8f3eb7-9259-45bc-b488-89e510881816, Time: 1ms, Scan: 
{"timeRange":[0,1493232491022],"batch":-1,"startRow":"\\x07\\x00\\x00\\x00dPO\\x00\\x00\\x00\\x00\\x00-s\\x0C\\x04\\x00\\x00\\x00\\x07","stopRow":"\\x07\\x00\\x00\\x00dPO\\x00\\x00\\x00\\x00\\x00-s\\x0C\\x04\\x00\\x00\\x00\\x08","loadColumnFamiliesOnDemand":null,"totalColumns":2,"cacheBlocks":true,"families":{"0":["_0"],"H":["DATA"]},"maxResultSize":2097152,"maxVersions":1,"caching":1}

Thanks,
Pradheep


Could not find hash cache error/time out errors

2017-03-13 Thread Pradheep Shanmugam
Hi,

I run a phoenix(4.4.0 – thick client) left join query from Jmeter
I run the same query from 3 to 5 parallel threads and I run into the following 
error:

org.apache.phoenix.exception.PhoenixIOException: 
org.apache.phoenix.exception.PhoenixIOException: 
org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash cache for 
joinId: �l͖LW||. The cache might have expired and have been removed.
at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.(HashJoinRegionScanner.java:96)
at 
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(GroupedAggregateRegionObserver.java:149)
at 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:177)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1318)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1748)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1712)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1313)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2259)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

When I increase the allocated heap in Jmeter from 4Gb for above to 6 Gb, I do 
not get this error..
Also I have to allocate memory proportional to number of parallel threads I 
run. Allocating more memory in the client application seems a costly option..
I tried increasing the phoenix.coprocessor.maxServerCacheTimeToLiveMs from 30 
to 120 secs then I Stated getting

org.apache.phoenix.exception.PhoenixIOException: 
org.apache.phoenix.exception.PhoenixIOException: 89367ms passed since the last 
invocation, timeout is currently set to 6

Default phoenix client time out is 10 mins..

Any idea on how to resolve this?

Thanks,
Pradheep




phoenix client config and memory

2017-03-07 Thread Pradheep Shanmugam
Hi,


  1.  When using phoenix thick client(4.4.0) at the application, where does the 
client hbase-site.xml reside..I don’t see one? Or does it pull the 
hbase-site.xml from the server before starting up?I do see the phoenix query 
time out set in the server side habse-site though it is client setting.
  2.  When using phoenix thick client,  does the client need extra memory to be 
allocated in general to run the queries and when it does operations like client 
merge sort. With mutiple such queries runnign at the same time?

Thanks,
Pradheep


Re: Phoenix query performance

2017-02-23 Thread Pradheep Shanmugam
Hi Arvind,

The row key is PARENTID, OWNERORGID,  MILESTONETYPEID, PARENTTYPE
Each parentid will have a list of  MILESTONETYPEID (19661, 1, 2 , etc..). So 
your query will return all the parentids.. I am looking of rparentid that does 
not have a MILESTONETYPEID

Thanks,
Pradheep

From: Arvind S <arvind18...@gmail.com<mailto:arvind18...@gmail.com>>
Reply-To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" 
<user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Date: Thursday, February 23, 2017 at 1:19 AM
To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" 
<user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Subject: Re: Phoenix query performance

why cant you reduce your query to

  select msbo1.PARENTID
  from msbo_phoenix_comp_rowkey
  where msbo1.PARENTTYPE = 'SHIPMENT'
   and msbo1.OWNERORGID = 100
   and msbo1.MILESTONETYPEID != 19661
   and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000
  group by msbo1.PARENTID
  order by msbo1.PARENTID


??

looks like that's what you get by this left outer.



Cheers !!
Arvind

On 22 February 2017 at 22:58, Pradheep Shanmugam 
<pradheep.shanmu...@infor.com<mailto:pradheep.shanmu...@infor.com>> wrote:
Hi,

We have a hbase cluster with 8 region servers with 20G memory
We have a table  with 1 column family along with a secondary index.
Following query took only few milliseconds when we had less data(< 1 million)
After adding more data(~30M rows) the performance declined and took about a 
minute or more(not stable)

select msbo1.PARENTID
  from msbo_phoenix_comp_rowkey msbo1
  left outer join (
 select PARENTID,MILESTONETYPEID
   from msbo_phoenix_comp_rowkey
  where PARENTREFERENCETIME between 1479964000 and 
1480464000<tel:01480%20464%2>
and OWNERORGID = 100
and PARENTTYPE = 'SHIPMENT'
and MILESTONETYPEID = 19661
group by PARENTID,MILESTONETYPEID
 ) msbo2
  on msbo1.PARENTID = msbo2.PARENTID
  where msbo1.PARENTTYPE = 'SHIPMENT'
   and msbo1.OWNERORGID = 100
   and msbo2.MILESTONETYPEID is null
   and msbo1.PARENTREFERENCETIME between 1479964000 and 
1480464000<tel:01480%20464%2>
group by msbo1.PARENTID
  order by msbo1.PARENTID

The RHS return about a 500K rows ..LHS about 18M rows…final result about 500K 
rows

MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
  is the index

Query plan:
CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER 
MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
 [0,'SHIPMENT',100]
SERVER FILTER BY FIRST KEY ONLY AND 
(TO_UNSIGNED_LONG("PARENTREFERENCETIME") >= 1477958400 AND 
TO_UNSIGNED_LONG("PARENTREFERENCETIME") <= 1480464000<tel:01480%20464%2>)
SERVER AGGREGATE INTO DISTINCT ROWS BY ["MSBO1.:PARENTID"]
CLIENT MERGE SORT
PARALLEL LEFT-JOIN TABLE 0
CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER 
MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
 [0,'SHIPMENT',100,19661,1,477,958,400] - [0,'SHIPMENT',100,19661,1,480,464,000]
SERVER FILTER BY FIRST KEY ONLY
SERVER AGGREGATE INTO DISTINCT ROWS BY ["PARENTID", 
"MILESTONETYPEID"]
CLIENT MERGE SORT
AFTER-JOIN SERVER FILTER BY MSBO2.MILESTONETYPEID IS NULL

Attached the phoenix log.
I see the caching to set as 100..and "maxResultSize”:2097152..is that something 
that can be tuned will help?
Is that the client merge sort consuming more time can be improved? Is there any 
other tuning possible?

Thanks,
Pradheep



Re: Phoenix query performance

2017-02-22 Thread Pradheep Shanmugam
Hi Maryann

Please find my answers inline.

Thanks,
Pradheep

From: Maryann Xue <maryann@gmail.com<mailto:maryann@gmail.com>>
Reply-To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" 
<user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Date: Wednesday, February 22, 2017 at 2:22 PM
To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" 
<user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Subject: Re: Phoenix query performance

Hi Pradheep,

Thank you for posting the query and the log file! There are two things going on 
on the server side at the same time here. I think it'd be a good idea to 
isolate the problem first. So a few questions:
1. When you say data size went from "< 1M" to 30M, did the data from both LHS 
and RHS grow proportionately?
It is basically the same table..the query is like a self join..yes, you can say 
that it is proportional.
2. If yes to (1), what if we only increase the data in LHS, but keep it small 
for RHS? Would the query run significantly faster?
When RHS count is 420336, time taken is 37 seconds
When RHS count is 63575, time taken is 32 seconds (not a significant difference)
3. What if we only do group by on LHS? Would the query time be linear to the 
data size?
After Removing group by on RHS
When RHS count is 420336, time taken is 34 seconds
When RHS count is 63575, time taken is 32 seconds
4. How was GC when running the query?
About 12ms in 1 RS, 10ms in 1 RS, 4-5ms in couple of them and less than 1ms in 
the rest of the region servers when the query is runnning.

Thanks,
Maryann

On Wed, Feb 22, 2017 at 9:28 AM, Pradheep Shanmugam 
<pradheep.shanmu...@infor.com<mailto:pradheep.shanmu...@infor.com>> wrote:
Hi,

We have a hbase cluster with 8 region servers with 20G memory
We have a table  with 1 column family along with a secondary index.
Following query took only few milliseconds when we had less data(< 1 million)
After adding more data(~30M rows) the performance declined and took about a 
minute or more(not stable)

select msbo1.PARENTID
  from msbo_phoenix_comp_rowkey msbo1
  left outer join (
 select PARENTID,MILESTONETYPEID
   from msbo_phoenix_comp_rowkey
  where PARENTREFERENCETIME between 1479964000 and 1480464000
and OWNERORGID = 100
and PARENTTYPE = 'SHIPMENT'
and MILESTONETYPEID = 19661
group by PARENTID,MILESTONETYPEID
 ) msbo2
  on msbo1.PARENTID = msbo2.PARENTID
  where msbo1.PARENTTYPE = 'SHIPMENT'
   and msbo1.OWNERORGID = 100
   and msbo2.MILESTONETYPEID is null
   and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000
group by msbo1.PARENTID
  order by msbo1.PARENTID

The RHS return about a 500K rows ..LHS about 18M rows…final result about 500K 
rows

MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
  is the index

Query plan:
CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER 
MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
 [0,'SHIPMENT',100]
SERVER FILTER BY FIRST KEY ONLY AND 
(TO_UNSIGNED_LONG("PARENTREFERENCETIME") >= 1477958400 AND 
TO_UNSIGNED_LONG("PARENTREFERENCETIME") <= 1480464000)
SERVER AGGREGATE INTO DISTINCT ROWS BY ["MSBO1.:PARENTID"]
CLIENT MERGE SORT
PARALLEL LEFT-JOIN TABLE 0
CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER 
MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
 [0,'SHIPMENT',100,19661,1,477,958,400] - [0,'SHIPMENT',100,19661,1,480,464,000]
SERVER FILTER BY FIRST KEY ONLY
SERVER AGGREGATE INTO DISTINCT ROWS BY ["PARENTID", 
"MILESTONETYPEID"]
CLIENT MERGE SORT
AFTER-JOIN SERVER FILTER BY MSBO2.MILESTONETYPEID IS NULL

Attached the phoenix log.
I see the caching to set as 100..and "maxResultSize”:2097152..is that something 
that can be tuned will help?
Is that the client merge sort consuming more time can be improved? Is there any 
other tuning possible?

Thanks,
Pradheep



Managed Splitting of phoenix salted tables

2017-01-20 Thread Pradheep Shanmugam
HI,

1. I have couple of phoenix tables with salting. I am assuming that all the 
regions will grow uniformly across the region server.
Based on above I expect the region splitting to happen across all the region 
servers at the same time which will impact my performance when the region size 
gets bigger.
I am considering manual region splitting to avoid this. But the given that the 
default split policy is IncreasingToUpperBoundRegionSplitPolicy,
I cannot really disable the splitting by increasing the hbase.hregion.max.file 
size to say 100Gb as the the new split size is going to be the one set by the 
policy
which will be less than 100 Gb several times and the automatic splitting is 
going to continue.
Should I consider changing the policy back to ConstantSizeRegionSplitPolicy and 
then disable it setting to (say) 100 Gb?
Is there a better solution to avoid this problem?

2. Should I not consider manual splitting as the chances of all regions 
splitting at the same time is not possible?

Thanks,
Pradheep


Phoenix tracing did not start

2017-01-18 Thread Pradheep Shanmugam
Hi,

I am using hdp 2.4.3.0-227. I am trying to enable phoenix tracing to monitor 
the queries and to analyze performance. I followed steps outlined here - 
https://phoenix.apache.org/tracing.html
i placed the hadoop-metrics2-hbase.properties in /etc/hbase/conf
hadoop-metrics2-phoenix.properties in /usr/hdp/2.4.3.0-227/phoenix/bin in all 
regions servers

Aslo added following properties to hbase.site
phoenix.trace.statsTableName SYSTEM.TRACING_STATS
phoenix.trace.frequency always. After this

I am not clear where to place the ddl for SYSTEM.TRACING_STATS. Also i could 
not see ./bin/traceserver.py to start
Please advice.

Thanks,
Pradheep


Salting an secondary index

2016-11-23 Thread Pradheep Shanmugam
Hi,

I have a hbase table created using phoenix which is salted.
Since the queries on the table required a secondary index, I created index 
using phoenix.
Can this index also be salted which may place the rows randomly in different 
Rss?
Even of the index is not salted, will the index be useful when we salt the 
actual table?
Please advise.

Thanks,
Pradheep


Re: Using Apache perf with Hbase 1.1

2016-10-18 Thread Pradheep Shanmugam
Hi,

All the regions are online acoording to 16010/master-status web UI.
Phoenix is not able connect properly due to error "Caused by: 
java.io.IOException: hconnection-0xa59a583 closed"

Thanks,
Pradheep

From: Mujtaba Chohan <mujt...@apache.org<mailto:mujt...@apache.org>>
Reply-To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" 
<user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Date: Tuesday, October 18, 2016 at 5:36 PM
To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" 
<user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Subject: Re: Using Apache perf with Hbase 1.1

Cannot get all table regions

Check that there are no offline regions. See related thread 
here<http://apache-phoenix-user-list.1124778.n5.nabble.com/Help-w-table-that-suddenly-keeps-timing-out-tt2393.html#a2404>.

On Tue, Oct 18, 2016 at 2:11 PM, Pradheep Shanmugam 
<pradheep.shanmu...@infor.com<mailto:pradheep.shanmu...@infor.com>> wrote:
Hi,

I am trying to connect pherf to hbase cluster running hbase 1.1 using below. 
Could you please help me connect pherf to hbase cluster running v 1.1

java -Xms512m -Xmx3072m  -cp 
"/home/ambari/pherf/phoenix/bin/../phoenix-pherf/config:/etc/hbase/conf:/home/ambari/pherf/phoenix/bin/../phoenix-client/target/phoenix-server-client-4.7.0-HBase-1.1.jar:/home/ambari/pherf/phoenix/bin/../phoenix-pherf/target/phoenix-pherf-4.8.1-HBase-1.1.jar"
 -Dlog4j.configuration=file:/home/ambari/pherf/phoenix/bin/log4j.properties 
org.apache.phoenix.pherf.Pherf -drop all -l -q -z hbase-perf-rs1 -schemaFile 
'.*user_defined_schema.sql' -scenarioFile '.*user_defined_scenario.xml’
And I get below exception

Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/commons/cli/ParseException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2615)
at java.lang.Class.getMethod0(Class.java:2856)
at java.lang.Class.getMethod(Class.java:1668)
at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: 
org.apache.commons.cli.ParseException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 6 more

When I tried to connect using below,
java -Xms512m -Xmx3072m  -cp 
"/home/ambari/pherf/phoenix/bin/../phoenix-pherf/config:/etc/hbase/conf:/home/ambari/pherf/phoenix/bin/../phoenix-client/target/phoenix-4.9.0-HBase-1.2-SNAPSHOT-client.jar:/home/ambari/pherf/phoenix/bin/../phoenix-pherf/target/phoenix-pherf-4.9.0-HBase-1.2-SNAPSHOT-minimal.jar"
 -Dlog4j.configuration=file:/home/ambari/pherf/phoenix/bin/log4j.properties 
org.apache.phoenix.pherf.Pherf -drop all -l -q -z hbase-perf-rs1 -schemaFile 
'.*user_defined_schema.sql' -scenarioFile '.*user_defined_scenario.xml’

I got below error. I thought it could be because of connecting to hbase 1.1 
with 1.2 client?

java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.
at 
org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:457)
at 
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:549)
at 
org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:542)
at 
org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:477)
at 
org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:370)
at 
org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:60)
at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:218)
at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:341)
at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:206)
at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:290)
at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:270)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at 
org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:269)
at 
org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:1476)
at 
org.apache.phoenix.jdbc.PhoenixDatabaseMetaData.getTables(PhoenixDatabaseMetaData.java:1149)
at 
org.apache.phoenix.pherf.util.PhoenixUtil.getTableMetaData(PhoenixUtil.java:220)
at org.apache.phoenix.pherf.util.PhoenixUtil.d

Using Apache perf with Hbase 1.1

2016-10-18 Thread Pradheep Shanmugam
Hi,

I am trying to connect pherf to hbase cluster running hbase 1.1 using below. 
Could you please help me connect pherf to hbase cluster running v 1.1

java -Xms512m -Xmx3072m  -cp 
"/home/ambari/pherf/phoenix/bin/../phoenix-pherf/config:/etc/hbase/conf:/home/ambari/pherf/phoenix/bin/../phoenix-client/target/phoenix-server-client-4.7.0-HBase-1.1.jar:/home/ambari/pherf/phoenix/bin/../phoenix-pherf/target/phoenix-pherf-4.8.1-HBase-1.1.jar"
 -Dlog4j.configuration=file:/home/ambari/pherf/phoenix/bin/log4j.properties 
org.apache.phoenix.pherf.Pherf -drop all -l -q -z hbase-perf-rs1 -schemaFile 
'.*user_defined_schema.sql' -scenarioFile '.*user_defined_scenario.xml’
And I get below exception

Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/commons/cli/ParseException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2615)
at java.lang.Class.getMethod0(Class.java:2856)
at java.lang.Class.getMethod(Class.java:1668)
at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: 
org.apache.commons.cli.ParseException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 6 more

When I tried to connect using below,
java -Xms512m -Xmx3072m  -cp 
"/home/ambari/pherf/phoenix/bin/../phoenix-pherf/config:/etc/hbase/conf:/home/ambari/pherf/phoenix/bin/../phoenix-client/target/phoenix-4.9.0-HBase-1.2-SNAPSHOT-client.jar:/home/ambari/pherf/phoenix/bin/../phoenix-pherf/target/phoenix-pherf-4.9.0-HBase-1.2-SNAPSHOT-minimal.jar"
 -Dlog4j.configuration=file:/home/ambari/pherf/phoenix/bin/log4j.properties 
org.apache.phoenix.pherf.Pherf -drop all -l -q -z hbase-perf-rs1 -schemaFile 
'.*user_defined_schema.sql' -scenarioFile '.*user_defined_scenario.xml’

I got below error. I thought it could be because of connecting to hbase 1.1 
with 1.2 client?

java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.
at 
org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:457)
at 
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:549)
at 
org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:542)
at 
org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:477)
at 
org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:370)
at 
org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:60)
at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:218)
at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:341)
at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:206)
at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:290)
at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:270)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at 
org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:269)
at 
org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:1476)
at 
org.apache.phoenix.jdbc.PhoenixDatabaseMetaData.getTables(PhoenixDatabaseMetaData.java:1149)
at 
org.apache.phoenix.pherf.util.PhoenixUtil.getTableMetaData(PhoenixUtil.java:220)
at org.apache.phoenix.pherf.util.PhoenixUtil.deleteTables(PhoenixUtil.java:192)
at org.apache.phoenix.pherf.Pherf.run(Pherf.java:234)
at org.apache.phoenix.pherf.Pherf.main(Pherf.java:188)
Caused by: java.io.IOException: hconnection-0xa59a583 closed
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveZooKeeperWatcher(ConnectionManager.java:1685)
at 
org.apache.hadoop.hbase.client.ZooKeeperRegistry.isTableOnlineState(ZooKeeperRegistry.java:122)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.isTableDisabled(ConnectionManager.java:979)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1148)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1136)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:957)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:535)