Re: Hash aggregation

2018-05-22 Thread Maryann Xue
Since the performance running a group-by aggregation on client side is most
likely bad, it’s usually not desired. The original implementation was for
functionality completeness only so it chose the easiest way, which reused
some existing classes. In some cases, though, the client group-by can still
be tolerable if there aren’t many distinct keys. So yes, please open a JIRA
for implementing hash aggregation on client side. Thank you!


Thanks,
Maryann
On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gsang...@23andme.com>
wrote:

> Hello,
>
> Any guidance or thoughts on the thread below?
>
> Thanks,
> Gerald
>
>
> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gsang...@23andme.com>
> wrote:
>
>> Maryann,
>>
>> Can Phoenix provide hash aggregation on the client side? Are there design
>> / implementation reasons not to, or should I file a ticket for this?
>>
>> Thanks,
>> Gerald
>>
>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <maryann@gmail.com>
>> wrote:
>>
>>> Hi Gerald,
>>>
>>> Phoenix does have hash aggregation. The reason why sort-based
>>> aggregation is used in your query plan is that the aggregation happens on
>>> the client side. And that is because sort-merge join is used (as hinted)
>>> which is a client driven join, and after that join stage all operations can
>>> only be on the client-side.
>>>
>>>
>>> Thanks,
>>> Marynn
>>>
>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <gsang...@23andme.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap, or
>>>> should I file a ticket? We have aggregation queries that do not require
>>>> sorted results.
>>>>
>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>
>>>> *CREATE TABLE unsalted (   keyA BIGINT NOT NULL,   keyB BIGINT
>>>> NOT NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY (keyA,
>>>> keyB));*
>>>>
>>>>
>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>> BY t1.val,
>>>> t2.val;++-++--+|
>>>>PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>> |++-++--+|
>>>> SORT-MERGE-JOIN (INNER) TABLES | null | null |
>>>> || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>> | || AND| null |
>>>> null | || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]  |
>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>| null | null |
>>>> |++-++--+*
>>>> Thanks,
>>>> Gerald
>>>>
>>>
>>>
>>
>


Re: Hash aggregation

2018-05-18 Thread Maryann Xue
Hi Gerald,

Phoenix does have hash aggregation. The reason why sort-based aggregation
is used in your query plan is that the aggregation happens on the client
side. And that is because sort-merge join is used (as hinted) which is a
client driven join, and after that join stage all operations can only be on
the client-side.


Thanks,
Marynn

On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi 
wrote:

> Hello,
>
> Does Phoenix provide hash aggregation? If not, is it on the roadmap, or
> should I file a ticket? We have aggregation queries that do not require
> sorted results.
>
> For example, this EXPLAIN plan shows a CLIENT SORT.
>
> *CREATE TABLE unsalted (   keyA BIGINT NOT NULL,   keyB BIGINT NOT
> NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY (keyA, keyB));*
>
>
> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2, COUNT(*) c
> FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP BY t1.val,
> t2.val;++-++--+|
>PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
> |++-++--+|
> SORT-MERGE-JOIN (INNER) TABLES | null | null |
> || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
> | || AND| null |
> null | || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]  |
> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>| null | null |
> |++-++--+*
> Thanks,
> Gerald
>


Re: SORT_MERGE_JOIN on non-leading key: server-side sorting

2018-05-17 Thread Maryann Xue
E) l\n" +
"JOIN " + peopleTable + " ds ON ds.PERSON_ID = l.LOCALID";
String query3 = "SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)\n" +
"FROM " + myTable + " t1 JOIN " + myTable + " t2\n" +
"ON t1.DSID = t2.DSID\n" +
"WHERE t1.LOCALID = 'X001' AND t2.LOCALID = 'X002'";

for (String q : new String[]{query1, query2, query3}) {
ResultSet rs = conn.createStatement().executeQuery("explain " + q);
String plan = QueryUtil.getExplainPlan(rs);
assertFalse("Tables should not be sorted over their PKs:\n" + plan,
plan.contains("SERVER SORTED BY"));

rs = conn.createStatement().executeQuery(q);
assertTrue(rs.next());
//assertEquals(2, rs.getInt(1));
assertFalse(rs.next());
}
}



On Thu, May 17, 2018 at 3:48 PM, Gerald Sangudi <gsang...@23andme.com>
wrote:

> Hi Maryann,
>
> Would you mind sharing the EXPLAIN output you are getting?
>
> I'm not able to find a download for phoenix-4.13.2-HBase or 4.14.0-HBase.
> The *-cdh downloads do not work with Amazon EMR. I tried building against
> the 4.14.0-rc0 tag. This produced 4.14.0 for phoenix-core.jar, but all the
> other jars produced are 4.13.1, including the client jar. When I deploy
> these jars on EMR and run queries via sqlline, I get a NoClassDefFoundError.
>
> Are the plans to provide 4.13.2-HBase as a public download?
>
> Thanks,
> Gerald
>
> On Wed, May 16, 2018 at 10:40 AM, Maryann Xue <maryann@gmail.com>
> wrote:
>
>> Hi Gerald,
>>
>> I checked again. Unfortunately this fix is included with 4.13.2 but not
>> 4.13.1. Would you mind upgrading your library to 4.13.2?
>>
>> Thanks,
>> Maryann
>>
>> On Wed, May 16, 2018 at 9:41 AM, Maryann Xue <maryann@gmail.com>
>> wrote:
>>
>>> Sorry for the late response. Yes, sure, I will try it right away.
>>>
>>> On Wed, May 16, 2018 at 9:40 AM Gerald Sangudi <gsang...@23andme.com>
>>> wrote:
>>>
>>>> Hi Maryann,
>>>>
>>>> Following up again -- I ran EXPLAIN on 4.13.1 and saw the same
>>>> behavior, a sort on the RHS. Is it possible for you to try it on 4.13.1?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>> On Fri, May 11, 2018 at 10:52 AM, Gerald Sangudi <gsang...@23andme.com>
>>>> wrote:
>>>>
>>>>> Hi Maryann,
>>>>>
>>>>> Thanks for verifying against latest. However, I did not detect the fix
>>>>> in Phoenix 4.13.1. AWS EMR currently provides Phoenix 4.13.0. I manually
>>>>> upgraded an AWS EMR cluster to Phoenix 4.13.1 and rebooted the cluster. 
>>>>> The
>>>>> EXPLAIN plan still shows the same issue.
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>> On Wed, May 9, 2018 at 11:51 AM, Maryann Xue <maryann@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Gerald,
>>>>>>
>>>>>> I have verified against latest Phoenix code that this problem has
>>>>>> been fixed. I have also checked Phoenix 4.13 release tags. Looks like all
>>>>>> versions of 4.13 packages now include that fix. Would you mind getting 
>>>>>> the
>>>>>> latest Phoenix-4.13 package and testing it again? Thank you!
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Maryann
>>>>>>
>>>>>> On Tue, May 8, 2018 at 12:24 PM, Gerald Sangudi <gsang...@23andme.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I'm running Phoenix 4.13 on AWS EMR and getting the following
>>>>>>> EXPLAIN plan:
>>>>>>>
>>>>>>> Table:
>>>>>>>
>>>>>>> *CREATE TABLE salted (   keyA BIGINT NOT NULL,   keyB BIGINT
>>>>>>> NOT NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY (keyA,
>>>>>>> keyB))SALT_BUCKETS = 64;*
>>>>>>> EXPLAIN:
>>>>>>>
>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */COUNT(*) cFROM salted t1
>>>>>>> JOIN salted t2ON (t1.keyB = t2.keyB)WHERE t1.keyA = 10AND t2.keyA = 20;*
>>>>>>>
>>>>>>> *+--+-+-+|
>>>>>>>   

Re: SORT_MERGE_JOIN on non-leading key: server-side sorting

2018-05-16 Thread Maryann Xue
Hi Gerald,

I checked again. Unfortunately this fix is included with 4.13.2 but not
4.13.1. Would you mind upgrading your library to 4.13.2?

Thanks,
Maryann

On Wed, May 16, 2018 at 9:41 AM, Maryann Xue <maryann@gmail.com> wrote:

> Sorry for the late response. Yes, sure, I will try it right away.
>
> On Wed, May 16, 2018 at 9:40 AM Gerald Sangudi <gsang...@23andme.com>
> wrote:
>
>> Hi Maryann,
>>
>> Following up again -- I ran EXPLAIN on 4.13.1 and saw the same behavior,
>> a sort on the RHS. Is it possible for you to try it on 4.13.1?
>>
>> Thanks,
>> Gerald
>>
>> On Fri, May 11, 2018 at 10:52 AM, Gerald Sangudi <gsang...@23andme.com>
>> wrote:
>>
>>> Hi Maryann,
>>>
>>> Thanks for verifying against latest. However, I did not detect the fix
>>> in Phoenix 4.13.1. AWS EMR currently provides Phoenix 4.13.0. I manually
>>> upgraded an AWS EMR cluster to Phoenix 4.13.1 and rebooted the cluster. The
>>> EXPLAIN plan still shows the same issue.
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Wed, May 9, 2018 at 11:51 AM, Maryann Xue <maryann@gmail.com>
>>> wrote:
>>>
>>>> Hi Gerald,
>>>>
>>>> I have verified against latest Phoenix code that this problem has been
>>>> fixed. I have also checked Phoenix 4.13 release tags. Looks like all
>>>> versions of 4.13 packages now include that fix. Would you mind getting the
>>>> latest Phoenix-4.13 package and testing it again? Thank you!
>>>>
>>>>
>>>> Thanks,
>>>> Maryann
>>>>
>>>> On Tue, May 8, 2018 at 12:24 PM, Gerald Sangudi <gsang...@23andme.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm running Phoenix 4.13 on AWS EMR and getting the following EXPLAIN
>>>>> plan:
>>>>>
>>>>> Table:
>>>>>
>>>>> *CREATE TABLE salted (   keyA BIGINT NOT NULL,   keyB BIGINT
>>>>> NOT NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY (keyA,
>>>>> keyB))SALT_BUCKETS = 64;*
>>>>> EXPLAIN:
>>>>>
>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */COUNT(*) cFROM salted t1 JOIN
>>>>> salted t2ON (t1.keyB = t2.keyB)WHERE t1.keyA = 10AND t2.keyA = 20;*
>>>>>
>>>>> *+--+-+-+|
>>>>> PLAN   | EST_BYTES_READ |
>>>>> |+--+-+-+|
>>>>> SORT-MERGE-JOIN (INNER) TABLES
>>>>>   | null | || CLIENT
>>>>> 64-CHUNK PARALLEL 64-WAY RANGE SCAN OVER SALTED [0,10] - [63,10]  | null |
>>>>> || SERVER FILTER BY FIRST KEY ONLY
>>>>>  | null | || CLIENT MERGE SORT
>>>>>| null | || AND
>>>>> (SKIP MERGE) |
>>>>> null | || CLIENT 64-CHUNK PARALLEL 64-WAY RANGE SCAN OVER SALTED 
>>>>> [0,20]
>>>>> - [63,20]  | null | || SERVER FILTER BY FIRST KEY ONLY
>>>>>  | null | || SERVER SORTED BY
>>>>> [T2.KEYB]   | null | || CLIENT
>>>>> MERGE SORT| null |
>>>>> || CLIENT AGGREGATE INTO SINGLE ROW
>>>>> | null |
>>>>> |+--+-+-+*
>>>>>
>>>>> In the EXPLAIN output, I see "SERVER SORTED BY [T2.KEYB]". Is this
>>>>> sort necessary? For both JOIN terms T1 and T2, the value of keyA, the
>>>>> leading part of the primary key, is fixed. Furthermore, there is no
>>>>> corresponding sort of T1.KEYB.
>>>>>
>>>>> When I EXPLAIN the same query on a non-salted table, neither T1.KEYB
>>>>> nor T2.KEYB is re-sorted. I'm hoping the sort is unnecessary. If so, is
>>>>> there an open ticket? I would be happy to file a ticket and to contribute
>>>>> to a fix. I would appreciate any guidance.
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>>
>>>>
>>>
>>


Re: SORT_MERGE_JOIN on non-leading key: server-side sorting

2018-05-16 Thread Maryann Xue
Sorry for the late response. Yes, sure, I will try it right away.
On Wed, May 16, 2018 at 9:40 AM Gerald Sangudi <gsang...@23andme.com> wrote:

> Hi Maryann,
>
> Following up again -- I ran EXPLAIN on 4.13.1 and saw the same behavior, a
> sort on the RHS. Is it possible for you to try it on 4.13.1?
>
> Thanks,
> Gerald
>
> On Fri, May 11, 2018 at 10:52 AM, Gerald Sangudi <gsang...@23andme.com>
> wrote:
>
>> Hi Maryann,
>>
>> Thanks for verifying against latest. However, I did not detect the fix in
>> Phoenix 4.13.1. AWS EMR currently provides Phoenix 4.13.0. I manually
>> upgraded an AWS EMR cluster to Phoenix 4.13.1 and rebooted the cluster. The
>> EXPLAIN plan still shows the same issue.
>>
>> Thanks,
>> Gerald
>>
>> On Wed, May 9, 2018 at 11:51 AM, Maryann Xue <maryann@gmail.com>
>> wrote:
>>
>>> Hi Gerald,
>>>
>>> I have verified against latest Phoenix code that this problem has been
>>> fixed. I have also checked Phoenix 4.13 release tags. Looks like all
>>> versions of 4.13 packages now include that fix. Would you mind getting the
>>> latest Phoenix-4.13 package and testing it again? Thank you!
>>>
>>>
>>> Thanks,
>>> Maryann
>>>
>>> On Tue, May 8, 2018 at 12:24 PM, Gerald Sangudi <gsang...@23andme.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm running Phoenix 4.13 on AWS EMR and getting the following EXPLAIN
>>>> plan:
>>>>
>>>> Table:
>>>>
>>>> *CREATE TABLE salted (   keyA BIGINT NOT NULL,   keyB BIGINT
>>>> NOT NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY (keyA,
>>>> keyB))SALT_BUCKETS = 64;*
>>>> EXPLAIN:
>>>>
>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */COUNT(*) cFROM salted t1 JOIN
>>>> salted t2ON (t1.keyB = t2.keyB)WHERE t1.keyA = 10AND t2.keyA = 20;*
>>>>
>>>> *+--+-+-+|
>>>> PLAN   | EST_BYTES_READ |
>>>> |+--+-+-+|
>>>> SORT-MERGE-JOIN (INNER) TABLES
>>>>   | null | || CLIENT
>>>> 64-CHUNK PARALLEL 64-WAY RANGE SCAN OVER SALTED [0,10] - [63,10]  | null |
>>>> || SERVER FILTER BY FIRST KEY ONLY
>>>>  | null | || CLIENT MERGE SORT
>>>>| null | || AND
>>>> (SKIP MERGE) |
>>>> null | || CLIENT 64-CHUNK PARALLEL 64-WAY RANGE SCAN OVER SALTED [0,20]
>>>> - [63,20]  | null | || SERVER FILTER BY FIRST KEY ONLY
>>>>  | null | || SERVER SORTED BY
>>>> [T2.KEYB]   | null | || CLIENT
>>>> MERGE SORT| null |
>>>> || CLIENT AGGREGATE INTO SINGLE ROW
>>>> | null |
>>>> |+--+-+-+*
>>>>
>>>> In the EXPLAIN output, I see "SERVER SORTED BY [T2.KEYB]". Is this sort
>>>> necessary? For both JOIN terms T1 and T2, the value of keyA, the leading
>>>> part of the primary key, is fixed. Furthermore, there is no corresponding
>>>> sort of T1.KEYB.
>>>>
>>>> When I EXPLAIN the same query on a non-salted table, neither T1.KEYB
>>>> nor T2.KEYB is re-sorted. I'm hoping the sort is unnecessary. If so, is
>>>> there an open ticket? I would be happy to file a ticket and to contribute
>>>> to a fix. I would appreciate any guidance.
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>>
>>>
>>
>


Re: SORT_MERGE_JOIN on non-leading key: server-side sorting

2018-05-09 Thread Maryann Xue
Hi Gerald,

I have verified against latest Phoenix code that this problem has been
fixed. I have also checked Phoenix 4.13 release tags. Looks like all
versions of 4.13 packages now include that fix. Would you mind getting the
latest Phoenix-4.13 package and testing it again? Thank you!


Thanks,
Maryann

On Tue, May 8, 2018 at 12:24 PM, Gerald Sangudi 
wrote:

> Hello,
>
> I'm running Phoenix 4.13 on AWS EMR and getting the following EXPLAIN plan:
>
> Table:
>
> *CREATE TABLE salted (   keyA BIGINT NOT NULL,   keyB BIGINT NOT
> NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY (keyA,
> keyB))SALT_BUCKETS = 64;*
> EXPLAIN:
>
> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */COUNT(*) cFROM salted t1 JOIN
> salted t2ON (t1.keyB = t2.keyB)WHERE t1.keyA = 10AND t2.keyA = 20;*
>
> *+--+-+-+|
> PLAN   | EST_BYTES_READ |
> |+--+-+-+|
> SORT-MERGE-JOIN (INNER) TABLES
>   | null | || CLIENT
> 64-CHUNK PARALLEL 64-WAY RANGE SCAN OVER SALTED [0,10] - [63,10]  | null |
> || SERVER FILTER BY FIRST KEY ONLY
>  | null | || CLIENT MERGE SORT
>| null | || AND
> (SKIP MERGE) |
> null | || CLIENT 64-CHUNK PARALLEL 64-WAY RANGE SCAN OVER SALTED [0,20]
> - [63,20]  | null | || SERVER FILTER BY FIRST KEY ONLY
>  | null | || SERVER SORTED BY
> [T2.KEYB]   | null | || CLIENT
> MERGE SORT| null |
> || CLIENT AGGREGATE INTO SINGLE ROW
> | null |
> |+--+-+-+*
>
> In the EXPLAIN output, I see "SERVER SORTED BY [T2.KEYB]". Is this sort
> necessary? For both JOIN terms T1 and T2, the value of keyA, the leading
> part of the primary key, is fixed. Furthermore, there is no corresponding
> sort of T1.KEYB.
>
> When I EXPLAIN the same query on a non-salted table, neither T1.KEYB nor
> T2.KEYB is re-sorted. I'm hoping the sort is unnecessary. If so, is there
> an open ticket? I would be happy to file a ticket and to contribute to a
> fix. I would appreciate any guidance.
>
> Thanks,
> Gerald
>
>


Re: SORT_MERGE_JOIN on non-leading key: server-side sorting

2018-05-08 Thread Maryann Xue
Hi Gerald,

Thank you for finding this issue! I think it is similar to PHOENIX-4508.
I'll verify your case on the latest Phoenix branch and see if it has been
fixed.


Thanks,
Maryann

On Tue, May 8, 2018 at 12:24 PM, Gerald Sangudi 
wrote:

> Hello,
>
> I'm running Phoenix 4.13 on AWS EMR and getting the following EXPLAIN plan:
>
> Table:
>
> *CREATE TABLE salted (   keyA BIGINT NOT NULL,   keyB BIGINT NOT
> NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY (keyA,
> keyB))SALT_BUCKETS = 64;*
> EXPLAIN:
>
> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */COUNT(*) cFROM salted t1 JOIN
> salted t2ON (t1.keyB = t2.keyB)WHERE t1.keyA = 10AND t2.keyA = 20;*
>
> *+--+-+-+|
> PLAN   | EST_BYTES_READ |
> |+--+-+-+|
> SORT-MERGE-JOIN (INNER) TABLES
>   | null | || CLIENT
> 64-CHUNK PARALLEL 64-WAY RANGE SCAN OVER SALTED [0,10] - [63,10]  | null |
> || SERVER FILTER BY FIRST KEY ONLY
>  | null | || CLIENT MERGE SORT
>| null | || AND
> (SKIP MERGE) |
> null | || CLIENT 64-CHUNK PARALLEL 64-WAY RANGE SCAN OVER SALTED [0,20]
> - [63,20]  | null | || SERVER FILTER BY FIRST KEY ONLY
>  | null | || SERVER SORTED BY
> [T2.KEYB]   | null | || CLIENT
> MERGE SORT| null |
> || CLIENT AGGREGATE INTO SINGLE ROW
> | null |
> |+--+-+-+*
>
> In the EXPLAIN output, I see "SERVER SORTED BY [T2.KEYB]". Is this sort
> necessary? For both JOIN terms T1 and T2, the value of keyA, the leading
> part of the primary key, is fixed. Furthermore, there is no corresponding
> sort of T1.KEYB.
>
> When I EXPLAIN the same query on a non-salted table, neither T1.KEYB nor
> T2.KEYB is re-sorted. I'm hoping the sort is unnecessary. If so, is there
> an open ticket? I would be happy to file a ticket and to contribute to a
> fix. I would appreciate any guidance.
>
> Thanks,
> Gerald
>
>


Re: SELECT + ORDER BY vs self-join

2017-10-31 Thread Maryann Xue
I suspect this problem is similar to PHOENIX-4288.
On Mon, Oct 30, 2017 at 11:26 PM James Taylor 
wrote:

> Please file a JIRA and include the explain plan for each of the queries. I
> suspect your index is not being used in the first query due to the
> selection of all the columns. You can try hinting the query to force your
> index to be used. See
> https://phoenix.apache.org/secondary_indexing.html#Index_Usage
>
> Thanks,
> James
>
> On Mon, Oct 30, 2017 at 7:02 AM, Marcin Januszkiewicz <
> januszkiewicz.mar...@gmail.com> wrote:
>
>> We have a wide table with 100M records created with the following DDL:
>>
>> CREATE TABLE traces (
>>   rowkey VARCHAR PRIMARY KEY,
>>   time VARCHAR,
>>   number VARCHAR,
>>   +40 more columns)
>>
>> We want to select a large (~30M records) subset of this data with the
>> query:
>>
>> SELECT *all columns*
>>   FROM traces
>>   WHERE (UPPER(number) LIKE 'PO %')
>>   ORDER BY time DESC, ROWKEY
>>   LIMIT 101;
>>
>> This times out after 15 minutes and puts a huge load on our cluster.
>> We have an alternate way of selecting this data:
>>
>> SELECT t.rowkey, *all columns*
>> FROM TRACES t
>> JOIN (
>>   SELECT rowkey
>>   FROM TRACES
>>   WHERE (UPPER(number) LIKE 'PO %')
>>   ORDER BY time DESC, ROWKEY
>>   LIMIT 101
>> ) ix
>> ON t.ROWKEY = ix.ROWKEY
>> order by t.ROWKEY;
>>
>> Which completes in just under a minute.
>> Is there a better way to construct this query?
>> When is using the self-join a worse choice than the simple select?
>> Given that we have a functional index on UPPER(number), could this
>> potentially be a statistics-based optimizer decision?
>>
>> --
>> Pozdrawiam,
>> Marcin Januszkiewicz
>>
>
>


Re: Phoenix query performance

2017-02-22 Thread Maryann Xue
Hi Pradheep,

Thank you for the answers! Please see my response inline.


On Wed, Feb 22, 2017 at 12:39 PM, Pradheep Shanmugam <
pradheep.shanmu...@infor.com> wrote:

> Hi Maryann
>
> Please find my answers inline.
>
> Thanks,
> Pradheep
>
> From: Maryann Xue <maryann@gmail.com>
> Reply-To: "user@phoenix.apache.org" <user@phoenix.apache.org>
> Date: Wednesday, February 22, 2017 at 2:22 PM
> To: "user@phoenix.apache.org" <user@phoenix.apache.org>
> Subject: Re: Phoenix query performance
>
> Hi Pradheep,
>
> Thank you for posting the query and the log file! There are two things
> going on on the server side at the same time here. I think it'd be a good
> idea to isolate the problem first. So a few questions:
> 1. When you say data size went from "< 1M" to 30M, did the data from both
> LHS and RHS grow proportionately?
> *It is basically the same table..the query is like a self join..yes, you
> can say that it is proportional.*
>

Sorry that I didn't read the query well enough. Just went through it again,
but it looks to me that instead of a LEFT OUTER join, you are actually
trying to do an ANTI join, is that correct? i.e.,
SELECT PARENTID
FROM msbo_phoenix_comp_rowkey msbo1
WHERE  AND NOT EXISTS (
SELECT 1 FROM msbo_phoenix_comp_rowkey
WHERE  AND PARENTID = msbo1. PARENTID)
If the query can be rewritten to an ANTI join, the join operation can be
more efficient.


> 2. If yes to (1), what if we only increase the data in LHS, but keep it
> small for RHS? Would the query run significantly faster?
> *When RHS count is 420336, time taken is 37 seconds*
> *When RHS count is 63575, time taken is 32 seconds (not a significant
> difference)*
> 3. What if we only do group by on LHS? Would the query time be linear to
> the data size?
> After Removing group by on RHS
> *When RHS count is 420336, time taken is 34 seconds*
> *When RHS count is 63575, time taken is 32 seconds*
>

Just to confirm, are you saying that you removed GROUP BY and kept the
JOIN, and it's taking a long time? One more question, how long would it
take to further remove the JOIN?


> 4. How was GC when running the query?
> *About 12ms in 1 RS, 10ms in 1 RS, 4-5ms in couple of them and less than
> 1ms in the rest of the region servers when the query is runnning.*
>
> Thanks,
> Maryann
>
> On Wed, Feb 22, 2017 at 9:28 AM, Pradheep Shanmugam <
> pradheep.shanmu...@infor.com> wrote:
>
>> Hi,
>>
>> We have a hbase cluster with 8 region servers with 20G memory
>> We have a table  with 1 column family along with a secondary index.
>> Following query took only few milliseconds when we had less data(< 1
>> million)
>> After adding more data(~30M rows) the performance declined and took about
>> a minute or more(not stable)
>>
>> select msbo1.PARENTID
>>   from msbo_phoenix_comp_rowkey msbo1
>>   left outer join (
>>  select PARENTID,MILESTONETYPEID
>>from msbo_phoenix_comp_rowkey
>>   where PARENTREFERENCETIME between 1479964000 and 1480464000
>> and OWNERORGID = 100
>> and PARENTTYPE = 'SHIPMENT'
>> and MILESTONETYPEID = 19661
>> group by PARENTID,MILESTONETYPEID
>>  ) msbo2
>>   on msbo1.PARENTID = msbo2.PARENTID
>>   where msbo1.PARENTTYPE = 'SHIPMENT'
>>and msbo1.OWNERORGID = 100
>>and msbo2.MILESTONETYPEID is null
>>and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000
>> group by msbo1.PARENTID
>>   order by msbo1.PARENTID
>>
>> The RHS return about a 500K rows ..LHS about 18M rows…final result about
>> 500K rows
>>
>> MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
>>  is the index
>>
>> *Query plan:*
>> CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER
>> MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
>> [0,'SHIPMENT',100]
>> SERVER FILTER BY FIRST KEY ONLY AND 
>> (TO_UNSIGNED_LONG("PARENTREFERENCETIME")
>> >= 1477958400 AND TO_UNSIGNED_LONG("PARENTREFERENCETIME") <= 1480464000)
>> SERVER AGGREGATE INTO DISTINCT ROWS BY ["MSBO1.:PARENTID"]
>> CLIENT MERGE SORT
>> PARALLEL LEFT-JOIN TABLE 0
>> CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER
>> MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
>> [0,'SHIPMENT',100,19661,1,477,958,400] - [0,'SHIPMENT',100,19661,1,480,
>> 464,000]
>> SERVER FILTER BY FIRST KEY ONLY
>> SERVER AGGREGATE INTO DISTINCT ROWS BY ["PARENTID",
>> "MILESTONETYPEID"]
>> CLIENT MERGE SORT
>> AFTER-JOIN SERVER FILTER BY MSBO2.MILESTONETYPEID IS NULL
>>
>> Attached the phoenix log.
>> I see the caching to set as 100..and "maxResultSize”:2097152..is that
>> something that can be tuned will help?
>> Is that the client merge sort consuming more time can be improved? Is
>> there any other tuning possible?
>>
>> Thanks,
>> Pradheep
>>
>
>


Re: Phoenix query performance

2017-02-22 Thread Maryann Xue
Hi Pradheep,

Thank you for posting the query and the log file! There are two things
going on on the server side at the same time here. I think it'd be a good
idea to isolate the problem first. So a few questions:
1. When you say data size went from "< 1M" to 30M, did the data from both
LHS and RHS grow proportionately?
2. If yes to (1), what if we only increase the data in LHS, but keep it
small for RHS? Would the query run significantly faster?
3. What if we only do group by on LHS? Would the query time be linear to
the data size?
4. How was GC when running the query?


Thanks,
Maryann

On Wed, Feb 22, 2017 at 9:28 AM, Pradheep Shanmugam <
pradheep.shanmu...@infor.com> wrote:

> Hi,
>
> We have a hbase cluster with 8 region servers with 20G memory
> We have a table  with 1 column family along with a secondary index.
> Following query took only few milliseconds when we had less data(< 1
> million)
> After adding more data(~30M rows) the performance declined and took about
> a minute or more(not stable)
>
> select msbo1.PARENTID
>   from msbo_phoenix_comp_rowkey msbo1
>   left outer join (
>  select PARENTID,MILESTONETYPEID
>from msbo_phoenix_comp_rowkey
>   where PARENTREFERENCETIME between 1479964000 and 1480464000
> and OWNERORGID = 100
> and PARENTTYPE = 'SHIPMENT'
> and MILESTONETYPEID = 19661
> group by PARENTID,MILESTONETYPEID
>  ) msbo2
>   on msbo1.PARENTID = msbo2.PARENTID
>   where msbo1.PARENTTYPE = 'SHIPMENT'
>and msbo1.OWNERORGID = 100
>and msbo2.MILESTONETYPEID is null
>and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000
> group by msbo1.PARENTID
>   order by msbo1.PARENTID
>
> The RHS return about a 500K rows ..LHS about 18M rows…final result about
> 500K rows
>
> MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
>  is the index
>
> *Query plan:*
> CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER MSBO_PHOENIX_COMP_ROWKEY_
> INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
> [0,'SHIPMENT',100]
> SERVER FILTER BY FIRST KEY ONLY AND 
> (TO_UNSIGNED_LONG("PARENTREFERENCETIME")
> >= 1477958400 AND TO_UNSIGNED_LONG("PARENTREFERENCETIME") <= 1480464000)
> SERVER AGGREGATE INTO DISTINCT ROWS BY ["MSBO1.:PARENTID"]
> CLIENT MERGE SORT
> PARALLEL LEFT-JOIN TABLE 0
> CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER
> MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
> [0,'SHIPMENT',100,19661,1,477,958,400] - [0,'SHIPMENT',100,19661,1,480,
> 464,000]
> SERVER FILTER BY FIRST KEY ONLY
> SERVER AGGREGATE INTO DISTINCT ROWS BY ["PARENTID",
> "MILESTONETYPEID"]
> CLIENT MERGE SORT
> AFTER-JOIN SERVER FILTER BY MSBO2.MILESTONETYPEID IS NULL
>
> Attached the phoenix log.
> I see the caching to set as 100..and "maxResultSize”:2097152..is that
> something that can be tuned will help?
> Is that the client merge sort consuming more time can be improved? Is
> there any other tuning possible?
>
> Thanks,
> Pradheep
>


Re: Hash join confusion

2016-10-06 Thread Maryann Xue
Assigned. Thanks a lot for filing the issue, Sumit!

On Thu, Oct 6, 2016 at 10:19 AM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

> Hi Maryann,
>
> I created https://issues.apache.org/jira/browse/PHOENIX-3354 for this
> issue. I could not assign to you.
>
> Best regards,
> Sumit
>
> ----------
> *From:* Maryann Xue <maryann@gmail.com>
> *To:* "user@phoenix.apache.org" <user@phoenix.apache.org>; Sumit Nigam <
> sumit_o...@yahoo.com>
> *Sent:* Wednesday, October 5, 2016 11:27 AM
> *Subject:* Re: Hash join confusion
>
> Not sure if it's related, coz your DDL does not have DESC columns, but we
> do have a sort-merge-join bug fix in 4.8.0: https://issues.apache.org/
> jira/browse/PHOENIX-2894.
>
> Otherwise could you please just file a JIRA and assign to me? Thanks a lot!
>
>
> Thanks,
> Maryann
>
> On Tue, Oct 4, 2016 at 8:24 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:
>
> Hi Maryann,
>
> *Here are the 2 DDLs (for data and index tables)*:
>
> CREATE TABLE IF NOT EXISTS "ldmns:exDocStoreb" (CURRENT_TIMESTAMP BIGINT
> NOT NULL, ID VARCHAR(96), BINARY_CURR_EXDOC VARBINARY, CURR_CHECKSUM
> VARCHAR(32), BINARY_PREV_EXDOC VARBINARY, PREV_CHECKSUM VARCHAR(32),
> PREV_TIMESTAMP BIGINT, SUMMARY VARCHAR, OBJ_SUMMARY VARCHAR, PARAM_SAMPLES
> VARCHAR, BULK_PUBLISH_UUID  VARCHAR, TOTAL_FACTS INTEGER, CURR_EXDOC
> VARCHAR, PREV_EXDOC VARCHAR *CONSTRAINT PK PRIMARY KEY(CURRENT_TIMESTAMP,
> ID)*) COMPRESSION = 'SNAPPY', BLOCKCACHE =  false, *SALT_BUCKETS = 36*
>
>
> CREATE INDEX IF NOT EXISTS "ldmns:indx_exdocb" ON "ldmns:exDocStoreb"(ID)
> INCLUDE (SUMMARY, OBJ_SUMMARY, PARAM_SAMPLES, BULK_PUBLISH_UUID)
>
>
>
> *Here is the upsert query for this table*:
>
> UPSERT INTO "ldmns:exDocStoreb" (CURRENT_TIMESTAMP, BULK_PUBLISH_UUID, ID,
> CURR_CHECKSUM, CURR_EXDOC, SUMMARY, PREV_EXDOC, PREV_CHECKSUM,
> PREV_TIMESTAMP, OBJ_SUMMARY, PARAM_SAMPLES, TOTAL_FACTS, BINARY_CURR_EXDOC,
> BINARY_PREV_EXDOC) VALUES (TO_NUMBER(CURRENT_TIME()), ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?, ?)
>
>
> *Here is explain plan of a SELECT with merge sort:*
>
> explain   select  */* +USE_SORT_MERGE_JOIN*/ * ID, CURR_EXDOC,
> BINARY_CURR_EXDOC, CURRENT_TIMESTAMP, CURR_CHECKSUM, PREV_EXDOC,
> BINARY_PREV_EXDOC, PREV_CHECKSUM, PREV_TIMESTAMP from "ldmns:exDocStoreb"
>  as a inner join (select max(CURRENT_TIMESTAMP) as mct, ID as tid from
> "ldmns:exDocStoreb" where ID like ' 006389a6b10667f39bdbbdafdc4611
> e03cc04418cbc2619ddc01f54d88d7 c3bf%' group by ID) as tmp on a.ID=tmp.tid
> and a.CURRENT_TIMESTAMP=tmp.mct where id like '
> 006389a6b10667f39bdbbdafdc4611 e03cc04418cbc2619ddc01f54d88d7 c3bf%' ;
> +- -+
> |   PLAN   |
> +- -+
> | SORT-MERGE-JOIN (INNER) TABLES   |
> | CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER ldmns:exDocStoreb
> [0] |
> | SERVER FILTER BY ID LIKE ' 006389a6b10667f39bdbbdafdc4611
> e03cc04418cbc2619ddc01f54d88d7 c3bf%' |
> | SERVER SORTED BY [A.ID <http://a.id/>, A.CURRENT_TIMESTAMP] |
> | CLIENT MERGE SORT|
> | AND (SKIP MERGE) |
> | CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER ldmns:indx_exdocb
> [0,' 006389a6b10667f39bdbbdafdc4611 e03cc04418cbc2619ddc01f54d88d7 c3bf'] -
> [0,'006389 |
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY ["ID", MAX("CURRENT_TIMESTAMP")] |
> +- -+
> 11 rows selected (0.025 seconds)
>
>
> *Here is explain plan with default join:*
>
> explain SELECT   ID, CURR_EXDOC, BINARY_CURR_EXDOC, CURRENT_TIMESTAMP,
> CURR_CHECKSUM, PREV_EXDOC, BINARY_PREV_EXDOC, PREV_CHECKSUM, PREV_TIMESTAMP
> from "ldmns:exDocStoreb" as a inner join (select max(CURRENT_TIMESTAMP) as
> mct, ID as tid from "ldmns:exDocStoreb" where ID like '
> 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf%' group
> by ID) as tmp on a.ID=tmp.tid and a.CURRENT_TIMESTAMP=tmp.mct where ID like
> ' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf%' ;
> +- -+
> |   PLAN   |
> +- -+
> | CLIENT 3-CHUNK PARALLEL 3-WAY RANGE SCAN OVER ldmns:exDocStoreb [0] |
> | SERVER FILTER BY ID LIKE ' 42ecf4abd4bd7e7606025dc8eee3de
> 6a3cc04418cbc2619ddc01f54d88d7 

Re: Hash join confusion

2016-10-04 Thread Maryann Xue
; --
> *From:* Sumit Nigam <sumit_o...@yahoo.com>
> *To:* "user@phoenix.apache.org" <user@phoenix.apache.org>
> *Sent:* Wednesday, October 5, 2016 12:13 AM
>
> *Subject:* Re: Hash join confusion
>
> Thanks Maryann.
>
> I will share the details in a few hours.
>
> Under heavy load scenario, the default hash join failed with time-out (and
> memory issue), so I switched to sort-merge. But sort-merge is missing data
> randomly. So, as of now I am not sure what is the issue with sort-merge
> join.
>
> Hash join does not miss any data but has the issue of not fitting in
> memory (the actual issue with which I started this thread).
>
> Thanks again!
> Sumit
>
> --
> *From:* Maryann Xue <maryann@gmail.com>
> *To:* Sumit Nigam <sumit_o...@yahoo.com>; "user@phoenix.apache.org" <
> user@phoenix.apache.org>
> *Sent:* Tuesday, October 4, 2016 10:04 PM
> *Subject:* Re: Hash join confusion
>
> Hi Sumit,
>
> Thank you for the update! Would you mind sharing the queries and their
> plans, as well as the DDL for both the data tables and the index?
>
> And just to confirm, you are saying hash joins are working, is it with
> changes to the config or without?
>
> Thanks,
> Maryann
> On Tue, Oct 4, 2016 at 9:17 AM Sumit Nigam <sumit_o...@yahoo.com> wrote:
>
> Thank you Maryann.
>
> From the time I have moved to sort-merge join, my use cases have stopped
> working. However, if I remove the hint (and fall back to hash), then they
> all work. I am on phoenix 4.6/ hbase 1.1
> I thought just changing the join algorithm would be enough. I would
> assume that changing the hash join to sort-merge join would not alter the
> query results, right? Do I need to re-write my query?
>
> I am using global index.
>
> Thanks,
> Sumit
>
> --
> *From:* Maryann Xue <maryann@gmail.com>
> *To:* Sumit Nigam <sumit_o...@yahoo.com>
> *Cc:* "user@phoenix.apache.org" <user@phoenix.apache.org>
> *Sent:* Sunday, October 2, 2016 5:30 AM
>
> *Subject:* Re: Hash join confusion
>
> So if either or both sides of a sort-merge-join will have to be sorted
> simply depends on whether this side is already ordered on the join key.
>
> So far we don't have any documentation specifically for explain plan yet,
> but the Phoenix website does have some examples for different types of
> queries or functionalities, including join queries.
>
>
> Thanks,
> Maryann
>
> On Wed, Sep 28, 2016 at 10:52 PM, Sumit Nigam <sumit_o...@yahoo.com>
> wrote:
>
> Thanks Maryann.
>
> Yes let me switch to merge sort join because the other query uses lots
> more columns. Also, if I just change the hint to use merge sort would that
> be enough or I need to sort both the driving query and subquery with same
> order by for merge sort?
>
> As an aside, is there a document to interpret explain plan?
>
> Thanks,
> Sumit
>
> --
> *From:* Maryann Xue <maryann@gmail.com>
> *To:* Sumit Nigam <sumit_o...@yahoo.com>
> *Cc:* "user@phoenix.apache.org" <user@phoenix.apache.org>
> *Sent:* Thursday, September 29, 2016 11:03 AM
> *Subject:* Re: Hash join confusion
>
> Thank you Sumit, for trying this out! So right now it's very clear that
> the table to be cached IS too big so there should be no point of using hash
> join in this case. Is the other table much smaller, or it is about the same
> size or even bigger? If it's considerably smaller you can probably rewrite
> your query to do the join the other way, otherwise let's just stick to
> sort-merge join.
>
>
> Thanks,
> Maryann
>
> On Wed, Sep 28, 2016 at 10:29 PM, Sumit Nigam <sumit_o...@yahoo.com>
> wrote:
>
> Thank you Maryann.
>
> I am not using multi-tenancy for these tables. Increasing phoenix.
> coprocessor.maxServerCacheTime ToLiveMs and the corresponding cache size
> config just delayed the error.
>
> I have also started seeing some memory problem -
>
> Caused by: org.apache.phoenix.memory.*Insu fficientMemoryException*: 
> Requested memory of 22871932 bytes could not be allocated from remaining 
> memory of 776776654 bytes from global pool of 778469376 bytes after waiting 
> for 1ms.
>   at org.apache.phoenix.memory. GlobalMemoryManager. allocateBytes( 
> GlobalMemoryManager.java:78)
>   at org.apache.phoenix.memory. GlobalMemoryManager.access$ 
> 300(GlobalMemoryManager.java: 30)
>   at org.apache.phoenix.memory. GlobalMemoryManager$ 
> GlobalMemoryChunk.resize( GlobalMemoryManager.java:139)
>
>
> What I am having

Re: Hash join confusion

2016-09-28 Thread Maryann Xue
Thank you Sumit, for trying this out! So right now it's very clear that the
table to be cached IS too big so there should be no point of using hash
join in this case. Is the other table much smaller, or it is about the same
size or even bigger? If it's considerably smaller you can probably rewrite
your query to do the join the other way, otherwise let's just stick to
sort-merge join.


Thanks,
Maryann

On Wed, Sep 28, 2016 at 10:29 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

> Thank you Maryann.
>
> I am not using multi-tenancy for these tables. Increasing phoenix.
> coprocessor.maxServerCacheTimeToLiveMs and the corresponding cache size
> config just delayed the error.
>
> I have also started seeing some memory problem -
>
> Caused by: org.apache.phoenix.memory.*InsufficientMemoryException*: Requested 
> memory of 22871932 bytes could not be allocated from remaining memory of 
> 776776654 bytes from global pool of 778469376 bytes after waiting for 1ms.
>   at 
> org.apache.phoenix.memory.GlobalMemoryManager.allocateBytes(GlobalMemoryManager.java:78)
>   at 
> org.apache.phoenix.memory.GlobalMemoryManager.access$300(GlobalMemoryManager.java:30)
>   at 
> org.apache.phoenix.memory.GlobalMemoryManager$GlobalMemoryChunk.resize(GlobalMemoryManager.java:139)
>
>
> What I am having trouble with is, that the total size of csv produced by
> sub-query is only ~7Mb. I have 12 region servers with 5GB heap each. So,
> when this result gets sent across to all region servers to perform the
> server side join, not sure why a memory issue should show up (or a time out
> occur). Any insights?
>
> These tables are salted. Not sure if it is https://issues.apache.org/
> jira/browse/PHOENIX-2900 issue.
>
> Switching to sort merge join helped. But not sure if that is the right
> solution going forward.
>
> Thanks again!
> Sumit
>
>
> --
> *From:* Maryann Xue <maryann@gmail.com>
> *To:* "user@phoenix.apache.org" <user@phoenix.apache.org>; Sumit Nigam <
> sumit_o...@yahoo.com>
> *Sent:* Wednesday, September 28, 2016 11:36 PM
> *Subject:* Re: Hash join confusion
>
> Yes, Sumit, the sub-query will get cached in hash join. Are you using
> multi-tenancy for these tables? If yes, you might want to checkout Phoenix
> 4.7 or 4.8, since a related bug fix got in the 4.7 release.
> https://issues.apache.org/jira/browse/PHOENIX-2381?jql=project%20%
> 3D%20PHOENIX%20AND%20text%20~%20%22hash%20cache%20id%22
>
> Otherwise I think it's the hash cache timeout issue, in which case
> changing phoenix.coprocessor.maxServerCacheTimeToLiveMs might be helpful.
>
>
> Thanks,
> Maryann
>
> On Tue, Sep 27, 2016 at 10:02 PM, Sumit Nigam <sumit_o...@yahoo.com>
> wrote:
>
> Hi,
>
> Is there any document which can help me understand explain plan output in
> detail? Or, which piece of code should I look at, to get an idea?
>
> Here is explain plan for inner join query below. Can anyone help in
> explaining it to me? Like, as per the plan which table is being cached,
> etc.?
> Here, *indx_exdocb* is index table* (on ID) *and *exDocStoreb *is main
> table with rowkey as (current_timestamp, ID).
>
> +- -+
> |   PLAN   |
> +- -+
> | CLIENT 36-CHUNK PARALLEL 36-WAY FULL SCAN OVER *exDocStoreb* |
> | PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) |
> | CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER *indx_exdocb*
> [0,' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf'] -
> [0,' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bg' |
> | SERVER FILTER BY FIRST KEY ONLY |
> | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] |
> | CLIENT MERGE SORT|
> | DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID <http://a.id/>)
> IN ((TMP.MCT, TMP.TID)) |
> +- -+
>
> Also, is there a way to turn ON more verbose explain plan? Like, seeing
> number of bytes, rows that each step results in?
>
> Thanks,
> Sumit
>
> --
> *From:* Sumit Nigam <sumit_o...@yahoo.com>
> *To:* Users Mail List Phoenix <user@phoenix.apache.org>
> *Sent:* Tuesday, September 27, 2016 9:17 PM
> *Subject:* Hash join confusion
>
> Hi,
>
> I am using hbase 1.1 with phoenix 4.6.
>
> I have a table with row key as (current_timestamp, id) which is salted and
> index on (id). This table has ~3 million records.
>
> I have a query like given below.
>
> SELECT  ID, CURRENT_TIMESTAMP,  from TBL
> 

Re: Hash join confusion

2016-09-28 Thread Maryann Xue
Yes, Sumit, the sub-query will get cached in hash join. Are you using
multi-tenancy for these tables? If yes, you might want to checkout Phoenix
4.7 or 4.8, since a related bug fix got in the 4.7 release.
https://issues.apache.org/jira/browse/PHOENIX-2381?jql=project%20%3D%20PHOENIX%20AND%20text%20~%20%22hash%20cache%20id%22

Otherwise I think it's the hash cache timeout issue, in which case changing
phoenix.coprocessor.maxServerCacheTimeToLiveMs might be helpful.


Thanks,
Maryann

On Tue, Sep 27, 2016 at 10:02 PM, Sumit Nigam  wrote:

> Hi,
>
> Is there any document which can help me understand explain plan output in
> detail? Or, which piece of code should I look at, to get an idea?
>
> Here is explain plan for inner join query below. Can anyone help in
> explaining it to me? Like, as per the plan which table is being cached,
> etc.?
> Here, *indx_exdocb* is index table* (on ID) *and *exDocStoreb *is main
> table with rowkey as (current_timestamp, ID).
>
> +--+
> |   PLAN   |
> +--+
> | CLIENT 36-CHUNK PARALLEL 36-WAY FULL SCAN OVER *exDocStoreb* |
> | PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) |
> | CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER *indx_exdocb*
> [0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf'] -
> [0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bg' |
> | SERVER FILTER BY FIRST KEY ONLY |
> | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] |
> | CLIENT MERGE SORT|
> | DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID) IN ((TMP.MCT,
> TMP.TID)) |
> +--+
>
> Also, is there a way to turn ON more verbose explain plan? Like, seeing
> number of bytes, rows that each step results in?
>
> Thanks,
> Sumit
>
> --
> *From:* Sumit Nigam 
> *To:* Users Mail List Phoenix 
> *Sent:* Tuesday, September 27, 2016 9:17 PM
> *Subject:* Hash join confusion
>
> Hi,
>
> I am using hbase 1.1 with phoenix 4.6.
>
> I have a table with row key as (current_timestamp, id) which is salted and
> index on (id). This table has ~3 million records.
>
> I have a query like given below.
>
> SELECT  ID, CURRENT_TIMESTAMP,  from TBL
>as a inner join (
> select max(CURRENT_TIMESTAMP) as mct, ID
> as tid from TBL where ID like '42ecf4abd4bd7e7606025dc8eee3de
> 6a3cc04418cbc2619ddc01f54d88d7c3bf%' group by ID) as tmp
>on a.ID=tmp.tid and a.CURRENT_TIMESTAMP=tmp.mct
>
>
> The query hangs for long and finally fails with a timeout. I have 12
> region servers each with 5GB heap and also the total records satisfying the
> above query is 62K whose CSV dump is ~10MB only.
>
> DoNotRetryIOException: Could not find *hash cache for join Id*: Ӧ�8�D�.
> The cache might have expired and have been removed
>
> and -
>
> Caused by: java.sql.SQLException: Encountered exception in sub plan [0] 
> execution.
>   at 
> org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:175)
>   at 
> com.infa.products.ldm.ingestion.server.java.hadoop.impl.FixPhoenixIngestInputFormat.getQueryPlan(FixPhoenixIngestInputFormat.java:94)
>   ... 22 more
>
>
> andCaused by: java.sql.SQLException:
> java.util.concurrent.*TimeoutException*
>
> at 
> org.apache.phoenix.cache.ServerCacheClient.addServerCache(ServerCacheClient.java:264)
>
>
> I can try playing around with parameters such as 
> phoenix.coprocessor.maxServerCacheTimeToLiveMs and switching to 
> sort_merge_join actually helped.
>
> But my question is as per Joins | Apache Phoenix 
>  in a case such as *lhs* INNER JOIN 
> *rhs, *it is *rhs* which will be built as hash table in server cache. So, in 
> the above query I assume this gets cached?
>
>  select max(CURRENT_TIMESTAMP) as mct, ID as tid from TBL where ID like 
> '42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' group by 
> ID) as tmp
>
> Thanks,
> Sumit
>
>
>
>


Re: Using COUNT() with columns that don't use COUNT() when the table is join fails

2016-09-19 Thread Maryann Xue
Thank you very much for your answer, Michael! Yes, what Cheyenne tried to
use was simply not the right grammar.


Thanks,
Maryann

On Mon, Sep 19, 2016 at 10:47 AM, Michael McAllister <
mmcallis...@homeaway.com> wrote:

> This is really an ANSI SQL question. If you use an aggregate function,
> then you need to specify what columns to group by. Any columns not being
> referenced in the aggregate function(s) need to be in the GROUP BY
> statement.
>
>
>
> Michael McAllister
>
> Staff Data Warehouse Engineer | Decision Systems
>
> mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha
>  | webex: https://h.a/mikewebex
>
> This electronic communication (including any attachment) is confidential.
> If you are not an intended recipient of this communication, please be
> advised that any disclosure, dissemination, distribution, copying or other
> use of this communication or any attachment is strictly prohibited.  If you
> have received this communication in error, please notify the sender
> immediately by reply e-mail and promptly destroy all electronic and printed
> copies of this communication and any attachment.
>
>
>
> *From: *Cheyenne Forbes 
> *Reply-To: *"user@phoenix.apache.org" 
> *Date: *Monday, September 19, 2016 at 10:50 AM
> *To: *"user@phoenix.apache.org" 
> *Subject: *Re: Using COUNT() with columns that don't use COUNT() when the
> table is join fails
>
>
>
> I was wondering because it seems extra wordy
>


Re: high client cpu usage

2016-08-25 Thread Maryann Xue
Hi John,

Would you mind sharing the query plan for this query (by running "EXPLAIN
")?


Thanks,
Maryann

On Thu, Aug 25, 2016 at 11:19 AM, John Leach  wrote:

> Yeah, this query.
>
> QUERY:
> SELECT SUM(L_EXTENDEDPRICE) / 7.0 AS AVG_YEARLY
> FROM
>  TPCH.LINEITEM,
>  TPCH.PART
> WHERE
>  P_PARTKEY = L_PARTKEY
>  AND P_BRAND = 'BRAND#23'
>  AND P_CONTAINER = 'MED BOX'
>  AND L_QUANTITY < (
>SELECT 0.2 * AVG(L_QUANTITY)
>FROM
>  TPCH.LINEITEM
>WHERE
>  L_PARTKEY = P_PARTKEY
>  );
>
> ROW COUNTS:
> TPCH.REGION 5
> TPCH.NATION25
> TPCH.SUPPLIER 100
> TPCH.PART2000
> TPCH.PARTSUPP8000
> TPCH.ORDERS 15000
> TPCH.CUSTOMER1500
> TPCH.LINEITEM   600037902
>
> Does the client have to sort all of LINEITEM and PART?
>
> Thanks James.
>
> Regards,
> John
>
>
>
> On Aug 25, 2016, at 1:16 PM, James Taylor  wrote:
>
> The client typically does a merge sort when combining results from
> paralllel scans. Not sure if this would explain the CPU/memory usage you're
> seeing. Can you narrow it down to a particular query that's causing the
> issue?
>
> Thanks,
> James
>
> On Thu, Aug 25, 2016 at 6:49 AM, John Leach 
> wrote:
>
>> Can anyone explain why the client would be burning so much CPU and memory
>> if the result is a single row?
>>
>> I suspect we configured something wrong on Phoenix but we are having a
>> hard time figuring it out.
>>
>> Thanks in advance.
>>
>> Regards,
>> John
>>
>> > On Aug 24, 2016, at 9:54 AM, Aaron Molitor 
>> wrote:
>> >
>> > Seeing higher than expected CPU/MEM usage for the java process started
>> by the sqlline.py client.
>> >
>> > From top:
>> > top - 14:37:32 up 7 days, 22:15,  2 users,  load average: 25.52, 9.74,
>> 7.89
>> > Tasks: 509 total,   1 running, 508 sleeping,   0 stopped,   0 zombie
>> > Cpu(s): 61.2%us,  6.3%sy,  0.0%ni, 31.0%id,  0.5%wa,  0.0%hi,  1.0%si,
>> 0.0%st
>> > Mem:  65920564k total, 31913580k used, 34006984k free,   647004k buffers
>> > Swap: 33030140k total,0k used, 33030140k free, 10464056k cached
>> >
>> >  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>> > 8729 splice20   0 29.9g  14g 889m S 1616.8 22.4  26:19.21 java
>> >
>> > Why is a the JDBC client using so much memory/cpu?  The expected result
>> is only a single row.
>> >
>> > QUERY:
>> > SELECT SUM(L_EXTENDEDPRICE) / 7.0 AS AVG_YEARLY
>> > FROM
>> >  TPCH.LINEITEM,
>> >  TPCH.PART
>> > WHERE
>> >  P_PARTKEY = L_PARTKEY
>> >  AND P_BRAND = 'BRAND#23'
>> >  AND P_CONTAINER = 'MED BOX'
>> >  AND L_QUANTITY < (
>> >SELECT 0.2 * AVG(L_QUANTITY)
>> >FROM
>> >  TPCH.LINEITEM
>> >WHERE
>> >  L_PARTKEY = P_PARTKEY
>> >  );
>> >
>> > ROW COUNTS:
>> > TPCH.REGION 5
>> > TPCH.NATION25
>> > TPCH.SUPPLIER 100
>> > TPCH.PART2000
>> > TPCH.PARTSUPP8000
>> > TPCH.ORDERS 15000
>> > TPCH.CUSTOMER1500
>> > TPCH.LINEITEM   600037902
>> >
>> > Thanks,
>> > Aaron
>>
>>
>
>


Re: querying time for Apache Phoenix

2016-07-29 Thread Maryann Xue
Hi James,

I have filed a JIRA https://issues.apache.org/jira/browse/PHOENIX-3129 for
using global index for such queries without hint. Feel free to watch and
comment on this issue.


Thanks,
Maryann

On Wed, Jul 27, 2016 at 12:29 PM, James Taylor 
wrote:

> On Wed, Jul 27, 2016 at 8:07 AM, Heather, James (ELS) <
> james.heat...@elsevier.com> wrote:
>
>> - select * from documents where profile_id = ? (multiple rows returned,
>> doing a full scan)  --- 5 sec
>>
>> See https://phoenix.apache.org/secondary_indexing.html#Index_Usage for
>> when an index is used/not used, but by default a global index won't be used
>> if not all columns are contained in the index (i.e. covered index).
>>
>>
>> If the index on profile_id isn't covered, it's much quicker, it seems, to
>> do
>>
>> SELECT * FROM documents WHERE id IN (SELECT id FROM documents WHERE
>> profile_id = ?)
>>
>> where 'id' is the primary key. In other words, do it in two stages:
>> retrieve the primary key using the global index, and then look up the rows
>> using the primary key.
>>
>> Is there any reason that Phoenix doesn't do this by default, to avoid the
>> full scan?
>>
>
> In general, Phoenix doesn't know how many rows will be returned for a
> given profile_id, so rather than performing a broadcast join (and
> potentially failing because too many rows are returned) it takes the
> conservative approach. You can hint the query to force the index to be
> used, though.
>
> We could definitely be smarter about this when there's an index in place,
> as we have a good estimate (with stats) on how rows will be scanned in that
> case. In fact we have an optimization already in place that'll do a kind of
> batched lookup of PKs as the join in being processed (kind of like a nested
> loop join). Definitely worth filing a JIRA.
>
>
>> - select profile_id from documents_test order by added desc limit 1
>> (index on added) --- 5.5 sec
>>
>>
>> If PROFILE_ID is not in the primary key constraint of the table, make
>> sure to include it in your index on the ADDED column.
>>
>>
>> I don't think there's a similar trick with an ORDER BY clause, is there?
>>
>
> Yes, you can hint the query to use the index (see link I sent before).
>
>


Re: Phoenix Upsert with SELECT behaving strange

2016-05-17 Thread Maryann Xue
Hi Radha,

Thanks for reporting this issue! Would you mind trying it with latest
Phoenix version?

Thanks,
Maryann

On Tue, May 17, 2016 at 8:19 AM, Radha krishna  wrote:

> Hi I am performing some join operation in phoenix console and storing the
> result into another table but the same query some time showing below error
> messages and some times it is inserting the result into the table.
>
> Error Messages:
>
> 1)
>
> Error: ERROR 201 (22000): Illegal data. ERROR 201 (22000): Illegal data.
> Expected length of at least 96 bytes, but had 15 (state=22000,code=201)
> java.sql.SQLException: ERROR 201 (22000): Illegal data. ERROR 201 (22000):
> Illegal data. Expected length of at least 96 bytes, but had 15
> at
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:395)
> at
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at
> org.apache.phoenix.util.ServerUtil.parseRemoteException(ServerUtil.java:131)
> at
> org.apache.phoenix.util.ServerUtil.parseServerExceptionOrNull(ServerUtil.java:115)
> at
> org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:104)
> at
> org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:538)
> at
> org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:510)
> at
> org.apache.phoenix.iterate.RoundRobinResultIterator.getIterators(RoundRobinResultIterator.java:176)
> at
> org.apache.phoenix.iterate.RoundRobinResultIterator.next(RoundRobinResultIterator.java:91)
> at
> org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
> at
> org.apache.phoenix.compile.UpsertCompiler$2.execute(UpsertCompiler.java:737)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:305)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:297)
> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
> at
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:295)
> at
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1255)
> at sqlline.Commands.execute(Commands.java:822)
> at sqlline.Commands.sql(Commands.java:732)
> at sqlline.SqlLine.dispatch(SqlLine.java:808)
> at sqlline.SqlLine.begin(SqlLine.java:681)
> at sqlline.SqlLine.start(SqlLine.java:398)
> at sqlline.SqlLine.main(SqlLine.java:292)
>
>
> 2)
> Error: ERROR 201 (22000): Illegal data. ERROR 201 (22000): Illegal data.
> Expected length of at least 48 bytes, but had 47 (state=22000,code=201)
> java.sql.SQLException: ERROR 201 (22000): Illegal data. ERROR 201 (22000):
> Illegal data. Expected length of at least 48 bytes, but had 47
> at
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:395)
> at
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at
> org.apache.phoenix.util.ServerUtil.parseRemoteException(ServerUtil.java:131)
> at
> org.apache.phoenix.util.ServerUtil.parseServerExceptionOrNull(ServerUtil.java:115)
> at
> org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:104)
> at
> org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:538)
> at
> org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:510)
> at
> org.apache.phoenix.iterate.RoundRobinResultIterator.getIterators(RoundRobinResultIterator.java:176)
> at
> org.apache.phoenix.iterate.RoundRobinResultIterator.next(RoundRobinResultIterator.java:91)
> at
> org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
> at
> org.apache.phoenix.compile.UpsertCompiler$2.execute(UpsertCompiler.java:737)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:305)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:297)
> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
> at
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:295)
> at
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1255)
> at sqlline.Commands.execute(Commands.java:822)
> at sqlline.Commands.sql(Commands.java:732)
> at sqlline.SqlLine.dispatch(SqlLine.java:808)
> at sqlline.SqlLine.begin(SqlLine.java:681)
> at sqlline.SqlLine.start(SqlLine.java:398)
> at sqlline.SqlLine.main(SqlLine.java:292)
> 0: jdbc:phoenix:g4t7565.houston.hpecorp.net:2>
>
>
> 3)
> Error: ERROR 201 (22000): Illegal data. ERROR 201 (22000): Illegal data.
> Expected length of at least 48 bytes, but had 47 

Re: JOIN returning incomplete result

2016-05-12 Thread Maryann Xue
Hi Pierre,

Thank you very much for reporting this issue! Can you create a JIRA with
all the information you've attached above along with the table DDL info?
I'll take a look at it.


Thanks,
Maryann

On Thu, May 12, 2016 at 6:18 AM, pierre lacave  wrote:

> Hi
>
> I am seeing weird result with joins where the output seems to be incomplete
> I tried to summarise the problem with the queries bellow.
>
> in query 1, I do a join over a period for which I would have expected to
> return a dozen of rows, but only one is returned for a time T1,
> in query 2, I do the same join but filtering for one of the missing row at
> time T2 which is now returned ?!
>
> I re-ran query 1 to make sure it was not a timing issue, but had the same
> wrong partial result.
>
>
> Surely that is not the expected behaviour, what would be the next step to
> get to the bottom of this?
>
>
> Thanks
>
>
> The two tables are using a salt of 2.
> Using Phoenix 4.7, Hbase 1.1
>
>
>
>
>
>
> Query #1
> 0: jdbc:phoenix:localhost:2181> SELECT C.BUCKET, C.TIMESTAMP FROM (
> . . . . . . . . . . . . . . . > SELECT E.BUCKET as BUCKET, L.BUCKET as
> LBUCKET, E.TIMESTAMP as TIMESTAMP, L.TIMESTAMP as LTIMESTAMP FROM
> . . . . . . . . . . . . . . . >(SELECT BUCKET, TIMESTAMP FROM
> EVENT_COUNT
> . . . . . . . . . . . . . . . > WHERE BUCKET = '5SEC' AND
> LOCATION = 'Tr/Bal'
> . . . . . . . . . . . . . . . > AND TIMESTAMP <=
> 14629935200 AND TIMESTAMP > 14629934200
> . . . . . . . . . . . . . . . > GROUP BY BUCKET, TIMESTAMP,
> LOCATION
> . . . . . . . . . . . . . . . >) E
> . . . . . . . . . . . . . . . >JOIN
> . . . . . . . . . . . . . . . > (SELECT BUCKET, TIMESTAMP FROM
> EVENT_LATENCY
> . . . . . . . . . . . . . . . > WHERE BUCKET = '5SEC' AND
> SRC_LOCATION = 'Tr/Bal' AND SRC_LOCATION = DST_LOCATION
> . . . . . . . . . . . . . . . > AND TIMESTAMP <=
> 14629935200 AND TIMESTAMP > 14629934200
> . . . . . . . . . . . . . . . > GROUP BY BUCKET, TIMESTAMP,
> SRC_LOCATION, DST_LOCATION
> . . . . . . . . . . . . . . . > ) L
> . . . . . . . . . . . . . . . > ON L.BUCKET = E.BUCKET AND L.TIMESTAMP
> = E.TIMESTAMP
> . . . . . . . . . . . . . . . > ) C
> . . . . . . . . . . . . . . . > GROUP BY C.BUCKET, C.TIMESTAMP;
> +---+--+
> | E.BUCKET  | E.TIMESTAMP  |
> +---+--+
> | 5SEC  | 14629934300  |
> +---+--+
> 1 row selected (0.169 seconds)
>
>
>
> Query #2
> 0: jdbc:phoenix:localhost:2181> SELECT C.BUCKET, C.TIMESTAMP FROM (
> . . . . . . . . . . . . . . . > SELECT E.BUCKET as BUCKET, L.BUCKET as
> LBUCKET, E.TIMESTAMP as TIMESTAMP, L.TIMESTAMP as LTIMESTAMP FROM
> . . . . . . . . . . . . . . . >(SELECT BUCKET, TIMESTAMP FROM
> EVENT_COUNT
> . . . . . . . . . . . . . . . > WHERE BUCKET = '5SEC' AND
> LOCATION = 'Tr/Bal'
> . . . . . . . . . . . . . . . > AND TIMESTAMP <=
> 14629935200 AND TIMESTAMP > 14629934200 *AND TIMESTAMP =
> 14629935200*
> . . . . . . . . . . . . . . . > GROUP BY BUCKET, TIMESTAMP,
> LOCATION
> . . . . . . . . . . . . . . . >) E
> . . . . . . . . . . . . . . . >JOIN
> . . . . . . . . . . . . . . . > (SELECT BUCKET, TIMESTAMP FROM
> EVENT_LATENCY
> . . . . . . . . . . . . . . . > WHERE BUCKET = '5SEC' AND
> SRC_LOCATION = 'Tr/Bal' AND SRC_LOCATION = DST_LOCATION
> . . . . . . . . . . . . . . . > AND TIMESTAMP <=
> 14629935200 AND TIMESTAMP > 14629934200 *AND TIMESTAMP =
> 14629935200*
> . . . . . . . . . . . . . . . > GROUP BY BUCKET, TIMESTAMP,
> SRC_LOCATION, DST_LOCATION
> . . . . . . . . . . . . . . . > ) L
> . . . . . . . . . . . . . . . > ON L.BUCKET = E.BUCKET AND L.TIMESTAMP
> = E.TIMESTAMP
> . . . . . . . . . . . . . . . > ) C
> . . . . . . . . . . . . . . . > GROUP BY C.BUCKET, C.TIMESTAMP;
> +---+--+
> | E.BUCKET  | E.TIMESTAMP  |
> +---+--+
> | 5SEC  | *14629935200*  |
> +---+--+
> 1 row selected (0.081 seconds)
>
>
>
>
>
>
>
> For reference the content of each table and the plan of each query bellow
>
> 0: jdbc:phoenix:localhost:2181> SELECT BUCKET, TIMESTAMP as T FROM
> EVENT_LATENCY
> . . . . . . . . . . . . . . . > WHERE BUCKET = '5SEC' AND SRC_LOCATION
> = 'Tr/Bal' AND SRC_LOCATION = DST_LOCATION
> . . . . . . . . . . . . . . . > AND TIMESTAMP <=
> 14629935200 AND TIMESTAMP > 14629934200
> . . . . . . . . . . . . . . . > GROUP BY BUCKET, TIMESTAMP,
> SRC_LOCATION, DST_LOCATION ORDER BY  T DESC;
> +-+--+
> | BUCKET  |  T   |
> +-+--+
> | 5SEC| 

Re: Error "could not find hash cache for joinId" when doing Inner Join with any table or view with Multi_Tenant=true

2015-11-05 Thread Maryann Xue
Thanks a lot, Don! I have assigned the issue to myself and will look into
it shortly.

On Thu, Nov 5, 2015 at 7:10 PM, Don Brinn <don.br...@d2l.com> wrote:

> Hi Maryann,
>
>
>
> I have created this defect report in JIRA:
> https://issues.apache.org/jira/browse/PHOENIX-2381
>
>
>
> Thanks,
>
>
>
> Don Brinn
>
>
>
> *From:* Maryann Xue [mailto:maryann@gmail.com]
> *Sent:* Thursday, November 5, 2015 5:13 PM
> *To:* user@phoenix.apache.org
> *Subject:* Re: Error "could not find hash cache for joinId" when doing
> Inner Join with any table or view with Multi_Tenant=true
>
>
>
> Hi Don,
>
> Thank you very much for finding the issue. Would mind filing a Phoenix
> JIRA?
>
>
> Thanks,
> Maryann
>
>
>
> On Thu, Nov 5, 2015 at 3:08 PM Don Brinn <don.br...@d2l.com> wrote:
>
> Hi,
>
>
>
> I am seeing the following error when doing an INNER JOIN of a view with
> MULTI_TENANT=true with any other table or view:
>
> java.lang.RuntimeException:
> org.apache.phoenix.exception.PhoenixIOException:
> org.apache.phoenix.exception.PhoenixIOException:
> org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash cache
> for joinId: Ys�0��%�. The cache might have expired and have been removed.
>
> at
> org.apache.phoenix.coprocessor.HashJoinRegionScanner.(HashJoinRegionScanner.java:95)
>
> at
> org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:212)
>
> at
> org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:178)
>
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1931)
>
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3178)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
>
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
>
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
>
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
>
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
>
> at sqlline.TableOutputFormat.print(TableOutputFormat.java:33)
>
> at sqlline.SqlLine.print(SqlLine.java:1653)
>
> at sqlline.Commands.execute(Commands.java:833)
>
> at sqlline.Commands.sql(Commands.java:732)
>
> at sqlline.SqlLine.dispatch(SqlLine.java:808)
>
> at sqlline.SqlLine.begin(SqlLine.java:681)
>
> at sqlline.SqlLine.start(SqlLine.java:398)
>
> at sqlline.SqlLine.main(SqlLine.java:292)
>
>
>
> This is with Phoenix version 4.6.0 and HBase version
> 0.98.4.2.2.6.0-2800-hadoop2.
>
>
>
> This seems very strongly related to the MULTI_TENANT=true property on a
> view or table.  I see the error whenever the view has MULTI_TENANT=true and
> I have a tenant-specific connection to Phoenix.  I do not see the problem
> if the MULTI_TENANT=true property is not set on the view or if I do not
> have a tenant-specific connection to Phoenix.
>
>
>
> Here is an example SQL statement that has this error when the view
> INVENTORY has the MULTI_TENANT=true property and I have a tenant-specific
> connection, but that succeeds in other cases. (The view PRODUCT_IDS is
> not Multi-Tenant.)
>
> SELECT * FROM INVENTORY INNER JOIN PRODUCT_IDS ON (PRODUCT_ID =
> INVENTORY.ID)
>
>
>
> Note:  “INNER JOIN” fails under these conditions, as does “LEFT OUTER
> JOIN”.  However, “RIGHT OUTER JOIN” and “FULL OUTER JOIN” do work.  Also,
> if I tell Phoenix to use a Sort Join for the Inner Join or Left Outer Join
> then it does work, e.g.  SELECT /*+ USE_SORT_MERGE_JOIN*/ * FROM
> INVENTORY INNER JOIN PRODUCT_IDS ON (PRODUCT_ID = INVENTORY.ID); works.
>
>
>
> This seems to be the same problem that was discussed previously in this
> mailing list:
> https://mail-archives.apache.org/mod_mbox/phoenix-user/201507.mbox/%3ccaotkwx5xfbwkjf--0k-zj91tfdqwfq6rmuqw0r_lojcnj1a...@mail.gmail.com%3E
>
>
>
> Is this a known issue?  Is there a fix or work-around, or something I
> should be doing differently to avoid the problem?
>
>
>
>
>
> Thanks,
>
>
>
> Don Brinn
>
>
>
>
>
>


Re: When will be the stats based join selector be implemented?

2015-10-05 Thread Maryann Xue
Hi Li,

We are moving towards integrating with Calcite as our stats based
optimization now. You can checkout our calcite

branch and play with it if you are interested. It's still under
development, but you can already see some amazing optimization examples in
our test file CalciteIT.java. You can also go
http://www.slideshare.net/HBaseCon/ecosystem-session-2-49044349 for more
information.


Thanks,
Maryann




On Mon, Oct 5, 2015 at 2:08 PM, Li Gao  wrote:

> Hi all,
>
> I am currently looking into getting optimized joins based on table stats.
> I noticed in the QueryCompile at line 232-234 is still saying "TODO".
>
>
> https://github.com/apache/phoenix/blob/4.x-HBase-1.0/phoenix-core/src/main/java/org/apache/phoenix/compile/QueryCompiler.java
>
> We have a need to get the selector enabled based on the size of the the
> LHS and RHS table.
>
> Thanks,
> Li
>


Re: When will be the stats based join selector be implemented?

2015-10-05 Thread Maryann Xue
Hi Li,

Sorry, I forgot to mention that this calcite branch is now depending on
Apache Calcite's master branch instead of any of its releases. So you need
to checkout Calcite (git://github.com/apache/incubator-calcite.git) first
and run `mvn install` for that project before going back to the Phoenix
project and run mvn commands.

On Mon, Oct 5, 2015 at 6:43 PM, Li Gao <g...@marinsoftware.com> wrote:

> Hi Maryann,
>
> This looks great. Thanks for pointing me to the right branch!  For some
> reason I am getting the following errors when I do mvn package
>
> [WARNING] The POM for
> org.apache.calcite:calcite-avatica:jar:1.5.0-incubating-SNAPSHOT is
> missing, no dependency information available
>
> [WARNING] The POM for
> org.apache.calcite:calcite-core:jar:1.5.0-incubating-SNAPSHOT is missing,
> no dependency information available
>
> [WARNING] The POM for
> org.apache.calcite:calcite-core:jar:tests:1.5.0-incubating-SNAPSHOT is
> missing, no dependency information available
>
> [WARNING] The POM for
> org.apache.calcite:calcite-linq4j:jar:1.5.0-incubating-SNAPSHOT is missing,
> no dependency information available
>
> Where can I find these dependencies?
>
> Thanks,
>
> Li
>
>
>
> On Mon, Oct 5, 2015 at 12:19 PM, Maryann Xue <maryann@gmail.com>
> wrote:
>
>> Hi Li,
>>
>> We are moving towards integrating with Calcite as our stats based
>> optimization now. You can checkout our calcite
>> <https://git1-us-west.apache.org/repos/asf?p=phoenix.git;a=shortlog;h=refs/heads/calcite>
>> branch and play with it if you are interested. It's still under
>> development, but you can already see some amazing optimization examples in
>> our test file CalciteIT.java. You can also go
>> http://www.slideshare.net/HBaseCon/ecosystem-session-2-49044349 for more
>> information.
>>
>>
>> Thanks,
>> Maryann
>>
>>
>>
>>
>> On Mon, Oct 5, 2015 at 2:08 PM, Li Gao <g...@marinsoftware.com> wrote:
>>
>>> Hi all,
>>>
>>> I am currently looking into getting optimized joins based on table
>>> stats. I noticed in the QueryCompile at line 232-234 is still saying "TODO".
>>>
>>>
>>> https://github.com/apache/phoenix/blob/4.x-HBase-1.0/phoenix-core/src/main/java/org/apache/phoenix/compile/QueryCompiler.java
>>>
>>> We have a need to get the selector enabled based on the size of the the
>>> LHS and RHS table.
>>>
>>> Thanks,
>>> Li
>>>
>>
>>
>


Re: failing integration tests in DerivedTableIT

2015-09-14 Thread Maryann Xue
Thank you, James! I have assigned the issue to myself.

On Mon, Sep 14, 2015 at 7:39 AM James Heather 
wrote:

> Reported as
>
> https://issues.apache.org/jira/browse/PHOENIX-2257
>
> On 14/09/15 12:24, James Heather wrote:
> > I also have two failing integration tests in DerivedTableIT:
> >
> > Failed tests:
> >   DerivedTableIT.testDerivedTableWithGroupBy:320 expected:<['e']> but
> > was:<['b', 'c', 'e']>
> >   DerivedTableIT.testDerivedTableWithGroupBy:320 expected:<['e']> but
> > was:<['b', 'c', 'e']>
> >
> > Full output of test attached. You'll see there are a couple of
> > TableInfoMissingExceptions in there, which might be related.
> >
> > It is possible that they're related to the previous unit test issue I
> > reported; I don't know.
> >
> > It's tricky to set up a CDH5-compatible build until I have a vanilla
> > build that passes.
> >
> > James
>
>


Re: failing integration tests in DerivedTableIT

2015-09-14 Thread Maryann Xue
Noted. Thanks, James!

On Mon, Sep 14, 2015 at 1:48 PM, James Heather <james.heat...@mendeley.com>
wrote:

> Thanks!
>
> Note James's comments on the unit tests: probably this is another issue
> that fails on Java 8 but succeeds on Java 7.
>
> That's likely to indicate a fairly subtle bug, or reliance on a Java 7
> implementation detail that isn't contractual...
>
> James
>
>
> On 14/09/15 18:38, Maryann Xue wrote:
>
> Thank you, James! I have assigned the issue to myself.
>
> On Mon, Sep 14, 2015 at 7:39 AM James Heather <james.heat...@mendeley.com>
> wrote:
>
>> Reported as
>>
>> https://issues.apache.org/jira/browse/PHOENIX-2257
>>
>> On 14/09/15 12:24, James Heather wrote:
>> > I also have two failing integration tests in DerivedTableIT:
>> >
>> > Failed tests:
>> >   DerivedTableIT.testDerivedTableWithGroupBy:320 expected:<['e']> but
>> > was:<['b', 'c', 'e']>
>> >   DerivedTableIT.testDerivedTableWithGroupBy:320 expected:<['e']> but
>> > was:<['b', 'c', 'e']>
>> >
>> > Full output of test attached. You'll see there are a couple of
>> > TableInfoMissingExceptions in there, which might be related.
>> >
>> > It is possible that they're related to the previous unit test issue I
>> > reported; I don't know.
>> >
>> > It's tricky to set up a CDH5-compatible build until I have a vanilla
>> > build that passes.
>> >
>> > James
>>
>>
>


Re: Error: Encountered exception in sub plan [0] execution.

2015-09-11 Thread Maryann Xue
Hi Alberto,

Could you please check in your server log if there's an ERROR, probably
something like InsufficientMemoryException?


Thanks,
Maryann

On Fri, Sep 11, 2015 at 7:04 AM, Alberto Gonzalez Mesas  wrote:

> Hi!
>
> I create two tables:
>
> CREATE TABLE "Customers2" ("CustomerID" VARCHAR NOT NULL PRIMARY KEY,
> "C"."CustomerName" VARCHAR,
> "C"."Country" VARCHAR
> )
>
> and
>
> CREATE TABLE "Orders2" ("OrderID" VARCHAR NOT NULL PRIMARY KEY,
> "O"."CustomerID" VARCHAR,
> "O"."Date" VARCHAR,
> "O"."ItemID" VARCHAR,
> "O"."Quantity" VARCHAR
> )
>
> When i have a join, i recived this msg:
>
> "Error: Encountered exception in sub plan [0] execution.
> SQLState:  null
> ErrorCode: 0"
>
> My phoenix version is 4.5.0
>
> Thanks!
>


Re: yet another question...perhaps dumb...JOIN with two conditions

2015-09-11 Thread Maryann Xue
Hi Aaron,

As Jaime pointed out, it is a non-equi join. And unfortunately it is
handled as CROSS join in Phoenix and thus is not very efficient. For each
row from the left side, it will be joined with all of the rows from the
right side before the condition is a applied to filter the joined result.
Try switching the left table and the right table in your query to see if it
will work a little better?


Thanks,
Maryann

On Fri, Sep 11, 2015 at 10:06 AM, M. Aaron Bossert 
wrote:

> Not sure where the problem is, but when I run the suggested query, I get
> the following error...and when I try is with the sort/merge join hint, I
> get yet a different error:
>
> java.lang.RuntimeException:
> org.apache.phoenix.exception.PhoenixIOException:
> org.apache.phoenix.exception.PhoenixIOException:
> org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash cache
> for joinId: C}^U. The cache might have expired and have been removed.
>
> at
> org.apache.phoenix.coprocessor.HashJoinRegionScanner.(HashJoinRegionScanner.java:95)
>
> at
> org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:212)
>
> at
> org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:178)
>
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1845)
>
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3173)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
>
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
>
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
>
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
>
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
>
> at sqlline.TableOutputFormat.print(TableOutputFormat.java:33)
>
> at sqlline.SqlLine.print(SqlLine.java:1653)
>
> at sqlline.Commands.execute(Commands.java:833)
>
> at sqlline.Commands.sql(Commands.java:732)
>
> at sqlline.SqlLine.dispatch(SqlLine.java:808)
>
> at sqlline.SqlLine.begin(SqlLine.java:681)
>
> at sqlline.SqlLine.start(SqlLine.java:398)
>
> at sqlline.SqlLine.main(SqlLine.java:292)
>
>
> and then the following is with the sort/merge join hint:
>
>
> 15/09/11 08:39:56 WARN client.ScannerCallable: Ignore, probably already
> closed
>
> org.apache.hadoop.hbase.UnknownScannerException:
> org.apache.hadoop.hbase.UnknownScannerException: Name: 658, already closed?
>
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3145)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
>
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
>
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
>
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
>
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
> at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown Source)
>
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
>
> at
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:287)
>
> at
> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:303)
>
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:159)
>
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:58)
>
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:115)
>
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:91)
>
> at
> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:481)
>
> at
> org.apache.phoenix.iterate.ScanningResultIterator.close(ScanningResultIterator.java:49)
>
> at
> org.apache.phoenix.iterate.TableResultIterator.close(TableResultIterator.java:95)
>
> at
> org.apache.phoenix.iterate.LookAheadResultIterator$1.close(LookAheadResultIterator.java:42)
>
> at
> org.apache.phoenix.iterate.ConcatResultIterator.close(ConcatResultIterator.java:70)
>
> at
> org.apache.phoenix.iterate.RoundRobinResultIterator$RoundRobinIterator.close(RoundRobinResultIterator.java:298)
>
> at
> org.apache.phoenix.iterate.RoundRobinResultIterator.close(RoundRobinResultIterator.java:134)
>
> at
> 

Could not find hash cache for joinId

2015-07-08 Thread Maryann Xue
Hi Alex,

Could you please try this new patch?


Thanks,
Maryann

On Wed, Jul 8, 2015 at 3:53 PM, Maryann Xue maryann@gmail.com
javascript:_e(%7B%7D,'cvml','maryann@gmail.com'); wrote:

 Thanks again for all this information! Would you mind checking a couple
 more things for me? For test.table1, does it have its regions on all region
 servers in your cluster? And for region servers whose logs have that error
 message, do they have table1's regions and what are the startkeys of those
 regions?


 Thanks,
 Maryann

 On Wed, Jul 8, 2015 at 3:05 PM, Alex Kamil alex.ka...@gmail.com
 javascript:_e(%7B%7D,'cvml','alex.ka...@gmail.com'); wrote:

 Maryann,


 - the patch didn't help when applied to the client (we havent put it on
 the server yet)
 - starting another client instance in a separate jvm and running the
 query there after the query fails on the first client  - returns the same
 error
 - the counts are : table1: 68834 rows, table2: 2138 rows
 - to support multitenancy we currently set MULTI_TENANT=true in the
 CREATE stmt
 - we use tenant-based connection with apache dbcp connection pool using
 this code:

 *BasicDataSource ds = new BasicDataSource();*

 *ds.setDriverClassName(org.apache.phoenix.jdbc.PhoenixDriver);*

 *ds.setUrl(jdbc:phoenix: + url);*

 *ds.setInitialSize(50);*

 *if (tenant != null) ds.setConnectionProperties(TenantId= + tenant);*

 *return ds;*
 - when we don't use tenant based connection there is no error
 - verified that the tenant_id used in tenant connection has access to
 the records (created with the same tenant_id)
 - the problem occurs only on the cluster but works in stand-alone mode

 - are there any settings to be set on server or client side in the code
 or in hbase-site.xml to enable multitenancy?
 - were there any bug fixes related to multitenancy or cache management in
 joins since 3.3.0

 thanks
 Alex

 On Tue, Jul 7, 2015 at 2:22 PM, Maryann Xue maryann@gmail.com
 javascript:_e(%7B%7D,'cvml','maryann@gmail.com'); wrote:

 It could be not the real cache expiration (which should not be
 considered a bug), since your increasing the cache live time didn't solve
 the problem. So the problem might be the cache had not been sent over to
 that server at all, which then would be a bug, and mostly likely it would
 be because the client didn't do it right.

 So starting a new client after the problem happens should be a good test
 of the above theory.

 Anyway, what's the approximate time of running a count(*) on your
 test.table2?


 Thanks,
 Maryann

 On Tue, Jul 7, 2015 at 1:53 PM, Alex Kamil alex.ka...@gmail.com
 javascript:_e(%7B%7D,'cvml','alex.ka...@gmail.com'); wrote:

 Maryann,

 is this patch only for the client? as we saw the error in regionserver
 logs and it seems that server side cache has expired

 also by start a new process doing the same query do you mean start
 two client instances and run the query from one then from the other client?

 thanks
 Alex

 On Tue, Jul 7, 2015 at 1:20 PM, Maryann Xue maryann@gmail.com
 javascript:_e(%7B%7D,'cvml','maryann@gmail.com'); wrote:

 My question was actually if the problem appears on your cluster, will
 it go away if you just start a new process doing the same query? I do have
 a patch, but it only fixes the problem I assume here, and it might be
 something else.


 Thanks,
 Maryann

 On Tue, Jul 7, 2015 at 12:59 PM, Alex Kamil alex.ka...@gmail.com
 javascript:_e(%7B%7D,'cvml','alex.ka...@gmail.com'); wrote:

 a patch would be great, we saw that this problem goes away in
 standalone mode but reappears on the cluster

 On Tue, Jul 7, 2015 at 12:56 PM, Alex Kamil alex.ka...@gmail.com
 javascript:_e(%7B%7D,'cvml','alex.ka...@gmail.com'); wrote:

 sure, sounds good

 On Tue, Jul 7, 2015 at 10:57 AM, Maryann Xue maryann@gmail.com
 javascript:_e(%7B%7D,'cvml','maryann@gmail.com'); wrote:

 Hi Alex,

 I suspect it's related to using cached region locations that might
 have been invalid. A simple way to verify this is try starting a new 
 java
 process doing this query and see if the problem goes away.


 Thanks,
 Maryann

 On Mon, Jul 6, 2015 at 10:56 PM, Maryann Xue maryann@gmail.com
 javascript:_e(%7B%7D,'cvml','maryann@gmail.com'); wrote:

 Thanks a lot for the details, Alex! That might be a bug if it
 failed only on cluster and increasing cache alive time didn't not 
 help.
 Would you mind testing it out for me if I provide a simple patch 
 tomorrow?


 Thanks,
 Maryann

 On Mon, Jul 6, 2015 at 9:09 PM, Alex Kamil alex.ka...@gmail.com
 javascript:_e(%7B%7D,'cvml','alex.ka...@gmail.com'); wrote:

 one more thing - the same query (via tenant connection) works in
 standalone mode but fails on a cluster.
 I've tried modifying
 phoenix.coprocessor.maxServerCacheTimeToLiveMs
 https://phoenix.apache.org/tuning.html from the default
 3(ms) to 30 with no effect

 On Mon, Jul 6, 2015 at 7:35 PM, Alex Kamil alex.ka...@gmail.com
 javascript:_e(%7B%7D,'cvml','alex.ka...@gmail.com'); wrote

Re: Could not find hash cache for joinId

2015-07-07 Thread Maryann Xue
It could be not the real cache expiration (which should not be considered a
bug), since your increasing the cache live time didn't solve the problem.
So the problem might be the cache had not been sent over to that server at
all, which then would be a bug, and mostly likely it would be because the
client didn't do it right.

So starting a new client after the problem happens should be a good test of
the above theory.

Anyway, what's the approximate time of running a count(*) on your
test.table2?


Thanks,
Maryann

On Tue, Jul 7, 2015 at 1:53 PM, Alex Kamil alex.ka...@gmail.com wrote:

 Maryann,

 is this patch only for the client? as we saw the error in regionserver
 logs and it seems that server side cache has expired

 also by start a new process doing the same query do you mean start two
 client instances and run the query from one then from the other client?

 thanks
 Alex

 On Tue, Jul 7, 2015 at 1:20 PM, Maryann Xue maryann@gmail.com wrote:

 My question was actually if the problem appears on your cluster, will it
 go away if you just start a new process doing the same query? I do have a
 patch, but it only fixes the problem I assume here, and it might be
 something else.


 Thanks,
 Maryann

 On Tue, Jul 7, 2015 at 12:59 PM, Alex Kamil alex.ka...@gmail.com wrote:

 a patch would be great, we saw that this problem goes away in standalone
 mode but reappears on the cluster

 On Tue, Jul 7, 2015 at 12:56 PM, Alex Kamil alex.ka...@gmail.com
 wrote:

 sure, sounds good

 On Tue, Jul 7, 2015 at 10:57 AM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Alex,

 I suspect it's related to using cached region locations that might
 have been invalid. A simple way to verify this is try starting a new java
 process doing this query and see if the problem goes away.


 Thanks,
 Maryann

 On Mon, Jul 6, 2015 at 10:56 PM, Maryann Xue maryann@gmail.com
 wrote:

 Thanks a lot for the details, Alex! That might be a bug if it failed
 only on cluster and increasing cache alive time didn't not help. Would 
 you
 mind testing it out for me if I provide a simple patch tomorrow?


 Thanks,
 Maryann

 On Mon, Jul 6, 2015 at 9:09 PM, Alex Kamil alex.ka...@gmail.com
 wrote:

 one more thing - the same query (via tenant connection) works in
 standalone mode but fails on a cluster.
 I've tried modifying phoenix.coprocessor.maxServerCacheTimeToLiveMs
 https://phoenix.apache.org/tuning.html from the default 3(ms)
 to 30 with no effect

 On Mon, Jul 6, 2015 at 7:35 PM, Alex Kamil alex.ka...@gmail.com
 wrote:

 also pls note that it only fails with tenant-specific connections

 On Mon, Jul 6, 2015 at 7:17 PM, Alex Kamil alex.ka...@gmail.com
 wrote:

 Maryann,

 here is the query, I don't see warnings
 SELECT '\''||C.ROWKEY||'\'' AS RK, C.VS FROM  test.table1 AS C
 JOIN (SELECT DISTINCT B.ROWKEY, B.VS FROM test.table2 AS B) B ON
 (C.ROWKEY=B.ROWKEY AND C.VS=B.VS) LIMIT 2147483647;

 thanks
 Alex

 On Fri, Jul 3, 2015 at 10:36 PM, Maryann Xue 
 maryann@gmail.com wrote:

 Hi Alex,

 Most likely what happened was as suggested by the error message:
 the cache might have expired. Could you please check if there are any
 Phoenix warnings in the client log and share your query?



 Thanks,
 Maryann

 On Fri, Jul 3, 2015 at 4:01 PM, Alex Kamil alex.ka...@gmail.com
 wrote:

 getting this error with phoenix 3.3.0/hbase 0.94.15, any ideas?


 org.apache.phoenix.exception.PhoenixIOException: 
 org.apache.phoenix.exception.PhoenixIOException: 
 org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash 
 cache for joinId: ???Z
 ^XI??. The cache might have expired

 and have been removed.

 at 
 org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:96)

 at 
 org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:511)

 at 
 org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48)

 at 
 org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:84)

 at 
 org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:111)

 at 
 org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)

 at 
 org.apache.phoenix.iterate.LimitingResultIterator.next(LimitingResultIterator.java:47)

 at 
 org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)

 at 
 org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739)

 at 
 org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)

 thanks
 Alex














Re: Could not find hash cache for joinId

2015-07-07 Thread Maryann Xue
Hi Alex,

I suspect it's related to using cached region locations that might have
been invalid. A simple way to verify this is try starting a new java
process doing this query and see if the problem goes away.


Thanks,
Maryann

On Mon, Jul 6, 2015 at 10:56 PM, Maryann Xue maryann@gmail.com wrote:

 Thanks a lot for the details, Alex! That might be a bug if it failed only
 on cluster and increasing cache alive time didn't not help. Would you mind
 testing it out for me if I provide a simple patch tomorrow?


 Thanks,
 Maryann

 On Mon, Jul 6, 2015 at 9:09 PM, Alex Kamil alex.ka...@gmail.com wrote:

 one more thing - the same query (via tenant connection) works in
 standalone mode but fails on a cluster.
 I've tried modifying phoenix.coprocessor.maxServerCacheTimeToLiveMs
 https://phoenix.apache.org/tuning.html from the default 3(ms) to
 30 with no effect

 On Mon, Jul 6, 2015 at 7:35 PM, Alex Kamil alex.ka...@gmail.com wrote:

 also pls note that it only fails with tenant-specific connections

 On Mon, Jul 6, 2015 at 7:17 PM, Alex Kamil alex.ka...@gmail.com wrote:

 Maryann,

 here is the query, I don't see warnings
 SELECT '\''||C.ROWKEY||'\'' AS RK, C.VS FROM  test.table1 AS C JOIN
 (SELECT DISTINCT B.ROWKEY, B.VS FROM test.table2 AS B) B ON
 (C.ROWKEY=B.ROWKEY AND C.VS=B.VS) LIMIT 2147483647;

 thanks
 Alex

 On Fri, Jul 3, 2015 at 10:36 PM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Alex,

 Most likely what happened was as suggested by the error message: the
 cache might have expired. Could you please check if there are any Phoenix
 warnings in the client log and share your query?



 Thanks,
 Maryann

 On Fri, Jul 3, 2015 at 4:01 PM, Alex Kamil alex.ka...@gmail.com
 wrote:

 getting this error with phoenix 3.3.0/hbase 0.94.15, any ideas?


 org.apache.phoenix.exception.PhoenixIOException: 
 org.apache.phoenix.exception.PhoenixIOException: 
 org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash cache 
 for joinId: ???Z
 ^XI??. The cache might have expired

 and have been removed.

 at 
 org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:96)

 at 
 org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:511)

 at 
 org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48)

 at 
 org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:84)

 at 
 org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:111)

 at 
 org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)

 at 
 org.apache.phoenix.iterate.LimitingResultIterator.next(LimitingResultIterator.java:47)

 at 
 org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)

 at 
 org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739)

 at 
 org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)

 thanks
 Alex









Re: Could not find hash cache for joinId

2015-07-03 Thread Maryann Xue
Hi Alex,

Most likely what happened was as suggested by the error message: the cache
might have expired. Could you please check if there are any Phoenix
warnings in the client log and share your query?



Thanks,
Maryann

On Fri, Jul 3, 2015 at 4:01 PM, Alex Kamil alex.ka...@gmail.com wrote:

 getting this error with phoenix 3.3.0/hbase 0.94.15, any ideas?


 org.apache.phoenix.exception.PhoenixIOException: 
 org.apache.phoenix.exception.PhoenixIOException: 
 org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash cache for 
 joinId: ???Z
 ^XI??. The cache might have expired

 and have been removed.

 at 
 org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:96)

 at 
 org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:511)

 at 
 org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48)

 at 
 org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:84)

 at 
 org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:111)

 at 
 org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)

 at 
 org.apache.phoenix.iterate.LimitingResultIterator.next(LimitingResultIterator.java:47)

 at 
 org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)

 at 
 org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739)

 at 
 org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)

 thanks
 Alex



Re: Join create OOM with java heap space on phoenix client

2015-06-30 Thread Maryann Xue
Yes, you are right Krunal. The sort merge join performs sorting on both
sides of the join first. Looks like this won't for you until PHOENIX-990 is
resolved.

What is the heap size of your Phoenix client? This is also related to the
exception you got when doing hash join (the very first exception).


Thanks,
Maryann

On Mon, Jun 29, 2015 at 7:08 PM, Krunal Varajiya krunal.varaj...@ask.com
wrote:

  Hi Maryann,

  Here is the error message with “USE_SORT_MERGE_JOIN” hint:

  java.lang.RuntimeException: java.io.IOException: Map failed

 at
 org.apache.phoenix.iterate.MappedByteBufferQueue$MappedByteBufferSegmentQueue$SegmentQueueFileIterator.readNext(MappedByteBufferQueue.java:399)

 at
 org.apache.phoenix.iterate.MappedByteBufferQueue$MappedByteBufferSegmentQueue$SegmentQueueFileIterator.next(MappedByteBufferQueue.java:379)

 at
 org.apache.phoenix.execute.SortMergeJoinPlan$MappedByteBufferTupleQueue$2.next(SortMergeJoinPlan.java:575)

 at
 org.apache.phoenix.execute.SortMergeJoinPlan$MappedByteBufferTupleQueue$2.next(SortMergeJoinPlan.java:557)

 at
 org.apache.phoenix.execute.SortMergeJoinPlan$BasicJoinIterator.next(SortMergeJoinPlan.java:263)

 at
 org.apache.phoenix.iterate.LookAheadResultIterator$1.advance(LookAheadResultIterator.java:47)

 at
 org.apache.phoenix.iterate.LookAheadResultIterator.next(LookAheadResultIterator.java:67)

 at
 org.apache.phoenix.iterate.BaseGroupedAggregatingResultIterator.next(BaseGroupedAggregatingResultIterator.java:79)

 at
 org.apache.phoenix.iterate.LookAheadResultIterator$1.advance(LookAheadResultIterator.java:47)

 at
 org.apache.phoenix.iterate.LookAheadResultIterator.init(LookAheadResultIterator.java:59)

 at
 org.apache.phoenix.iterate.LookAheadResultIterator.next(LookAheadResultIterator.java:65)

 at
 org.apache.phoenix.iterate.BaseGroupedAggregatingResultIterator.next(BaseGroupedAggregatingResultIterator.java:64)

 at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:764)

 at
 org.apache.phoenix.jdbc.PhoenixConnection.executeStatements(PhoenixConnection.java:283)

 at
 org.apache.phoenix.util.PhoenixRuntime.executeStatements(PhoenixRuntime.java:229)

 at org.apache.phoenix.util.PhoenixRuntime.main(PhoenixRuntime.java:175)

 Caused by: java.io.IOException: Map failed

 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:907)

 at
 org.apache.phoenix.iterate.MappedByteBufferQueue$MappedByteBufferSegmentQueue$SegmentQueueFileIterator.readNext(MappedByteBufferQueue.java:397)

 ... 15 more

 Caused by: java.lang.OutOfMemoryError: Map failed

 at sun.nio.ch.FileChannelImpl.map0(Native Method)

 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:904)

 ... 16 more

 Java HotSpot(TM) 64-Bit Server VM warning: Java HotSpot(TM) 64-Bit Server
 VM warning: INFO: os::commit_memory(0x7fef5d6d7000, 12288, 0) failed;
 error='Cannot allocate memory' (errno=12)

 Java HotSpot(TM) 64-Bit Server VM warning: INFO:
 os::commit_memory(0x7fee59edf000, 12288, 0) failed; error='Cannot
 allocate memory' (errno=12)

 INFO: os::commit_memory(0x7fef5f0f1000, 12288, 0) failed;
 error='Cannot allocate memory' (errno=12)[thread 140661688760064 also had
 an error]


  [thread 140666069784320 also had an error]

 #

 # There is insufficient memory for the Java Runtime Environment to
 continue.

 # Native memory allocation (mmap) failed to map 12288 bytes for committing
 reserved memory.

 # An error report file with more information is saved as:

 # /home/varajiyak/phoenix/bin/hs_err_pid34045.log


  Is this related to https://issues.apache.org/jira/browse/PHOENIX-990? Or
 like GC issue?

  Thanks,
 Krunal.


   From: Maryann Xue maryann@gmail.com
 Date: Tuesday, June 16, 2015 at 3:20 PM
 To: Krunal krunal.varaj...@ask.com
 Cc: user@phoenix.apache.org user@phoenix.apache.org
 Subject: Re: Join create OOM with java heap space on phoenix client

   Hi Krunal,

  Can you try with merge join by specifying the hint
 USE_SORT_MERGE_JOIN? and if it still does not work, would you mind
 posting the exact error message of running this merge-join?


  Thanks,
 Maryann

 On Tue, Jun 16, 2015 at 6:12 PM, Krunal Varajiya krunal.varaj...@ask.com
 wrote:

  Does anybody has any idea why below join is throwing OOM error? I will
 really appreciate any help here. We are stuck here is as none of our join
 works even with 5M rows.

   From: Krunal krunal.varaj...@ask.com
 Reply-To: user@phoenix.apache.org user@phoenix.apache.org
 Date: Wednesday, June 10, 2015 at 3:49 PM
 To: Maryann Xue maryann@gmail.com

 Cc: user@phoenix.apache.org user@phoenix.apache.org
 Subject: Re: Join create OOM with java heap space on phoenix client

   Hey Maryann,

  Sorry I just realized that I am using Phoenix 4.3.0.

  I am using sample tables generated using performance.py script from
 Phoenix package! I have generated 5M, 25M, 100M using this script and
 running join in these tables!

  Here is table definition:

   *   TABLE_CAT** | **
 TABLE_SCHEM

Re: Strategy on joining on partial keys

2015-06-30 Thread Maryann Xue
Hi Yiannis,

Could you please post your UPSERT query and the approximate size of both
tables? And does it happen every time you try to do the query?


Thanks,
Maryann

On Mon, Jun 29, 2015 at 6:19 AM, Yiannis Gkoufas johngou...@gmail.com
wrote:

 Hi there,

 I have two tables I want to join.

 TABLE_A: ( (A,B), C, D, E)  where (A,B) is the composite key
 TABLE_B: ( (A), C, D, E) where A is the key

 I basically want to join TABLE_A and TABLE_B on A and update TABLE_A with
 the values C, D, E coming from TABLE_B
 When I try to use UPSERT SELECT JOIN statement, it causes the regionserver
 to die, without any logs.
 TABLE_B is really small compared to TABLE_A

 Any hints on how to approach this?

 Thanks a lot!



Re: count distinct

2015-06-23 Thread Maryann Xue
Which version of Phoenix are you using?

On Tuesday, June 23, 2015, Michael McAllister mmcallis...@homeaway.com
wrote:

  Hi

  (This questions relates to Phoenix 4.2 on HDP 2.2)

  I have a situation where I want to count the distinct combination of a
 couple of columns.

  When I try the following:-

  select count(distinct a.col1, b.col2)
 from table tab1 a
 inner join tab2 b on b.joincol = a.joincol
 where a.col3 = ‘some condition’
 and b.col4 = ‘some other condition';

  I get the following error:-

  Error: ERROR 605 (42P00): Syntax error. Unknown function:
 DISTINCT_COUNT. (state=42P00,code=605)

  Playing around with this it looks like count(distinct) works with a
 single column, but not more.

  So I try this:-

  SELECT count(*)
 FROM
   (SELECT a.col1,
   b.col2
FROM TABLE tab1 a
INNER JOIN tab2 b ON b.joincol = a.joincol
WHERE a.col3 = ‘SOME condition’
  AND b.col4 = ‘SOME other condition'
GROUP BY a.col1,
 b.col2) ;

  I get the following error:-

  Error: Complex nested queries not supported. (state=,code=0)

  So, my question … is there any way to get what I’m looking for?

  Regards,

  Mike





Re: count distinct

2015-06-23 Thread Maryann Xue
Sorry, I missed the first line. Your second query should work with Phoenix
4.3 or later.


I will investigate the problem with the first one and get back to you.


Thanks,
Maryann

On Tuesday, June 23, 2015, Michael McAllister mmcallis...@homeaway.com
wrote:

  Hi

  (This questions relates to Phoenix 4.2 on HDP 2.2)

  I have a situation where I want to count the distinct combination of a
 couple of columns.

  When I try the following:-

  select count(distinct a.col1, b.col2)
 from table tab1 a
 inner join tab2 b on b.joincol = a.joincol
 where a.col3 = ‘some condition’
 and b.col4 = ‘some other condition';

  I get the following error:-

  Error: ERROR 605 (42P00): Syntax error. Unknown function:
 DISTINCT_COUNT. (state=42P00,code=605)

  Playing around with this it looks like count(distinct) works with a
 single column, but not more.

  So I try this:-

  SELECT count(*)
 FROM
   (SELECT a.col1,
   b.col2
FROM TABLE tab1 a
INNER JOIN tab2 b ON b.joincol = a.joincol
WHERE a.col3 = ‘SOME condition’
  AND b.col4 = ‘SOME other condition'
GROUP BY a.col1,
 b.col2) ;

  I get the following error:-

  Error: Complex nested queries not supported. (state=,code=0)

  So, my question … is there any way to get what I’m looking for?

  Regards,

  Mike





Re: StackOverflowError

2015-06-22 Thread Maryann Xue
Hi Bahubali,

Could you please share your query?


Thanks,
Maryann

On Mon, Jun 22, 2015 at 12:51 PM, Bahubali Jain bahub...@gmail.com wrote:

 Hi,
 I am running into below error when I execute a query which has a
 join,group by and order by.
 But when I run the same query with hint /*+ USE_SORT_MERGE_JOIN*/  , it
 runs well.
 Can anybody please shed some light on this.

 Error: Encountered exception in sub plan [0] execution. (state=,code=0)
 java.sql.SQLException: Encountered exception in sub plan [0] execution.
 at
 org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:156)
 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:251)
 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:241)
 at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
 at
 org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:240)
 at
 org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1250)
 at sqlline.Commands.execute(Commands.java:822)
 at sqlline.Commands.sql(Commands.java:732)
 at sqlline.SqlLine.dispatch(SqlLine.java:808)
 at sqlline.SqlLine.begin(SqlLine.java:681)
 at sqlline.SqlLine.start(SqlLine.java:398)
 at sqlline.SqlLine.main(SqlLine.java:292)
 Caused by: java.lang.StackOverflowError
 at
 org.apache.phoenix.iterate.RoundRobinResultIterator$RoundRobinIterator.close(RoundRobinResultIterator.java:298)
 at
 org.apache.phoenix.iterate.RoundRobinResultIterator$RoundRobinIterator.close(RoundRobinResultIterator.java:298)




Re: Limitation with limit?

2015-06-20 Thread Maryann Xue
Hi Bahubali,

Thanks for reporting the issue! Could you please file a JIRA and add some
details? I'll verify and fix it as soon as I can.


Thanks,
Maryann

On Fri, Jun 19, 2015 at 3:53 AM, Bahubali Jain bahub...@gmail.com wrote:

 Hi,
 Is there any issue related to usage of limit ?
 select table1.x1,count(table.x2) from table1
 join
 table2 on table1.x1 = table2.x1
 group by x1 limit 20;

 This query always picks the first 20 rows from joined table for the group
 by computation

 Thanks,
 Baahu
 --
 Twitter:http://twitter.com/Baahu




Re: Join create OOM with java heap space on phoenix client

2015-06-01 Thread Maryann Xue
Hi Krunal,

Thanks for the explanation! I am back to work now. Could you please also
post the table definition? I have some guess here, but need more
information to confirm. And which version of Phoenix are you using?


Thanks,
Maryann

On Wed, May 27, 2015 at 4:55 PM, Krunal Varajiya krunal.varaj...@ask.com
wrote:

   Hey Maryann

  Thanks for your reply! I understand it is difficult to access some stuff
 in China! I can wait till you come back, meantime I will keep you posted on
 if we make any progress on this!

  My comments are below in blue:

  Have a fun time in China!
  - Krunal


   From: Maryann Xue maryann@gmail.com
 Reply-To: user@phoenix.apache.org user@phoenix.apache.org
 Date: Tuesday, May 26, 2015 at 5:45 PM
 To: user@phoenix.apache.org user@phoenix.apache.org
 Subject: Re: Join create OOM with java heap space on phoenix client

  Hi Krunal,

  Sorry for the late reply. I have been on vacation.

  1. Can you make sure that the connection/statement is closed after each
 run of your query (even with exception)?

  *I think this is related to second issue mentioned below as it happens
 for join. I tried some other queries they are working fine!*

  2. You might want to try switching the join tables in your query first
 by putting the larger table as LHS, and if it still does not work, you can
 force a merge join by adding hint USE_SORT_MERGE_JOIN.

   *I have tried changing lhs and rhs, but no luck. I have also tried
 USE_SORT_MERGE_JOIN but for some reason it crashes region server while
 scanning result to local.*
  *I have also tried changing some tuning parameters, but none of the
 settings worked!*
  *One thing I am curious is why it is dumping data in local heap, I see
 millions of instances for org.apache.phoenix.expression.literalexpression.
 Shouldn't it execute joins on server side? *
  *I tried increasing memory upto 64gb and still it fails! **This is just
 one single connection and only one query running at a time, I am not sure
 what will happen with multiple connection and multiple queries!*
 *A**nd if you see in example my data size for both LHS and RHS is not
 that big. These are sample tables come with Phoenix.*

  Thanks,
 Maryann

 On Thursday, May 21, 2015, Krunal Varajiya krunal.varaj...@ask.com
 wrote:

  Hi

  I have 2 issues with phoenix client:

1. Heap memory is not cleanup after each query is finished. So, it
keeps increasing every time when we submit new query.
2. I am try to do a normal join operation on two tables but getting
exception. Below is the details:

  These are some sample queries I tried:

1. select p1.host, count(1) from PERFORMANCE_500 p1,
PERFORMANCE_2500 p2 where p1.host = p2.host group by p1.host;
2. select p1.host from PERFORMANCE_500 p1, PERFORMANCE_2500
p2 where p1.host = p2.host group by p1.host;
3. select count(1) from PERFORMANCE_500 p1, PERFORMANCE_2500
p2 where p1.host = p2.host group by p1.host;

  Here is explain plan:

  explain  select count(1) from PERFORMANCE_500 p1,
 PERFORMANCE_2500 p2 where p1.host = p2.host group by p1.host;

 +--+

 |   PLAN   |

 +--+

 | CLIENT 9-CHUNK PARALLEL 1-WAY FULL SCAN OVER PERFORMANCE_500 |

 | SERVER FILTER BY FIRST KEY ONLY  |

 | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [HOST] |

 | CLIENT MERGE SORT|

 | PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) |

 | CLIENT 18-CHUNK PARALLEL 1-WAY FULL SCAN OVER
 PERFORMANCE_2500 |

 | SERVER FILTER BY FIRST KEY ONLY |

 | DYNAMIC SERVER FILTER BY HOST IN (P2.HOST) |

 +--+

 8 rows selected (0.127 seconds)

   Phoenix client heap size is 16GB. ( noticed that above queries are
 dumping data in local heap, I see millions of instances for
 org.apache.phoenix.expression.literalexpression)

 and my exceptions are:

 java.sql.SQLException: Encountered exception in sub plan [0] execution.

 at org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:156)

 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:235)

 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:226)

 at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)

 at
 org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:225)

 at
 org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1066)

 at sqlline.Commands.execute(Commands.java:822)

 at sqlline.Commands.sql(Commands.java:732)

 at sqlline.SqlLine.dispatch(SqlLine.java:808)

 at sqlline.SqlLine.begin(SqlLine.java:681)

 at sqlline.SqlLine.start(SqlLine.java:398)

 at sqlline.SqlLine.main(SqlLine.java:292)

 Caused by: java.sql.SQLException:
 java.util.concurrent.ExecutionException: java.lang.Exception:
 java.lang.OutOfMemoryError: Java heap space

Re: Join create OOM with java heap space on phoenix client

2015-05-26 Thread Maryann Xue
Hi Krunal,

Sorry for the late reply. I have been on vacation.

1. Can you make sure that the connection/statement is closed after each run
of your query (even with exception)?

2. You might want to try switching the join tables in your query first by
putting the larger table as LHS, and if it still does not work, you can
force a merge join by adding hint USE_SORT_MERGE_JOIN.


Thanks,
Maryann

On Thursday, May 21, 2015, Krunal Varajiya krunal.varaj...@ask.com wrote:

  Hi

  I have 2 issues with phoenix client:

1. Heap memory is not cleanup after each query is finished. So, it
keeps increasing every time when we submit new query.
2. I am try to do a normal join operation on two tables but getting
exception. Below is the details:

  These are some sample queries I tried:

1. select p1.host, count(1) from PERFORMANCE_500 p1,
PERFORMANCE_2500 p2 where p1.host = p2.host group by p1.host;
2. select p1.host from PERFORMANCE_500 p1, PERFORMANCE_2500 p2
where p1.host = p2.host group by p1.host;
3. select count(1) from PERFORMANCE_500 p1, PERFORMANCE_2500
p2 where p1.host = p2.host group by p1.host;

  Here is explain plan:

  explain  select count(1) from PERFORMANCE_500 p1,
 PERFORMANCE_2500 p2 where p1.host = p2.host group by p1.host;

 +--+

 |   PLAN   |

 +--+

 | CLIENT 9-CHUNK PARALLEL 1-WAY FULL SCAN OVER PERFORMANCE_500 |

 | SERVER FILTER BY FIRST KEY ONLY  |

 | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [HOST] |

 | CLIENT MERGE SORT|

 | PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) |

 | CLIENT 18-CHUNK PARALLEL 1-WAY FULL SCAN OVER
 PERFORMANCE_2500 |

 | SERVER FILTER BY FIRST KEY ONLY |

 | DYNAMIC SERVER FILTER BY HOST IN (P2.HOST) |

 +--+

 8 rows selected (0.127 seconds)

   Phoenix client heap size is 16GB. ( noticed that above queries are
 dumping data in local heap, I see millions of instances for
 org.apache.phoenix.expression.literalexpression)

 and my exceptions are:

 java.sql.SQLException: Encountered exception in sub plan [0] execution.

 at org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:156)

 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:235)

 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:226)

 at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)

 at
 org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:225)

 at
 org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1066)

 at sqlline.Commands.execute(Commands.java:822)

 at sqlline.Commands.sql(Commands.java:732)

 at sqlline.SqlLine.dispatch(SqlLine.java:808)

 at sqlline.SqlLine.begin(SqlLine.java:681)

 at sqlline.SqlLine.start(SqlLine.java:398)

 at sqlline.SqlLine.main(SqlLine.java:292)

 Caused by: java.sql.SQLException: java.util.concurrent.ExecutionException:
 java.lang.Exception: java.lang.OutOfMemoryError: Java heap space

 at
 org.apache.phoenix.cache.ServerCacheClient.addServerCache(ServerCacheClient.java:247)

 at
 org.apache.phoenix.join.HashCacheClient.addHashCache(HashCacheClient.java:83)

 at
 org.apache.phoenix.execute.HashJoinPlan$HashSubPlan.execute(HashJoinPlan.java:338)

 at org.apache.phoenix.execute.HashJoinPlan$1.call(HashJoinPlan.java:135)

 at java.util.concurrent.FutureTask.run(FutureTask.java:266)

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

 at java.lang.Thread.run(Thread.java:745)

 Caused by: java.util.concurrent.ExecutionException: java.lang.Exception:
 java.lang.OutOfMemoryError: Java heap space

 at java.util.concurrent.FutureTask.report(FutureTask.java:122)

 at java.util.concurrent.FutureTask.get(FutureTask.java:206)

 at
 org.apache.phoenix.cache.ServerCacheClient.addServerCache(ServerCacheClient.java:239)

 ... 7 more

 Caused by: java.lang.Exception: java.lang.OutOfMemoryError: Java heap space

 at
 org.apache.phoenix.cache.ServerCacheClient$1.call(ServerCacheClient.java:212)

 at
 org.apache.phoenix.cache.ServerCacheClient$1.call(ServerCacheClient.java:182)

 ... 4 more

 Caused by: java.lang.OutOfMemoryError: Java heap space

 May 20, 2015 4:58:01 PM ServerCommunicatorAdmin reqIncoming

 WARNING: The server has decided to close this client connection.

 15/05/20 16:56:43 WARN client.HTable: Error calling coprocessor service
 org.apache.phoenix.coprocessor.generated.ServerCachingProtos$ServerCachingService
 for row CSGoogle\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00

 java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java
 heap space

 at java.util.concurrent.FutureTask.report(FutureTask.java:122)

 at 

Re: Error when using aggregates with correlated subqueries

2015-04-26 Thread Maryann Xue
Glad that your queries worked. Please do let us know if any further
problems.

The bug is with EXISTS. Right now Phoenix does not handle aggregate
function calls correctly within an EXISTS subquery or EXISTS nested
subquery. I have opened a JIRA:
https://issues.apache.org/jira/browse/PHOENIX-1923.


Thanks,
Maryann

On Sun, Apr 26, 2015 at 8:54 AM, khaleel mershad khellom...@gmail.com
wrote:


 Dear Maryann,

 I tried your query and it worked. I also executed a more complex query
 which I need in my testing, which is:

 select distinct W.rowId, W.md5_current, W.size_current from Wiki
 AS W
 where EXISTS (select * from History AS H
 where (REGEXP_SUBSTR(SUBSTR(H.rowId,19),'[^:]+')
 = W.rowId)
 AND H.timestamp = (select MAX(H2.timestamp)
 from History AS H2
   where
 (REGEXP_SUBSTR(SUBSTR(H2.rowId,19),'[^:]+') =

 REGEXP_SUBSTR(SUBSTR(H.rowId,19),'[^:]+'))
   AND (H2.status
 = 'approved')) AND (TO_NUMBER(H.value)  1000))
 AND NOT EXISTS (select * from History AS H3
   where
 (REGEXP_SUBSTR(SUBSTR(H3.rowId,19),'[^:]+') = W.rowId)
 AND (H3.status = 'pending') AND
 (TO_NUMBER(H3.value)  1000));

 and it also worked after I understood your fix of the original query.

 So the trick here is that we can use the reference to the outer query
 within the next subquery level only, and not up to two levels as I was
 doing? Maybe this limitation exists because Phoenix joins the tables from
 the outer and the inner correlated query, but it can perform this join up
 to one level only?



 Best,
 Khaleel


 On Sat, Apr 25, 2015 at 8:11 PM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Khaleel,

 Mind if you try the following query? I think it's the same semantics as
 you meant for your query.

 select distinct W.rowId, W.md5_current, W.size_current from Wiki
 AS W
 where
 W.rowId in (select HrowId from History AS H
  where H.timestamp =
 (select MAX(H2.timestamp) from
 History AS H2 where H2.rowId = H.rowId)
  AND
  H.status = 'approved')



 Thanks,
 Maryann

 On Fri, Apr 24, 2015 at 5:25 PM, khaleel mershad khellom...@gmail.com
 wrote:

 Dear Maryann,

 Thanks for your question. You are right: the query that I was writing
 wasn't the correct one for my purpose. The query that will satisfy my
 request would be:

 select distinct W.rowId, W.md5_current, W.size_current from Wiki
 AS W
 where
 EXISTS (select * from History AS H where (H.rowId = W.rowId) AND
 H.timestamp = (select MAX(H2.timestamp) from History AS H2
 where (H2.rowId = W.rowId)) AND
  (H.status = 'approved') )

 In this query I specify that I need to select the exact version which
 has the Maximum timestamp among all versions of the same data item by using
 MAX(H2.timestamp) within the inner subquery within EXISTS.

 However I tried such query and it still produces the same error as the
 old query, which is: (Aggregate may not contain columns not in GROUP BY.
 )

 Thank you for your help.


 Best,
 Khaleel


 On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Khaleel,

 Thanks for the explanation! But my question was since this is an
 EXISTS, I assume if there ever is a H.timestamp with an 'approved'
 status, the EXISTS will return true regardless of whether you are testing
 H.timestamp or Max(H.timestamp). Is that correct? or have I missed
 something?


 Thanks,
 Maryann

 On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad khellom...@gmail.com
 wrote:


 Hello Maryann,

 Thanks very much for your reply. Hopefully this bug gets fixed in the
 next release so that I can continue working with this part in my research
 project. Thanks for keeping me posted.

 With respect to your question, I am using the History table as a
 data store of all versions of a certain data item. When I say Max(
 H.timestamp) inside the query, I am selecting the latest version
 (most recent) that is approved (which is checked using the condition 
 H.status
 = 'approved')



 Best Regards,
 Khaleel

 On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Khaleel,

 Thanks a lot for reporting the problem, which looks like a bug. I
 will file a JIRA and keep you posted.

 One question though, why would we use MAX(H.timestamp) instead of 
 H.timestamp?
 What difference would it make?


 Thanks,
 Maryann

 On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad 
 khellom...@gmail.com wrote:


 Hello,

 I tried executing the following query in Phoenix:
 select distinct W.rowId, W.md5_current, W.size_current from
 Wiki AS W where EXISTS( select MAX(H.timestamp) from History AS H
 where (H.rowId = W.rowId) AND (H.status = 'approved') );

 I got the following error:
 ERROR 1018 (42Y27): Aggregate may not contain columns not in GROUP
 BY. (state=42Y27,code

Re: Error when using aggregates with correlated subqueries

2015-04-26 Thread Maryann Xue
Hi Khaleel,

Thanks for looking into the problem! But there IS a bug with EXISTS having
an aggregate function, say max(H.timestamp) as used in your very first
query that did not work. Otherwise, if subquery is SELECT * or SELECT
column1 or SELECT substr(column1), it's fine coz there is no aggregate.

And interestingly the query with W.rowId failed because it referenced to
the outmost query that had EXISTS. Otherwise if using H.
rowId, it only had to deal with SELECT * from H which was a comparison
subquery and so it worked fine.

Anyway, I will verify if there is any problem with multiple level outer
reference in correlated subqueries, as an independent issue without EXISTS.

Given that our EXISTS support is currently incomplete, you may want to
rewrite your EXISTS subqueries with equivalent IN subqueries, which would
most likely just work.


Thanks,
Maryann


On Sun, Apr 26, 2015 at 11:54 AM, khaleel mershad khellom...@gmail.com
wrote:


 I don't think that the bug is with EXISTS. As you can see with the query
 from my last email, it contains a nested correlated subquery which contains
 an aggregate within *EXISTS* and it worked after replacing the reference
 to the outer table (W.rowId) with a reference to the inner subquery table
 (H.rowId) within the aggregate subquery. In other words, the following
 query generates an error:

 select distinct W.rowId, W.md5_current, W.size_current from Wiki
 AS W
 where EXISTS (select * from History AS H
 where (REGEXP_SUBSTR(SUBSTR(H.rowId,19),'[^:]+')
 = W.rowId)
 AND H.timestamp = (select MAX(H2.timestamp)
 from History AS H2
   where 
 (*REGEXP_SUBSTR(SUBSTR(H2.rowId,19),'[^:]+')
 =*
 *
 W.rowId*)
   AND (H2.status
 = 'approved')) AND (TO_NUMBER(H.value)  1000))
 AND NOT EXISTS (select * from History AS H3
   where
 (REGEXP_SUBSTR(SUBSTR(H3.rowId,19),'[^:]+') = W.rowId)
 AND (H3.status = 'pending') AND
 (TO_NUMBER(H3.value)  1000));


 But if I replace *W.rowId* with 
 REGEXP_SUBSTR(SUBSTR(*H.rowId*,19),'[^:]+'),
 the query works fine (which is the query from my last email). So I think
 the problem is not with EXISTS, but with the fact that a reference to a
 table from the outer query (W in my query) can be done up to a maximum one
 nested level, and not more than that. In other words, referring to a table
 from the outer query from within a subquery that is in the second or more
 nested level will generate an error.



 Best,
 Khaleel



 On Sun, Apr 26, 2015 at 6:33 PM, Maryann Xue maryann@gmail.com
 wrote:

 Glad that your queries worked. Please do let us know if any further
 problems.

 The bug is with EXISTS. Right now Phoenix does not handle aggregate
 function calls correctly within an EXISTS subquery or EXISTS nested
 subquery. I have opened a JIRA:
 https://issues.apache.org/jira/browse/PHOENIX-1923.


 Thanks,
 Maryann


 On Sun, Apr 26, 2015 at 8:54 AM, khaleel mershad khellom...@gmail.com
 wrote:


 Dear Maryann,

 I tried your query and it worked. I also executed a more complex query
 which I need in my testing, which is:

 select distinct W.rowId, W.md5_current, W.size_current from Wiki
 AS W
 where EXISTS (select * from History AS H
 where
 (REGEXP_SUBSTR(SUBSTR(H.rowId,19),'[^:]+') = W.rowId)
 AND H.timestamp = (select MAX(H2.timestamp)
 from History AS H2
   where
 (REGEXP_SUBSTR(SUBSTR(H2.rowId,19),'[^:]+') =

 REGEXP_SUBSTR(SUBSTR(H.rowId,19),'[^:]+'))
   AND
 (H2.status = 'approved')) AND (TO_NUMBER(H.value)  1000))
 AND NOT EXISTS (select * from History AS H3
   where
 (REGEXP_SUBSTR(SUBSTR(H3.rowId,19),'[^:]+') = W.rowId)
 AND (H3.status = 'pending') AND
 (TO_NUMBER(H3.value)  1000));

 and it also worked after I understood your fix of the original query.

 So the trick here is that we can use the reference to the outer query
 within the next subquery level only, and not up to two levels as I was
 doing? Maybe this limitation exists because Phoenix joins the tables from
 the outer and the inner correlated query, but it can perform this join up
 to one level only?



 Best,
 Khaleel


 On Sat, Apr 25, 2015 at 8:11 PM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Khaleel,

 Mind if you try the following query? I think it's the same semantics as
 you meant for your query.

 select distinct W.rowId, W.md5_current, W.size_current from
 Wiki AS W
 where
 W.rowId in (select HrowId from History AS H
  where H.timestamp =
 (select MAX(H2.timestamp) from
 History AS H2 where H2.rowId = H.rowId)
  AND
  H.status = 'approved')



 Thanks

Re: Error when using aggregates with correlated subqueries

2015-04-26 Thread Maryann Xue
Thank you in advance! Just let me know.


Thanks,
Maryann

On Sunday, April 26, 2015, khaleel mershad khellom...@gmail.com wrote:


 Thanks for the explanation. I see what you mean. I will also try testing a
 query that contains nested correlated subqueries with multiple level
 outer reference, and with using IN instead of EXISTS and I will see if it
 produces an error.



 Best,
 Khaleel

 On Sun, Apr 26, 2015 at 7:14 PM, Maryann Xue maryann@gmail.com
 javascript:_e(%7B%7D,'cvml','maryann@gmail.com'); wrote:

 Hi Khaleel,

 Thanks for looking into the problem! But there IS a bug with EXISTS
 having an aggregate function, say max(H.timestamp) as used in your very
 first query that did not work. Otherwise, if subquery is SELECT * or SELECT
 column1 or SELECT substr(column1), it's fine coz there is no aggregate.

 And interestingly the query with W.rowId failed because it referenced
 to the outmost query that had EXISTS. Otherwise if using H.
 rowId, it only had to deal with SELECT * from H which was a comparison
 subquery and so it worked fine.

 Anyway, I will verify if there is any problem with multiple level outer
 reference in correlated subqueries, as an independent issue without EXISTS.

 Given that our EXISTS support is currently incomplete, you may want to
 rewrite your EXISTS subqueries with equivalent IN subqueries, which would
 most likely just work.


 Thanks,
 Maryann


 On Sun, Apr 26, 2015 at 11:54 AM, khaleel mershad khellom...@gmail.com
 javascript:_e(%7B%7D,'cvml','khellom...@gmail.com'); wrote:


 I don't think that the bug is with EXISTS. As you can see with the query
 from my last email, it contains a nested correlated subquery which contains
 an aggregate within *EXISTS* and it worked after replacing the
 reference to the outer table (W.rowId) with a reference to the inner
 subquery table (H.rowId) within the aggregate subquery. In other words,
 the following query generates an error:

 select distinct W.rowId, W.md5_current, W.size_current from Wiki
 AS W
 where EXISTS (select * from History AS H
 where
 (REGEXP_SUBSTR(SUBSTR(H.rowId,19),'[^:]+') = W.rowId)
 AND H.timestamp = (select MAX(H2.timestamp)
 from History AS H2
   where 
 (*REGEXP_SUBSTR(SUBSTR(H2.rowId,19),'[^:]+')
 =*
 *
 W.rowId*)
   AND
 (H2.status = 'approved')) AND (TO_NUMBER(H.value)  1000))
 AND NOT EXISTS (select * from History AS H3
   where
 (REGEXP_SUBSTR(SUBSTR(H3.rowId,19),'[^:]+') = W.rowId)
 AND (H3.status = 'pending') AND
 (TO_NUMBER(H3.value)  1000));


 But if I replace *W.rowId* with 
 REGEXP_SUBSTR(SUBSTR(*H.rowId*,19),'[^:]+'),
 the query works fine (which is the query from my last email). So I think
 the problem is not with EXISTS, but with the fact that a reference to a
 table from the outer query (W in my query) can be done up to a maximum one
 nested level, and not more than that. In other words, referring to a table
 from the outer query from within a subquery that is in the second or more
 nested level will generate an error.



 Best,
 Khaleel



 On Sun, Apr 26, 2015 at 6:33 PM, Maryann Xue maryann@gmail.com
 javascript:_e(%7B%7D,'cvml','maryann@gmail.com'); wrote:

 Glad that your queries worked. Please do let us know if any further
 problems.

 The bug is with EXISTS. Right now Phoenix does not handle aggregate
 function calls correctly within an EXISTS subquery or EXISTS nested
 subquery. I have opened a JIRA:
 https://issues.apache.org/jira/browse/PHOENIX-1923.


 Thanks,
 Maryann


 On Sun, Apr 26, 2015 at 8:54 AM, khaleel mershad khellom...@gmail.com
 javascript:_e(%7B%7D,'cvml','khellom...@gmail.com'); wrote:


 Dear Maryann,

 I tried your query and it worked. I also executed a more complex query
 which I need in my testing, which is:

 select distinct W.rowId, W.md5_current, W.size_current from
 Wiki AS W
 where EXISTS (select * from History AS H
 where
 (REGEXP_SUBSTR(SUBSTR(H.rowId,19),'[^:]+') = W.rowId)
 AND H.timestamp = (select
 MAX(H2.timestamp) from History AS H2
   where
 (REGEXP_SUBSTR(SUBSTR(H2.rowId,19),'[^:]+') =

 REGEXP_SUBSTR(SUBSTR(H.rowId,19),'[^:]+'))
   AND
 (H2.status = 'approved')) AND (TO_NUMBER(H.value)  1000))
 AND NOT EXISTS (select * from History AS H3
   where
 (REGEXP_SUBSTR(SUBSTR(H3.rowId,19),'[^:]+') = W.rowId)
 AND (H3.status = 'pending') AND
 (TO_NUMBER(H3.value)  1000));

 and it also worked after I understood your fix of the original query.

 So the trick here is that we can use the reference to the outer query
 within the next subquery level only, and not up to two levels as I was
 doing? Maybe

Re: Error when using aggregates with correlated subqueries

2015-04-25 Thread Maryann Xue
Hi Khaleel,

Mind if you try the following query? I think it's the same semantics as you
meant for your query.

select distinct W.rowId, W.md5_current, W.size_current from Wiki AS
W
where
W.rowId in (select HrowId from History AS H
 where H.timestamp =
(select MAX(H2.timestamp) from History
AS H2 where H2.rowId = H.rowId)
 AND
 H.status = 'approved')



Thanks,
Maryann

On Fri, Apr 24, 2015 at 5:25 PM, khaleel mershad khellom...@gmail.com
wrote:

 Dear Maryann,

 Thanks for your question. You are right: the query that I was writing
 wasn't the correct one for my purpose. The query that will satisfy my
 request would be:

 select distinct W.rowId, W.md5_current, W.size_current from Wiki
 AS W
 where
 EXISTS (select * from History AS H where (H.rowId = W.rowId) AND
 H.timestamp = (select MAX(H2.timestamp) from History AS H2
 where (H2.rowId = W.rowId)) AND
  (H.status = 'approved') )

 In this query I specify that I need to select the exact version which has
 the Maximum timestamp among all versions of the same data item by using
 MAX(H2.timestamp) within the inner subquery within EXISTS.

 However I tried such query and it still produces the same error as the old
 query, which is: (Aggregate may not contain columns not in GROUP BY.)

 Thank you for your help.


 Best,
 Khaleel


 On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Khaleel,

 Thanks for the explanation! But my question was since this is an EXISTS,
 I assume if there ever is a H.timestamp with an 'approved' status, the
 EXISTS will return true regardless of whether you are testing H.timestamp
 or Max(H.timestamp). Is that correct? or have I missed something?


 Thanks,
 Maryann

 On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad khellom...@gmail.com
 wrote:


 Hello Maryann,

 Thanks very much for your reply. Hopefully this bug gets fixed in the
 next release so that I can continue working with this part in my research
 project. Thanks for keeping me posted.

 With respect to your question, I am using the History table as a data
 store of all versions of a certain data item. When I say Max(
 H.timestamp) inside the query, I am selecting the latest version
 (most recent) that is approved (which is checked using the condition 
 H.status
 = 'approved')



 Best Regards,
 Khaleel

 On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Khaleel,

 Thanks a lot for reporting the problem, which looks like a bug. I will
 file a JIRA and keep you posted.

 One question though, why would we use MAX(H.timestamp) instead of 
 H.timestamp?
 What difference would it make?


 Thanks,
 Maryann

 On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad khellom...@gmail.com
 wrote:


 Hello,

 I tried executing the following query in Phoenix:
 select distinct W.rowId, W.md5_current, W.size_current from
 Wiki AS W where EXISTS( select MAX(H.timestamp) from History AS H
 where (H.rowId = W.rowId) AND (H.status = 'approved') );

 I got the following error:
 ERROR 1018 (42Y27): Aggregate may not contain columns not in GROUP BY.
 (state=42Y27,code=1018)

 If I replace W.rowId (in the where clause) with a constant value
 (for example: '3587'), the query works fine. Also, if I replace the
 aggregate MAX(H.timestamp) with H.timestamp, the query also works 
 fine.

 So it seems that Phoenix generates error when using a reference to an
 outer query while using an aggregate within the inner query.

 Any solutions?



 Regards,

 Khaleel Mershad, Ph.D.
 Research Associate

 American University of Beirut
 Department of Electrical and Computer Engineering
 Bliss Street, Beirut, Lebanon
 email: kw...@aub.edu.lb








Re: Non-equi joins

2015-03-25 Thread Maryann Xue
Yes, 4.3 only.

On Wed, Mar 25, 2015 at 1:25 PM, Jaime Solano jdjsol...@gmail.com wrote:

 Thanks for your response, Maryann!

 Again, this suggestion is for 4.3 only, right?
 On Mar 25, 2015 12:34 PM, Maryann Xue maryann@gmail.com wrote:

 Actually we do in 4.3, but apparently not in an efficient way. If no equi
 conditions are specified, Phoenix simply does cross join and applies a post
 filter to the cross joined results. And we do not support non-equi
 conditions in ON clause, so non-equi outer join is currently impossible.
 But if you are doing inner joins, you can adjust your query as:

 SELECT * FROM T1, T2 WHERE T1.COL1 = T2.COL2 AND T1.COL1 = T2.COL3;


 On Wed, Mar 25, 2015 at 9:29 AM, Jaime Solano jdjsol...@gmail.com
 wrote:

 Hi guys,
 I'm trying to figure out a way to join two tables with non-equi
 conditions. Basically, something like:
 SELECT * FROM T1 LEFT JOIN T2 ON T1.COL1 = T2.COL2 AND T1.COL1 =
 T2.COL3;
 I'm using Phoenix 4.2.0, which doesn't support non-equi joins.
 Is this feature supported in version 4.3? If not, is there a workaround
 to make it work in 4.2.0?
 Thanks in advance!
 -Jaime





Re: Non-equi joins

2015-03-25 Thread Maryann Xue
No. Both the cross join and the post filter are done on the server side if
a hash join is possible, otherwise on the client side by a sort merge join.


Thanks,
Maryann

On Wed, Mar 25, 2015 at 2:27 PM, Abe Weinograd a...@flonet.com wrote:

 Is this all applied on the client?

 Thanks,
 Abe

 On Wed, Mar 25, 2015 at 1:59 PM, Maryann Xue maryann@gmail.com
 wrote:

 Yes, 4.3 only.

 On Wed, Mar 25, 2015 at 1:25 PM, Jaime Solano jdjsol...@gmail.com
 wrote:

 Thanks for your response, Maryann!

 Again, this suggestion is for 4.3 only, right?
 On Mar 25, 2015 12:34 PM, Maryann Xue maryann@gmail.com wrote:

 Actually we do in 4.3, but apparently not in an efficient way. If no
 equi conditions are specified, Phoenix simply does cross join and applies a
 post filter to the cross joined results. And we do not support non-equi
 conditions in ON clause, so non-equi outer join is currently impossible.
 But if you are doing inner joins, you can adjust your query as:

 SELECT * FROM T1, T2 WHERE T1.COL1 = T2.COL2 AND T1.COL1 = T2.COL3;


 On Wed, Mar 25, 2015 at 9:29 AM, Jaime Solano jdjsol...@gmail.com
 wrote:

 Hi guys,
 I'm trying to figure out a way to join two tables with non-equi
 conditions. Basically, something like:
 SELECT * FROM T1 LEFT JOIN T2 ON T1.COL1 = T2.COL2 AND T1.COL1 =
 T2.COL3;
 I'm using Phoenix 4.2.0, which doesn't support non-equi joins.
 Is this feature supported in version 4.3? If not, is there a
 workaround to make it work in 4.2.0?
 Thanks in advance!
 -Jaime







Re: Using Hints in Phoenix

2015-03-19 Thread Maryann Xue
Hi Matt,

Thank you very much for the investigation! Actually I tried with Java code
in a unit test (for I don't have an standalone HBase environment right
now), but could not reproduce it anyway. Then I thought it could be using
createStatement() vs prepareStatement() after seeing your
investigation, but again that was not it.

But anyway, it just occurred to me that there might be an easy way to force
running sort-merge-join (just for the purpose of testing), by running a
query with FULL OUTER JOIN. Could you please try that out and see what
happens?


Thanks,
Maryann


On Thu, Mar 19, 2015 at 7:54 AM, Matthew Johnson matt.john...@algomi.com
wrote:

 Hi Maryann,



 Sending this to the whole user group now I have a bit more investigation.
 Managed to do some digging and also figured out how to use sqlline.py, and
 got the following results.



 With sqlline.py:



 *0: jdbc:phoenix:sales1,sales2,sales3 explain SELECT /*+
 USE_SORT_MERGE_JOIN */ * FROM testtable1 t1 join testtable2 t2 on t2.rowid
 = t1.rowid;*



 *+--+*

 *|   PLAN   |*

 *+--+*

 *| SORT-MERGE-JOIN (INNER) TABLES   |*

 *| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1 |*

 *| AND  |*

 *| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2 |*

 *| SERVER SORTED BY [T2.ROWID]  |*

 *| CLIENT MERGE SORT|*

 *+--+*

 *6 rows selected (0.035 seconds)*





 So it does seem to work from sqlline!! Then, I decided to try and use it
 from inside a Java class:



 Class.*forName*(org.apache.phoenix.jdbc.PhoenixDriver);



  Connection conn = DriverManager.*getConnection*(
 jdbc:phoenix:sales1,sales2,sales3, , );



  // Create a Statement class to execute the SQL
 statement

  Statement stmtLimited = conn.createStatement();



  System.*out*.println(Executing statement);



  // Execute the SQL statement and get the results in
 a *Resultset*

  ResultSet rsLimited = stmtLimited.executeQuery(explain
 SELECT /*+ USE_SORT_MERGE_JOIN */ * FROM testtable1 t1 join testtable2 t2
 on t2.rowid = t1.rowid);

  *while*(rsLimited.next()) {

String plan = rsLimited.getString(PLAN);

System.*out*.println(plan);

  }



  stmtLimited.close();

  conn.close();



 And I get:



 Executing statement

 CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1

 PARALLEL INNER-JOIN TABLE 0

 CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2

 DYNAMIC SERVER FILTER BY ROWID BETWEEN MIN/MAX OF (T2.ROWID)





 So it seems that when executed from Java code (Squirrel is also a
 Java-based client) it ignores the hint! What do you think? Is there a way
 around this?



 Thanks!

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 17 March 2015 16:23
 *To:* user@phoenix.apache.org
 *Subject:* Re: Using Hints in Phoenix



 Hi Matt,



 Sorry that I still could not reproduce the issue. Could you try
 reproducing it in a unit test?





 Thanks,

 Maryann



 On Fri, Mar 13, 2015 at 6:20 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 All hbases in my cluster have:



 *phoenix-4.3.0-server.jar*



 in the lib folder, and my client is using:



 *phoenix-4.3.0-client.jar*



 I generated the MD5 in case something has gone wrong with the versioning:



 *e0ade979e7b444fb8f8f4b7b5578edab*



 And I have opened up the jar, and can see the new class
 *SortMergeJoinPlan.class*, so presumably I have the right version – is
 there anything else I can check?



 Cheers,

 Matt



 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 12 March 2015 23:02


 *To:* user@phoenix.apache.org
 *Subject:* Re: Using Hints in Phoenix



 Hi Matt,



 I checked on my side. Sort-merge-join would work for both tables and
 views. And I also verified that 4.3 branch does have the corresponding
 check-in (
 https://git1-us-west.apache.org/repos/asf?p=phoenix.git;a=commitdiff;h=ebc7ee42cdb2b05a293f54dc687ca975db9acbc3)
 although the check-in message had a little mistake there.



 Could you please verify your Phoenix library version again, Matt?
 Especially the client.





 Thanks,

 Maryann



 On Thu, Mar 12, 2015 at 6:00 PM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Matt,



 Thanks for sharing the query. Using that hint should supposedly force
 sort-merge join no matter what. I will go ahead and verify that.





 Thanks,

 Maryann



 On Thu, Mar 12, 2015 at 2:25 PM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 My views:



 *create view mytestview1 *

 *(*

 *rowid varchar primary

Re: Using Hints in Phoenix

2015-03-19 Thread Maryann Xue
By the way, to answer your previous questions, Phoenix joins have not
started to use stats so far, but the hints are parsed and handled in a
universal way regardless of what type of query it is.

Thanks,
Maryann

On Thu, Mar 19, 2015 at 12:03 PM, Maryann Xue maryann@gmail.com wrote:

 Hi Matt,

 Thank you very much for the investigation! Actually I tried with Java code
 in a unit test (for I don't have an standalone HBase environment right
 now), but could not reproduce it anyway. Then I thought it could be using
 createStatement() vs prepareStatement() after seeing your
 investigation, but again that was not it.

 But anyway, it just occurred to me that there might be an easy way to
 force running sort-merge-join (just for the purpose of testing), by running
 a query with FULL OUTER JOIN. Could you please try that out and see what
 happens?


 Thanks,
 Maryann


 On Thu, Mar 19, 2015 at 7:54 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 Sending this to the whole user group now I have a bit more investigation.
 Managed to do some digging and also figured out how to use sqlline.py, and
 got the following results.



 With sqlline.py:



 *0: jdbc:phoenix:sales1,sales2,sales3 explain SELECT /*+
 USE_SORT_MERGE_JOIN */ * FROM testtable1 t1 join testtable2 t2 on t2.rowid
 = t1.rowid;*



 *+--+*

 *|   PLAN   |*

 *+--+*

 *| SORT-MERGE-JOIN (INNER) TABLES   |*

 *| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1 |*

 *| AND  |*

 *| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2 |*

 *| SERVER SORTED BY [T2.ROWID]  |*

 *| CLIENT MERGE SORT|*

 *+--+*

 *6 rows selected (0.035 seconds)*





 So it does seem to work from sqlline!! Then, I decided to try and use it
 from inside a Java class:



 Class.*forName*(org.apache.phoenix.jdbc.PhoenixDriver);



  Connection conn = DriverManager.*getConnection*(
 jdbc:phoenix:sales1,sales2,sales3, , );



  // Create a Statement class to execute the SQL
 statement

  Statement stmtLimited = conn.createStatement();



  System.*out*.println(Executing statement);



  // Execute the SQL statement and get the results in
 a *Resultset*

  ResultSet rsLimited = stmtLimited.executeQuery(explain
 SELECT /*+ USE_SORT_MERGE_JOIN */ * FROM testtable1 t1 join testtable2 t2
 on t2.rowid = t1.rowid);

  *while*(rsLimited.next()) {

String plan = rsLimited.getString(PLAN);

System.*out*.println(plan);

  }



  stmtLimited.close();

  conn.close();



 And I get:



 Executing statement

 CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1

 PARALLEL INNER-JOIN TABLE 0

 CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2

 DYNAMIC SERVER FILTER BY ROWID BETWEEN MIN/MAX OF (T2.ROWID)





 So it seems that when executed from Java code (Squirrel is also a
 Java-based client) it ignores the hint! What do you think? Is there a way
 around this?



 Thanks!

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 17 March 2015 16:23
 *To:* user@phoenix.apache.org
 *Subject:* Re: Using Hints in Phoenix



 Hi Matt,



 Sorry that I still could not reproduce the issue. Could you try
 reproducing it in a unit test?





 Thanks,

 Maryann



 On Fri, Mar 13, 2015 at 6:20 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 All hbases in my cluster have:



 *phoenix-4.3.0-server.jar*



 in the lib folder, and my client is using:



 *phoenix-4.3.0-client.jar*



 I generated the MD5 in case something has gone wrong with the versioning:



 *e0ade979e7b444fb8f8f4b7b5578edab*



 And I have opened up the jar, and can see the new class
 *SortMergeJoinPlan.class*, so presumably I have the right version – is
 there anything else I can check?



 Cheers,

 Matt



 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 12 March 2015 23:02


 *To:* user@phoenix.apache.org
 *Subject:* Re: Using Hints in Phoenix



 Hi Matt,



 I checked on my side. Sort-merge-join would work for both tables and
 views. And I also verified that 4.3 branch does have the corresponding
 check-in (
 https://git1-us-west.apache.org/repos/asf?p=phoenix.git;a=commitdiff;h=ebc7ee42cdb2b05a293f54dc687ca975db9acbc3)
 although the check-in message had a little mistake there.



 Could you please verify your Phoenix library version again, Matt?
 Especially the client.





 Thanks,

 Maryann



 On Thu, Mar 12, 2015 at 6:00 PM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Matt,



 Thanks for sharing the query

Re: Using Hints in Phoenix

2015-03-19 Thread Maryann Xue
Hi Matt,

Ah, there we go. I think the your Squirrel and Java program somehow
referred to the old client JAR and that's why.


Thanks,
Maryann

On Thu, Mar 19, 2015 at 12:28 PM, Matthew Johnson matt.john...@algomi.com
wrote:

 Thanks Maryann for looking into it. I tried to run the following:



 *// Execute the SQL statement and get the results in a Resultset*

 *   ResultSet rsLimited = stmtLimited.executeQuery(**explain SELECT
 /*+ USE_SORT_MERGE_JOIN */ * FROM testtable1 t1 full outer join testtable2
 t2 on t2.rowid = t1.rowid**);*

*while**(rsLimited.next()) {*

 *  String plan = rsLimited.getString(**PLAN**);*

 *  System.**out**.println(plan);*

 *   }*



 But got this error:



 Executing statement

 *java.sql.SQLFeatureNotSupportedException*: Full joins not supported.

   at org.apache.phoenix.compile.QueryCompiler.compileJoinQuery(
 *QueryCompiler.java:248*)

   at org.apache.phoenix.compile.QueryCompiler.compile(
 *QueryCompiler.java:138*)

   at
 org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(
 *PhoenixStatement.java:322*)

   at
 org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(
 *PhoenixStatement.java:305*)

   at
 org.apache.phoenix.jdbc.PhoenixStatement$ExecutableExplainStatement.compilePlan(
 *PhoenixStatement.java:380*)

   at
 org.apache.phoenix.jdbc.PhoenixStatement$ExecutableExplainStatement.compilePlan(
 *PhoenixStatement.java:360*)

   at org.apache.phoenix.jdbc.PhoenixStatement$1.call(
 *PhoenixStatement.java:221*)

   at org.apache.phoenix.jdbc.PhoenixStatement$1.call(
 *PhoenixStatement.java:217*)

   at org.apache.phoenix.call.CallRunner.run(*CallRunner.java:53*)

   at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(
 *PhoenixStatement.java:216*)

   at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(
 *PhoenixStatement.java:1030*)

   at com.algomi.phoenix.PhoenixExplainPlan.main(
 *PhoenixExplainPlan.java:24*)





 Although this, too, seems to work from sqlline:



 *0: jdbc:phoenix:sales1,sales2,sales3 explain SELECT * FROM testtable1 t1
 full outer join testtable2 t2 on t2.rowid = t1.rowid;*

 *+--+*

 *|   PLAN   |*

 *+--+*

 *| SORT-MERGE-JOIN (FULL) TABLES|*

 *| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1 |*

 *| SERVER SORTED BY [T1.ROWID]  |*

 *| CLIENT MERGE SORT|*

 *| AND  |*

 *| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2 |*

 *| SERVER SORTED BY [T2.ROWID]  |*

 *| CLIENT MERGE SORT|*

 *+--+*

 *8 rows selected (0.032 seconds)*



 Does Sqlline use the same phoenix-client jar to make the connection? Or
 does it work differently?



 Thanks!

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 19 March 2015 16:08

 *To:* user@phoenix.apache.org
 *Subject:* Re: Using Hints in Phoenix



 By the way, to answer your previous questions, Phoenix joins have not
 started to use stats so far, but the hints are parsed and handled in a
 universal way regardless of what type of query it is.



 Thanks,

 Maryann



 On Thu, Mar 19, 2015 at 12:03 PM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Matt,



 Thank you very much for the investigation! Actually I tried with Java code
 in a unit test (for I don't have an standalone HBase environment right
 now), but could not reproduce it anyway. Then I thought it could be using
 createStatement() vs prepareStatement() after seeing your
 investigation, but again that was not it.



 But anyway, it just occurred to me that there might be an easy way to
 force running sort-merge-join (just for the purpose of testing), by running
 a query with FULL OUTER JOIN. Could you please try that out and see what
 happens?





 Thanks,

 Maryann





 On Thu, Mar 19, 2015 at 7:54 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 Sending this to the whole user group now I have a bit more investigation.
 Managed to do some digging and also figured out how to use sqlline.py, and
 got the following results.



 With sqlline.py:



 *0: jdbc:phoenix:sales1,sales2,sales3 explain SELECT /*+
 USE_SORT_MERGE_JOIN */ * FROM testtable1 t1 join testtable2 t2 on t2.rowid
 = t1.rowid;*



 *+--+*

 *|   PLAN   |*

 *+--+*

 *| SORT-MERGE-JOIN (INNER) TABLES   |*

 *| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1 |*

 *| AND  |*

 *| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2 |*

 *| SERVER SORTED BY [T2.ROWID]  |*

 *| CLIENT MERGE SORT

Re: Using Hints in Phoenix

2015-03-12 Thread Maryann Xue
Hi Matt,

Thanks for sharing the query. Using that hint should supposedly force
sort-merge join no matter what. I will go ahead and verify that.


Thanks,
Maryann

On Thu, Mar 12, 2015 at 2:25 PM, Matthew Johnson matt.john...@algomi.com
wrote:

 Hi Maryann,



 My views:



 *create view mytestview1 *

 *(*

 *rowid varchar primary key, *

 *metadata.bId varchar, *

 *metadata.dId varchar*

 *) *



 *create view mytestview2 *

 *(*

 *rowid varchar primary key, *

 *data.bId varchar,*

 *data.details varchar,*

 *data.comment varchar*

 *) *



 The amount of data:



 *SELECT count(*) FROM mytestview1;*

 *  -- 78,549*



 *SELECT count(*) FROM mytestview2;*

 *  -- 2,130,905*



 The query:



 Without hint:



 *EXPLAIN select count(*) *

 *from mytestview1 m1*

 *inner join mytestview2 m2*

 *on m1.bId = m2.bId*



 *CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest1*

 *SERVER AGGREGATE INTO SINGLE ROW*

 *PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)*

 *CLIENT 5-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest2*



 With hint:



 *EXPLAIN select /*+ USE_SORT_MERGE_JOIN */ count(*) *

 *from mytestview1 m1*

 *inner join mytestview2 m2*

 *on m1.bId = m2.bId*



 *CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest1*

 *SERVER AGGREGATE INTO SINGLE ROW*

 *PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)*

 *CLIENT 5-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest2*





 I know that when joining that I should ideally do large table join
 small table, but this is on my test environment and in production both
 tables are roughly the same size, so I’m trying to force it to use the sort
 merge before running the query in prod. My region servers in test don’t
 have that much heap space (about 2 gigs) if that makes a difference. Do I
 need to force a major compaction, generate statistics, anything like that?



 Thanks!

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 11 March 2015 20:16

 *To:* user@phoenix.apache.org
 *Subject:* Re: Using Hints in Phoenix



 Hi Matt,



 Views or tables should not matter in this. Would you mind sharing your
 query and DDLs?





 Thanks,

 Maryann





 On Wed, Mar 11, 2015 at 6:06 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 I am using 4.3.0 (I upgraded as soon as it was released, largely so I
 could pick up this feature). I am actually joining views rather than tables
 – would this make a difference?



 Cheers,

 Matt



 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 10 March 2015 20:54


 *To:* user@phoenix.apache.org
 *Subject:* Re: Using Hints in Phoenix



 Hi Matt,



 Which version of Phoenix are you using? Sort-merge join is only available
 in Phoenix 4.3.





 Thanks,

 Maryann



 On Tue, Mar 10, 2015 at 6:11 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 Thanks for clarifying that for me. I’ve been playing with the hint
 *USE_SORT_MERGE_JOIN* to try and solve an issue when joining two very
 large tables:



 *Error: Encountered exception in sub plan [0] execution.*

 *SQLState:  null*

 *ErrorCode: 0*



 Which I believe is related to a lack of memory for building the hash
 table? I thought that using *SELECT /*+ USE_SORT_MERGE_JOIN*/* would make
 the join much slower but would work on an unlimited data set – did I
 misunderstand? The explain plan does not change with or without this hint:



 CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1

 SERVER AGGREGATE INTO SINGLE ROW

 PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)

 CLIENT 15-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2



 Cheers,

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 09 March 2015 15:00
 *To:* user@phoenix.apache.org
 *Subject:* Re: Using Hints in Phoenix



 Hi Matt,



 So far in Phoenix, hints are only supported as specified right after
 keywords SELECT, UPSERT and DELETE. Same for join queries. It is currently
 impossible to hint a certain join algorithm for a specific join node in a
 multiple join query. However, for subqueries, the inner query can have its
 own hints, independent of the outer query, like SELECT /*+ INDEX(t idx1)*/
 col1, col2 FROM t WHERE col3 IN (SELECT /*+ NO_INDEX*/ id FROM r WHERE name
 = 'x').





 Thanks,

 Maryann



 On Mon, Mar 9, 2015 at 7:26 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi guys,



 This is more of a general question than a problem – but I’m just wondering
 if someone can clarify for me what the syntax rules are for hints in
 Phoenix. Does it matter where in the query they go? Do they always go
 something like *SELECT insert hint x from y*? Or, if the hint is for a
 join (eg Sort Merge) does it go in the join part (*SELECT x from y inner
 join insert hint z on j = k*)?



 Couldn’t seem to find anything specific on this in the docs, and haven’t
 worked much with database hints in general so maybe there is a convention
 that I am not aware of – apologies if it’s a stupid question

Re: Using Hints in Phoenix

2015-03-12 Thread Maryann Xue
Hi Matt,

I checked on my side. Sort-merge-join would work for both tables and views.
And I also verified that 4.3 branch does have the corresponding check-in (
https://git1-us-west.apache.org/repos/asf?p=phoenix.git;a=commitdiff;h=ebc7ee42cdb2b05a293f54dc687ca975db9acbc3)
although the check-in message had a little mistake there.

Could you please verify your Phoenix library version again, Matt?
Especially the client.


Thanks,
Maryann

On Thu, Mar 12, 2015 at 6:00 PM, Maryann Xue maryann@gmail.com wrote:

 Hi Matt,

 Thanks for sharing the query. Using that hint should supposedly force
 sort-merge join no matter what. I will go ahead and verify that.


 Thanks,
 Maryann

 On Thu, Mar 12, 2015 at 2:25 PM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 My views:



 *create view mytestview1 *

 *(*

 *rowid varchar primary key, *

 *metadata.bId varchar, *

 *metadata.dId varchar*

 *) *



 *create view mytestview2 *

 *(*

 *rowid varchar primary key, *

 *data.bId varchar,*

 *data.details varchar,*

 *data.comment varchar*

 *) *



 The amount of data:



 *SELECT count(*) FROM mytestview1;*

 *  -- 78,549*



 *SELECT count(*) FROM mytestview2;*

 *  -- 2,130,905*



 The query:



 Without hint:



 *EXPLAIN select count(*) *

 *from mytestview1 m1*

 *inner join mytestview2 m2*

 *on m1.bId = m2.bId*



 *CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest1*

 *SERVER AGGREGATE INTO SINGLE ROW*

 *PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)*

 *CLIENT 5-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest2*



 With hint:



 *EXPLAIN select /*+ USE_SORT_MERGE_JOIN */ count(*) *

 *from mytestview1 m1*

 *inner join mytestview2 m2*

 *on m1.bId = m2.bId*



 *CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest1*

 *SERVER AGGREGATE INTO SINGLE ROW*

 *PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)*

 *CLIENT 5-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest2*





 I know that when joining that I should ideally do large table join
 small table, but this is on my test environment and in production both
 tables are roughly the same size, so I’m trying to force it to use the sort
 merge before running the query in prod. My region servers in test don’t
 have that much heap space (about 2 gigs) if that makes a difference. Do I
 need to force a major compaction, generate statistics, anything like that?



 Thanks!

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 11 March 2015 20:16

 *To:* user@phoenix.apache.org
 *Subject:* Re: Using Hints in Phoenix



 Hi Matt,



 Views or tables should not matter in this. Would you mind sharing your
 query and DDLs?





 Thanks,

 Maryann





 On Wed, Mar 11, 2015 at 6:06 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 I am using 4.3.0 (I upgraded as soon as it was released, largely so I
 could pick up this feature). I am actually joining views rather than tables
 – would this make a difference?



 Cheers,

 Matt



 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 10 March 2015 20:54


 *To:* user@phoenix.apache.org
 *Subject:* Re: Using Hints in Phoenix



 Hi Matt,



 Which version of Phoenix are you using? Sort-merge join is only available
 in Phoenix 4.3.





 Thanks,

 Maryann



 On Tue, Mar 10, 2015 at 6:11 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 Thanks for clarifying that for me. I’ve been playing with the hint
 *USE_SORT_MERGE_JOIN* to try and solve an issue when joining two very
 large tables:



 *Error: Encountered exception in sub plan [0] execution.*

 *SQLState:  null*

 *ErrorCode: 0*



 Which I believe is related to a lack of memory for building the hash
 table? I thought that using *SELECT /*+ USE_SORT_MERGE_JOIN*/* would
 make the join much slower but would work on an unlimited data set – did I
 misunderstand? The explain plan does not change with or without this hint:



 CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1

 SERVER AGGREGATE INTO SINGLE ROW

 PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)

 CLIENT 15-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2



 Cheers,

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 09 March 2015 15:00
 *To:* user@phoenix.apache.org
 *Subject:* Re: Using Hints in Phoenix



 Hi Matt,



 So far in Phoenix, hints are only supported as specified right after
 keywords SELECT, UPSERT and DELETE. Same for join queries. It is currently
 impossible to hint a certain join algorithm for a specific join node in a
 multiple join query. However, for subqueries, the inner query can have its
 own hints, independent of the outer query, like SELECT /*+ INDEX(t idx1)*/
 col1, col2 FROM t WHERE col3 IN (SELECT /*+ NO_INDEX*/ id FROM r WHERE name
 = 'x').





 Thanks,

 Maryann



 On Mon, Mar 9, 2015 at 7:26 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi guys,



 This is more of a general question than a problem – but I’m just
 wondering if someone can

Re: Using Hints in Phoenix

2015-03-11 Thread Maryann Xue
Hi Matt,

Views or tables should not matter in this. Would you mind sharing your
query and DDLs?


Thanks,
Maryann


On Wed, Mar 11, 2015 at 6:06 AM, Matthew Johnson matt.john...@algomi.com
wrote:

 Hi Maryann,



 I am using 4.3.0 (I upgraded as soon as it was released, largely so I
 could pick up this feature). I am actually joining views rather than tables
 – would this make a difference?



 Cheers,

 Matt



 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 10 March 2015 20:54

 *To:* user@phoenix.apache.org
 *Subject:* Re: Using Hints in Phoenix



 Hi Matt,



 Which version of Phoenix are you using? Sort-merge join is only available
 in Phoenix 4.3.





 Thanks,

 Maryann



 On Tue, Mar 10, 2015 at 6:11 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 Thanks for clarifying that for me. I’ve been playing with the hint
 *USE_SORT_MERGE_JOIN* to try and solve an issue when joining two very
 large tables:



 *Error: Encountered exception in sub plan [0] execution.*

 *SQLState:  null*

 *ErrorCode: 0*



 Which I believe is related to a lack of memory for building the hash
 table? I thought that using *SELECT /*+ USE_SORT_MERGE_JOIN*/* would make
 the join much slower but would work on an unlimited data set – did I
 misunderstand? The explain plan does not change with or without this hint:



 CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1

 SERVER AGGREGATE INTO SINGLE ROW

 PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)

 CLIENT 15-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2



 Cheers,

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 09 March 2015 15:00
 *To:* user@phoenix.apache.org
 *Subject:* Re: Using Hints in Phoenix



 Hi Matt,



 So far in Phoenix, hints are only supported as specified right after
 keywords SELECT, UPSERT and DELETE. Same for join queries. It is currently
 impossible to hint a certain join algorithm for a specific join node in a
 multiple join query. However, for subqueries, the inner query can have its
 own hints, independent of the outer query, like SELECT /*+ INDEX(t idx1)*/
 col1, col2 FROM t WHERE col3 IN (SELECT /*+ NO_INDEX*/ id FROM r WHERE name
 = 'x').





 Thanks,

 Maryann



 On Mon, Mar 9, 2015 at 7:26 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi guys,



 This is more of a general question than a problem – but I’m just wondering
 if someone can clarify for me what the syntax rules are for hints in
 Phoenix. Does it matter where in the query they go? Do they always go
 something like *SELECT insert hint x from y*? Or, if the hint is for a
 join (eg Sort Merge) does it go in the join part (*SELECT x from y inner
 join insert hint z on j = k*)?



 Couldn’t seem to find anything specific on this in the docs, and haven’t
 worked much with database hints in general so maybe there is a convention
 that I am not aware of – apologies if it’s a stupid question!



 Cheers,

 Matt









Re: Using Hints in Phoenix

2015-03-10 Thread Maryann Xue
Hi Matt,

Which version of Phoenix are you using? Sort-merge join is only available
in Phoenix 4.3.


Thanks,
Maryann

On Tue, Mar 10, 2015 at 6:11 AM, Matthew Johnson matt.john...@algomi.com
wrote:

 Hi Maryann,



 Thanks for clarifying that for me. I’ve been playing with the hint
 *USE_SORT_MERGE_JOIN* to try and solve an issue when joining two very
 large tables:



 *Error: Encountered exception in sub plan [0] execution.*

 *SQLState:  null*

 *ErrorCode: 0*



 Which I believe is related to a lack of memory for building the hash
 table? I thought that using *SELECT /*+ USE_SORT_MERGE_JOIN*/* would make
 the join much slower but would work on an unlimited data set – did I
 misunderstand? The explain plan does not change with or without this hint:



 CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1

 SERVER AGGREGATE INTO SINGLE ROW

 PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)

 CLIENT 15-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2



 Cheers,

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 09 March 2015 15:00
 *To:* user@phoenix.apache.org
 *Subject:* Re: Using Hints in Phoenix



 Hi Matt,



 So far in Phoenix, hints are only supported as specified right after
 keywords SELECT, UPSERT and DELETE. Same for join queries. It is currently
 impossible to hint a certain join algorithm for a specific join node in a
 multiple join query. However, for subqueries, the inner query can have its
 own hints, independent of the outer query, like SELECT /*+ INDEX(t idx1)*/
 col1, col2 FROM t WHERE col3 IN (SELECT /*+ NO_INDEX*/ id FROM r WHERE name
 = 'x').





 Thanks,

 Maryann



 On Mon, Mar 9, 2015 at 7:26 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi guys,



 This is more of a general question than a problem – but I’m just wondering
 if someone can clarify for me what the syntax rules are for hints in
 Phoenix. Does it matter where in the query they go? Do they always go
 something like *SELECT insert hint x from y*? Or, if the hint is for a
 join (eg Sort Merge) does it go in the join part (*SELECT x from y inner
 join insert hint z on j = k*)?



 Couldn’t seem to find anything specific on this in the docs, and haven’t
 worked much with database hints in general so maybe there is a convention
 that I am not aware of – apologies if it’s a stupid question!



 Cheers,

 Matt







Re: Using Hints in Phoenix

2015-03-09 Thread Maryann Xue
Hi Matt,

So far in Phoenix, hints are only supported as specified right after
keywords SELECT, UPSERT and DELETE. Same for join queries. It is currently
impossible to hint a certain join algorithm for a specific join node in a
multiple join query. However, for subqueries, the inner query can have its
own hints, independent of the outer query, like SELECT /*+ INDEX(t idx1)*/
col1, col2 FROM t WHERE col3 IN (SELECT /*+ NO_INDEX*/ id FROM r WHERE name
= 'x').


Thanks,
Maryann

On Mon, Mar 9, 2015 at 7:26 AM, Matthew Johnson matt.john...@algomi.com
wrote:

 Hi guys,



 This is more of a general question than a problem – but I’m just wondering
 if someone can clarify for me what the syntax rules are for hints in
 Phoenix. Does it matter where in the query they go? Do they always go
 something like *SELECT insert hint x from y*? Or, if the hint is for a
 join (eg Sort Merge) does it go in the join part (*SELECT x from y inner
 join insert hint z on j = k*)?



 Couldn’t seem to find anything specific on this in the docs, and haven’t
 worked much with database hints in general so maybe there is a convention
 that I am not aware of – apologies if it’s a stupid question!



 Cheers,

 Matt





Re: Inner Join not returning any results in Phoenix

2015-02-24 Thread Maryann Xue
Thanks a lot, Matt, for the reply! Very helpful. *SERVER FILTER BY
PageFilter 100* does look like a but here. I will try again to reproduce
it.


Thanks,
Maryann

On Tue, Feb 24, 2015 at 6:07 AM, Matthew Johnson matt.john...@algomi.com
wrote:

 Hi Maryann,



 Thanks for that - I will schedule an update to the latest version of
 Phoenix then for later this week (and try out the merge-join hints).



 In the meantime, here are my explain plans:



 *JOIN WITH NO SQUIRREL LIMIT*



 *PLAN*

 *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1*

 *PARALLEL INNER-JOIN TABLE 0*

 *CLIENT 3-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2*



 *JOIN WITH SQUIRREL LIMIT 100*



 *PLAN*

 *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1*

 *CLIENT 100 ROW LIMIT*

 *PARALLEL INNER-JOIN TABLE 0*

 *CLIENT 3-CHUNK SERIAL 1-WAY FULL SCAN OVER mytable2*

 *SERVER FILTER BY PageFilter 100*

 *SERVER 100 ROW LIMIT*

 *CLIENT 100 ROW LIMIT*





 I’m not really sure how to read that, but it does seem to suggest that
 ‘mytable2’ is being limited to 100 – thoughts?



 Cheers,

 Matt



 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 23 February 2015 18:10

 *To:* user@phoenix.apache.org
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Hi Matt,



 Yes, the upgrade is as easy as that. I believe things will work fine with
 existing tables.

 I tried with a similar query but didn't see that it was a Phoenix bug. So
 could you please try the following explain statement and see the execution
 plan:



 EXPLAIN *SELECT * FROM mytable1 hc*

 *INNER JOIN “mytable2” bs*

 *On hc.myId = bs.”myId”*





 Thanks,

 Maryann





 On Fri, Feb 20, 2015 at 1:09 PM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 That’s a good point – I am using 4.2.2, so if that feature is 4.3+ then
 that would explain why it’s not working. Is upgrading versions of Phoenix
 as simple as removing the previous jar from HBase lib folder and dropping
 the new Phoenix jar in (and restarting HBase)? Will all the existing
 Phoenix tables and views be backwards-compatible and work with the new
 version?



 Cheers,

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 20 February 2015 17:46


 *To:* user@phoenix.apache.org
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Which version of Phoenix are you using, Matt? This feature is only
 available in the latest releases of 4.3/3.3.



 On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 Unfortunately my two tables are roughly the same size (~500k), but I have
 tested a different join where one table is ~500k and the other is ~20k and
 putting the larger one first is definitely far more performant. I believe
 you are right about running out of memory, I can see this repeated a few
 times in the region server logs followed by what appears to be a restart or
 disconnect:



 *[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host
 machine (eg GC): pause of approximately 1083ms*



 I have been looking at the Phoenix page on joins (
 http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge
 joins for large tables by using a hint. I have tried this though with no
 success:



 *SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc*

 *INNER JOIN “mytable2” bs*

 *On hc.”myId” = bs.”myId”*



 Am I putting the hint in the wrong place? Does it need to go next to the
 JOIN rather than the SELECT?



 I will try increasing the memory available to the Region Servers as well
 to see if that helps.



 Thanks!

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 20 February 2015 16:28
 *To:* user@phoenix.apache.org


 *Subject:* Re: Inner Join not returning any results in Phoenix



 Hi Matt,



 The error you got with Limit Rows off might be related to insufficient
 memory on region servers for one of your tables. Which is the larger table
 between table1 and table2? You might want to try putting the larger table
 as the first table in your join query and see if it works.



 And I will quickly check if the LIMIT problem is a Phoenix bug and will
 keep you posted.





 Thanks,

 Maryann





 On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Abe,



 Glad to hear I’m not alone! Will try and figure out exactly what’s
 happening and maybe raise a Jira :-)





 @Constantin – I have tried with and without the “Limit Rows” – but without
 it, and without any indexes, the query runs for a while (about 10 minutes?)
 and then throws an error:



 *Error: Encountered exception in sub plan [0] execution.*



 Which I’m guessing is either HBase or Zookeeper timeout. The weird thing
 is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit
 Rows” does not affect any aggregate functions like ‘count’, because the
 actual number

Re: Inner Join not returning any results in Phoenix

2015-02-24 Thread Maryann Xue
Thanks, James and Matt!

I successfully repro the bug using setMaxRow(). Adding a LIMIT to the query
itself won't cause such a bug.

Please watch on https://issues.apache.org/jira/browse/PHOENIX-1680. I will
post a patch right away.


Maryann


On Tue, Feb 24, 2015 at 12:57 PM, Matthew Johnson matt.john...@algomi.com
wrote:

 Thanks James.



 Maryann, with the upserts I sent, I run the following code (using
 phoenix-client-4.2.2.jar):



  // Register JDBC Driver

  Class.*forName*(
 org.apache.phoenix.jdbc.PhoenixDriver).newInstance();



  Connection conn = DriverManager.
 *getConnection*(jdbc:phoenix:box1,box2,box3, , );



  // Create a Statement class to execute the
 SQL statement

  Statement stmtLimited =
 conn.createStatement();

  stmtLimited.setMaxRows(10);



  // Execute the SQL statement and get the
 results in a *Resultset*

  ResultSet rsLimited =
 stmtLimited.executeQuery(select * from mytable1 m1 inner join mytable2
 m2 on m1.firstletter = m2.firstletter where m1.firstletter = 'Z');

  *int* rsLimitedCount = 0;

  *while*(rsLimited.next()) {

   rsLimitedCount++;

  }



  // Create a Statement class to execute the
 SQL statement

  Statement stmtNoLimit =
 conn.createStatement();



  // Execute the SQL statement and get the
 results in a *Resultset*

  ResultSet rsNoLimit =
 stmtNoLimit.executeQuery(select * from mytable1 m1 inner join mytable2
 m2 on m1.firstletter = m2.firstletter where m1.firstletter = 'Z');

  *int* rsNoLimitedCount = 0;

  *while*(rsNoLimit.next()) {

   rsNoLimitedCount++;

  }



  System.*out*.println(Results found when
 LIMIT 10 was  + rsLimitedCount +  but with NO LIMIT was  +
 rsNoLimitedCount);





 And the output is:



 Results found when LIMIT 10 was 0 but with NO LIMIT was 1





 Hope that helps!



 Cheers,

 Matt





 *From:* James Taylor [mailto:jamestay...@apache.org]
 *Sent:* 24 February 2015 17:27

 *To:* user
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Mary,

 You can just call Statement.setMaxRows(10) to emulate what SQuirrel is
 doing.

 Thanks,

 James



 On Tue, Feb 24, 2015 at 9:09 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 I have two environments, one with just some dummy data, and one with real
 data in it. The dummy data one gave me results when I queried and the real
 data didn’t, but turns out that is just because the dummy data was much
 more coherent (because it was manually created for a specific test) so the
 RHS table always matches the LHS and therefore the join gives me results.



 I have attached a script that demonstrates my problem (create 2 Phoenix
 tables, insert some rows, and run a query using a join). When I run this on
 my cluster, I consistently see the issue I am having. If I set the LIMIT in
 Squirrel to 10, I get no results, but if I set it to 26, I get a result
 (since I have 26 rows in the RHS table and I am intentionally querying for
 the last one).



 Please give it a go and see if it reproduces for you – are you using
 Squirrel? If so, what version?



 Thanks!

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 24 February 2015 16:41


 *To:* user@phoenix.apache.org
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Hi Matt,



 I just noticed these lines in your very first message:



 *PS* Something that may or may not be of note: In the environments I am
 using:

 WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar*

 FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar*



 What do you mean by WORKING and FAILING?



 I still cannot reproduce the bug here. Could you please post DDLs you used
 for related tables?





 Thanks,

 Maryann







 On Tue, Feb 24, 2015 at 11:27 AM, James Taylor jamestay...@apache.org
 wrote:

 FYI, SQuirrel sets the max rows to return as 100. You can change this in
 the tool, though.



 On Tuesday, February 24, 2015, Maryann Xue maryann@gmail.com wrote:

 Thanks a lot, Matt, for the reply! Very helpful. *SERVER FILTER BY
 PageFilter 100* does look like a but here. I will try again to reproduce
 it.





 Thanks,

 Maryann



 On Tue, Feb 24, 2015 at 6:07 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 Thanks for that - I will schedule an update to the latest version of
 Phoenix then for later this week (and try out the merge-join hints

Re: Inner Join not returning any results in Phoenix

2015-02-20 Thread Maryann Xue
Hi Matt,

The error you got with Limit Rows off might be related to insufficient
memory on region servers for one of your tables. Which is the larger table
between table1 and table2? You might want to try putting the larger table
as the first table in your join query and see if it works.

And I will quickly check if the LIMIT problem is a Phoenix bug and will
keep you posted.


Thanks,
Maryann


On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson matt.john...@algomi.com
wrote:

 Hi Abe,



 Glad to hear I’m not alone! Will try and figure out exactly what’s
 happening and maybe raise a Jira :-)





 @Constantin – I have tried with and without the “Limit Rows” – but without
 it, and without any indexes, the query runs for a while (about 10 minutes?)
 and then throws an error:



 *Error: Encountered exception in sub plan [0] execution.*



 Which I’m guessing is either HBase or Zookeeper timeout. The weird thing
 is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit
 Rows” does not affect any aggregate functions like ‘count’, because the
 actual number of result rows for a count is just 1 row (the count itself).
 But in HBase it seems that the Row Limit, as Abe mentioned, is applied to
 one of the table BEFORE it does the join, so it affects the results of the
 ‘count’ function.



 When I try to create my indexes so I am able to do the join without Row
 Limit, I get the following error:



 ERROR 1029 (42Y88): Mutable secondary indexes must have the
 hbase.regionserver.wal.codec property set to
 org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the
 hbase-sites.xml of every region server



 Which I am happy to do (will have to wait until outside of business hours
 though), but I am curious, will this have any impact on the rest of my
 cluster and could it have any unforeseen consequences?



 Thanks again for the input!



 Cheers,

 Matt





 *From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com]
 *Sent:* 20 February 2015 15:48

 *To:* user@phoenix.apache.org
 *Subject:* RE: Inner Join not returning any results in Phoenix



 Hello Matt,



 http://codingclues.eu/2008/the-squirrel-100-rows-problem/



 Can you please test again after unchecking “Contents- Limit rows” and “SQL
 – Limit rows”?



 [image: SQL tab]



 P.S. Off-topic – it’s as funny as this “problem” (not possible to send an
 email for more than 500 miles away J
 http://www.ibiblio.org/harris/500milemail.html )



 Regards,

   Constantin



 *From:* Abe Weinograd [mailto:a...@flonet.com a...@flonet.com]
 *Sent:* Friday, February 20, 2015 4:18 PM
 *To:* user
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Matt,



 I have seen this same issue.  When passing a LIMIT to a query with joins
 (most query tools do it implicitly), Phoenix seems to apply that to the
 table on the right of the join I believe.  I hadn't had a chance to play
 with it more and file a JIRA, but what you are describing is consistent
 with what I have seen.



 Abe



 On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Constantin,



 Many thanks for your reply – the quotes were both of the same type (double
 quotes for table and column names, single quotes for string literals), it
 is just my email client that formatted them weirdly, sorry!



 I have discovered what I believe is an important piece of the puzzle to my
 problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has
 a “Limit Rows” feature. When I try and count the number of rows in a single
 table:



 *select count(*) from “mytable1”*



 I get the expected number of results (eg 20,000). But when I join two
 tables together, it seems that the “Limit Rows” from Squirrel is somehow
 being applied before the join is performed, and if “Limit Rows” is set to
 100 I get 100 results or less. If the inner join is quite sparse (eg 20,000
 rows in a table but only 100 of these will join with a second table) then I
 believe it tries to join the first 100 it finds and returns no results. In
 my experience of Oracle or MySQL, joins are done entirely on server side
 and then you just get back the number of rows you limited, rather than what
 appears to be happening which is the row limit is applied to the first
 table before the join is attempted with the second table. Is that how
 Phoenix works?



 I have also discovered that I get different results (with “Limit Rows”
 turned on) depending on which order I join the tables:



 *SELECT count(*) FROM “mytable1” hc*

 *INNER JOIN “mytable2” bs*

 *On hc.”myId” = bs.”myId”*



 Gives me a very different number of results than:



 *SELECT count(*) FROM “mytable2” bs*

 *INNER JOIN “mytable1” hc*

 *On hc.”myId” = bs.”myId”*





 Unfortunately I cannot test whether I get the same number of results with
 “Limit Rows” turned off because my query times out! So I am now looking at
 creating secondary indexes on the “myId” column in both tables to see if I
 

Re: Inner Join not returning any results in Phoenix

2015-02-20 Thread Maryann Xue
Which version of Phoenix are you using, Matt?

On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson matt.john...@algomi.com
wrote:

 Hi Maryann,



 Unfortunately my two tables are roughly the same size (~500k), but I have
 tested a different join where one table is ~500k and the other is ~20k and
 putting the larger one first is definitely far more performant. I believe
 you are right about running out of memory, I can see this repeated a few
 times in the region server logs followed by what appears to be a restart or
 disconnect:



 *[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host
 machine (eg GC): pause of approximately 1083ms*



 I have been looking at the Phoenix page on joins (
 http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge
 joins for large tables by using a hint. I have tried this though with no
 success:



 *SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc*

 *INNER JOIN “mytable2” bs*

 *On hc.”myId” = bs.”myId”*



 Am I putting the hint in the wrong place? Does it need to go next to the
 JOIN rather than the SELECT?



 I will try increasing the memory available to the Region Servers as well
 to see if that helps.



 Thanks!

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 20 February 2015 16:28
 *To:* user@phoenix.apache.org

 *Subject:* Re: Inner Join not returning any results in Phoenix



 Hi Matt,



 The error you got with Limit Rows off might be related to insufficient
 memory on region servers for one of your tables. Which is the larger table
 between table1 and table2? You might want to try putting the larger table
 as the first table in your join query and see if it works.



 And I will quickly check if the LIMIT problem is a Phoenix bug and will
 keep you posted.





 Thanks,

 Maryann





 On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Abe,



 Glad to hear I’m not alone! Will try and figure out exactly what’s
 happening and maybe raise a Jira :-)





 @Constantin – I have tried with and without the “Limit Rows” – but without
 it, and without any indexes, the query runs for a while (about 10 minutes?)
 and then throws an error:



 *Error: Encountered exception in sub plan [0] execution.*



 Which I’m guessing is either HBase or Zookeeper timeout. The weird thing
 is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit
 Rows” does not affect any aggregate functions like ‘count’, because the
 actual number of result rows for a count is just 1 row (the count itself).
 But in HBase it seems that the Row Limit, as Abe mentioned, is applied to
 one of the table BEFORE it does the join, so it affects the results of the
 ‘count’ function.



 When I try to create my indexes so I am able to do the join without Row
 Limit, I get the following error:



 ERROR 1029 (42Y88): Mutable secondary indexes must have the
 hbase.regionserver.wal.codec property set to
 org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the
 hbase-sites.xml of every region server



 Which I am happy to do (will have to wait until outside of business hours
 though), but I am curious, will this have any impact on the rest of my
 cluster and could it have any unforeseen consequences?



 Thanks again for the input!



 Cheers,

 Matt





 *From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com]
 *Sent:* 20 February 2015 15:48


 *To:* user@phoenix.apache.org
 *Subject:* RE: Inner Join not returning any results in Phoenix



 Hello Matt,



 http://codingclues.eu/2008/the-squirrel-100-rows-problem/



 Can you please test again after unchecking “Contents- Limit rows” and “SQL
 – Limit rows”?



 [image: SQL tab]



 P.S. Off-topic – it’s as funny as this “problem” (not possible to send an
 email for more than 500 miles away J
 http://www.ibiblio.org/harris/500milemail.html )



 Regards,

   Constantin



 *From:* Abe Weinograd [mailto:a...@flonet.com a...@flonet.com]
 *Sent:* Friday, February 20, 2015 4:18 PM
 *To:* user
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Matt,



 I have seen this same issue.  When passing a LIMIT to a query with joins
 (most query tools do it implicitly), Phoenix seems to apply that to the
 table on the right of the join I believe.  I hadn't had a chance to play
 with it more and file a JIRA, but what you are describing is consistent
 with what I have seen.



 Abe



 On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Constantin,



 Many thanks for your reply – the quotes were both of the same type (double
 quotes for table and column names, single quotes for string literals), it
 is just my email client that formatted them weirdly, sorry!



 I have discovered what I believe is an important piece of the puzzle to my
 problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has
 a “Limit Rows” feature. When I try and count the number of rows

Re: Phoenix Subqueries with ‘IN’

2015-01-15 Thread Maryann Xue
Hi Xiaoguo,

Do you mean you have hit a bug in Phoenix? The query is expected to return
nothing but returns all rows?


Thanks,
Maryann

On Thu, Jan 15, 2015 at 9:02 PM, 【小郭】 guoqid...@qq.com wrote:

 Hi guys:
  When using the subquery with 'IN',if the subquery return no rows,the
 query whill find all rows.
  for example:‍
 Query:‍
 SELECT * FROM SYS_USER WHERE ID IN (SELECT USER_ID FROM SYS_ORG
 WHERE ID = 0);
 If the subquery 'SELECT USER_ID FROM SYS_ORG WHERE ID = 0‍'  have no 
 result,the
 Query will return all rows of table 'SYS_USER'.

 How could i get the result like oracle if the ‍subquery ‍have no result?‍‍

 Hbase 0.98
 Phoenix 4.2‍

 Thank you for your help,‍
 xiaoguo







Re: Phoenix Subqueries with ‘IN’

2015-01-15 Thread Maryann Xue
This has been verified as a bug. Just filed
https://issues.apache.org/jira/browse/PHOENIX-1591 for it.
Thank you very much for reporting this, Xiaoguo! You can expect it to be
fixed in Phoenix 4.3.

On Thu, Jan 15, 2015 at 10:43 PM, Maryann Xue maryann@gmail.com wrote:

 Hi Xiaoguo,

 Do you mean you have hit a bug in Phoenix? The query is expected to return
 nothing but returns all rows?


 Thanks,
 Maryann

 On Thu, Jan 15, 2015 at 9:02 PM, 【小郭】 guoqid...@qq.com wrote:

 Hi guys:
  When using the subquery with 'IN',if the subquery return no
 rows,the query whill find all rows.
  for example:‍
 Query:‍
 SELECT * FROM SYS_USER WHERE ID IN (SELECT USER_ID FROM SYS_ORG
 WHERE ID = 0);
 If the subquery 'SELECT USER_ID FROM SYS_ORG WHERE ID = 0‍'  have no 
 result,the
 Query will return all rows of table 'SYS_USER'.

 How could i get the result like oracle if the ‍subquery ‍have no result?‍
 ‍

 Hbase 0.98
 Phoenix 4.2‍

 Thank you for your help,‍
 xiaoguo









Re: Query performance question

2014-12-18 Thread Maryann Xue
No problem. Thanks for reporting the problem, Ralph!

On Thu, Dec 18, 2014 at 11:49 AM, Perko, Ralph J ralph.pe...@pnnl.gov
wrote:

   Thank you for your quick response to this.  I built and deployed 4.2.3
 and there is a dramatic difference in performance.  Both the subselect and
 join query went from around 177s to 8s!  The explain plan now shows the
 entire pk being used.

__
 *Ralph Perko*
 Pacific Northwest National Laboratory
   (509) 375-2272
 ralph.pe...@pnnl.gov


   From: Maryann Xue maryann@gmail.com
 Reply-To: user@phoenix.apache.org user@phoenix.apache.org
 Date: Monday, December 15, 2014 at 3:14 PM

 To: user@phoenix.apache.org user@phoenix.apache.org
 Subject: Re: Query performance question

   Thanks again for the input, Ralph!

  Pinpointed the problem and filed a JIRA (
 https://issues.apache.org/jira/browse/PHOENIX-1533). Will try to work out
 a solution ASAP.

 On Mon, Dec 15, 2014 at 3:38 PM, Perko, Ralph J ralph.pe...@pnnl.gov
 wrote:

   I am using version 4.2.2


  From: Perko, Ralph Perko ralph.pe...@pnnl.gov
 Reply-To: user@phoenix.apache.org user@phoenix.apache.org
 Date: Monday, December 15, 2014 at 12:37 PM

 To: user@phoenix.apache.org user@phoenix.apache.org
 Subject: Re: Query performance question

DDL is attached – thanks!

  Ralph


   From: Maryann Xue maryann@gmail.com
 Reply-To: user@phoenix.apache.org user@phoenix.apache.org
 Date: Monday, December 15, 2014 at 12:21 PM
 To: user@phoenix.apache.org user@phoenix.apache.org
 Subject: Re: Query performance question

   Hi Ralph,

  Thank you very much for the information! Very helpful for your
 questions.
 The numbers look reasonable as opposed to the query plan. But the only
 problem is both file_id and recnum should be used in DYNAMIC SERVER
 FILTER (which will turn into a skip-scan), but the plan you got there only
 used file_id alone.
 So could you please attach the DDL of your table and index so that I can
 try reproducing the problem?


  Thanks,
 Maryann


 On Mon, Dec 15, 2014 at 2:46 PM, Perko, Ralph J ralph.pe...@pnnl.gov
 wrote:

   My apologies for the delay.  I had to switch clusters and use a
 smaller dataset.  The discrepancy still exists but the numbers are a little
 different:

  I ran the same queries as in the original email (below)

  Total records: 581M

  Simple query based on secondary index value used in the subselect:
 1747 recs – 0.256 sec

  Subselect query:  177s

  Join query: 179s

  The answers to your questions are below.

  1. What is the Primary Key definition of your BULK_TABLE?

  CONSTRAINT pkey PRIMARY KEY (file_id,recnum)

  2. How many (approximately) distinct file_id values are there in the
 BULK_TABLE? (If you don't know for sure, you can just run a query to find
 out).

  select count(distinct(file_id)) from BULK_TABLE” - 7902 - 92s

  select distinct(file_id) from BULK_TABLE”  - returns in 25m

  3. How long does it take to run a full-scan query on BULK_TABLE, like
 select * from BULK_TABLE”?

  Results began returning after about 25min

  4. How long does it take to run a full-scan join query on BULK_TABLE,
 like select * from BULK_TABLE join (select file_id, recnum from
 BULK_TABLE) as SS on BULK_TABLE.file_id = SS.file_id and BULK_TABLE.recnum
 = SS.recnum”?

  The full-scan join fails with a MaxServerCacheSizeExceededException  -
 server cache set to 1G.

  Custom hbase/phoenix settings are attached.

  Thanks,
 Ralph


   From: Maryann Xue maryann@gmail.com
 Reply-To: user@phoenix.apache.org user@phoenix.apache.org
 Date: Friday, December 12, 2014 at 8:07 AM
 To: user@phoenix.apache.org user@phoenix.apache.org
 Subject: Re: Query performance question

   Hi Ralph,

  Thanks for the question!
 According to the explain result you got, the optimization worked
 exactly as expected with this query:

  DYNAMIC SERVER FILTER BY FILE_ID IN (SS.FILE_ID) means a skip-scan
 instead of a full-scan over BULK_TABLE will be executed at runtime based on
 the values of file_id it got from the inner query.

  So I need to know a few more things:
 1. What is the Primary Key definition of your BULK_TABLE?
 2. How many (approximately) distinct file_id values are there in the
 BULK_TABLE? (If you don't know for sure, you can just run a query to find
 out).
 3. How long does it take to run a full-scan query on BULK_TABLE, like
 select * from BULK_TABLE?
 4. How long does it take to run a full-scan join query on BULK_TABLE,
 like select * from BULK_TABLE join (select file_id, recnum from
 BULK_TABLE) as SS on BULK_TABLE.file_id = SS.file_id and
 BULK_TABLE.recnum = SS.recnum?


  Thanks,
 Maryann


 On Thu, Dec 11, 2014 at 6:28 PM, Perko, Ralph J ralph.pe...@pnnl.gov
 wrote:

  Hi,

  Thanks for all your help thus far with Phoenix.

  I am trying to understand the best way to construct a query that
 returns all the fields from a table but still takes advantage of a single
 field secondary index.   I have a table with upwards

Re: Query performance question

2014-12-15 Thread Maryann Xue
Hi Ralph,

Thank you very much for the information! Very helpful for your questions.
The numbers look reasonable as opposed to the query plan. But the only
problem is both file_id and recnum should be used in DYNAMIC SERVER
FILTER (which will turn into a skip-scan), but the plan you got there only
used file_id alone.
So could you please attach the DDL of your table and index so that I can
try reproducing the problem?


Thanks,
Maryann


On Mon, Dec 15, 2014 at 2:46 PM, Perko, Ralph J ralph.pe...@pnnl.gov
wrote:

   My apologies for the delay.  I had to switch clusters and use a smaller
 dataset.  The discrepancy still exists but the numbers are a little
 different:

  I ran the same queries as in the original email (below)

  Total records: 581M

   Simple query based on secondary index value used in the subselect:
  1747 recs – 0.256 sec

  Subselect query:  177s

  Join query: 179s

  The answers to your questions are below.

   1. What is the Primary Key definition of your BULK_TABLE?

  CONSTRAINT pkey PRIMARY KEY (file_id,recnum)

  2. How many (approximately) distinct file_id values are there in the
 BULK_TABLE? (If you don't know for sure, you can just run a query to find
 out).

  select count(distinct(file_id)) from BULK_TABLE” - 7902 - 92s

  select distinct(file_id) from BULK_TABLE”  - returns in 25m

  3. How long does it take to run a full-scan query on BULK_TABLE, like
 select * from BULK_TABLE”?

  Results began returning after about 25min

  4. How long does it take to run a full-scan join query on BULK_TABLE,
 like select * from BULK_TABLE join (select file_id, recnum from
 BULK_TABLE) as SS on BULK_TABLE.file_id = SS.file_id and BULK_TABLE.recnum
 = SS.recnum”?

  The full-scan join fails with a MaxServerCacheSizeExceededException  -
 server cache set to 1G.

  Custom hbase/phoenix settings are attached.

  Thanks,
  Ralph


   From: Maryann Xue maryann@gmail.com
 Reply-To: user@phoenix.apache.org user@phoenix.apache.org
 Date: Friday, December 12, 2014 at 8:07 AM
 To: user@phoenix.apache.org user@phoenix.apache.org
 Subject: Re: Query performance question

   Hi Ralph,

  Thanks for the question!
 According to the explain result you got, the optimization worked exactly
 as expected with this query:

  DYNAMIC SERVER FILTER BY FILE_ID IN (SS.FILE_ID) means a skip-scan
 instead of a full-scan over BULK_TABLE will be executed at runtime based on
 the values of file_id it got from the inner query.

  So I need to know a few more things:
 1. What is the Primary Key definition of your BULK_TABLE?
 2. How many (approximately) distinct file_id values are there in the
 BULK_TABLE? (If you don't know for sure, you can just run a query to find
 out).
 3. How long does it take to run a full-scan query on BULK_TABLE, like
 select * from BULK_TABLE?
 4. How long does it take to run a full-scan join query on BULK_TABLE, like
 select * from BULK_TABLE join (select file_id, recnum from BULK_TABLE)
 as SS on BULK_TABLE.file_id = SS.file_id and BULK_TABLE.recnum =
 SS.recnum?


  Thanks,
 Maryann


 On Thu, Dec 11, 2014 at 6:28 PM, Perko, Ralph J ralph.pe...@pnnl.gov
 wrote:

  Hi,

  Thanks for all your help thus far with Phoenix.

  I am trying to understand the best way to construct a query that
 returns all the fields from a table but still takes advantage of a single
 field secondary index.   I have a table with upwards of 50 fields and do
 not wish to index them all but the use case exists to return them all.

  My general approach is to first select the records I want using an
 indexed field then use the returned pk values to get the entire record in
 the form of a subselect or join.

  The initial select executes very fast, sub-second , returning close to
 3000 records.  When used as a subselect or join the entire query takes very
 long (over 15min)  or does not return.   Based on the processing plans it
 appears there is a lot more going on than just a simple look-up of the
 values returned in the subselect.  Is there a way to do this using Phoenix
 SQL syntax?

  Any suggestions are appreciated.

  Initial indexed query (very fast):

  SELECT file_id,recnum

 FROM BULK_TABLE

 WHERE saddr IN (ip1,ip2,ip3))


  file_id and recnum make up the primary key


  Plan:

 | CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS OVER BULK_TABLE_SADDR_IDX
 [0,1,000,004,076] - [9,1,000,142,114] |

 | CLIENT MERGE SORT |


  Used as a subselect (times out):

   SELECT * FROM BULK_TABLE

 WHERE (file_id,recnum) IN(SELECT file_id,recnum

  FROM BULK_TABLE

   WHERE saddr IN (ip1,ip2,ip3));

 Plan:

 | CLIENT PARALLEL 10-WAY FULL SCAN OVER BULK_TABLE |

 | CLIENT MERGE SORT |

 | PARALLEL SEMI-JOIN TABLE 0 (SKIP MERGE) |

 | CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS
 OVER BULK_TABLE_SADDR_IDX [0,1,000,004,076] - [9,1,000,142,114] |

 | SERVER AGGREGATE INTO DISTINCT ROWS BY [RECNUM, FILE_ID] |

 | CLIENT MERGE SORT |

 | DYNAMIC SERVER FILTER BY FILE_ID IN ($1.$3

Re: Query performance question

2014-12-15 Thread Maryann Xue
And one more thing: the version of Phoenix you are running.

On Mon, Dec 15, 2014 at 3:21 PM, Maryann Xue maryann@gmail.com wrote:

 Hi Ralph,

 Thank you very much for the information! Very helpful for your questions.
 The numbers look reasonable as opposed to the query plan. But the only
 problem is both file_id and recnum should be used in DYNAMIC SERVER
 FILTER (which will turn into a skip-scan), but the plan you got there only
 used file_id alone.
 So could you please attach the DDL of your table and index so that I can
 try reproducing the problem?


 Thanks,
 Maryann


 On Mon, Dec 15, 2014 at 2:46 PM, Perko, Ralph J ralph.pe...@pnnl.gov
 wrote:

   My apologies for the delay.  I had to switch clusters and use a
 smaller dataset.  The discrepancy still exists but the numbers are a little
 different:

  I ran the same queries as in the original email (below)

  Total records: 581M

   Simple query based on secondary index value used in the subselect:
  1747 recs – 0.256 sec

  Subselect query:  177s

  Join query: 179s

  The answers to your questions are below.

   1. What is the Primary Key definition of your BULK_TABLE?

  CONSTRAINT pkey PRIMARY KEY (file_id,recnum)

  2. How many (approximately) distinct file_id values are there in the
 BULK_TABLE? (If you don't know for sure, you can just run a query to find
 out).

  select count(distinct(file_id)) from BULK_TABLE” - 7902 - 92s

  select distinct(file_id) from BULK_TABLE”  - returns in 25m

  3. How long does it take to run a full-scan query on BULK_TABLE, like
 select * from BULK_TABLE”?

  Results began returning after about 25min

  4. How long does it take to run a full-scan join query on BULK_TABLE,
 like select * from BULK_TABLE join (select file_id, recnum from
 BULK_TABLE) as SS on BULK_TABLE.file_id = SS.file_id and BULK_TABLE.recnum
 = SS.recnum”?

  The full-scan join fails with a MaxServerCacheSizeExceededException  -
 server cache set to 1G.

  Custom hbase/phoenix settings are attached.

  Thanks,
  Ralph


   From: Maryann Xue maryann@gmail.com
 Reply-To: user@phoenix.apache.org user@phoenix.apache.org
 Date: Friday, December 12, 2014 at 8:07 AM
 To: user@phoenix.apache.org user@phoenix.apache.org
 Subject: Re: Query performance question

   Hi Ralph,

  Thanks for the question!
 According to the explain result you got, the optimization worked
 exactly as expected with this query:

  DYNAMIC SERVER FILTER BY FILE_ID IN (SS.FILE_ID) means a skip-scan
 instead of a full-scan over BULK_TABLE will be executed at runtime based on
 the values of file_id it got from the inner query.

  So I need to know a few more things:
 1. What is the Primary Key definition of your BULK_TABLE?
 2. How many (approximately) distinct file_id values are there in the
 BULK_TABLE? (If you don't know for sure, you can just run a query to find
 out).
 3. How long does it take to run a full-scan query on BULK_TABLE, like
 select * from BULK_TABLE?
 4. How long does it take to run a full-scan join query on BULK_TABLE,
 like select * from BULK_TABLE join (select file_id, recnum from
 BULK_TABLE) as SS on BULK_TABLE.file_id = SS.file_id and
 BULK_TABLE.recnum = SS.recnum?


  Thanks,
 Maryann


 On Thu, Dec 11, 2014 at 6:28 PM, Perko, Ralph J ralph.pe...@pnnl.gov
 wrote:

  Hi,

  Thanks for all your help thus far with Phoenix.

  I am trying to understand the best way to construct a query that
 returns all the fields from a table but still takes advantage of a single
 field secondary index.   I have a table with upwards of 50 fields and do
 not wish to index them all but the use case exists to return them all.

  My general approach is to first select the records I want using an
 indexed field then use the returned pk values to get the entire record in
 the form of a subselect or join.

  The initial select executes very fast, sub-second , returning close to
 3000 records.  When used as a subselect or join the entire query takes very
 long (over 15min)  or does not return.   Based on the processing plans it
 appears there is a lot more going on than just a simple look-up of the
 values returned in the subselect.  Is there a way to do this using Phoenix
 SQL syntax?

  Any suggestions are appreciated.

  Initial indexed query (very fast):

  SELECT file_id,recnum

 FROM BULK_TABLE

 WHERE saddr IN (ip1,ip2,ip3))


  file_id and recnum make up the primary key


  Plan:

 | CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS OVER BULK_TABLE_SADDR_IDX
 [0,1,000,004,076] - [9,1,000,142,114] |

 | CLIENT MERGE SORT |


  Used as a subselect (times out):

   SELECT * FROM BULK_TABLE

 WHERE (file_id,recnum) IN(SELECT file_id,recnum

  FROM BULK_TABLE

   WHERE saddr IN (ip1,ip2,ip3));

 Plan:

 | CLIENT PARALLEL 10-WAY FULL SCAN OVER BULK_TABLE |

 | CLIENT MERGE SORT |

 | PARALLEL SEMI-JOIN TABLE 0 (SKIP MERGE) |

 | CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS
 OVER BULK_TABLE_SADDR_IDX [0,1,000,004,076] - [9,1,000,142,114

Re: Query performance question

2014-12-12 Thread Maryann Xue
Hi Ralph,

Thanks for the question!
According to the explain result you got, the optimization worked exactly
as expected with this query:

DYNAMIC SERVER FILTER BY FILE_ID IN (SS.FILE_ID) means a skip-scan
instead of a full-scan over BULK_TABLE will be executed at runtime based on
the values of file_id it got from the inner query.

So I need to know a few more things:
1. What is the Primary Key definition of your BULK_TABLE?
2. How many (approximately) distinct file_id values are there in the
BULK_TABLE? (If you don't know for sure, you can just run a query to find
out).
3. How long does it take to run a full-scan query on BULK_TABLE, like
select * from BULK_TABLE?
4. How long does it take to run a full-scan join query on BULK_TABLE, like
select * from BULK_TABLE join (select file_id, recnum from BULK_TABLE) as
SS on BULK_TABLE.file_id = SS.file_id and BULK_TABLE.recnum = SS.recnum?


Thanks,
Maryann


On Thu, Dec 11, 2014 at 6:28 PM, Perko, Ralph J ralph.pe...@pnnl.gov
wrote:

  Hi,

  Thanks for all your help thus far with Phoenix.

  I am trying to understand the best way to construct a query that returns
 all the fields from a table but still takes advantage of a single field
 secondary index.   I have a table with upwards of 50 fields and do not wish
 to index them all but the use case exists to return them all.

  My general approach is to first select the records I want using an
 indexed field then use the returned pk values to get the entire record in
 the form of a subselect or join.

  The initial select executes very fast, sub-second , returning close to
 3000 records.  When used as a subselect or join the entire query takes very
 long (over 15min)  or does not return.   Based on the processing plans it
 appears there is a lot more going on than just a simple look-up of the
 values returned in the subselect.  Is there a way to do this using Phoenix
 SQL syntax?

  Any suggestions are appreciated.

  Initial indexed query (very fast):

  SELECT file_id,recnum

 FROM BULK_TABLE

 WHERE saddr IN (ip1,ip2,ip3))


  file_id and recnum make up the primary key


  Plan:

 | CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS OVER BULK_TABLE_SADDR_IDX
 [0,1,000,004,076] - [9,1,000,142,114] |

 | CLIENT MERGE SORT |


   Used as a subselect (times out):

   SELECT * FROM BULK_TABLE

 WHERE (file_id,recnum) IN(SELECT file_id,recnum

   FROM BULK_TABLE

   WHERE saddr IN (ip1,ip2,ip3));

 Plan:

 | CLIENT PARALLEL 10-WAY FULL SCAN OVER BULK_TABLE |

 | CLIENT MERGE SORT |

 | PARALLEL SEMI-JOIN TABLE 0 (SKIP MERGE) |

 | CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS
 OVER BULK_TABLE_SADDR_IDX [0,1,000,004,076] - [9,1,000,142,114] |

 | SERVER AGGREGATE INTO DISTINCT ROWS BY [RECNUM, FILE_ID] |

 | CLIENT MERGE SORT |

 | DYNAMIC SERVER FILTER BY FILE_ID IN ($1.$3) |


  Another approach using using a join instead:


  SELECT *

 FROM BULK_TABLE

 JOIN

 (SELECT file_id, recnum

  FROM BULK_TABLE

  WHERE saddr in (ip1,ip2,ip3)) AS SS

 ON BULK_TABLE.file_id = SS.file_id AND BULK_TABLE.recnum = SS.recnum;


  Runs faster but still can take about 15min


  Plan:


  | CLIENT PARALLEL 10-WAY FULL SCAN OVER BULK_TABLE |

 | CLIENT MERGE SORT |

 | PARALLEL INNER-JOIN TABLE 0 |

 | CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS
 OVER BULK_TABLE_SADDR_IDX [0,1,000,004,076] - [9,1,000,142,114] |

 | CLIENT MERGE SORT |

 | DYNAMIC SERVER FILTER BY FILE_ID IN (SS.FILE_ID) |



   Is there a more efficient way to run a query such as this?

  Thanks!
  Ralph







-- 
Thanks,
Maryann


Re: FW: Exception in sub plan[0] exception - multi inner join

2014-12-10 Thread Maryann Xue
Hi Siddharth,


Thank you for attaching the log file! I didn't find any insufficient memory
error, so my previous guess should be wrong. But unfortunately I couldn't
seem to find any other useful information from the log regarding the
exception you got. So the best way to identify the problem is to get full
stack trace of your exception. Maybe you can try running the query in your
JDBC environment instead of in sqlline.

As to the configuration file, you just need to make sure that the
hbase-site.xml file is included in your CLASSPATH both for the server
side and the client side, and you can just put all your needed attributes
into that file.


Thanks,
Maryann


On Wed, Dec 10, 2014 at 9:35 AM, Siddharth Ubale 
siddharth.ub...@syncoms.com wrote:

  Hi ,



 I am adding the log file(the  exception log captured at 19:45) for the
 case where I have received the exception. Also I am sending my
 hbase-site.xml .

 Do let me know if any inputs from your side.



 Thanks,

 Siddharth



 *From:* Siddharth Ubale
 *Sent:* Wednesday, December 10, 2014 7:43 PM
 *To:* 'user@phoenix.apache.org'
 *Subject:* Exception in sub plan[0] exception - multi inner join



 Hi ,



 We have been trying to run a multi table inner join query .However very
 often we get a error message :”.Encountered exception in sub plan [0]
 execution.”.

 Since we had posted this message earlier to this forum, we had received a
 response saying that there are three parameters which have to be added ,
 which are :



 *1.   **property*

 *namephoenix.query.maxServerCacheBytes/name*

 *value409,715,200/value*

 */property*



 *2.   *

 *property*

 *namephoenix.query.maxGlobalMemoryPercentage/name*

 *value39/value*

 */property*



 *3.   **property*

 *namephoenix.coprocessor.maxServerCacheTimeToLiveMs/name*

 *value6/value*

 */property*







 Along with this we have also gone through the Phoenix tuning parameters on
 phoenix tuning page as well as Hbase configuration settings and added the
 recommended ones.



 But still we are getting the above mentioned error. Also, we are seeing a
 lag of 120 – 200 secs on the query which is :



 select /*+ NO_STAR_JOIN */  * from Address_1 as a

 inner join Customers_1 as c on c.C_Id = a.A_C_Id

 inner join Orders_1 as o on o.O_C_Id = a.A_C_Id

 inner join Order_Details_1 as od on od.O_Id = o.O_Id

 inner join Payterms as p on p.Payt_Id = o.O_Pay_Terms

 inner join Shipterms as s on s.Shipt_Id =
 o.O_Ship_Terms

 where c.C_Id = '69088';



 all the above tables have more than 2million + records.



 Also, if somebody could let us know whether hbase-site.xml in phoenix
 installation folder also needs to be updated with the following changes??
 Or it should contain only the one property mentioned there??







 Thanks,

 Siddharth Ubale,

 *Synchronized Communications *

 *#43, Velankani Tech Park, Block No. II, *

 *3rd Floor, Electronic City Phase I,*

 *Bangalore – 560 100*

 *Tel : +91 80 3202 4060 %2B91%2080%203202%204060*

 *Web:* *www.syncoms.com* http://www.syncoms.com/

 *[image: LogoNEWmohLARGE]*

 *London*|*Bangalore*|*Orlando*



 *we innovate, plan, execute, and transform the business​*






-- 
Thanks,
Maryann


Re: Phoenix-136 did not support aggregate queries with derived tables in from clause

2014-12-05 Thread Maryann Xue
Hi Sun,

Which version of Phoenix are you using? This feature is supported from 3.1
and 4.1. And there is no such error message in Phoenix code base now.


Thanks,
Maryann

On Fri, Dec 5, 2014 at 3:16 AM, su...@certusnet.com.cn 
su...@certusnet.com.cn wrote:

 Hi,all
 Notice that PHOENIX-136
 https://issues.apache.org/jira/browse/PHOENIX-136 has already supported
 derived tables in from clause, however,
 aggregate queries would throw error like the following:
Error: Complex nested queries not supported. (state=,code=0)

 The example queries are like : SELECT COUNT (1) FROM (SELECT COUNT (1)
 FROM table WHERE 1=1 GROUP BY COL1)
 while we are expecting to get the final results of distinct groups counts.

 Is there any mistake that I am missing or taking the correct grammer? If
 so, please corrects me kindly.
 Any available advice will be greately appreciated.

 Thanks,
 Sun.

 --
 --





-- 
Thanks,
Maryann


Re: PhoenixIOException - GlobalMemoryManager

2014-11-17 Thread Maryann Xue
Hi Ralph,

I think this is a known issue reported as PHOENIX-1011 (
https://issues.apache.org/jira/browse/PHOENIX-1011). We are still looking
at it. Will give you an update once it is solved.

Thanks a lot for the very detailed information, Ralph!


Thanks,
Maryann

On Mon, Nov 17, 2014 at 12:24 PM, Perko, Ralph J ralph.pe...@pnnl.gov
wrote:

  Hi, while importing data using the CsvBulkLoadTool I’ve run into an
 issue trying to query the data using sqlline.py.  The bulk load tool was
 successful.  There were no errors.  However when I attempt to query the
 data I get some exceptions:



 java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException

 at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)



 followed by many GlobalMemoryManger errors:



 WARN memory.GlobalMemoryManager: Orphaned chunk of  bytes found during
 finalize



 Not all queries, but most, produce this error and it seems related to the
 existence of a secondary index table:



 select * from TABLE limit 10;  --ERROR – index not used

 select un-indexed field from TABLE limit 10 -- ERROR



 If I run a query on an INTEGER column with a secondary index I do not get
 this error:



 select distinct(fieldx) from TABLE limit 10;  -- SUCCESS!



 However, a similar query on an indexed VARCHAR field produces a timeout
 error:

 java.lang.RuntimeException: … PhoenixIOException: Failed after retry of
 OutOfOrderScannerNextException: was there a rpc timeout?

 at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)



 select count(*) … times out as well



 Details:

 Total records imported: 7.2B

 Cluster size: 30 nodes

 Splits: 40 (salted)



 Phoenix version: 4.2.0

 HBase version: 0.98

 HDP distro 2.1.5



 I can scan the data with no errors from hbase shell



 Basic Phoenix table def:



 CREATE TABLE IF NOT EXISTS

 t1_csv_data

 (

 timestamp BIGINT NOT NULL,

 location VARCHAR NOT NULL,

 fileid VARCHAR NOT NULL,

 recnum INTEGER NOT NULL,

 field5 VARCHAR,

 ...

 field45 VARCHAR,

 CONSTRAINT pkey PRIMARY KEY (timestamp,

 location, fileid,recnum)

 )

 IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=40,
 SPLIT_POLICY=’org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy’;



 -- indexes

 CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
 COMPRESSION='SNAPPY',
 SPLIT_POLICY=’org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy’;;

 CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
 COMPRESSION='SNAPPY',
 SPLIT_POLICY=’org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy’;;

 CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
 COMPRESSION='SNAPPY',
 SPLIT_POLICY=’org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy’;;



 Thanks for your help,

 Ralph






-- 
Thanks,
Maryann


Re: PhoenixIOException - GlobalMemoryManager

2014-11-17 Thread Maryann Xue
Hi Ralph,

You may want to check this problem against the latest release of Phoenix,
coz we just incorporated a fix for a similar issue in our 3.2.1 RC1 and
4.2.1 RC1.


Thanks,
Maryann

On Mon, Nov 17, 2014 at 6:32 PM, Maryann Xue maryann@gmail.com wrote:

 Hi Ralph,

 I think this is a known issue reported as PHOENIX-1011 (
 https://issues.apache.org/jira/browse/PHOENIX-1011). We are still looking
 at it. Will give you an update once it is solved.

 Thanks a lot for the very detailed information, Ralph!


 Thanks,
 Maryann

 On Mon, Nov 17, 2014 at 12:24 PM, Perko, Ralph J ralph.pe...@pnnl.gov
 wrote:

  Hi, while importing data using the CsvBulkLoadTool I’ve run into an
 issue trying to query the data using sqlline.py.  The bulk load tool was
 successful.  There were no errors.  However when I attempt to query the
 data I get some exceptions:



 java.lang.RuntimeException:
 org.apache.phoenix.exception.PhoenixIOException

 at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)



 followed by many GlobalMemoryManger errors:



 WARN memory.GlobalMemoryManager: Orphaned chunk of  bytes found
 during finalize



 Not all queries, but most, produce this error and it seems related to the
 existence of a secondary index table:



 select * from TABLE limit 10;  --ERROR – index not used

 select un-indexed field from TABLE limit 10 -- ERROR



 If I run a query on an INTEGER column with a secondary index I do not get
 this error:



 select distinct(fieldx) from TABLE limit 10;  -- SUCCESS!



 However, a similar query on an indexed VARCHAR field produces a timeout
 error:

 java.lang.RuntimeException: … PhoenixIOException: Failed after retry of
 OutOfOrderScannerNextException: was there a rpc timeout?

 at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)



 select count(*) … times out as well



 Details:

 Total records imported: 7.2B

 Cluster size: 30 nodes

 Splits: 40 (salted)



 Phoenix version: 4.2.0

 HBase version: 0.98

 HDP distro 2.1.5



 I can scan the data with no errors from hbase shell



 Basic Phoenix table def:



 CREATE TABLE IF NOT EXISTS

 t1_csv_data

 (

 timestamp BIGINT NOT NULL,

 location VARCHAR NOT NULL,

 fileid VARCHAR NOT NULL,

 recnum INTEGER NOT NULL,

 field5 VARCHAR,

 ...

 field45 VARCHAR,

 CONSTRAINT pkey PRIMARY KEY (timestamp,

 location, fileid,recnum)

 )

 IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=40,
 SPLIT_POLICY=’org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy’;



 -- indexes

 CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
 COMPRESSION='SNAPPY',
 SPLIT_POLICY=’org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy’;;

 CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
 COMPRESSION='SNAPPY',
 SPLIT_POLICY=’org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy’;;

 CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
 COMPRESSION='SNAPPY',
 SPLIT_POLICY=’org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy’;;



 Thanks for your help,

 Ralph






 --
 Thanks,
 Maryann




-- 
Thanks,
Maryann


Re: Getting InsufficientMemoryException

2014-09-30 Thread Maryann Xue
Hi Ashish,

Could you please let us see your error message?


Thanks,
Maryann

On Tue, Sep 30, 2014 at 12:58 PM, ashish tapdiya ashishtapd...@gmail.com
wrote:

 Hey Maryann,

 Thanks for your input. I tried both the properties but no luck.

 ~Ashish

 On Sun, Sep 28, 2014 at 8:31 PM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Ashish,

 The global cache size is set to either 
 *phoenix.query.maxGlobalMemorySize* or 
 phoenix.query.maxGlobalMemoryPercentage
 * heapSize (Sorry about the mistake I made earlier). The 
 phoenix.query.maxServerCacheBytes is a client parameter and is most
 likely NOT the thing you should worry about. So you can try adjusting 
 phoenix.query.maxGlobalMemoryPercentage and the heap size in region
 server configurations and see how it works.


 Thanks,
 Maryann

 On Fri, Sep 26, 2014 at 10:48 PM, ashish tapdiya ashishtapd...@gmail.com
  wrote:

 I have tried that as well...but phoenix.query.maxServerCacheBytes
 remains the default value of 100 MB. I get to see it when join fails.

 Thanks,
 ~Ashish

 On Fri, Sep 26, 2014 at 8:02 PM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Ashish,

 The global cache size is set to either phoenix.query.maxServerCacheBytes
 or phoenix.query.maxGlobalMemoryPercentage * heapSize, whichever is
 *smaller*. You can try setting phoenix.query.maxGlobalMemoryPercentage
 instead, which is recommended, and see how it goes.


 Thanks,
 Maryann

 On Fri, Sep 26, 2014 at 5:37 PM, ashish tapdiya 
 ashishtapd...@gmail.com wrote:

 Hi Maryann,

 I am having the same issue where star join is failing with 
 MaxServerCacheSizeExceededException.
 I set phoenix.query.maxServerCacheBytes to 1 GB both in client and
 server hbase-site.xml's. However, it does not take effect.

 Phoenix 3.1
 HBase .94

 Thanks,
 ~Ashish

 On Fri, Sep 26, 2014 at 2:56 PM, Maryann Xue maryann@gmail.com
 wrote:

 Yes, you should make your modification on each region server, since
 this is a server-side configuration.


 On Thu, Sep 25, 2014 at 4:15 AM, G.S.Vijay Raajaa 
 gsvijayraa...@gmail.com wrote:

 Hi Xue,

   Thanks for replying. I did modify the hbase-site.xml by
 increasing the default value of phoenix.query.maxGlobalMemoryPercentage
 . Also increased the Region server heap space memory . The
 change didn't get reflected and I still get the error with an indication
 that global pool of 319507660 bytes is present. Should I modify
 the hbase-site.xml in every region server or just the file present in
 the class path of Phoenix client?

 Regards,
 Vijay Raajaa G S

 On Thu, Sep 25, 2014 at 1:47 AM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Vijay,

 I think here the query plan is scanning table *CUSTOMER_3 *while
 joining the other two tables at the same time, which means the region
 server memory for Phoenix should be large enough to hold 2 tables 
 together
 and you also need to expect some memory expansion for java objects.

 Do you mean that after you had modified the parameters you
 mentioned, you were still getting the same error message with exactly 
 the
 same numbers as global pool of 319507660 bytes? Did you make
 sure that the parameters actually took effect after modification?


 Thanks,
 Maryann

 On Tue, Sep 23, 2014 at 1:43 AM, G.S.Vijay Raajaa 
 gsvijayraa...@gmail.com wrote:

 Hi,

 I am trying to do a join of three tables usng the following
 query:

 *select c.c_first_name, ca.ca_city, cd.cd_education_status from
 CUSTOMER_3 c join CUSTOMER_DEMOGRAPHICS_1 cd on 
 c.c_current_cdemo_sk =
 cd.cd_demo_sk join CUSTOMER_ADDRESS_1 ca on c.c_current_addr_sk =
 ca.ca_address_sk group by ca.ca_city, cd.cd_education_status,
 c.c_first_name;*

 *The size of CUSTOMER_3 is 4.1 GB with 30million records.*

 *I get the following error:*

 ./psql.py 10.10.5.55 test.sql
 java.sql.SQLException: Encountered exception in hash plan [0]
 execution.
 at
 org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:146)
 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:211)
 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:204)
 at
 org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:54)
 at
 org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:204)
 at
 org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:193)
 at
 org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:147)
 at
 org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:152)
 at
 org.apache.phoenix.jdbc.PhoenixConnection.executeStatements(PhoenixConnection.java:220)
 at
 org.apache.phoenix.util.PhoenixRuntime.executeStatements(PhoenixRuntime.java:193)
 at
 org.apache.phoenix.util.PhoenixRuntime.main(PhoenixRuntime.java:140)
 Caused by: java.sql.SQLException:
 java.util.concurrent.ExecutionException:
 java.lang.reflect.UndeclaredThrowableException
 at
 org.apache.phoenix.cache.ServerCacheClient.addServerCache

Re: Getting InsufficientMemoryException

2014-09-28 Thread Maryann Xue
Hi Ashish,

The global cache size is set to either *phoenix.query.maxGlobalMemorySize*
or phoenix.query.maxGlobalMemoryPercentage * heapSize (Sorry about the
mistake I made earlier). The phoenix.query.maxServerCacheBytes is a
client parameter and is most likely NOT the thing you should worry about.
So you can try adjusting phoenix.query.maxGlobalMemoryPercentage and the
heap size in region server configurations and see how it works.


Thanks,
Maryann

On Fri, Sep 26, 2014 at 10:48 PM, ashish tapdiya ashishtapd...@gmail.com
wrote:

 I have tried that as well...but phoenix.query.maxServerCacheBytes
 remains the default value of 100 MB. I get to see it when join fails.

 Thanks,
 ~Ashish

 On Fri, Sep 26, 2014 at 8:02 PM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Ashish,

 The global cache size is set to either phoenix.query.maxServerCacheBytes
 or phoenix.query.maxGlobalMemoryPercentage * heapSize, whichever is
 *smaller*. You can try setting phoenix.query.maxGlobalMemoryPercentage
 instead, which is recommended, and see how it goes.


 Thanks,
 Maryann

 On Fri, Sep 26, 2014 at 5:37 PM, ashish tapdiya ashishtapd...@gmail.com
 wrote:

 Hi Maryann,

 I am having the same issue where star join is failing with 
 MaxServerCacheSizeExceededException.
 I set phoenix.query.maxServerCacheBytes to 1 GB both in client and
 server hbase-site.xml's. However, it does not take effect.

 Phoenix 3.1
 HBase .94

 Thanks,
 ~Ashish

 On Fri, Sep 26, 2014 at 2:56 PM, Maryann Xue maryann@gmail.com
 wrote:

 Yes, you should make your modification on each region server, since
 this is a server-side configuration.


 On Thu, Sep 25, 2014 at 4:15 AM, G.S.Vijay Raajaa 
 gsvijayraa...@gmail.com wrote:

 Hi Xue,

   Thanks for replying. I did modify the hbase-site.xml by
 increasing the default value of phoenix.query.maxGlobalMemoryPercentage
 . Also increased the Region server heap space memory . The
 change didn't get reflected and I still get the error with an indication
 that global pool of 319507660 bytes is present. Should I modify the
 hbase-site.xml in every region server or just the file present in the 
 class
 path of Phoenix client?

 Regards,
 Vijay Raajaa G S

 On Thu, Sep 25, 2014 at 1:47 AM, Maryann Xue maryann@gmail.com
 wrote:

 Hi Vijay,

 I think here the query plan is scanning table *CUSTOMER_3 *while
 joining the other two tables at the same time, which means the region
 server memory for Phoenix should be large enough to hold 2 tables 
 together
 and you also need to expect some memory expansion for java objects.

 Do you mean that after you had modified the parameters you mentioned,
 you were still getting the same error message with exactly the same 
 numbers
 as global pool of 319507660 bytes? Did you make sure that the
 parameters actually took effect after modification?


 Thanks,
 Maryann

 On Tue, Sep 23, 2014 at 1:43 AM, G.S.Vijay Raajaa 
 gsvijayraa...@gmail.com wrote:

 Hi,

 I am trying to do a join of three tables usng the following
 query:

 *select c.c_first_name, ca.ca_city, cd.cd_education_status from
 CUSTOMER_3 c join CUSTOMER_DEMOGRAPHICS_1 cd on 
 c.c_current_cdemo_sk =
 cd.cd_demo_sk join CUSTOMER_ADDRESS_1 ca on c.c_current_addr_sk =
 ca.ca_address_sk group by ca.ca_city, cd.cd_education_status,
 c.c_first_name;*

 *The size of CUSTOMER_3 is 4.1 GB with 30million records.*

 *I get the following error:*

 ./psql.py 10.10.5.55 test.sql
 java.sql.SQLException: Encountered exception in hash plan [0]
 execution.
 at
 org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:146)
 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:211)
 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:204)
 at
 org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:54)
 at
 org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:204)
 at
 org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:193)
 at
 org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:147)
 at
 org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:152)
 at
 org.apache.phoenix.jdbc.PhoenixConnection.executeStatements(PhoenixConnection.java:220)
 at
 org.apache.phoenix.util.PhoenixRuntime.executeStatements(PhoenixRuntime.java:193)
 at
 org.apache.phoenix.util.PhoenixRuntime.main(PhoenixRuntime.java:140)
 Caused by: java.sql.SQLException:
 java.util.concurrent.ExecutionException:
 java.lang.reflect.UndeclaredThrowableException
 at
 org.apache.phoenix.cache.ServerCacheClient.addServerCache(ServerCacheClient.java:199)
 at
 org.apache.phoenix.join.HashCacheClient.addHashCache(HashCacheClient.java:78)
 at
 org.apache.phoenix.execute.HashJoinPlan$1.call(HashJoinPlan.java:119)
 at
 org.apache.phoenix.execute.HashJoinPlan$1.call(HashJoinPlan.java:114)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303

Re: Subqueries: Missing LPAREN

2014-09-25 Thread Maryann Xue
Hi JM,

Sorry that I made a mistake earlier. Your query should be covered by
https://issues.apache.org/jira/browse/PHOENIX-945. Will keep you updated on
the progress of that issue.


Thanks,
Maryann


On Wed, Sep 24, 2014 at 3:41 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Hi Maryann,

 We have already spotted PHOENIX-1168 and tracking it ;) Thanks for the
 patch!

 We have already downloaded it and will give it a try.

 JM

 2014-09-24 15:39 GMT-04:00 Maryann Xue maryann@gmail.com:

 Hi JM,

 Think this sub-query feature is covered by PHOENIX-1168, for which a
 check-in is expected very soon.


 Thanks,
 Maryann

 On Wed, Sep 24, 2014 at 9:06 AM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Hi,

 Is it possible to run sub-queries with Phoenix? Something like this:

 select * from metadata n where L = 1 AND R = (select max(R) from
 metadata z where n.A = z.A);

 Goel is to get all lignes where L=1 and R=max. Field A is the key.

 Thanks,

 JM




 --
 Thanks,
 Maryann





-- 
Thanks,
Maryann


Re: JOIN and limit

2014-09-24 Thread Maryann Xue
Hi Abe,

The expected behavior should be pushing the LIMIT to a (since it's left
outer join) while checking the limit again against the final joined
results. But it does not work as expected, it should be bug.

Could you please verify it and report an issue with a test case attached?


Thanks,
Maryann

On Thu, Sep 18, 2014 at 8:51 PM, Abe Weinograd a...@flonet.com wrote:

 Given the following query


 select * from a left outer join b on a.col2 = b.col2 and b.col3 = 'X'
 WHERE a.col1 = 'Y' LIMIT 1000


 After playing with this for a while and getting results that didn't make
 sense, it seems the LIMIT is being pushed on b or something like it before
 the join is applied and not after the full resultset is computed.  I was
 digging around a little bit.  Is that expected behavior?

 Thanks,
 Abe




-- 
Thanks,
Maryann


Re: Getting InsufficientMemoryException

2014-09-24 Thread Maryann Xue
Hi Vijay,

I think here the query plan is scanning table *CUSTOMER_3 *while
joining the other two tables at the same time, which means the region
server memory for Phoenix should be large enough to hold 2 tables together
and you also need to expect some memory expansion for java objects.

Do you mean that after you had modified the parameters you mentioned, you
were still getting the same error message with exactly the same
numbers as global
pool of 319507660 bytes? Did you make sure that the parameters actually
took effect after modification?


Thanks,
Maryann

On Tue, Sep 23, 2014 at 1:43 AM, G.S.Vijay Raajaa gsvijayraa...@gmail.com
wrote:

 Hi,

 I am trying to do a join of three tables usng the following query:

 *select c.c_first_name, ca.ca_city, cd.cd_education_status from
 CUSTOMER_3 c join CUSTOMER_DEMOGRAPHICS_1 cd on c.c_current_cdemo_sk =
 cd.cd_demo_sk join CUSTOMER_ADDRESS_1 ca on c.c_current_addr_sk =
 ca.ca_address_sk group by ca.ca_city, cd.cd_education_status,
 c.c_first_name;*

 *The size of CUSTOMER_3 is 4.1 GB with 30million records.*

 *I get the following error:*

 ./psql.py 10.10.5.55 test.sql
 java.sql.SQLException: Encountered exception in hash plan [0] execution.
 at org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:146)
 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:211)
 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:204)
 at
 org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:54)
 at
 org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:204)
 at
 org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:193)
 at
 org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:147)
 at
 org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:152)
 at
 org.apache.phoenix.jdbc.PhoenixConnection.executeStatements(PhoenixConnection.java:220)
 at
 org.apache.phoenix.util.PhoenixRuntime.executeStatements(PhoenixRuntime.java:193)
 at org.apache.phoenix.util.PhoenixRuntime.main(PhoenixRuntime.java:140)
 Caused by: java.sql.SQLException: java.util.concurrent.ExecutionException:
 java.lang.reflect.UndeclaredThrowableException
 at
 org.apache.phoenix.cache.ServerCacheClient.addServerCache(ServerCacheClient.java:199)
 at
 org.apache.phoenix.join.HashCacheClient.addHashCache(HashCacheClient.java:78)
 at org.apache.phoenix.execute.HashJoinPlan$1.call(HashJoinPlan.java:119)
 at org.apache.phoenix.execute.HashJoinPlan$1.call(HashJoinPlan.java:114)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.util.concurrent.ExecutionException:
 java.lang.reflect.UndeclaredThrowableException
 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
 at java.util.concurrent.FutureTask.get(FutureTask.java:91)
 at
 org.apache.phoenix.cache.ServerCacheClient.addServerCache(ServerCacheClient.java:191)
 ... 8 more
 Caused by: java.lang.reflect.UndeclaredThrowableException
 at $Proxy10.addServerCache(Unknown Source)
 at
 org.apache.phoenix.cache.ServerCacheClient$1.call(ServerCacheClient.java:169)
 at
 org.apache.phoenix.cache.ServerCacheClient$1.call(ServerCacheClient.java:164)
 ... 5 more
 Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
 Failed after attempts=14, exceptions:
 Tue Sep 23 00:25:53 CDT 2014,
 org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@100e398,
 java.io.IOException: java.io.IOException:
 org.apache.phoenix.memory.InsufficientMemoryException: Requested memory of
 446623727 bytes is larger than global pool of 319507660 bytes.
 Tue Sep 23 00:26:02 CDT 2014,
 org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@100e398,
 java.io.IOException: java.io.IOException:
 org.apache.phoenix.memory.InsufficientMemoryException: Requested memory of
 446623727 bytes is larger than global pool of 319507660 bytes.
 Tue Sep 23 00:26:18 CDT 2014,
 org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@100e398,
 java.io.IOException: java.io.IOException:
 org.apache.phoenix.memory.InsufficientMemoryException: Requested memory of
 446623727 bytes is larger than global pool of 319507660 bytes.
 Tue Sep 23 00:26:43 CDT 2014,
 org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@100e398,
 java.io.IOException: java.io.IOException:
 org.apache.phoenix.memory.InsufficientMemoryException: Requested memory of
 446623727 bytes is larger than global pool of 319507660 bytes.
 Tue Sep 23 00:27:01 CDT 2014,
 org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@100e398,
 java.io.IOException: java.io.IOException:
 org.apache.phoenix.memory.InsufficientMemoryException: Requested memory of
 446623727 bytes is larger than