Re: Inner Join not returning any results in Phoenix

2015-02-24 Thread Abe Weinograd
Thanks!

On Tue, Feb 24, 2015 at 1:25 PM, Matthew Johnson matt.john...@algomi.com
wrote:

 Legend! Thanks :-)





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 24 February 2015 18:15

 *To:* user@phoenix.apache.org
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Thanks, James and Matt!



 I successfully repro the bug using setMaxRow(). Adding a LIMIT to the
 query itself won't cause such a bug.



 Please watch on https://issues.apache.org/jira/browse/PHOENIX-1680. I
 will post a patch right away.





 Maryann





 On Tue, Feb 24, 2015 at 12:57 PM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Thanks James.



 Maryann, with the upserts I sent, I run the following code (using
 phoenix-client-4.2.2.jar):



  // Register JDBC Driver

  Class.*forName*(
 org.apache.phoenix.jdbc.PhoenixDriver).newInstance();



  Connection conn = DriverManager.
 *getConnection*(jdbc:phoenix:box1,box2,box3, , );



  // Create a Statement class to execute the
 SQL statement

  Statement stmtLimited =
 conn.createStatement();

  stmtLimited.setMaxRows(10);



  // Execute the SQL statement and get the
 results in a *Resultset*

  ResultSet rsLimited =
 stmtLimited.executeQuery(select * from mytable1 m1 inner join mytable2
 m2 on m1.firstletter = m2.firstletter where m1.firstletter = 'Z');

  *int* rsLimitedCount = 0;

  *while*(rsLimited.next()) {

   rsLimitedCount++;

  }



  // Create a Statement class to execute the
 SQL statement

  Statement stmtNoLimit =
 conn.createStatement();



  // Execute the SQL statement and get the
 results in a *Resultset*

  ResultSet rsNoLimit =
 stmtNoLimit.executeQuery(select * from mytable1 m1 inner join mytable2
 m2 on m1.firstletter = m2.firstletter where m1.firstletter = 'Z');

  *int* rsNoLimitedCount = 0;

  *while*(rsNoLimit.next()) {

   rsNoLimitedCount++;

  }



  System.*out*.println(Results found when
 LIMIT 10 was  + rsLimitedCount +  but with NO LIMIT was  +
 rsNoLimitedCount);





 And the output is:



 Results found when LIMIT 10 was 0 but with NO LIMIT was 1





 Hope that helps!



 Cheers,

 Matt





 *From:* James Taylor [mailto:jamestay...@apache.org]
 *Sent:* 24 February 2015 17:27


 *To:* user
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Mary,

 You can just call Statement.setMaxRows(10) to emulate what SQuirrel is
 doing.

 Thanks,

 James



 On Tue, Feb 24, 2015 at 9:09 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 I have two environments, one with just some dummy data, and one with real
 data in it. The dummy data one gave me results when I queried and the real
 data didn’t, but turns out that is just because the dummy data was much
 more coherent (because it was manually created for a specific test) so the
 RHS table always matches the LHS and therefore the join gives me results.



 I have attached a script that demonstrates my problem (create 2 Phoenix
 tables, insert some rows, and run a query using a join). When I run this on
 my cluster, I consistently see the issue I am having. If I set the LIMIT in
 Squirrel to 10, I get no results, but if I set it to 26, I get a result
 (since I have 26 rows in the RHS table and I am intentionally querying for
 the last one).



 Please give it a go and see if it reproduces for you – are you using
 Squirrel? If so, what version?



 Thanks!

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 24 February 2015 16:41


 *To:* user@phoenix.apache.org
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Hi Matt,



 I just noticed these lines in your very first message:



 *PS* Something that may or may not be of note: In the environments I am
 using:

 WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar*

 FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar*



 What do you mean by WORKING and FAILING?



 I still cannot reproduce the bug here. Could you please post DDLs you used
 for related tables?





 Thanks,

 Maryann







 On Tue, Feb 24, 2015 at 11:27 AM, James Taylor jamestay...@apache.org
 wrote:

 FYI, SQuirrel sets the max rows to return as 100. You can change this in
 the tool, though.



 On Tuesday, February 24, 2015, Maryann Xue maryann@gmail.com wrote:

 Thanks a lot, Matt, for the reply! Very helpful. *SERVER FILTER BY
 PageFilter 100* does look like

RE: Inner Join not returning any results in Phoenix

2015-02-24 Thread Matthew Johnson
Hi Maryann,



Thanks for that - I will schedule an update to the latest version of
Phoenix then for later this week (and try out the merge-join hints).



In the meantime, here are my explain plans:



*JOIN WITH NO SQUIRREL LIMIT*



*PLAN*

*CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1*

*PARALLEL INNER-JOIN TABLE 0*

*CLIENT 3-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2*



*JOIN WITH SQUIRREL LIMIT 100*



*PLAN*

*CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1*

*CLIENT 100 ROW LIMIT*

*PARALLEL INNER-JOIN TABLE 0*

*CLIENT 3-CHUNK SERIAL 1-WAY FULL SCAN OVER mytable2*

*SERVER FILTER BY PageFilter 100*

*SERVER 100 ROW LIMIT*

*CLIENT 100 ROW LIMIT*





I’m not really sure how to read that, but it does seem to suggest that
‘mytable2’ is being limited to 100 – thoughts?



Cheers,

Matt



*From:* Maryann Xue [mailto:maryann@gmail.com]
*Sent:* 23 February 2015 18:10
*To:* user@phoenix.apache.org
*Subject:* Re: Inner Join not returning any results in Phoenix



Hi Matt,



Yes, the upgrade is as easy as that. I believe things will work fine with
existing tables.

I tried with a similar query but didn't see that it was a Phoenix bug. So
could you please try the following explain statement and see the execution
plan:



EXPLAIN *SELECT * FROM mytable1 hc*

*INNER JOIN “mytable2” bs*

*On hc.myId = bs.”myId”*





Thanks,

Maryann





On Fri, Feb 20, 2015 at 1:09 PM, Matthew Johnson matt.john...@algomi.com
wrote:

Hi Maryann,



That’s a good point – I am using 4.2.2, so if that feature is 4.3+ then
that would explain why it’s not working. Is upgrading versions of Phoenix
as simple as removing the previous jar from HBase lib folder and dropping
the new Phoenix jar in (and restarting HBase)? Will all the existing
Phoenix tables and views be backwards-compatible and work with the new
version?



Cheers,

Matt





*From:* Maryann Xue [mailto:maryann@gmail.com]
*Sent:* 20 February 2015 17:46


*To:* user@phoenix.apache.org
*Subject:* Re: Inner Join not returning any results in Phoenix



Which version of Phoenix are you using, Matt? This feature is only
available in the latest releases of 4.3/3.3.



On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson matt.john...@algomi.com
wrote:

Hi Maryann,



Unfortunately my two tables are roughly the same size (~500k), but I have
tested a different join where one table is ~500k and the other is ~20k and
putting the larger one first is definitely far more performant. I believe
you are right about running out of memory, I can see this repeated a few
times in the region server logs followed by what appears to be a restart or
disconnect:



*[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host
machine (eg GC): pause of approximately 1083ms*



I have been looking at the Phoenix page on joins (
http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge
joins for large tables by using a hint. I have tried this though with no
success:



*SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc*

*INNER JOIN “mytable2” bs*

*On hc.”myId” = bs.”myId”*



Am I putting the hint in the wrong place? Does it need to go next to the
JOIN rather than the SELECT?



I will try increasing the memory available to the Region Servers as well to
see if that helps.



Thanks!

Matt





*From:* Maryann Xue [mailto:maryann@gmail.com]
*Sent:* 20 February 2015 16:28
*To:* user@phoenix.apache.org


*Subject:* Re: Inner Join not returning any results in Phoenix



Hi Matt,



The error you got with Limit Rows off might be related to insufficient
memory on region servers for one of your tables. Which is the larger table
between table1 and table2? You might want to try putting the larger table
as the first table in your join query and see if it works.



And I will quickly check if the LIMIT problem is a Phoenix bug and will
keep you posted.





Thanks,

Maryann





On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson matt.john...@algomi.com
wrote:

Hi Abe,



Glad to hear I’m not alone! Will try and figure out exactly what’s
happening and maybe raise a Jira :-)





@Constantin – I have tried with and without the “Limit Rows” – but without
it, and without any indexes, the query runs for a while (about 10 minutes?)
and then throws an error:



*Error: Encountered exception in sub plan [0] execution.*



Which I’m guessing is either HBase or Zookeeper timeout. The weird thing is
that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit Rows”
does not affect any aggregate functions like ‘count’, because the actual
number of result rows for a count is just 1 row (the count itself). But in
HBase it seems that the Row Limit, as Abe mentioned, is applied to one of
the table BEFORE it does the join, so it affects the results of the ‘count’
function.



When I try to create my indexes so I am able to do the join without Row
Limit, I get the following error:



ERROR 1029

Re: Inner Join not returning any results in Phoenix

2015-02-24 Thread Maryann Xue
Thanks a lot, Matt, for the reply! Very helpful. *SERVER FILTER BY
PageFilter 100* does look like a but here. I will try again to reproduce
it.


Thanks,
Maryann

On Tue, Feb 24, 2015 at 6:07 AM, Matthew Johnson matt.john...@algomi.com
wrote:

 Hi Maryann,



 Thanks for that - I will schedule an update to the latest version of
 Phoenix then for later this week (and try out the merge-join hints).



 In the meantime, here are my explain plans:



 *JOIN WITH NO SQUIRREL LIMIT*



 *PLAN*

 *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1*

 *PARALLEL INNER-JOIN TABLE 0*

 *CLIENT 3-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2*



 *JOIN WITH SQUIRREL LIMIT 100*



 *PLAN*

 *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1*

 *CLIENT 100 ROW LIMIT*

 *PARALLEL INNER-JOIN TABLE 0*

 *CLIENT 3-CHUNK SERIAL 1-WAY FULL SCAN OVER mytable2*

 *SERVER FILTER BY PageFilter 100*

 *SERVER 100 ROW LIMIT*

 *CLIENT 100 ROW LIMIT*





 I’m not really sure how to read that, but it does seem to suggest that
 ‘mytable2’ is being limited to 100 – thoughts?



 Cheers,

 Matt



 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 23 February 2015 18:10

 *To:* user@phoenix.apache.org
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Hi Matt,



 Yes, the upgrade is as easy as that. I believe things will work fine with
 existing tables.

 I tried with a similar query but didn't see that it was a Phoenix bug. So
 could you please try the following explain statement and see the execution
 plan:



 EXPLAIN *SELECT * FROM mytable1 hc*

 *INNER JOIN “mytable2” bs*

 *On hc.myId = bs.”myId”*





 Thanks,

 Maryann





 On Fri, Feb 20, 2015 at 1:09 PM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 That’s a good point – I am using 4.2.2, so if that feature is 4.3+ then
 that would explain why it’s not working. Is upgrading versions of Phoenix
 as simple as removing the previous jar from HBase lib folder and dropping
 the new Phoenix jar in (and restarting HBase)? Will all the existing
 Phoenix tables and views be backwards-compatible and work with the new
 version?



 Cheers,

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 20 February 2015 17:46


 *To:* user@phoenix.apache.org
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Which version of Phoenix are you using, Matt? This feature is only
 available in the latest releases of 4.3/3.3.



 On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 Unfortunately my two tables are roughly the same size (~500k), but I have
 tested a different join where one table is ~500k and the other is ~20k and
 putting the larger one first is definitely far more performant. I believe
 you are right about running out of memory, I can see this repeated a few
 times in the region server logs followed by what appears to be a restart or
 disconnect:



 *[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host
 machine (eg GC): pause of approximately 1083ms*



 I have been looking at the Phoenix page on joins (
 http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge
 joins for large tables by using a hint. I have tried this though with no
 success:



 *SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc*

 *INNER JOIN “mytable2” bs*

 *On hc.”myId” = bs.”myId”*



 Am I putting the hint in the wrong place? Does it need to go next to the
 JOIN rather than the SELECT?



 I will try increasing the memory available to the Region Servers as well
 to see if that helps.



 Thanks!

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 20 February 2015 16:28
 *To:* user@phoenix.apache.org


 *Subject:* Re: Inner Join not returning any results in Phoenix



 Hi Matt,



 The error you got with Limit Rows off might be related to insufficient
 memory on region servers for one of your tables. Which is the larger table
 between table1 and table2? You might want to try putting the larger table
 as the first table in your join query and see if it works.



 And I will quickly check if the LIMIT problem is a Phoenix bug and will
 keep you posted.





 Thanks,

 Maryann





 On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Abe,



 Glad to hear I’m not alone! Will try and figure out exactly what’s
 happening and maybe raise a Jira :-)





 @Constantin – I have tried with and without the “Limit Rows” – but without
 it, and without any indexes, the query runs for a while (about 10 minutes?)
 and then throws an error:



 *Error: Encountered exception in sub plan [0] execution.*



 Which I’m guessing is either HBase or Zookeeper timeout. The weird thing
 is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit
 Rows” does not affect any aggregate functions like ‘count’, because the
 actual number

Re: Inner Join not returning any results in Phoenix

2015-02-24 Thread James Taylor
FYI, SQuirrel sets the max rows to return as 100. You can change this in
the tool, though.

On Tuesday, February 24, 2015, Maryann Xue maryann@gmail.com wrote:

 Thanks a lot, Matt, for the reply! Very helpful. *SERVER FILTER BY
 PageFilter 100* does look like a but here. I will try again to reproduce
 it.


 Thanks,
 Maryann

 On Tue, Feb 24, 2015 at 6:07 AM, Matthew Johnson matt.john...@algomi.com
 javascript:_e(%7B%7D,'cvml','matt.john...@algomi.com'); wrote:

 Hi Maryann,



 Thanks for that - I will schedule an update to the latest version of
 Phoenix then for later this week (and try out the merge-join hints).



 In the meantime, here are my explain plans:



 *JOIN WITH NO SQUIRREL LIMIT*



 *PLAN*

 *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1*

 *PARALLEL INNER-JOIN TABLE 0*

 *CLIENT 3-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2*



 *JOIN WITH SQUIRREL LIMIT 100*



 *PLAN*

 *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1*

 *CLIENT 100 ROW LIMIT*

 *PARALLEL INNER-JOIN TABLE 0*

 *CLIENT 3-CHUNK SERIAL 1-WAY FULL SCAN OVER mytable2*

 *SERVER FILTER BY PageFilter 100*

 *SERVER 100 ROW LIMIT*

 *CLIENT 100 ROW LIMIT*





 I’m not really sure how to read that, but it does seem to suggest that
 ‘mytable2’ is being limited to 100 – thoughts?



 Cheers,

 Matt



 *From:* Maryann Xue [mailto:maryann@gmail.com
 javascript:_e(%7B%7D,'cvml','maryann@gmail.com');]
 *Sent:* 23 February 2015 18:10

 *To:* user@phoenix.apache.org
 javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Hi Matt,



 Yes, the upgrade is as easy as that. I believe things will work fine with
 existing tables.

 I tried with a similar query but didn't see that it was a Phoenix bug. So
 could you please try the following explain statement and see the execution
 plan:



 EXPLAIN *SELECT * FROM mytable1 hc*

 *INNER JOIN “mytable2” bs*

 *On hc.myId = bs.”myId”*





 Thanks,

 Maryann





 On Fri, Feb 20, 2015 at 1:09 PM, Matthew Johnson matt.john...@algomi.com
 javascript:_e(%7B%7D,'cvml','matt.john...@algomi.com'); wrote:

 Hi Maryann,



 That’s a good point – I am using 4.2.2, so if that feature is 4.3+ then
 that would explain why it’s not working. Is upgrading versions of Phoenix
 as simple as removing the previous jar from HBase lib folder and dropping
 the new Phoenix jar in (and restarting HBase)? Will all the existing
 Phoenix tables and views be backwards-compatible and work with the new
 version?



 Cheers,

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com
 javascript:_e(%7B%7D,'cvml','maryann@gmail.com');]
 *Sent:* 20 February 2015 17:46


 *To:* user@phoenix.apache.org
 javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Which version of Phoenix are you using, Matt? This feature is only
 available in the latest releases of 4.3/3.3.



 On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson 
 matt.john...@algomi.com
 javascript:_e(%7B%7D,'cvml','matt.john...@algomi.com'); wrote:

 Hi Maryann,



 Unfortunately my two tables are roughly the same size (~500k), but I have
 tested a different join where one table is ~500k and the other is ~20k and
 putting the larger one first is definitely far more performant. I believe
 you are right about running out of memory, I can see this repeated a few
 times in the region server logs followed by what appears to be a restart or
 disconnect:



 *[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host
 machine (eg GC): pause of approximately 1083ms*



 I have been looking at the Phoenix page on joins (
 http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge
 joins for large tables by using a hint. I have tried this though with no
 success:



 *SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc*

 *INNER JOIN “mytable2” bs*

 *On hc.”myId” = bs.”myId”*



 Am I putting the hint in the wrong place? Does it need to go next to the
 JOIN rather than the SELECT?



 I will try increasing the memory available to the Region Servers as well
 to see if that helps.



 Thanks!

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com
 javascript:_e(%7B%7D,'cvml','maryann@gmail.com');]
 *Sent:* 20 February 2015 16:28
 *To:* user@phoenix.apache.org
 javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');


 *Subject:* Re: Inner Join not returning any results in Phoenix



 Hi Matt,



 The error you got with Limit Rows off might be related to insufficient
 memory on region servers for one of your tables. Which is the larger table
 between table1 and table2? You might want to try putting the larger table
 as the first table in your join query and see if it works.



 And I will quickly check if the LIMIT problem is a Phoenix bug and will
 keep you posted.





 Thanks,

 Maryann





 On Fri, Feb 20

RE: Inner Join not returning any results in Phoenix

2015-02-24 Thread Matthew Johnson
Thanks James.



Maryann, with the upserts I sent, I run the following code (using
phoenix-client-4.2.2.jar):



 // Register JDBC Driver

 Class.*forName*(
org.apache.phoenix.jdbc.PhoenixDriver).newInstance();



 Connection conn = DriverManager.*getConnection*
(jdbc:phoenix:box1,box2,box3, , );



 // Create a Statement class to execute the SQL
statement

 Statement stmtLimited = conn.createStatement();

 stmtLimited.setMaxRows(10);



 // Execute the SQL statement and get the
results in a *Resultset*

 ResultSet rsLimited =
stmtLimited.executeQuery(select
* from mytable1 m1 inner join mytable2 m2 on m1.firstletter =
m2.firstletter where m1.firstletter = 'Z');

 *int* rsLimitedCount = 0;

 *while*(rsLimited.next()) {

  rsLimitedCount++;

 }



 // Create a Statement class to execute the SQL
statement

 Statement stmtNoLimit = conn.createStatement();



 // Execute the SQL statement and get the
results in a *Resultset*

 ResultSet rsNoLimit =
stmtNoLimit.executeQuery(select
* from mytable1 m1 inner join mytable2 m2 on m1.firstletter =
m2.firstletter where m1.firstletter = 'Z');

 *int* rsNoLimitedCount = 0;

 *while*(rsNoLimit.next()) {

  rsNoLimitedCount++;

 }



 System.*out*.println(Results found when LIMIT
10 was  + rsLimitedCount +  but with NO LIMIT was  + rsNoLimitedCount);





And the output is:



Results found when LIMIT 10 was 0 but with NO LIMIT was 1





Hope that helps!



Cheers,

Matt





*From:* James Taylor [mailto:jamestay...@apache.org]
*Sent:* 24 February 2015 17:27
*To:* user
*Subject:* Re: Inner Join not returning any results in Phoenix



Mary,

You can just call Statement.setMaxRows(10) to emulate what SQuirrel is
doing.

Thanks,

James



On Tue, Feb 24, 2015 at 9:09 AM, Matthew Johnson matt.john...@algomi.com
wrote:

Hi Maryann,



I have two environments, one with just some dummy data, and one with real
data in it. The dummy data one gave me results when I queried and the real
data didn’t, but turns out that is just because the dummy data was much
more coherent (because it was manually created for a specific test) so the
RHS table always matches the LHS and therefore the join gives me results.



I have attached a script that demonstrates my problem (create 2 Phoenix
tables, insert some rows, and run a query using a join). When I run this on
my cluster, I consistently see the issue I am having. If I set the LIMIT in
Squirrel to 10, I get no results, but if I set it to 26, I get a result
(since I have 26 rows in the RHS table and I am intentionally querying for
the last one).



Please give it a go and see if it reproduces for you – are you using
Squirrel? If so, what version?



Thanks!

Matt





*From:* Maryann Xue [mailto:maryann@gmail.com]
*Sent:* 24 February 2015 16:41


*To:* user@phoenix.apache.org
*Subject:* Re: Inner Join not returning any results in Phoenix



Hi Matt,



I just noticed these lines in your very first message:



*PS* Something that may or may not be of note: In the environments I am
using:

WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar*

FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar*



What do you mean by WORKING and FAILING?



I still cannot reproduce the bug here. Could you please post DDLs you used
for related tables?





Thanks,

Maryann







On Tue, Feb 24, 2015 at 11:27 AM, James Taylor jamestay...@apache.org
wrote:

FYI, SQuirrel sets the max rows to return as 100. You can change this in
the tool, though.



On Tuesday, February 24, 2015, Maryann Xue maryann@gmail.com wrote:

Thanks a lot, Matt, for the reply! Very helpful. *SERVER FILTER BY
PageFilter 100* does look like a but here. I will try again to reproduce
it.





Thanks,

Maryann



On Tue, Feb 24, 2015 at 6:07 AM, Matthew Johnson matt.john...@algomi.com
wrote:

Hi Maryann,



Thanks for that - I will schedule an update to the latest version of
Phoenix then for later this week (and try out the merge-join hints).



In the meantime, here are my explain plans:



*JOIN WITH NO SQUIRREL LIMIT*



*PLAN*

*CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1*

*PARALLEL INNER-JOIN TABLE 0*

*CLIENT 3-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2*



*JOIN WITH SQUIRREL LIMIT 100*



*PLAN*

*CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1*

*CLIENT 100 ROW LIMIT*

*PARALLEL INNER-JOIN TABLE 0*

*CLIENT 3-CHUNK SERIAL 1

Re: Inner Join not returning any results in Phoenix

2015-02-24 Thread Maryann Xue
Thanks, James and Matt!

I successfully repro the bug using setMaxRow(). Adding a LIMIT to the query
itself won't cause such a bug.

Please watch on https://issues.apache.org/jira/browse/PHOENIX-1680. I will
post a patch right away.


Maryann


On Tue, Feb 24, 2015 at 12:57 PM, Matthew Johnson matt.john...@algomi.com
wrote:

 Thanks James.



 Maryann, with the upserts I sent, I run the following code (using
 phoenix-client-4.2.2.jar):



  // Register JDBC Driver

  Class.*forName*(
 org.apache.phoenix.jdbc.PhoenixDriver).newInstance();



  Connection conn = DriverManager.
 *getConnection*(jdbc:phoenix:box1,box2,box3, , );



  // Create a Statement class to execute the
 SQL statement

  Statement stmtLimited =
 conn.createStatement();

  stmtLimited.setMaxRows(10);



  // Execute the SQL statement and get the
 results in a *Resultset*

  ResultSet rsLimited =
 stmtLimited.executeQuery(select * from mytable1 m1 inner join mytable2
 m2 on m1.firstletter = m2.firstletter where m1.firstletter = 'Z');

  *int* rsLimitedCount = 0;

  *while*(rsLimited.next()) {

   rsLimitedCount++;

  }



  // Create a Statement class to execute the
 SQL statement

  Statement stmtNoLimit =
 conn.createStatement();



  // Execute the SQL statement and get the
 results in a *Resultset*

  ResultSet rsNoLimit =
 stmtNoLimit.executeQuery(select * from mytable1 m1 inner join mytable2
 m2 on m1.firstletter = m2.firstletter where m1.firstletter = 'Z');

  *int* rsNoLimitedCount = 0;

  *while*(rsNoLimit.next()) {

   rsNoLimitedCount++;

  }



  System.*out*.println(Results found when
 LIMIT 10 was  + rsLimitedCount +  but with NO LIMIT was  +
 rsNoLimitedCount);





 And the output is:



 Results found when LIMIT 10 was 0 but with NO LIMIT was 1





 Hope that helps!



 Cheers,

 Matt





 *From:* James Taylor [mailto:jamestay...@apache.org]
 *Sent:* 24 February 2015 17:27

 *To:* user
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Mary,

 You can just call Statement.setMaxRows(10) to emulate what SQuirrel is
 doing.

 Thanks,

 James



 On Tue, Feb 24, 2015 at 9:09 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 I have two environments, one with just some dummy data, and one with real
 data in it. The dummy data one gave me results when I queried and the real
 data didn’t, but turns out that is just because the dummy data was much
 more coherent (because it was manually created for a specific test) so the
 RHS table always matches the LHS and therefore the join gives me results.



 I have attached a script that demonstrates my problem (create 2 Phoenix
 tables, insert some rows, and run a query using a join). When I run this on
 my cluster, I consistently see the issue I am having. If I set the LIMIT in
 Squirrel to 10, I get no results, but if I set it to 26, I get a result
 (since I have 26 rows in the RHS table and I am intentionally querying for
 the last one).



 Please give it a go and see if it reproduces for you – are you using
 Squirrel? If so, what version?



 Thanks!

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 24 February 2015 16:41


 *To:* user@phoenix.apache.org
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Hi Matt,



 I just noticed these lines in your very first message:



 *PS* Something that may or may not be of note: In the environments I am
 using:

 WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar*

 FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar*



 What do you mean by WORKING and FAILING?



 I still cannot reproduce the bug here. Could you please post DDLs you used
 for related tables?





 Thanks,

 Maryann







 On Tue, Feb 24, 2015 at 11:27 AM, James Taylor jamestay...@apache.org
 wrote:

 FYI, SQuirrel sets the max rows to return as 100. You can change this in
 the tool, though.



 On Tuesday, February 24, 2015, Maryann Xue maryann@gmail.com wrote:

 Thanks a lot, Matt, for the reply! Very helpful. *SERVER FILTER BY
 PageFilter 100* does look like a but here. I will try again to reproduce
 it.





 Thanks,

 Maryann



 On Tue, Feb 24, 2015 at 6:07 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Maryann,



 Thanks for that - I will schedule an update to the latest version of
 Phoenix then for later this week (and try out the merge-join hints

Inner Join not returning any results in Phoenix

2015-02-20 Thread Matthew Johnson
Hi guys,



I’m a little bit stuck with doing an Inner Join with Phoenix. I set up one
environment, created tables in HBase, and then created views (rather than
tables) in Phoenix, and am able to query as expected (when I join my two
tables I see results). I’ve just promoted to another environment, with the
exact same setup, but my Inner Join returns no results!



I run the following two individual queries:



*SELECT * FROM mytable1 hc*

*where hc.myId = 'XS0'*



*SELECT * FROM mytable2 bs*

*where bs.myId = 'XS0'*



And both of these queries give results. But when I run:



*SELECT * FROM mytable1 hc*

*INNER JOIN “mytable2” bs*

*On hc.myId = bs.”myId”*



I get no results. I also get no results if I try:



*SELECT * FROM mytable1 hc*

*where hc.myId in (select distinct “myId” from “mytable2”)*



I have checked in HBase shell and can see the “myId” value is as expected
(XS0 in both tables). I am not sure if there are any logs that I can look
at to get some insight?



Many thanks in advance for any suggestions!



Cheers,

Matt



*PS* Something that may or may not be of note: In the environments I am
using:

WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar*

FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar*


RE: Inner Join not returning any results in Phoenix

2015-02-20 Thread Ciureanu, Constantin (GfK)
Hi Matthew,

Is it working without the quotes “ /   ?   (I see you are using 2 types of 
quotes, weird)
I guess that’s not needed, and probably causing troubles.   I don’t have to use 
quotes anyway.

Alternatively check the types of data in those 2 tables (if the field types are 
not the same in both tables, the join will not work).

Good luck,
  Constantin

From: Matthew Johnson [mailto:matt.john...@algomi.com]
Sent: Friday, February 20, 2015 12:54 PM
To: user@phoenix.apache.org
Subject: Inner Join not returning any results in Phoenix

Hi guys,

I’m a little bit stuck with doing an Inner Join with Phoenix. I set up one 
environment, created tables in HBase, and then created views (rather than 
tables) in Phoenix, and am able to query as expected (when I join my two tables 
I see results). I’ve just promoted to another environment, with the exact same 
setup, but my Inner Join returns no results!

I run the following two individual queries:

SELECT * FROM mytable1 hc
where hc.myId = 'XS0'

SELECT * FROM mytable2 bs
where bs.myId = 'XS0'

And both of these queries give results. But when I run:

SELECT * FROM mytable1 hc
INNER JOIN “mytable2” bs
On hc.myId = bs.”myId”

I get no results. I also get no results if I try:

SELECT * FROM mytable1 hc
where hc.myId in (select distinct “myId” from “mytable2”)

I have checked in HBase shell and can see the “myId” value is as expected (XS0 
in both tables). I am not sure if there are any logs that I can look at to get 
some insight?

Many thanks in advance for any suggestions!

Cheers,
Matt

PS Something that may or may not be of note: In the environments I am using:
WORKING: hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar
FAILING: hbase-0.98.9-hadoop2 / phoenix-4.2.2-server.jar


RE: Inner Join not returning any results in Phoenix

2015-02-20 Thread Ciureanu, Constantin (GfK)
Hello Matt,

http://codingclues.eu/2008/the-squirrel-100-rows-problem/

Can you please test again after unchecking “Contents- Limit rows” and “SQL – 
Limit rows”?

[cid:image001.png@01D04D2C.FB047EE0]

P.S. Off-topic – it’s as funny as this “problem” (not possible to send an email 
for more than 500 miles away ☺  http://www.ibiblio.org/harris/500milemail.html )

Regards,
  Constantin

From: Abe Weinograd [mailto:a...@flonet.com]
Sent: Friday, February 20, 2015 4:18 PM
To: user
Subject: Re: Inner Join not returning any results in Phoenix

Matt,

I have seen this same issue.  When passing a LIMIT to a query with joins (most 
query tools do it implicitly), Phoenix seems to apply that to the table on the 
right of the join I believe.  I hadn't had a chance to play with it more and 
file a JIRA, but what you are describing is consistent with what I have seen.

Abe

On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson 
matt.john...@algomi.commailto:matt.john...@algomi.com wrote:
Hi Constantin,

Many thanks for your reply – the quotes were both of the same type (double 
quotes for table and column names, single quotes for string literals), it is 
just my email client that formatted them weirdly, sorry!

I have discovered what I believe is an important piece of the puzzle to my 
problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has a 
“Limit Rows” feature. When I try and count the number of rows in a single table:

select count(*) from “mytable1”

I get the expected number of results (eg 20,000). But when I join two tables 
together, it seems that the “Limit Rows” from Squirrel is somehow being applied 
before the join is performed, and if “Limit Rows” is set to 100 I get 100 
results or less. If the inner join is quite sparse (eg 20,000 rows in a table 
but only 100 of these will join with a second table) then I believe it tries to 
join the first 100 it finds and returns no results. In my experience of Oracle 
or MySQL, joins are done entirely on server side and then you just get back the 
number of rows you limited, rather than what appears to be happening which is 
the row limit is applied to the first table before the join is attempted with 
the second table. Is that how Phoenix works?

I have also discovered that I get different results (with “Limit Rows” turned 
on) depending on which order I join the tables:

SELECT count(*) FROM “mytable1” hc
INNER JOIN “mytable2” bs
On hc.”myId” = bs.”myId”

Gives me a very different number of results than:

SELECT count(*) FROM “mytable2” bs
INNER JOIN “mytable1” hc
On hc.”myId” = bs.”myId”


Unfortunately I cannot test whether I get the same number of results with 
“Limit Rows” turned off because my query times out! So I am now looking at 
creating secondary indexes on the “myId” column in both tables to see if I am 
able to do this join quicker. Does a join like this use a lot of memory on 
server side? Is something likely to be running out of resources?

Many thanks again for your time.

Cheers,
Matt


From: Ciureanu, Constantin (GfK) 
[mailto:constantin.ciure...@gfk.commailto:constantin.ciure...@gfk.com]
Sent: 20 February 2015 14:40
To: user@phoenix.apache.orgmailto:user@phoenix.apache.org
Subject: RE: Inner Join not returning any results in Phoenix

Hi Matthew,

Is it working without the quotes “ /   ?   (I see you are using 2 types of 
quotes, weird)
I guess that’s not needed, and probably causing troubles.   I don’t have to use 
quotes anyway.

Alternatively check the types of data in those 2 tables (if the field types are 
not the same in both tables, the join will not work).

Good luck,
  Constantin

From: Matthew Johnson [mailto:matt.john...@algomi.com]
Sent: Friday, February 20, 2015 12:54 PM
To: user@phoenix.apache.orgmailto:user@phoenix.apache.org
Subject: Inner Join not returning any results in Phoenix

Hi guys,

I’m a little bit stuck with doing an Inner Join with Phoenix. I set up one 
environment, created tables in HBase, and then created views (rather than 
tables) in Phoenix, and am able to query as expected (when I join my two tables 
I see results). I’ve just promoted to another environment, with the exact same 
setup, but my Inner Join returns no results!

I run the following two individual queries:

SELECT * FROM mytable1 hc
where hc.myId = 'XS0'

SELECT * FROM mytable2 bs
where bs.myId = 'XS0'

And both of these queries give results. But when I run:

SELECT * FROM mytable1 hc
INNER JOIN “mytable2” bs
On hc.myId = bs.”myId”

I get no results. I also get no results if I try:

SELECT * FROM mytable1 hc
where hc.myId in (select distinct “myId” from “mytable2”)

I have checked in HBase shell and can see the “myId” value is as expected (XS0 
in both tables). I am not sure if there are any logs that I can look at to get 
some insight?

Many thanks in advance for any suggestions!

Cheers,
Matt

PS Something that may or may not be of note: In the environments I am using:
WORKING: hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar
FAILING

Re: Inner Join not returning any results in Phoenix

2015-02-20 Thread Abe Weinograd
Matt,

I have seen this same issue.  When passing a LIMIT to a query with joins
(most query tools do it implicitly), Phoenix seems to apply that to the
table on the right of the join I believe.  I hadn't had a chance to play
with it more and file a JIRA, but what you are describing is consistent
with what I have seen.

Abe

On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson matt.john...@algomi.com
wrote:

 Hi Constantin,



 Many thanks for your reply – the quotes were both of the same type (double
 quotes for table and column names, single quotes for string literals), it
 is just my email client that formatted them weirdly, sorry!



 I have discovered what I believe is an important piece of the puzzle to my
 problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has
 a “Limit Rows” feature. When I try and count the number of rows in a single
 table:



 *select count(*) from “mytable1”*



 I get the expected number of results (eg 20,000). But when I join two
 tables together, it seems that the “Limit Rows” from Squirrel is somehow
 being applied before the join is performed, and if “Limit Rows” is set to
 100 I get 100 results or less. If the inner join is quite sparse (eg 20,000
 rows in a table but only 100 of these will join with a second table) then I
 believe it tries to join the first 100 it finds and returns no results. In
 my experience of Oracle or MySQL, joins are done entirely on server side
 and then you just get back the number of rows you limited, rather than what
 appears to be happening which is the row limit is applied to the first
 table before the join is attempted with the second table. Is that how
 Phoenix works?



 I have also discovered that I get different results (with “Limit Rows”
 turned on) depending on which order I join the tables:



 *SELECT count(*) FROM “mytable1” hc*

 *INNER JOIN “mytable2” bs*

 *On hc.”myId” = bs.”myId”*



 Gives me a very different number of results than:



 *SELECT count(*) FROM “mytable2” bs*

 *INNER JOIN “mytable1” hc*

 *On hc.”myId” = bs.”myId”*





 Unfortunately I cannot test whether I get the same number of results with
 “Limit Rows” turned off because my query times out! So I am now looking at
 creating secondary indexes on the “myId” column in both tables to see if I
 am able to do this join quicker. Does a join like this use a lot of memory
 on server side? Is something likely to be running out of resources?



 Many thanks again for your time.



 Cheers,

 Matt





 *From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com]
 *Sent:* 20 February 2015 14:40
 *To:* user@phoenix.apache.org
 *Subject:* RE: Inner Join not returning any results in Phoenix



 Hi Matthew,



 Is it working without the quotes “ / **  ?   (I see you are using 2
 types of quotes, weird)

 I guess that’s not needed, and probably causing troubles.   I don’t have
 to use quotes anyway.



 Alternatively check the types of data in those 2 tables (if the field
 types are not the same in both tables, the join will not work).



 Good luck,

   Constantin



 *From:* Matthew Johnson [mailto:matt.john...@algomi.com
 matt.john...@algomi.com]
 *Sent:* Friday, February 20, 2015 12:54 PM
 *To:* user@phoenix.apache.org
 *Subject:* Inner Join not returning any results in Phoenix



 Hi guys,



 I’m a little bit stuck with doing an Inner Join with Phoenix. I set up one
 environment, created tables in HBase, and then created views (rather than
 tables) in Phoenix, and am able to query as expected (when I join my two
 tables I see results). I’ve just promoted to another environment, with the
 exact same setup, but my Inner Join returns no results!



 I run the following two individual queries:



 *SELECT * FROM mytable1 hc*

 *where hc.myId = 'XS0'*



 *SELECT * FROM mytable2 bs*

 *where bs.myId = 'XS0'*



 And both of these queries give results. But when I run:



 *SELECT * FROM mytable1 hc*

 *INNER JOIN “mytable2” bs*

 *On hc.myId = bs.”myId”*



 I get no results. I also get no results if I try:



 *SELECT * FROM mytable1 hc*

 *where hc.myId in (select distinct “myId” from “mytable2”)*



 I have checked in HBase shell and can see the “myId” value is as expected
 (XS0 in both tables). I am not sure if there are any logs that I can look
 at to get some insight?



 Many thanks in advance for any suggestions!



 Cheers,

 Matt



 *PS* Something that may or may not be of note: In the environments I am
 using:

 WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar*

 FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar*



RE: Inner Join not returning any results in Phoenix

2015-02-20 Thread Matthew Johnson
Hi Abe,



Glad to hear I’m not alone! Will try and figure out exactly what’s
happening and maybe raise a Jira :-)





@Constantin – I have tried with and without the “Limit Rows” – but without
it, and without any indexes, the query runs for a while (about 10 minutes?)
and then throws an error:



*Error: Encountered exception in sub plan [0] execution.*



Which I’m guessing is either HBase or Zookeeper timeout. The weird thing is
that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit Rows”
does not affect any aggregate functions like ‘count’, because the actual
number of result rows for a count is just 1 row (the count itself). But in
HBase it seems that the Row Limit, as Abe mentioned, is applied to one of
the table BEFORE it does the join, so it affects the results of the ‘count’
function.



When I try to create my indexes so I am able to do the join without Row
Limit, I get the following error:



ERROR 1029 (42Y88): Mutable secondary indexes must have the
hbase.regionserver.wal.codec property set to
org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the
hbase-sites.xml of every region server



Which I am happy to do (will have to wait until outside of business hours
though), but I am curious, will this have any impact on the rest of my
cluster and could it have any unforeseen consequences?



Thanks again for the input!



Cheers,

Matt





*From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com]
*Sent:* 20 February 2015 15:48
*To:* user@phoenix.apache.org
*Subject:* RE: Inner Join not returning any results in Phoenix



Hello Matt,



http://codingclues.eu/2008/the-squirrel-100-rows-problem/



Can you please test again after unchecking “Contents- Limit rows” and “SQL
– Limit rows”?



[image: SQL tab]



P.S. Off-topic – it’s as funny as this “problem” (not possible to send an
email for more than 500 miles away J
http://www.ibiblio.org/harris/500milemail.html )



Regards,

  Constantin



*From:* Abe Weinograd [mailto:a...@flonet.com a...@flonet.com]
*Sent:* Friday, February 20, 2015 4:18 PM
*To:* user
*Subject:* Re: Inner Join not returning any results in Phoenix



Matt,



I have seen this same issue.  When passing a LIMIT to a query with joins
(most query tools do it implicitly), Phoenix seems to apply that to the
table on the right of the join I believe.  I hadn't had a chance to play
with it more and file a JIRA, but what you are describing is consistent
with what I have seen.



Abe



On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson matt.john...@algomi.com
wrote:

Hi Constantin,



Many thanks for your reply – the quotes were both of the same type (double
quotes for table and column names, single quotes for string literals), it
is just my email client that formatted them weirdly, sorry!



I have discovered what I believe is an important piece of the puzzle to my
problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has
a “Limit Rows” feature. When I try and count the number of rows in a single
table:



*select count(*) from “mytable1”*



I get the expected number of results (eg 20,000). But when I join two
tables together, it seems that the “Limit Rows” from Squirrel is somehow
being applied before the join is performed, and if “Limit Rows” is set to
100 I get 100 results or less. If the inner join is quite sparse (eg 20,000
rows in a table but only 100 of these will join with a second table) then I
believe it tries to join the first 100 it finds and returns no results. In
my experience of Oracle or MySQL, joins are done entirely on server side
and then you just get back the number of rows you limited, rather than what
appears to be happening which is the row limit is applied to the first
table before the join is attempted with the second table. Is that how
Phoenix works?



I have also discovered that I get different results (with “Limit Rows”
turned on) depending on which order I join the tables:



*SELECT count(*) FROM “mytable1” hc*

*INNER JOIN “mytable2” bs*

*On hc.”myId” = bs.”myId”*



Gives me a very different number of results than:



*SELECT count(*) FROM “mytable2” bs*

*INNER JOIN “mytable1” hc*

*On hc.”myId” = bs.”myId”*





Unfortunately I cannot test whether I get the same number of results with
“Limit Rows” turned off because my query times out! So I am now looking at
creating secondary indexes on the “myId” column in both tables to see if I
am able to do this join quicker. Does a join like this use a lot of memory
on server side? Is something likely to be running out of resources?



Many thanks again for your time.



Cheers,

Matt





*From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com]
*Sent:* 20 February 2015 14:40
*To:* user@phoenix.apache.org
*Subject:* RE: Inner Join not returning any results in Phoenix



Hi Matthew,



Is it working without the quotes “ / **  ?   (I see you are using 2 types
of quotes, weird)

I guess that’s not needed, and probably causing troubles.   I

RE: Inner Join not returning any results in Phoenix

2015-02-20 Thread Matthew Johnson
Hi Constantin,



Many thanks for your reply – the quotes were both of the same type (double
quotes for table and column names, single quotes for string literals), it
is just my email client that formatted them weirdly, sorry!



I have discovered what I believe is an important piece of the puzzle to my
problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has
a “Limit Rows” feature. When I try and count the number of rows in a single
table:



*select count(*) from “mytable1”*



I get the expected number of results (eg 20,000). But when I join two
tables together, it seems that the “Limit Rows” from Squirrel is somehow
being applied before the join is performed, and if “Limit Rows” is set to
100 I get 100 results or less. If the inner join is quite sparse (eg 20,000
rows in a table but only 100 of these will join with a second table) then I
believe it tries to join the first 100 it finds and returns no results. In
my experience of Oracle or MySQL, joins are done entirely on server side
and then you just get back the number of rows you limited, rather than what
appears to be happening which is the row limit is applied to the first
table before the join is attempted with the second table. Is that how
Phoenix works?



I have also discovered that I get different results (with “Limit Rows”
turned on) depending on which order I join the tables:



*SELECT count(*) FROM “mytable1” hc*

*INNER JOIN “mytable2” bs*

*On hc.”myId” = bs.”myId”*



Gives me a very different number of results than:



*SELECT count(*) FROM “mytable2” bs*

*INNER JOIN “mytable1” hc*

*On hc.”myId” = bs.”myId”*





Unfortunately I cannot test whether I get the same number of results with
“Limit Rows” turned off because my query times out! So I am now looking at
creating secondary indexes on the “myId” column in both tables to see if I
am able to do this join quicker. Does a join like this use a lot of memory
on server side? Is something likely to be running out of resources?



Many thanks again for your time.



Cheers,

Matt





*From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com]
*Sent:* 20 February 2015 14:40
*To:* user@phoenix.apache.org
*Subject:* RE: Inner Join not returning any results in Phoenix



Hi Matthew,



Is it working without the quotes “ / **  ?   (I see you are using 2 types
of quotes, weird)

I guess that’s not needed, and probably causing troubles.   I don’t have to
use quotes anyway.



Alternatively check the types of data in those 2 tables (if the field types
are not the same in both tables, the join will not work).



Good luck,

  Constantin



*From:* Matthew Johnson [mailto:matt.john...@algomi.com
matt.john...@algomi.com]
*Sent:* Friday, February 20, 2015 12:54 PM
*To:* user@phoenix.apache.org
*Subject:* Inner Join not returning any results in Phoenix



Hi guys,



I’m a little bit stuck with doing an Inner Join with Phoenix. I set up one
environment, created tables in HBase, and then created views (rather than
tables) in Phoenix, and am able to query as expected (when I join my two
tables I see results). I’ve just promoted to another environment, with the
exact same setup, but my Inner Join returns no results!



I run the following two individual queries:



*SELECT * FROM mytable1 hc*

*where hc.myId = 'XS0'*



*SELECT * FROM mytable2 bs*

*where bs.myId = 'XS0'*



And both of these queries give results. But when I run:



*SELECT * FROM mytable1 hc*

*INNER JOIN “mytable2” bs*

*On hc.myId = bs.”myId”*



I get no results. I also get no results if I try:



*SELECT * FROM mytable1 hc*

*where hc.myId in (select distinct “myId” from “mytable2”)*



I have checked in HBase shell and can see the “myId” value is as expected
(XS0 in both tables). I am not sure if there are any logs that I can look
at to get some insight?



Many thanks in advance for any suggestions!



Cheers,

Matt



*PS* Something that may or may not be of note: In the environments I am
using:

WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar*

FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar*


Re: Inner Join not returning any results in Phoenix

2015-02-20 Thread Maryann Xue
Hi Matt,

The error you got with Limit Rows off might be related to insufficient
memory on region servers for one of your tables. Which is the larger table
between table1 and table2? You might want to try putting the larger table
as the first table in your join query and see if it works.

And I will quickly check if the LIMIT problem is a Phoenix bug and will
keep you posted.


Thanks,
Maryann


On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson matt.john...@algomi.com
wrote:

 Hi Abe,



 Glad to hear I’m not alone! Will try and figure out exactly what’s
 happening and maybe raise a Jira :-)





 @Constantin – I have tried with and without the “Limit Rows” – but without
 it, and without any indexes, the query runs for a while (about 10 minutes?)
 and then throws an error:



 *Error: Encountered exception in sub plan [0] execution.*



 Which I’m guessing is either HBase or Zookeeper timeout. The weird thing
 is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit
 Rows” does not affect any aggregate functions like ‘count’, because the
 actual number of result rows for a count is just 1 row (the count itself).
 But in HBase it seems that the Row Limit, as Abe mentioned, is applied to
 one of the table BEFORE it does the join, so it affects the results of the
 ‘count’ function.



 When I try to create my indexes so I am able to do the join without Row
 Limit, I get the following error:



 ERROR 1029 (42Y88): Mutable secondary indexes must have the
 hbase.regionserver.wal.codec property set to
 org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the
 hbase-sites.xml of every region server



 Which I am happy to do (will have to wait until outside of business hours
 though), but I am curious, will this have any impact on the rest of my
 cluster and could it have any unforeseen consequences?



 Thanks again for the input!



 Cheers,

 Matt





 *From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com]
 *Sent:* 20 February 2015 15:48

 *To:* user@phoenix.apache.org
 *Subject:* RE: Inner Join not returning any results in Phoenix



 Hello Matt,



 http://codingclues.eu/2008/the-squirrel-100-rows-problem/



 Can you please test again after unchecking “Contents- Limit rows” and “SQL
 – Limit rows”?



 [image: SQL tab]



 P.S. Off-topic – it’s as funny as this “problem” (not possible to send an
 email for more than 500 miles away J
 http://www.ibiblio.org/harris/500milemail.html )



 Regards,

   Constantin



 *From:* Abe Weinograd [mailto:a...@flonet.com a...@flonet.com]
 *Sent:* Friday, February 20, 2015 4:18 PM
 *To:* user
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Matt,



 I have seen this same issue.  When passing a LIMIT to a query with joins
 (most query tools do it implicitly), Phoenix seems to apply that to the
 table on the right of the join I believe.  I hadn't had a chance to play
 with it more and file a JIRA, but what you are describing is consistent
 with what I have seen.



 Abe



 On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Constantin,



 Many thanks for your reply – the quotes were both of the same type (double
 quotes for table and column names, single quotes for string literals), it
 is just my email client that formatted them weirdly, sorry!



 I have discovered what I believe is an important piece of the puzzle to my
 problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has
 a “Limit Rows” feature. When I try and count the number of rows in a single
 table:



 *select count(*) from “mytable1”*



 I get the expected number of results (eg 20,000). But when I join two
 tables together, it seems that the “Limit Rows” from Squirrel is somehow
 being applied before the join is performed, and if “Limit Rows” is set to
 100 I get 100 results or less. If the inner join is quite sparse (eg 20,000
 rows in a table but only 100 of these will join with a second table) then I
 believe it tries to join the first 100 it finds and returns no results. In
 my experience of Oracle or MySQL, joins are done entirely on server side
 and then you just get back the number of rows you limited, rather than what
 appears to be happening which is the row limit is applied to the first
 table before the join is attempted with the second table. Is that how
 Phoenix works?



 I have also discovered that I get different results (with “Limit Rows”
 turned on) depending on which order I join the tables:



 *SELECT count(*) FROM “mytable1” hc*

 *INNER JOIN “mytable2” bs*

 *On hc.”myId” = bs.”myId”*



 Gives me a very different number of results than:



 *SELECT count(*) FROM “mytable2” bs*

 *INNER JOIN “mytable1” hc*

 *On hc.”myId” = bs.”myId”*





 Unfortunately I cannot test whether I get the same number of results with
 “Limit Rows” turned off because my query times out! So I am now looking at
 creating secondary indexes on the “myId” column in both tables to see if I

Re: Inner Join not returning any results in Phoenix

2015-02-20 Thread Maryann Xue
Which version of Phoenix are you using, Matt?

On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson matt.john...@algomi.com
wrote:

 Hi Maryann,



 Unfortunately my two tables are roughly the same size (~500k), but I have
 tested a different join where one table is ~500k and the other is ~20k and
 putting the larger one first is definitely far more performant. I believe
 you are right about running out of memory, I can see this repeated a few
 times in the region server logs followed by what appears to be a restart or
 disconnect:



 *[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host
 machine (eg GC): pause of approximately 1083ms*



 I have been looking at the Phoenix page on joins (
 http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge
 joins for large tables by using a hint. I have tried this though with no
 success:



 *SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc*

 *INNER JOIN “mytable2” bs*

 *On hc.”myId” = bs.”myId”*



 Am I putting the hint in the wrong place? Does it need to go next to the
 JOIN rather than the SELECT?



 I will try increasing the memory available to the Region Servers as well
 to see if that helps.



 Thanks!

 Matt





 *From:* Maryann Xue [mailto:maryann@gmail.com]
 *Sent:* 20 February 2015 16:28
 *To:* user@phoenix.apache.org

 *Subject:* Re: Inner Join not returning any results in Phoenix



 Hi Matt,



 The error you got with Limit Rows off might be related to insufficient
 memory on region servers for one of your tables. Which is the larger table
 between table1 and table2? You might want to try putting the larger table
 as the first table in your join query and see if it works.



 And I will quickly check if the LIMIT problem is a Phoenix bug and will
 keep you posted.





 Thanks,

 Maryann





 On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Abe,



 Glad to hear I’m not alone! Will try and figure out exactly what’s
 happening and maybe raise a Jira :-)





 @Constantin – I have tried with and without the “Limit Rows” – but without
 it, and without any indexes, the query runs for a while (about 10 minutes?)
 and then throws an error:



 *Error: Encountered exception in sub plan [0] execution.*



 Which I’m guessing is either HBase or Zookeeper timeout. The weird thing
 is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit
 Rows” does not affect any aggregate functions like ‘count’, because the
 actual number of result rows for a count is just 1 row (the count itself).
 But in HBase it seems that the Row Limit, as Abe mentioned, is applied to
 one of the table BEFORE it does the join, so it affects the results of the
 ‘count’ function.



 When I try to create my indexes so I am able to do the join without Row
 Limit, I get the following error:



 ERROR 1029 (42Y88): Mutable secondary indexes must have the
 hbase.regionserver.wal.codec property set to
 org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the
 hbase-sites.xml of every region server



 Which I am happy to do (will have to wait until outside of business hours
 though), but I am curious, will this have any impact on the rest of my
 cluster and could it have any unforeseen consequences?



 Thanks again for the input!



 Cheers,

 Matt





 *From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com]
 *Sent:* 20 February 2015 15:48


 *To:* user@phoenix.apache.org
 *Subject:* RE: Inner Join not returning any results in Phoenix



 Hello Matt,



 http://codingclues.eu/2008/the-squirrel-100-rows-problem/



 Can you please test again after unchecking “Contents- Limit rows” and “SQL
 – Limit rows”?



 [image: SQL tab]



 P.S. Off-topic – it’s as funny as this “problem” (not possible to send an
 email for more than 500 miles away J
 http://www.ibiblio.org/harris/500milemail.html )



 Regards,

   Constantin



 *From:* Abe Weinograd [mailto:a...@flonet.com a...@flonet.com]
 *Sent:* Friday, February 20, 2015 4:18 PM
 *To:* user
 *Subject:* Re: Inner Join not returning any results in Phoenix



 Matt,



 I have seen this same issue.  When passing a LIMIT to a query with joins
 (most query tools do it implicitly), Phoenix seems to apply that to the
 table on the right of the join I believe.  I hadn't had a chance to play
 with it more and file a JIRA, but what you are describing is consistent
 with what I have seen.



 Abe



 On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi Constantin,



 Many thanks for your reply – the quotes were both of the same type (double
 quotes for table and column names, single quotes for string literals), it
 is just my email client that formatted them weirdly, sorry!



 I have discovered what I believe is an important piece of the puzzle to my
 problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has
 a “Limit Rows” feature. When I try and count the number of rows

RE: Inner Join not returning any results in Phoenix

2015-02-20 Thread Matthew Johnson
Hi Maryann,



That’s a good point – I am using 4.2.2, so if that feature is 4.3+ then
that would explain why it’s not working. Is upgrading versions of Phoenix
as simple as removing the previous jar from HBase lib folder and dropping
the new Phoenix jar in (and restarting HBase)? Will all the existing
Phoenix tables and views be backwards-compatible and work with the new
version?



Cheers,

Matt





*From:* Maryann Xue [mailto:maryann@gmail.com]
*Sent:* 20 February 2015 17:46
*To:* user@phoenix.apache.org
*Subject:* Re: Inner Join not returning any results in Phoenix



Which version of Phoenix are you using, Matt? This feature is only
available in the latest releases of 4.3/3.3.



On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson matt.john...@algomi.com
wrote:

Hi Maryann,



Unfortunately my two tables are roughly the same size (~500k), but I have
tested a different join where one table is ~500k and the other is ~20k and
putting the larger one first is definitely far more performant. I believe
you are right about running out of memory, I can see this repeated a few
times in the region server logs followed by what appears to be a restart or
disconnect:



*[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host
machine (eg GC): pause of approximately 1083ms*



I have been looking at the Phoenix page on joins (
http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge
joins for large tables by using a hint. I have tried this though with no
success:



*SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc*

*INNER JOIN “mytable2” bs*

*On hc.”myId” = bs.”myId”*



Am I putting the hint in the wrong place? Does it need to go next to the
JOIN rather than the SELECT?



I will try increasing the memory available to the Region Servers as well to
see if that helps.



Thanks!

Matt





*From:* Maryann Xue [mailto:maryann@gmail.com]
*Sent:* 20 February 2015 16:28
*To:* user@phoenix.apache.org


*Subject:* Re: Inner Join not returning any results in Phoenix



Hi Matt,



The error you got with Limit Rows off might be related to insufficient
memory on region servers for one of your tables. Which is the larger table
between table1 and table2? You might want to try putting the larger table
as the first table in your join query and see if it works.



And I will quickly check if the LIMIT problem is a Phoenix bug and will
keep you posted.





Thanks,

Maryann





On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson matt.john...@algomi.com
wrote:

Hi Abe,



Glad to hear I’m not alone! Will try and figure out exactly what’s
happening and maybe raise a Jira :-)





@Constantin – I have tried with and without the “Limit Rows” – but without
it, and without any indexes, the query runs for a while (about 10 minutes?)
and then throws an error:



*Error: Encountered exception in sub plan [0] execution.*



Which I’m guessing is either HBase or Zookeeper timeout. The weird thing is
that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit Rows”
does not affect any aggregate functions like ‘count’, because the actual
number of result rows for a count is just 1 row (the count itself). But in
HBase it seems that the Row Limit, as Abe mentioned, is applied to one of
the table BEFORE it does the join, so it affects the results of the ‘count’
function.



When I try to create my indexes so I am able to do the join without Row
Limit, I get the following error:



ERROR 1029 (42Y88): Mutable secondary indexes must have the
hbase.regionserver.wal.codec property set to
org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the
hbase-sites.xml of every region server



Which I am happy to do (will have to wait until outside of business hours
though), but I am curious, will this have any impact on the rest of my
cluster and could it have any unforeseen consequences?



Thanks again for the input!



Cheers,

Matt





*From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com]
*Sent:* 20 February 2015 15:48


*To:* user@phoenix.apache.org
*Subject:* RE: Inner Join not returning any results in Phoenix



Hello Matt,



http://codingclues.eu/2008/the-squirrel-100-rows-problem/



Can you please test again after unchecking “Contents- Limit rows” and “SQL
– Limit rows”?



[image: SQL tab]



P.S. Off-topic – it’s as funny as this “problem” (not possible to send an
email for more than 500 miles away J
http://www.ibiblio.org/harris/500milemail.html )



Regards,

  Constantin



*From:* Abe Weinograd [mailto:a...@flonet.com a...@flonet.com]
*Sent:* Friday, February 20, 2015 4:18 PM
*To:* user
*Subject:* Re: Inner Join not returning any results in Phoenix



Matt,



I have seen this same issue.  When passing a LIMIT to a query with joins
(most query tools do it implicitly), Phoenix seems to apply that to the
table on the right of the join I believe.  I hadn't had a chance to play
with it more and file a JIRA, but what you are describing is consistent
with what