Re: Inner Join not returning any results in Phoenix
Thanks! On Tue, Feb 24, 2015 at 1:25 PM, Matthew Johnson matt.john...@algomi.com wrote: Legend! Thanks :-) *From:* Maryann Xue [mailto:maryann@gmail.com] *Sent:* 24 February 2015 18:15 *To:* user@phoenix.apache.org *Subject:* Re: Inner Join not returning any results in Phoenix Thanks, James and Matt! I successfully repro the bug using setMaxRow(). Adding a LIMIT to the query itself won't cause such a bug. Please watch on https://issues.apache.org/jira/browse/PHOENIX-1680. I will post a patch right away. Maryann On Tue, Feb 24, 2015 at 12:57 PM, Matthew Johnson matt.john...@algomi.com wrote: Thanks James. Maryann, with the upserts I sent, I run the following code (using phoenix-client-4.2.2.jar): // Register JDBC Driver Class.*forName*( org.apache.phoenix.jdbc.PhoenixDriver).newInstance(); Connection conn = DriverManager. *getConnection*(jdbc:phoenix:box1,box2,box3, , ); // Create a Statement class to execute the SQL statement Statement stmtLimited = conn.createStatement(); stmtLimited.setMaxRows(10); // Execute the SQL statement and get the results in a *Resultset* ResultSet rsLimited = stmtLimited.executeQuery(select * from mytable1 m1 inner join mytable2 m2 on m1.firstletter = m2.firstletter where m1.firstletter = 'Z'); *int* rsLimitedCount = 0; *while*(rsLimited.next()) { rsLimitedCount++; } // Create a Statement class to execute the SQL statement Statement stmtNoLimit = conn.createStatement(); // Execute the SQL statement and get the results in a *Resultset* ResultSet rsNoLimit = stmtNoLimit.executeQuery(select * from mytable1 m1 inner join mytable2 m2 on m1.firstletter = m2.firstletter where m1.firstletter = 'Z'); *int* rsNoLimitedCount = 0; *while*(rsNoLimit.next()) { rsNoLimitedCount++; } System.*out*.println(Results found when LIMIT 10 was + rsLimitedCount + but with NO LIMIT was + rsNoLimitedCount); And the output is: Results found when LIMIT 10 was 0 but with NO LIMIT was 1 Hope that helps! Cheers, Matt *From:* James Taylor [mailto:jamestay...@apache.org] *Sent:* 24 February 2015 17:27 *To:* user *Subject:* Re: Inner Join not returning any results in Phoenix Mary, You can just call Statement.setMaxRows(10) to emulate what SQuirrel is doing. Thanks, James On Tue, Feb 24, 2015 at 9:09 AM, Matthew Johnson matt.john...@algomi.com wrote: Hi Maryann, I have two environments, one with just some dummy data, and one with real data in it. The dummy data one gave me results when I queried and the real data didn’t, but turns out that is just because the dummy data was much more coherent (because it was manually created for a specific test) so the RHS table always matches the LHS and therefore the join gives me results. I have attached a script that demonstrates my problem (create 2 Phoenix tables, insert some rows, and run a query using a join). When I run this on my cluster, I consistently see the issue I am having. If I set the LIMIT in Squirrel to 10, I get no results, but if I set it to 26, I get a result (since I have 26 rows in the RHS table and I am intentionally querying for the last one). Please give it a go and see if it reproduces for you – are you using Squirrel? If so, what version? Thanks! Matt *From:* Maryann Xue [mailto:maryann@gmail.com] *Sent:* 24 February 2015 16:41 *To:* user@phoenix.apache.org *Subject:* Re: Inner Join not returning any results in Phoenix Hi Matt, I just noticed these lines in your very first message: *PS* Something that may or may not be of note: In the environments I am using: WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar* FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar* What do you mean by WORKING and FAILING? I still cannot reproduce the bug here. Could you please post DDLs you used for related tables? Thanks, Maryann On Tue, Feb 24, 2015 at 11:27 AM, James Taylor jamestay...@apache.org wrote: FYI, SQuirrel sets the max rows to return as 100. You can change this in the tool, though. On Tuesday, February 24, 2015, Maryann Xue maryann@gmail.com wrote: Thanks a lot, Matt, for the reply! Very helpful. *SERVER FILTER BY PageFilter 100* does look like
RE: Inner Join not returning any results in Phoenix
Hi Maryann, Thanks for that - I will schedule an update to the latest version of Phoenix then for later this week (and try out the merge-join hints). In the meantime, here are my explain plans: *JOIN WITH NO SQUIRREL LIMIT* *PLAN* *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1* *PARALLEL INNER-JOIN TABLE 0* *CLIENT 3-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2* *JOIN WITH SQUIRREL LIMIT 100* *PLAN* *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1* *CLIENT 100 ROW LIMIT* *PARALLEL INNER-JOIN TABLE 0* *CLIENT 3-CHUNK SERIAL 1-WAY FULL SCAN OVER mytable2* *SERVER FILTER BY PageFilter 100* *SERVER 100 ROW LIMIT* *CLIENT 100 ROW LIMIT* I’m not really sure how to read that, but it does seem to suggest that ‘mytable2’ is being limited to 100 – thoughts? Cheers, Matt *From:* Maryann Xue [mailto:maryann@gmail.com] *Sent:* 23 February 2015 18:10 *To:* user@phoenix.apache.org *Subject:* Re: Inner Join not returning any results in Phoenix Hi Matt, Yes, the upgrade is as easy as that. I believe things will work fine with existing tables. I tried with a similar query but didn't see that it was a Phoenix bug. So could you please try the following explain statement and see the execution plan: EXPLAIN *SELECT * FROM mytable1 hc* *INNER JOIN “mytable2” bs* *On hc.myId = bs.”myId”* Thanks, Maryann On Fri, Feb 20, 2015 at 1:09 PM, Matthew Johnson matt.john...@algomi.com wrote: Hi Maryann, That’s a good point – I am using 4.2.2, so if that feature is 4.3+ then that would explain why it’s not working. Is upgrading versions of Phoenix as simple as removing the previous jar from HBase lib folder and dropping the new Phoenix jar in (and restarting HBase)? Will all the existing Phoenix tables and views be backwards-compatible and work with the new version? Cheers, Matt *From:* Maryann Xue [mailto:maryann@gmail.com] *Sent:* 20 February 2015 17:46 *To:* user@phoenix.apache.org *Subject:* Re: Inner Join not returning any results in Phoenix Which version of Phoenix are you using, Matt? This feature is only available in the latest releases of 4.3/3.3. On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson matt.john...@algomi.com wrote: Hi Maryann, Unfortunately my two tables are roughly the same size (~500k), but I have tested a different join where one table is ~500k and the other is ~20k and putting the larger one first is definitely far more performant. I believe you are right about running out of memory, I can see this repeated a few times in the region server logs followed by what appears to be a restart or disconnect: *[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1083ms* I have been looking at the Phoenix page on joins ( http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge joins for large tables by using a hint. I have tried this though with no success: *SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc* *INNER JOIN “mytable2” bs* *On hc.”myId” = bs.”myId”* Am I putting the hint in the wrong place? Does it need to go next to the JOIN rather than the SELECT? I will try increasing the memory available to the Region Servers as well to see if that helps. Thanks! Matt *From:* Maryann Xue [mailto:maryann@gmail.com] *Sent:* 20 February 2015 16:28 *To:* user@phoenix.apache.org *Subject:* Re: Inner Join not returning any results in Phoenix Hi Matt, The error you got with Limit Rows off might be related to insufficient memory on region servers for one of your tables. Which is the larger table between table1 and table2? You might want to try putting the larger table as the first table in your join query and see if it works. And I will quickly check if the LIMIT problem is a Phoenix bug and will keep you posted. Thanks, Maryann On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson matt.john...@algomi.com wrote: Hi Abe, Glad to hear I’m not alone! Will try and figure out exactly what’s happening and maybe raise a Jira :-) @Constantin – I have tried with and without the “Limit Rows” – but without it, and without any indexes, the query runs for a while (about 10 minutes?) and then throws an error: *Error: Encountered exception in sub plan [0] execution.* Which I’m guessing is either HBase or Zookeeper timeout. The weird thing is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit Rows” does not affect any aggregate functions like ‘count’, because the actual number of result rows for a count is just 1 row (the count itself). But in HBase it seems that the Row Limit, as Abe mentioned, is applied to one of the table BEFORE it does the join, so it affects the results of the ‘count’ function. When I try to create my indexes so I am able to do the join without Row Limit, I get the following error: ERROR 1029
Re: Inner Join not returning any results in Phoenix
Thanks a lot, Matt, for the reply! Very helpful. *SERVER FILTER BY PageFilter 100* does look like a but here. I will try again to reproduce it. Thanks, Maryann On Tue, Feb 24, 2015 at 6:07 AM, Matthew Johnson matt.john...@algomi.com wrote: Hi Maryann, Thanks for that - I will schedule an update to the latest version of Phoenix then for later this week (and try out the merge-join hints). In the meantime, here are my explain plans: *JOIN WITH NO SQUIRREL LIMIT* *PLAN* *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1* *PARALLEL INNER-JOIN TABLE 0* *CLIENT 3-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2* *JOIN WITH SQUIRREL LIMIT 100* *PLAN* *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1* *CLIENT 100 ROW LIMIT* *PARALLEL INNER-JOIN TABLE 0* *CLIENT 3-CHUNK SERIAL 1-WAY FULL SCAN OVER mytable2* *SERVER FILTER BY PageFilter 100* *SERVER 100 ROW LIMIT* *CLIENT 100 ROW LIMIT* I’m not really sure how to read that, but it does seem to suggest that ‘mytable2’ is being limited to 100 – thoughts? Cheers, Matt *From:* Maryann Xue [mailto:maryann@gmail.com] *Sent:* 23 February 2015 18:10 *To:* user@phoenix.apache.org *Subject:* Re: Inner Join not returning any results in Phoenix Hi Matt, Yes, the upgrade is as easy as that. I believe things will work fine with existing tables. I tried with a similar query but didn't see that it was a Phoenix bug. So could you please try the following explain statement and see the execution plan: EXPLAIN *SELECT * FROM mytable1 hc* *INNER JOIN “mytable2” bs* *On hc.myId = bs.”myId”* Thanks, Maryann On Fri, Feb 20, 2015 at 1:09 PM, Matthew Johnson matt.john...@algomi.com wrote: Hi Maryann, That’s a good point – I am using 4.2.2, so if that feature is 4.3+ then that would explain why it’s not working. Is upgrading versions of Phoenix as simple as removing the previous jar from HBase lib folder and dropping the new Phoenix jar in (and restarting HBase)? Will all the existing Phoenix tables and views be backwards-compatible and work with the new version? Cheers, Matt *From:* Maryann Xue [mailto:maryann@gmail.com] *Sent:* 20 February 2015 17:46 *To:* user@phoenix.apache.org *Subject:* Re: Inner Join not returning any results in Phoenix Which version of Phoenix are you using, Matt? This feature is only available in the latest releases of 4.3/3.3. On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson matt.john...@algomi.com wrote: Hi Maryann, Unfortunately my two tables are roughly the same size (~500k), but I have tested a different join where one table is ~500k and the other is ~20k and putting the larger one first is definitely far more performant. I believe you are right about running out of memory, I can see this repeated a few times in the region server logs followed by what appears to be a restart or disconnect: *[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1083ms* I have been looking at the Phoenix page on joins ( http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge joins for large tables by using a hint. I have tried this though with no success: *SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc* *INNER JOIN “mytable2” bs* *On hc.”myId” = bs.”myId”* Am I putting the hint in the wrong place? Does it need to go next to the JOIN rather than the SELECT? I will try increasing the memory available to the Region Servers as well to see if that helps. Thanks! Matt *From:* Maryann Xue [mailto:maryann@gmail.com] *Sent:* 20 February 2015 16:28 *To:* user@phoenix.apache.org *Subject:* Re: Inner Join not returning any results in Phoenix Hi Matt, The error you got with Limit Rows off might be related to insufficient memory on region servers for one of your tables. Which is the larger table between table1 and table2? You might want to try putting the larger table as the first table in your join query and see if it works. And I will quickly check if the LIMIT problem is a Phoenix bug and will keep you posted. Thanks, Maryann On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson matt.john...@algomi.com wrote: Hi Abe, Glad to hear I’m not alone! Will try and figure out exactly what’s happening and maybe raise a Jira :-) @Constantin – I have tried with and without the “Limit Rows” – but without it, and without any indexes, the query runs for a while (about 10 minutes?) and then throws an error: *Error: Encountered exception in sub plan [0] execution.* Which I’m guessing is either HBase or Zookeeper timeout. The weird thing is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit Rows” does not affect any aggregate functions like ‘count’, because the actual number
Re: Inner Join not returning any results in Phoenix
FYI, SQuirrel sets the max rows to return as 100. You can change this in the tool, though. On Tuesday, February 24, 2015, Maryann Xue maryann@gmail.com wrote: Thanks a lot, Matt, for the reply! Very helpful. *SERVER FILTER BY PageFilter 100* does look like a but here. I will try again to reproduce it. Thanks, Maryann On Tue, Feb 24, 2015 at 6:07 AM, Matthew Johnson matt.john...@algomi.com javascript:_e(%7B%7D,'cvml','matt.john...@algomi.com'); wrote: Hi Maryann, Thanks for that - I will schedule an update to the latest version of Phoenix then for later this week (and try out the merge-join hints). In the meantime, here are my explain plans: *JOIN WITH NO SQUIRREL LIMIT* *PLAN* *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1* *PARALLEL INNER-JOIN TABLE 0* *CLIENT 3-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2* *JOIN WITH SQUIRREL LIMIT 100* *PLAN* *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1* *CLIENT 100 ROW LIMIT* *PARALLEL INNER-JOIN TABLE 0* *CLIENT 3-CHUNK SERIAL 1-WAY FULL SCAN OVER mytable2* *SERVER FILTER BY PageFilter 100* *SERVER 100 ROW LIMIT* *CLIENT 100 ROW LIMIT* I’m not really sure how to read that, but it does seem to suggest that ‘mytable2’ is being limited to 100 – thoughts? Cheers, Matt *From:* Maryann Xue [mailto:maryann@gmail.com javascript:_e(%7B%7D,'cvml','maryann@gmail.com');] *Sent:* 23 February 2015 18:10 *To:* user@phoenix.apache.org javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org'); *Subject:* Re: Inner Join not returning any results in Phoenix Hi Matt, Yes, the upgrade is as easy as that. I believe things will work fine with existing tables. I tried with a similar query but didn't see that it was a Phoenix bug. So could you please try the following explain statement and see the execution plan: EXPLAIN *SELECT * FROM mytable1 hc* *INNER JOIN “mytable2” bs* *On hc.myId = bs.”myId”* Thanks, Maryann On Fri, Feb 20, 2015 at 1:09 PM, Matthew Johnson matt.john...@algomi.com javascript:_e(%7B%7D,'cvml','matt.john...@algomi.com'); wrote: Hi Maryann, That’s a good point – I am using 4.2.2, so if that feature is 4.3+ then that would explain why it’s not working. Is upgrading versions of Phoenix as simple as removing the previous jar from HBase lib folder and dropping the new Phoenix jar in (and restarting HBase)? Will all the existing Phoenix tables and views be backwards-compatible and work with the new version? Cheers, Matt *From:* Maryann Xue [mailto:maryann@gmail.com javascript:_e(%7B%7D,'cvml','maryann@gmail.com');] *Sent:* 20 February 2015 17:46 *To:* user@phoenix.apache.org javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org'); *Subject:* Re: Inner Join not returning any results in Phoenix Which version of Phoenix are you using, Matt? This feature is only available in the latest releases of 4.3/3.3. On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson matt.john...@algomi.com javascript:_e(%7B%7D,'cvml','matt.john...@algomi.com'); wrote: Hi Maryann, Unfortunately my two tables are roughly the same size (~500k), but I have tested a different join where one table is ~500k and the other is ~20k and putting the larger one first is definitely far more performant. I believe you are right about running out of memory, I can see this repeated a few times in the region server logs followed by what appears to be a restart or disconnect: *[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1083ms* I have been looking at the Phoenix page on joins ( http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge joins for large tables by using a hint. I have tried this though with no success: *SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc* *INNER JOIN “mytable2” bs* *On hc.”myId” = bs.”myId”* Am I putting the hint in the wrong place? Does it need to go next to the JOIN rather than the SELECT? I will try increasing the memory available to the Region Servers as well to see if that helps. Thanks! Matt *From:* Maryann Xue [mailto:maryann@gmail.com javascript:_e(%7B%7D,'cvml','maryann@gmail.com');] *Sent:* 20 February 2015 16:28 *To:* user@phoenix.apache.org javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org'); *Subject:* Re: Inner Join not returning any results in Phoenix Hi Matt, The error you got with Limit Rows off might be related to insufficient memory on region servers for one of your tables. Which is the larger table between table1 and table2? You might want to try putting the larger table as the first table in your join query and see if it works. And I will quickly check if the LIMIT problem is a Phoenix bug and will keep you posted. Thanks, Maryann On Fri, Feb 20
RE: Inner Join not returning any results in Phoenix
Thanks James. Maryann, with the upserts I sent, I run the following code (using phoenix-client-4.2.2.jar): // Register JDBC Driver Class.*forName*( org.apache.phoenix.jdbc.PhoenixDriver).newInstance(); Connection conn = DriverManager.*getConnection* (jdbc:phoenix:box1,box2,box3, , ); // Create a Statement class to execute the SQL statement Statement stmtLimited = conn.createStatement(); stmtLimited.setMaxRows(10); // Execute the SQL statement and get the results in a *Resultset* ResultSet rsLimited = stmtLimited.executeQuery(select * from mytable1 m1 inner join mytable2 m2 on m1.firstletter = m2.firstletter where m1.firstletter = 'Z'); *int* rsLimitedCount = 0; *while*(rsLimited.next()) { rsLimitedCount++; } // Create a Statement class to execute the SQL statement Statement stmtNoLimit = conn.createStatement(); // Execute the SQL statement and get the results in a *Resultset* ResultSet rsNoLimit = stmtNoLimit.executeQuery(select * from mytable1 m1 inner join mytable2 m2 on m1.firstletter = m2.firstletter where m1.firstletter = 'Z'); *int* rsNoLimitedCount = 0; *while*(rsNoLimit.next()) { rsNoLimitedCount++; } System.*out*.println(Results found when LIMIT 10 was + rsLimitedCount + but with NO LIMIT was + rsNoLimitedCount); And the output is: Results found when LIMIT 10 was 0 but with NO LIMIT was 1 Hope that helps! Cheers, Matt *From:* James Taylor [mailto:jamestay...@apache.org] *Sent:* 24 February 2015 17:27 *To:* user *Subject:* Re: Inner Join not returning any results in Phoenix Mary, You can just call Statement.setMaxRows(10) to emulate what SQuirrel is doing. Thanks, James On Tue, Feb 24, 2015 at 9:09 AM, Matthew Johnson matt.john...@algomi.com wrote: Hi Maryann, I have two environments, one with just some dummy data, and one with real data in it. The dummy data one gave me results when I queried and the real data didn’t, but turns out that is just because the dummy data was much more coherent (because it was manually created for a specific test) so the RHS table always matches the LHS and therefore the join gives me results. I have attached a script that demonstrates my problem (create 2 Phoenix tables, insert some rows, and run a query using a join). When I run this on my cluster, I consistently see the issue I am having. If I set the LIMIT in Squirrel to 10, I get no results, but if I set it to 26, I get a result (since I have 26 rows in the RHS table and I am intentionally querying for the last one). Please give it a go and see if it reproduces for you – are you using Squirrel? If so, what version? Thanks! Matt *From:* Maryann Xue [mailto:maryann@gmail.com] *Sent:* 24 February 2015 16:41 *To:* user@phoenix.apache.org *Subject:* Re: Inner Join not returning any results in Phoenix Hi Matt, I just noticed these lines in your very first message: *PS* Something that may or may not be of note: In the environments I am using: WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar* FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar* What do you mean by WORKING and FAILING? I still cannot reproduce the bug here. Could you please post DDLs you used for related tables? Thanks, Maryann On Tue, Feb 24, 2015 at 11:27 AM, James Taylor jamestay...@apache.org wrote: FYI, SQuirrel sets the max rows to return as 100. You can change this in the tool, though. On Tuesday, February 24, 2015, Maryann Xue maryann@gmail.com wrote: Thanks a lot, Matt, for the reply! Very helpful. *SERVER FILTER BY PageFilter 100* does look like a but here. I will try again to reproduce it. Thanks, Maryann On Tue, Feb 24, 2015 at 6:07 AM, Matthew Johnson matt.john...@algomi.com wrote: Hi Maryann, Thanks for that - I will schedule an update to the latest version of Phoenix then for later this week (and try out the merge-join hints). In the meantime, here are my explain plans: *JOIN WITH NO SQUIRREL LIMIT* *PLAN* *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1* *PARALLEL INNER-JOIN TABLE 0* *CLIENT 3-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2* *JOIN WITH SQUIRREL LIMIT 100* *PLAN* *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1* *CLIENT 100 ROW LIMIT* *PARALLEL INNER-JOIN TABLE 0* *CLIENT 3-CHUNK SERIAL 1
Re: Inner Join not returning any results in Phoenix
Thanks, James and Matt! I successfully repro the bug using setMaxRow(). Adding a LIMIT to the query itself won't cause such a bug. Please watch on https://issues.apache.org/jira/browse/PHOENIX-1680. I will post a patch right away. Maryann On Tue, Feb 24, 2015 at 12:57 PM, Matthew Johnson matt.john...@algomi.com wrote: Thanks James. Maryann, with the upserts I sent, I run the following code (using phoenix-client-4.2.2.jar): // Register JDBC Driver Class.*forName*( org.apache.phoenix.jdbc.PhoenixDriver).newInstance(); Connection conn = DriverManager. *getConnection*(jdbc:phoenix:box1,box2,box3, , ); // Create a Statement class to execute the SQL statement Statement stmtLimited = conn.createStatement(); stmtLimited.setMaxRows(10); // Execute the SQL statement and get the results in a *Resultset* ResultSet rsLimited = stmtLimited.executeQuery(select * from mytable1 m1 inner join mytable2 m2 on m1.firstletter = m2.firstletter where m1.firstletter = 'Z'); *int* rsLimitedCount = 0; *while*(rsLimited.next()) { rsLimitedCount++; } // Create a Statement class to execute the SQL statement Statement stmtNoLimit = conn.createStatement(); // Execute the SQL statement and get the results in a *Resultset* ResultSet rsNoLimit = stmtNoLimit.executeQuery(select * from mytable1 m1 inner join mytable2 m2 on m1.firstletter = m2.firstletter where m1.firstletter = 'Z'); *int* rsNoLimitedCount = 0; *while*(rsNoLimit.next()) { rsNoLimitedCount++; } System.*out*.println(Results found when LIMIT 10 was + rsLimitedCount + but with NO LIMIT was + rsNoLimitedCount); And the output is: Results found when LIMIT 10 was 0 but with NO LIMIT was 1 Hope that helps! Cheers, Matt *From:* James Taylor [mailto:jamestay...@apache.org] *Sent:* 24 February 2015 17:27 *To:* user *Subject:* Re: Inner Join not returning any results in Phoenix Mary, You can just call Statement.setMaxRows(10) to emulate what SQuirrel is doing. Thanks, James On Tue, Feb 24, 2015 at 9:09 AM, Matthew Johnson matt.john...@algomi.com wrote: Hi Maryann, I have two environments, one with just some dummy data, and one with real data in it. The dummy data one gave me results when I queried and the real data didn’t, but turns out that is just because the dummy data was much more coherent (because it was manually created for a specific test) so the RHS table always matches the LHS and therefore the join gives me results. I have attached a script that demonstrates my problem (create 2 Phoenix tables, insert some rows, and run a query using a join). When I run this on my cluster, I consistently see the issue I am having. If I set the LIMIT in Squirrel to 10, I get no results, but if I set it to 26, I get a result (since I have 26 rows in the RHS table and I am intentionally querying for the last one). Please give it a go and see if it reproduces for you – are you using Squirrel? If so, what version? Thanks! Matt *From:* Maryann Xue [mailto:maryann@gmail.com] *Sent:* 24 February 2015 16:41 *To:* user@phoenix.apache.org *Subject:* Re: Inner Join not returning any results in Phoenix Hi Matt, I just noticed these lines in your very first message: *PS* Something that may or may not be of note: In the environments I am using: WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar* FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar* What do you mean by WORKING and FAILING? I still cannot reproduce the bug here. Could you please post DDLs you used for related tables? Thanks, Maryann On Tue, Feb 24, 2015 at 11:27 AM, James Taylor jamestay...@apache.org wrote: FYI, SQuirrel sets the max rows to return as 100. You can change this in the tool, though. On Tuesday, February 24, 2015, Maryann Xue maryann@gmail.com wrote: Thanks a lot, Matt, for the reply! Very helpful. *SERVER FILTER BY PageFilter 100* does look like a but here. I will try again to reproduce it. Thanks, Maryann On Tue, Feb 24, 2015 at 6:07 AM, Matthew Johnson matt.john...@algomi.com wrote: Hi Maryann, Thanks for that - I will schedule an update to the latest version of Phoenix then for later this week (and try out the merge-join hints
Inner Join not returning any results in Phoenix
Hi guys, I’m a little bit stuck with doing an Inner Join with Phoenix. I set up one environment, created tables in HBase, and then created views (rather than tables) in Phoenix, and am able to query as expected (when I join my two tables I see results). I’ve just promoted to another environment, with the exact same setup, but my Inner Join returns no results! I run the following two individual queries: *SELECT * FROM mytable1 hc* *where hc.myId = 'XS0'* *SELECT * FROM mytable2 bs* *where bs.myId = 'XS0'* And both of these queries give results. But when I run: *SELECT * FROM mytable1 hc* *INNER JOIN “mytable2” bs* *On hc.myId = bs.”myId”* I get no results. I also get no results if I try: *SELECT * FROM mytable1 hc* *where hc.myId in (select distinct “myId” from “mytable2”)* I have checked in HBase shell and can see the “myId” value is as expected (XS0 in both tables). I am not sure if there are any logs that I can look at to get some insight? Many thanks in advance for any suggestions! Cheers, Matt *PS* Something that may or may not be of note: In the environments I am using: WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar* FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar*
RE: Inner Join not returning any results in Phoenix
Hi Matthew, Is it working without the quotes “ / ? (I see you are using 2 types of quotes, weird) I guess that’s not needed, and probably causing troubles. I don’t have to use quotes anyway. Alternatively check the types of data in those 2 tables (if the field types are not the same in both tables, the join will not work). Good luck, Constantin From: Matthew Johnson [mailto:matt.john...@algomi.com] Sent: Friday, February 20, 2015 12:54 PM To: user@phoenix.apache.org Subject: Inner Join not returning any results in Phoenix Hi guys, I’m a little bit stuck with doing an Inner Join with Phoenix. I set up one environment, created tables in HBase, and then created views (rather than tables) in Phoenix, and am able to query as expected (when I join my two tables I see results). I’ve just promoted to another environment, with the exact same setup, but my Inner Join returns no results! I run the following two individual queries: SELECT * FROM mytable1 hc where hc.myId = 'XS0' SELECT * FROM mytable2 bs where bs.myId = 'XS0' And both of these queries give results. But when I run: SELECT * FROM mytable1 hc INNER JOIN “mytable2” bs On hc.myId = bs.”myId” I get no results. I also get no results if I try: SELECT * FROM mytable1 hc where hc.myId in (select distinct “myId” from “mytable2”) I have checked in HBase shell and can see the “myId” value is as expected (XS0 in both tables). I am not sure if there are any logs that I can look at to get some insight? Many thanks in advance for any suggestions! Cheers, Matt PS Something that may or may not be of note: In the environments I am using: WORKING: hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar FAILING: hbase-0.98.9-hadoop2 / phoenix-4.2.2-server.jar
RE: Inner Join not returning any results in Phoenix
Hello Matt, http://codingclues.eu/2008/the-squirrel-100-rows-problem/ Can you please test again after unchecking “Contents- Limit rows” and “SQL – Limit rows”? [cid:image001.png@01D04D2C.FB047EE0] P.S. Off-topic – it’s as funny as this “problem” (not possible to send an email for more than 500 miles away ☺ http://www.ibiblio.org/harris/500milemail.html ) Regards, Constantin From: Abe Weinograd [mailto:a...@flonet.com] Sent: Friday, February 20, 2015 4:18 PM To: user Subject: Re: Inner Join not returning any results in Phoenix Matt, I have seen this same issue. When passing a LIMIT to a query with joins (most query tools do it implicitly), Phoenix seems to apply that to the table on the right of the join I believe. I hadn't had a chance to play with it more and file a JIRA, but what you are describing is consistent with what I have seen. Abe On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson matt.john...@algomi.commailto:matt.john...@algomi.com wrote: Hi Constantin, Many thanks for your reply – the quotes were both of the same type (double quotes for table and column names, single quotes for string literals), it is just my email client that formatted them weirdly, sorry! I have discovered what I believe is an important piece of the puzzle to my problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has a “Limit Rows” feature. When I try and count the number of rows in a single table: select count(*) from “mytable1” I get the expected number of results (eg 20,000). But when I join two tables together, it seems that the “Limit Rows” from Squirrel is somehow being applied before the join is performed, and if “Limit Rows” is set to 100 I get 100 results or less. If the inner join is quite sparse (eg 20,000 rows in a table but only 100 of these will join with a second table) then I believe it tries to join the first 100 it finds and returns no results. In my experience of Oracle or MySQL, joins are done entirely on server side and then you just get back the number of rows you limited, rather than what appears to be happening which is the row limit is applied to the first table before the join is attempted with the second table. Is that how Phoenix works? I have also discovered that I get different results (with “Limit Rows” turned on) depending on which order I join the tables: SELECT count(*) FROM “mytable1” hc INNER JOIN “mytable2” bs On hc.”myId” = bs.”myId” Gives me a very different number of results than: SELECT count(*) FROM “mytable2” bs INNER JOIN “mytable1” hc On hc.”myId” = bs.”myId” Unfortunately I cannot test whether I get the same number of results with “Limit Rows” turned off because my query times out! So I am now looking at creating secondary indexes on the “myId” column in both tables to see if I am able to do this join quicker. Does a join like this use a lot of memory on server side? Is something likely to be running out of resources? Many thanks again for your time. Cheers, Matt From: Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.commailto:constantin.ciure...@gfk.com] Sent: 20 February 2015 14:40 To: user@phoenix.apache.orgmailto:user@phoenix.apache.org Subject: RE: Inner Join not returning any results in Phoenix Hi Matthew, Is it working without the quotes “ / ? (I see you are using 2 types of quotes, weird) I guess that’s not needed, and probably causing troubles. I don’t have to use quotes anyway. Alternatively check the types of data in those 2 tables (if the field types are not the same in both tables, the join will not work). Good luck, Constantin From: Matthew Johnson [mailto:matt.john...@algomi.com] Sent: Friday, February 20, 2015 12:54 PM To: user@phoenix.apache.orgmailto:user@phoenix.apache.org Subject: Inner Join not returning any results in Phoenix Hi guys, I’m a little bit stuck with doing an Inner Join with Phoenix. I set up one environment, created tables in HBase, and then created views (rather than tables) in Phoenix, and am able to query as expected (when I join my two tables I see results). I’ve just promoted to another environment, with the exact same setup, but my Inner Join returns no results! I run the following two individual queries: SELECT * FROM mytable1 hc where hc.myId = 'XS0' SELECT * FROM mytable2 bs where bs.myId = 'XS0' And both of these queries give results. But when I run: SELECT * FROM mytable1 hc INNER JOIN “mytable2” bs On hc.myId = bs.”myId” I get no results. I also get no results if I try: SELECT * FROM mytable1 hc where hc.myId in (select distinct “myId” from “mytable2”) I have checked in HBase shell and can see the “myId” value is as expected (XS0 in both tables). I am not sure if there are any logs that I can look at to get some insight? Many thanks in advance for any suggestions! Cheers, Matt PS Something that may or may not be of note: In the environments I am using: WORKING: hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar FAILING
Re: Inner Join not returning any results in Phoenix
Matt, I have seen this same issue. When passing a LIMIT to a query with joins (most query tools do it implicitly), Phoenix seems to apply that to the table on the right of the join I believe. I hadn't had a chance to play with it more and file a JIRA, but what you are describing is consistent with what I have seen. Abe On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson matt.john...@algomi.com wrote: Hi Constantin, Many thanks for your reply – the quotes were both of the same type (double quotes for table and column names, single quotes for string literals), it is just my email client that formatted them weirdly, sorry! I have discovered what I believe is an important piece of the puzzle to my problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has a “Limit Rows” feature. When I try and count the number of rows in a single table: *select count(*) from “mytable1”* I get the expected number of results (eg 20,000). But when I join two tables together, it seems that the “Limit Rows” from Squirrel is somehow being applied before the join is performed, and if “Limit Rows” is set to 100 I get 100 results or less. If the inner join is quite sparse (eg 20,000 rows in a table but only 100 of these will join with a second table) then I believe it tries to join the first 100 it finds and returns no results. In my experience of Oracle or MySQL, joins are done entirely on server side and then you just get back the number of rows you limited, rather than what appears to be happening which is the row limit is applied to the first table before the join is attempted with the second table. Is that how Phoenix works? I have also discovered that I get different results (with “Limit Rows” turned on) depending on which order I join the tables: *SELECT count(*) FROM “mytable1” hc* *INNER JOIN “mytable2” bs* *On hc.”myId” = bs.”myId”* Gives me a very different number of results than: *SELECT count(*) FROM “mytable2” bs* *INNER JOIN “mytable1” hc* *On hc.”myId” = bs.”myId”* Unfortunately I cannot test whether I get the same number of results with “Limit Rows” turned off because my query times out! So I am now looking at creating secondary indexes on the “myId” column in both tables to see if I am able to do this join quicker. Does a join like this use a lot of memory on server side? Is something likely to be running out of resources? Many thanks again for your time. Cheers, Matt *From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com] *Sent:* 20 February 2015 14:40 *To:* user@phoenix.apache.org *Subject:* RE: Inner Join not returning any results in Phoenix Hi Matthew, Is it working without the quotes “ / ** ? (I see you are using 2 types of quotes, weird) I guess that’s not needed, and probably causing troubles. I don’t have to use quotes anyway. Alternatively check the types of data in those 2 tables (if the field types are not the same in both tables, the join will not work). Good luck, Constantin *From:* Matthew Johnson [mailto:matt.john...@algomi.com matt.john...@algomi.com] *Sent:* Friday, February 20, 2015 12:54 PM *To:* user@phoenix.apache.org *Subject:* Inner Join not returning any results in Phoenix Hi guys, I’m a little bit stuck with doing an Inner Join with Phoenix. I set up one environment, created tables in HBase, and then created views (rather than tables) in Phoenix, and am able to query as expected (when I join my two tables I see results). I’ve just promoted to another environment, with the exact same setup, but my Inner Join returns no results! I run the following two individual queries: *SELECT * FROM mytable1 hc* *where hc.myId = 'XS0'* *SELECT * FROM mytable2 bs* *where bs.myId = 'XS0'* And both of these queries give results. But when I run: *SELECT * FROM mytable1 hc* *INNER JOIN “mytable2” bs* *On hc.myId = bs.”myId”* I get no results. I also get no results if I try: *SELECT * FROM mytable1 hc* *where hc.myId in (select distinct “myId” from “mytable2”)* I have checked in HBase shell and can see the “myId” value is as expected (XS0 in both tables). I am not sure if there are any logs that I can look at to get some insight? Many thanks in advance for any suggestions! Cheers, Matt *PS* Something that may or may not be of note: In the environments I am using: WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar* FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar*
RE: Inner Join not returning any results in Phoenix
Hi Abe, Glad to hear I’m not alone! Will try and figure out exactly what’s happening and maybe raise a Jira :-) @Constantin – I have tried with and without the “Limit Rows” – but without it, and without any indexes, the query runs for a while (about 10 minutes?) and then throws an error: *Error: Encountered exception in sub plan [0] execution.* Which I’m guessing is either HBase or Zookeeper timeout. The weird thing is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit Rows” does not affect any aggregate functions like ‘count’, because the actual number of result rows for a count is just 1 row (the count itself). But in HBase it seems that the Row Limit, as Abe mentioned, is applied to one of the table BEFORE it does the join, so it affects the results of the ‘count’ function. When I try to create my indexes so I am able to do the join without Row Limit, I get the following error: ERROR 1029 (42Y88): Mutable secondary indexes must have the hbase.regionserver.wal.codec property set to org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the hbase-sites.xml of every region server Which I am happy to do (will have to wait until outside of business hours though), but I am curious, will this have any impact on the rest of my cluster and could it have any unforeseen consequences? Thanks again for the input! Cheers, Matt *From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com] *Sent:* 20 February 2015 15:48 *To:* user@phoenix.apache.org *Subject:* RE: Inner Join not returning any results in Phoenix Hello Matt, http://codingclues.eu/2008/the-squirrel-100-rows-problem/ Can you please test again after unchecking “Contents- Limit rows” and “SQL – Limit rows”? [image: SQL tab] P.S. Off-topic – it’s as funny as this “problem” (not possible to send an email for more than 500 miles away J http://www.ibiblio.org/harris/500milemail.html ) Regards, Constantin *From:* Abe Weinograd [mailto:a...@flonet.com a...@flonet.com] *Sent:* Friday, February 20, 2015 4:18 PM *To:* user *Subject:* Re: Inner Join not returning any results in Phoenix Matt, I have seen this same issue. When passing a LIMIT to a query with joins (most query tools do it implicitly), Phoenix seems to apply that to the table on the right of the join I believe. I hadn't had a chance to play with it more and file a JIRA, but what you are describing is consistent with what I have seen. Abe On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson matt.john...@algomi.com wrote: Hi Constantin, Many thanks for your reply – the quotes were both of the same type (double quotes for table and column names, single quotes for string literals), it is just my email client that formatted them weirdly, sorry! I have discovered what I believe is an important piece of the puzzle to my problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has a “Limit Rows” feature. When I try and count the number of rows in a single table: *select count(*) from “mytable1”* I get the expected number of results (eg 20,000). But when I join two tables together, it seems that the “Limit Rows” from Squirrel is somehow being applied before the join is performed, and if “Limit Rows” is set to 100 I get 100 results or less. If the inner join is quite sparse (eg 20,000 rows in a table but only 100 of these will join with a second table) then I believe it tries to join the first 100 it finds and returns no results. In my experience of Oracle or MySQL, joins are done entirely on server side and then you just get back the number of rows you limited, rather than what appears to be happening which is the row limit is applied to the first table before the join is attempted with the second table. Is that how Phoenix works? I have also discovered that I get different results (with “Limit Rows” turned on) depending on which order I join the tables: *SELECT count(*) FROM “mytable1” hc* *INNER JOIN “mytable2” bs* *On hc.”myId” = bs.”myId”* Gives me a very different number of results than: *SELECT count(*) FROM “mytable2” bs* *INNER JOIN “mytable1” hc* *On hc.”myId” = bs.”myId”* Unfortunately I cannot test whether I get the same number of results with “Limit Rows” turned off because my query times out! So I am now looking at creating secondary indexes on the “myId” column in both tables to see if I am able to do this join quicker. Does a join like this use a lot of memory on server side? Is something likely to be running out of resources? Many thanks again for your time. Cheers, Matt *From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com] *Sent:* 20 February 2015 14:40 *To:* user@phoenix.apache.org *Subject:* RE: Inner Join not returning any results in Phoenix Hi Matthew, Is it working without the quotes “ / ** ? (I see you are using 2 types of quotes, weird) I guess that’s not needed, and probably causing troubles. I
RE: Inner Join not returning any results in Phoenix
Hi Constantin, Many thanks for your reply – the quotes were both of the same type (double quotes for table and column names, single quotes for string literals), it is just my email client that formatted them weirdly, sorry! I have discovered what I believe is an important piece of the puzzle to my problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has a “Limit Rows” feature. When I try and count the number of rows in a single table: *select count(*) from “mytable1”* I get the expected number of results (eg 20,000). But when I join two tables together, it seems that the “Limit Rows” from Squirrel is somehow being applied before the join is performed, and if “Limit Rows” is set to 100 I get 100 results or less. If the inner join is quite sparse (eg 20,000 rows in a table but only 100 of these will join with a second table) then I believe it tries to join the first 100 it finds and returns no results. In my experience of Oracle or MySQL, joins are done entirely on server side and then you just get back the number of rows you limited, rather than what appears to be happening which is the row limit is applied to the first table before the join is attempted with the second table. Is that how Phoenix works? I have also discovered that I get different results (with “Limit Rows” turned on) depending on which order I join the tables: *SELECT count(*) FROM “mytable1” hc* *INNER JOIN “mytable2” bs* *On hc.”myId” = bs.”myId”* Gives me a very different number of results than: *SELECT count(*) FROM “mytable2” bs* *INNER JOIN “mytable1” hc* *On hc.”myId” = bs.”myId”* Unfortunately I cannot test whether I get the same number of results with “Limit Rows” turned off because my query times out! So I am now looking at creating secondary indexes on the “myId” column in both tables to see if I am able to do this join quicker. Does a join like this use a lot of memory on server side? Is something likely to be running out of resources? Many thanks again for your time. Cheers, Matt *From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com] *Sent:* 20 February 2015 14:40 *To:* user@phoenix.apache.org *Subject:* RE: Inner Join not returning any results in Phoenix Hi Matthew, Is it working without the quotes “ / ** ? (I see you are using 2 types of quotes, weird) I guess that’s not needed, and probably causing troubles. I don’t have to use quotes anyway. Alternatively check the types of data in those 2 tables (if the field types are not the same in both tables, the join will not work). Good luck, Constantin *From:* Matthew Johnson [mailto:matt.john...@algomi.com matt.john...@algomi.com] *Sent:* Friday, February 20, 2015 12:54 PM *To:* user@phoenix.apache.org *Subject:* Inner Join not returning any results in Phoenix Hi guys, I’m a little bit stuck with doing an Inner Join with Phoenix. I set up one environment, created tables in HBase, and then created views (rather than tables) in Phoenix, and am able to query as expected (when I join my two tables I see results). I’ve just promoted to another environment, with the exact same setup, but my Inner Join returns no results! I run the following two individual queries: *SELECT * FROM mytable1 hc* *where hc.myId = 'XS0'* *SELECT * FROM mytable2 bs* *where bs.myId = 'XS0'* And both of these queries give results. But when I run: *SELECT * FROM mytable1 hc* *INNER JOIN “mytable2” bs* *On hc.myId = bs.”myId”* I get no results. I also get no results if I try: *SELECT * FROM mytable1 hc* *where hc.myId in (select distinct “myId” from “mytable2”)* I have checked in HBase shell and can see the “myId” value is as expected (XS0 in both tables). I am not sure if there are any logs that I can look at to get some insight? Many thanks in advance for any suggestions! Cheers, Matt *PS* Something that may or may not be of note: In the environments I am using: WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar* FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar*
Re: Inner Join not returning any results in Phoenix
Hi Matt, The error you got with Limit Rows off might be related to insufficient memory on region servers for one of your tables. Which is the larger table between table1 and table2? You might want to try putting the larger table as the first table in your join query and see if it works. And I will quickly check if the LIMIT problem is a Phoenix bug and will keep you posted. Thanks, Maryann On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson matt.john...@algomi.com wrote: Hi Abe, Glad to hear I’m not alone! Will try and figure out exactly what’s happening and maybe raise a Jira :-) @Constantin – I have tried with and without the “Limit Rows” – but without it, and without any indexes, the query runs for a while (about 10 minutes?) and then throws an error: *Error: Encountered exception in sub plan [0] execution.* Which I’m guessing is either HBase or Zookeeper timeout. The weird thing is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit Rows” does not affect any aggregate functions like ‘count’, because the actual number of result rows for a count is just 1 row (the count itself). But in HBase it seems that the Row Limit, as Abe mentioned, is applied to one of the table BEFORE it does the join, so it affects the results of the ‘count’ function. When I try to create my indexes so I am able to do the join without Row Limit, I get the following error: ERROR 1029 (42Y88): Mutable secondary indexes must have the hbase.regionserver.wal.codec property set to org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the hbase-sites.xml of every region server Which I am happy to do (will have to wait until outside of business hours though), but I am curious, will this have any impact on the rest of my cluster and could it have any unforeseen consequences? Thanks again for the input! Cheers, Matt *From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com] *Sent:* 20 February 2015 15:48 *To:* user@phoenix.apache.org *Subject:* RE: Inner Join not returning any results in Phoenix Hello Matt, http://codingclues.eu/2008/the-squirrel-100-rows-problem/ Can you please test again after unchecking “Contents- Limit rows” and “SQL – Limit rows”? [image: SQL tab] P.S. Off-topic – it’s as funny as this “problem” (not possible to send an email for more than 500 miles away J http://www.ibiblio.org/harris/500milemail.html ) Regards, Constantin *From:* Abe Weinograd [mailto:a...@flonet.com a...@flonet.com] *Sent:* Friday, February 20, 2015 4:18 PM *To:* user *Subject:* Re: Inner Join not returning any results in Phoenix Matt, I have seen this same issue. When passing a LIMIT to a query with joins (most query tools do it implicitly), Phoenix seems to apply that to the table on the right of the join I believe. I hadn't had a chance to play with it more and file a JIRA, but what you are describing is consistent with what I have seen. Abe On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson matt.john...@algomi.com wrote: Hi Constantin, Many thanks for your reply – the quotes were both of the same type (double quotes for table and column names, single quotes for string literals), it is just my email client that formatted them weirdly, sorry! I have discovered what I believe is an important piece of the puzzle to my problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has a “Limit Rows” feature. When I try and count the number of rows in a single table: *select count(*) from “mytable1”* I get the expected number of results (eg 20,000). But when I join two tables together, it seems that the “Limit Rows” from Squirrel is somehow being applied before the join is performed, and if “Limit Rows” is set to 100 I get 100 results or less. If the inner join is quite sparse (eg 20,000 rows in a table but only 100 of these will join with a second table) then I believe it tries to join the first 100 it finds and returns no results. In my experience of Oracle or MySQL, joins are done entirely on server side and then you just get back the number of rows you limited, rather than what appears to be happening which is the row limit is applied to the first table before the join is attempted with the second table. Is that how Phoenix works? I have also discovered that I get different results (with “Limit Rows” turned on) depending on which order I join the tables: *SELECT count(*) FROM “mytable1” hc* *INNER JOIN “mytable2” bs* *On hc.”myId” = bs.”myId”* Gives me a very different number of results than: *SELECT count(*) FROM “mytable2” bs* *INNER JOIN “mytable1” hc* *On hc.”myId” = bs.”myId”* Unfortunately I cannot test whether I get the same number of results with “Limit Rows” turned off because my query times out! So I am now looking at creating secondary indexes on the “myId” column in both tables to see if I
Re: Inner Join not returning any results in Phoenix
Which version of Phoenix are you using, Matt? On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson matt.john...@algomi.com wrote: Hi Maryann, Unfortunately my two tables are roughly the same size (~500k), but I have tested a different join where one table is ~500k and the other is ~20k and putting the larger one first is definitely far more performant. I believe you are right about running out of memory, I can see this repeated a few times in the region server logs followed by what appears to be a restart or disconnect: *[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1083ms* I have been looking at the Phoenix page on joins ( http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge joins for large tables by using a hint. I have tried this though with no success: *SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc* *INNER JOIN “mytable2” bs* *On hc.”myId” = bs.”myId”* Am I putting the hint in the wrong place? Does it need to go next to the JOIN rather than the SELECT? I will try increasing the memory available to the Region Servers as well to see if that helps. Thanks! Matt *From:* Maryann Xue [mailto:maryann@gmail.com] *Sent:* 20 February 2015 16:28 *To:* user@phoenix.apache.org *Subject:* Re: Inner Join not returning any results in Phoenix Hi Matt, The error you got with Limit Rows off might be related to insufficient memory on region servers for one of your tables. Which is the larger table between table1 and table2? You might want to try putting the larger table as the first table in your join query and see if it works. And I will quickly check if the LIMIT problem is a Phoenix bug and will keep you posted. Thanks, Maryann On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson matt.john...@algomi.com wrote: Hi Abe, Glad to hear I’m not alone! Will try and figure out exactly what’s happening and maybe raise a Jira :-) @Constantin – I have tried with and without the “Limit Rows” – but without it, and without any indexes, the query runs for a while (about 10 minutes?) and then throws an error: *Error: Encountered exception in sub plan [0] execution.* Which I’m guessing is either HBase or Zookeeper timeout. The weird thing is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit Rows” does not affect any aggregate functions like ‘count’, because the actual number of result rows for a count is just 1 row (the count itself). But in HBase it seems that the Row Limit, as Abe mentioned, is applied to one of the table BEFORE it does the join, so it affects the results of the ‘count’ function. When I try to create my indexes so I am able to do the join without Row Limit, I get the following error: ERROR 1029 (42Y88): Mutable secondary indexes must have the hbase.regionserver.wal.codec property set to org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the hbase-sites.xml of every region server Which I am happy to do (will have to wait until outside of business hours though), but I am curious, will this have any impact on the rest of my cluster and could it have any unforeseen consequences? Thanks again for the input! Cheers, Matt *From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com] *Sent:* 20 February 2015 15:48 *To:* user@phoenix.apache.org *Subject:* RE: Inner Join not returning any results in Phoenix Hello Matt, http://codingclues.eu/2008/the-squirrel-100-rows-problem/ Can you please test again after unchecking “Contents- Limit rows” and “SQL – Limit rows”? [image: SQL tab] P.S. Off-topic – it’s as funny as this “problem” (not possible to send an email for more than 500 miles away J http://www.ibiblio.org/harris/500milemail.html ) Regards, Constantin *From:* Abe Weinograd [mailto:a...@flonet.com a...@flonet.com] *Sent:* Friday, February 20, 2015 4:18 PM *To:* user *Subject:* Re: Inner Join not returning any results in Phoenix Matt, I have seen this same issue. When passing a LIMIT to a query with joins (most query tools do it implicitly), Phoenix seems to apply that to the table on the right of the join I believe. I hadn't had a chance to play with it more and file a JIRA, but what you are describing is consistent with what I have seen. Abe On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson matt.john...@algomi.com wrote: Hi Constantin, Many thanks for your reply – the quotes were both of the same type (double quotes for table and column names, single quotes for string literals), it is just my email client that formatted them weirdly, sorry! I have discovered what I believe is an important piece of the puzzle to my problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has a “Limit Rows” feature. When I try and count the number of rows
RE: Inner Join not returning any results in Phoenix
Hi Maryann, That’s a good point – I am using 4.2.2, so if that feature is 4.3+ then that would explain why it’s not working. Is upgrading versions of Phoenix as simple as removing the previous jar from HBase lib folder and dropping the new Phoenix jar in (and restarting HBase)? Will all the existing Phoenix tables and views be backwards-compatible and work with the new version? Cheers, Matt *From:* Maryann Xue [mailto:maryann@gmail.com] *Sent:* 20 February 2015 17:46 *To:* user@phoenix.apache.org *Subject:* Re: Inner Join not returning any results in Phoenix Which version of Phoenix are you using, Matt? This feature is only available in the latest releases of 4.3/3.3. On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson matt.john...@algomi.com wrote: Hi Maryann, Unfortunately my two tables are roughly the same size (~500k), but I have tested a different join where one table is ~500k and the other is ~20k and putting the larger one first is definitely far more performant. I believe you are right about running out of memory, I can see this repeated a few times in the region server logs followed by what appears to be a restart or disconnect: *[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1083ms* I have been looking at the Phoenix page on joins ( http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge joins for large tables by using a hint. I have tried this though with no success: *SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc* *INNER JOIN “mytable2” bs* *On hc.”myId” = bs.”myId”* Am I putting the hint in the wrong place? Does it need to go next to the JOIN rather than the SELECT? I will try increasing the memory available to the Region Servers as well to see if that helps. Thanks! Matt *From:* Maryann Xue [mailto:maryann@gmail.com] *Sent:* 20 February 2015 16:28 *To:* user@phoenix.apache.org *Subject:* Re: Inner Join not returning any results in Phoenix Hi Matt, The error you got with Limit Rows off might be related to insufficient memory on region servers for one of your tables. Which is the larger table between table1 and table2? You might want to try putting the larger table as the first table in your join query and see if it works. And I will quickly check if the LIMIT problem is a Phoenix bug and will keep you posted. Thanks, Maryann On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson matt.john...@algomi.com wrote: Hi Abe, Glad to hear I’m not alone! Will try and figure out exactly what’s happening and maybe raise a Jira :-) @Constantin – I have tried with and without the “Limit Rows” – but without it, and without any indexes, the query runs for a while (about 10 minutes?) and then throws an error: *Error: Encountered exception in sub plan [0] execution.* Which I’m guessing is either HBase or Zookeeper timeout. The weird thing is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit Rows” does not affect any aggregate functions like ‘count’, because the actual number of result rows for a count is just 1 row (the count itself). But in HBase it seems that the Row Limit, as Abe mentioned, is applied to one of the table BEFORE it does the join, so it affects the results of the ‘count’ function. When I try to create my indexes so I am able to do the join without Row Limit, I get the following error: ERROR 1029 (42Y88): Mutable secondary indexes must have the hbase.regionserver.wal.codec property set to org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the hbase-sites.xml of every region server Which I am happy to do (will have to wait until outside of business hours though), but I am curious, will this have any impact on the rest of my cluster and could it have any unforeseen consequences? Thanks again for the input! Cheers, Matt *From:* Ciureanu, Constantin (GfK) [mailto:constantin.ciure...@gfk.com] *Sent:* 20 February 2015 15:48 *To:* user@phoenix.apache.org *Subject:* RE: Inner Join not returning any results in Phoenix Hello Matt, http://codingclues.eu/2008/the-squirrel-100-rows-problem/ Can you please test again after unchecking “Contents- Limit rows” and “SQL – Limit rows”? [image: SQL tab] P.S. Off-topic – it’s as funny as this “problem” (not possible to send an email for more than 500 miles away J http://www.ibiblio.org/harris/500milemail.html ) Regards, Constantin *From:* Abe Weinograd [mailto:a...@flonet.com a...@flonet.com] *Sent:* Friday, February 20, 2015 4:18 PM *To:* user *Subject:* Re: Inner Join not returning any results in Phoenix Matt, I have seen this same issue. When passing a LIMIT to a query with joins (most query tools do it implicitly), Phoenix seems to apply that to the table on the right of the join I believe. I hadn't had a chance to play with it more and file a JIRA, but what you are describing is consistent with what