FYI, SQuirrel sets the max rows to return as 100. You can change this in the tool, though.
On Tuesday, February 24, 2015, Maryann Xue <[email protected]> wrote: > Thanks a lot, Matt, for the reply! Very helpful. "*SERVER FILTER BY > PageFilter 100*" does look like a but here. I will try again to reproduce > it. > > > Thanks, > Maryann > > On Tue, Feb 24, 2015 at 6:07 AM, Matthew Johnson <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> Hi Maryann, >> >> >> >> Thanks for that - I will schedule an update to the latest version of >> Phoenix then for later this week (and try out the merge-join hints). >> >> >> >> In the meantime, here are my explain plans: >> >> >> >> *JOIN WITH NO SQUIRREL LIMIT* >> >> >> >> *PLAN* >> >> *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1* >> >> * PARALLEL INNER-JOIN TABLE 0* >> >> * CLIENT 3-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2* >> >> >> >> *JOIN WITH SQUIRREL LIMIT 100* >> >> >> >> *PLAN* >> >> *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1* >> >> *CLIENT 100 ROW LIMIT* >> >> * PARALLEL INNER-JOIN TABLE 0* >> >> * CLIENT 3-CHUNK SERIAL 1-WAY FULL SCAN OVER mytable2* >> >> * SERVER FILTER BY PageFilter 100* >> >> * SERVER 100 ROW LIMIT* >> >> * CLIENT 100 ROW LIMIT* >> >> >> >> >> >> I’m not really sure how to read that, but it does seem to suggest that >> ‘mytable2’ is being limited to 100 – thoughts? >> >> >> >> Cheers, >> >> Matt >> >> >> >> *From:* Maryann Xue [mailto:[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>] >> *Sent:* 23 February 2015 18:10 >> >> *To:* [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');> >> *Subject:* Re: Inner Join not returning any results in Phoenix >> >> >> >> Hi Matt, >> >> >> >> Yes, the upgrade is as easy as that. I believe things will work fine with >> existing tables. >> >> I tried with a similar query but didn't see that it was a Phoenix bug. So >> could you please try the following explain statement and see the execution >> plan: >> >> >> >> EXPLAIN *SELECT * FROM "mytable1" hc* >> >> *INNER JOIN “mytable2” bs* >> >> *On hc."myId" = bs.”myId”* >> >> >> >> >> >> Thanks, >> >> Maryann >> >> >> >> >> >> On Fri, Feb 20, 2015 at 1:09 PM, Matthew Johnson <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >> Hi Maryann, >> >> >> >> That’s a good point – I am using 4.2.2, so if that feature is 4.3+ then >> that would explain why it’s not working. Is upgrading versions of Phoenix >> as simple as removing the previous jar from HBase lib folder and dropping >> the new Phoenix jar in (and restarting HBase)? Will all the existing >> Phoenix tables and views be backwards-compatible and work with the new >> version? >> >> >> >> Cheers, >> >> Matt >> >> >> >> >> >> *From:* Maryann Xue [mailto:[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>] >> *Sent:* 20 February 2015 17:46 >> >> >> *To:* [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');> >> *Subject:* Re: Inner Join not returning any results in Phoenix >> >> >> >> Which version of Phoenix are you using, Matt? This feature is only >> available in the latest releases of 4.3/3.3. >> >> >> >> On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson < >> [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >> Hi Maryann, >> >> >> >> Unfortunately my two tables are roughly the same size (~500k), but I have >> tested a different join where one table is ~500k and the other is ~20k and >> putting the larger one first is definitely far more performant. I believe >> you are right about running out of memory, I can see this repeated a few >> times in the region server logs followed by what appears to be a restart or >> disconnect: >> >> >> >> *[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host >> machine (eg GC): pause of approximately 1083ms* >> >> >> >> I have been looking at the Phoenix page on joins ( >> http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge >> joins for large tables by using a hint. I have tried this though with no >> success: >> >> >> >> *SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc* >> >> *INNER JOIN “mytable2” bs* >> >> *On hc.”myId” = bs.”myId”* >> >> >> >> Am I putting the hint in the wrong place? Does it need to go next to the >> JOIN rather than the SELECT? >> >> >> >> I will try increasing the memory available to the Region Servers as well >> to see if that helps. >> >> >> >> Thanks! >> >> Matt >> >> >> >> >> >> *From:* Maryann Xue [mailto:[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>] >> *Sent:* 20 February 2015 16:28 >> *To:* [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');> >> >> >> *Subject:* Re: Inner Join not returning any results in Phoenix >> >> >> >> Hi Matt, >> >> >> >> The error you got with "Limit Rows" off might be related to insufficient >> memory on region servers for one of your tables. Which is the larger table >> between table1 and table2? You might want to try putting the larger table >> as the first table in your join query and see if it works. >> >> >> >> And I will quickly check if the LIMIT problem is a Phoenix bug and will >> keep you posted. >> >> >> >> >> >> Thanks, >> >> Maryann >> >> >> >> >> >> On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson < >> [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >> Hi Abe, >> >> >> >> Glad to hear I’m not alone! Will try and figure out exactly what’s >> happening and maybe raise a Jira :-) >> >> >> >> >> >> @Constantin – I have tried with and without the “Limit Rows” – but >> without it, and without any indexes, the query runs for a while (about 10 >> minutes?) and then throws an error: >> >> >> >> *Error: Encountered exception in sub plan [0] execution.* >> >> >> >> Which I’m guessing is either HBase or Zookeeper timeout. The weird thing >> is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit >> Rows” does not affect any aggregate functions like ‘count’, because the >> actual number of result rows for a count is just 1 row (the count itself). >> But in HBase it seems that the Row Limit, as Abe mentioned, is applied to >> one of the table BEFORE it does the join, so it affects the results of the >> ‘count’ function. >> >> >> >> When I try to create my indexes so I am able to do the join without Row >> Limit, I get the following error: >> >> >> >> ERROR 1029 (42Y88): Mutable secondary indexes must have the >> hbase.regionserver.wal.codec property set to >> org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the >> hbase-sites.xml of every region server >> >> >> >> Which I am happy to do (will have to wait until outside of business hours >> though), but I am curious, will this have any impact on the rest of my >> cluster and could it have any unforeseen consequences? >> >> >> >> Thanks again for the input! >> >> >> >> Cheers, >> >> Matt >> >> >> >> >> >> *From:* Ciureanu, Constantin (GfK) [mailto:[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>] >> *Sent:* 20 February 2015 15:48 >> >> >> *To:* [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');> >> *Subject:* RE: Inner Join not returning any results in Phoenix >> >> >> >> Hello Matt, >> >> >> >> http://codingclues.eu/2008/the-squirrel-100-rows-problem/ >> >> >> >> Can you please test again after unchecking “Contents- Limit rows” and >> “SQL – Limit rows”? >> >> >> >> [image: SQL tab] >> >> >> >> P.S. Off-topic – it’s as funny as this “problem” (not possible to send an >> email for more than 500 miles away J >> http://www.ibiblio.org/harris/500milemail.html ) >> >> >> >> Regards, >> >> Constantin >> >> >> >> *From:* Abe Weinograd [mailto:[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>] >> *Sent:* Friday, February 20, 2015 4:18 PM >> *To:* user >> *Subject:* Re: Inner Join not returning any results in Phoenix >> >> >> >> Matt, >> >> >> >> I have seen this same issue. When passing a LIMIT to a query with joins >> (most query tools do it implicitly), Phoenix seems to apply that to the >> table on the right of the join I believe. I hadn't had a chance to play >> with it more and file a JIRA, but what you are describing is consistent >> with what I have seen. >> >> >> >> Abe >> >> >> >> On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson < >> [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >> Hi Constantin, >> >> >> >> Many thanks for your reply – the quotes were both of the same type >> (double quotes for table and column names, single quotes for string >> literals), it is just my email client that formatted them weirdly, sorry! >> >> >> >> I have discovered what I believe is an important piece of the puzzle to >> my problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it >> has a “Limit Rows” feature. When I try and count the number of rows in a >> single table: >> >> >> >> *select count(*) from “mytable1”* >> >> >> >> I get the expected number of results (eg 20,000). But when I join two >> tables together, it seems that the “Limit Rows” from Squirrel is somehow >> being applied before the join is performed, and if “Limit Rows” is set to >> 100 I get 100 results or less. If the inner join is quite sparse (eg 20,000 >> rows in a table but only 100 of these will join with a second table) then I >> believe it tries to join the first 100 it finds and returns no results. In >> my experience of Oracle or MySQL, joins are done entirely on server side >> and then you just get back the number of rows you limited, rather than what >> appears to be happening which is the row limit is applied to the first >> table before the join is attempted with the second table. Is that how >> Phoenix works? >> >> >> >> I have also discovered that I get different results (with “Limit Rows” >> turned on) depending on which order I join the tables: >> >> >> >> *SELECT count(*) FROM “mytable1” hc* >> >> *INNER JOIN “mytable2” bs* >> >> *On hc.”myId” = bs.”myId”* >> >> >> >> Gives me a very different number of results than: >> >> >> >> *SELECT count(*) FROM “mytable2” bs* >> >> *INNER JOIN “mytable1” hc* >> >> *On hc.”myId” = bs.”myId”* >> >> >> >> >> >> Unfortunately I cannot test whether I get the same number of results with >> “Limit Rows” turned off because my query times out! So I am now looking at >> creating secondary indexes on the “myId” column in both tables to see if I >> am able to do this join quicker. Does a join like this use a lot of memory >> on server side? Is something likely to be running out of resources? >> >> >> >> Many thanks again for your time. >> >> >> >> Cheers, >> >> Matt >> >> >> >> >> >> *From:* Ciureanu, Constantin (GfK) [mailto:[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>] >> *Sent:* 20 February 2015 14:40 >> *To:* [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');> >> *Subject:* RE: Inner Join not returning any results in Phoenix >> >> >> >> Hi Matthew, >> >> >> >> Is it working without the quotes “ / *"* ? (I see you are using 2 >> types of quotes, weird) >> >> I guess that’s not needed, and probably causing troubles. I don’t have >> to use quotes anyway. >> >> >> >> Alternatively check the types of data in those 2 tables (if the field >> types are not the same in both tables, the join will not work). >> >> >> >> Good luck, >> >> Constantin >> >> >> >> *From:* Matthew Johnson [mailto:[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>] >> *Sent:* Friday, February 20, 2015 12:54 PM >> *To:* [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');> >> *Subject:* Inner Join not returning any results in Phoenix >> >> >> >> Hi guys, >> >> >> >> I’m a little bit stuck with doing an Inner Join with Phoenix. I set up >> one environment, created tables in HBase, and then created views (rather >> than tables) in Phoenix, and am able to query as expected (when I join my >> two tables I see results). I’ve just promoted to another environment, with >> the exact same setup, but my Inner Join returns no results! >> >> >> >> I run the following two individual queries: >> >> >> >> *SELECT * FROM "mytable1" hc* >> >> *where hc."myId" = 'XS0'* >> >> >> >> *SELECT * FROM "mytable2" bs* >> >> *where bs."myId" = 'XS0'* >> >> >> >> And both of these queries give results. But when I run: >> >> >> >> *SELECT * FROM "mytable1" hc* >> >> *INNER JOIN “mytable2” bs* >> >> *On hc."myId" = bs.”myId”* >> >> >> >> I get no results. I also get no results if I try: >> >> >> >> *SELECT * FROM "mytable1" hc* >> >> *where hc."myId" in (select distinct “myId” from “mytable2”)* >> >> >> >> I have checked in HBase shell and can see the “myId” value is as expected >> (XS0 in both tables). I am not sure if there are any logs that I can look >> at to get some insight? >> >> >> >> Many thanks in advance for any suggestions! >> >> >> >> Cheers, >> >> Matt >> >> >> >> *PS* Something that may or may not be of note: In the environments I am >> using: >> >> WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar* >> >> FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar* >> >> >> >> >> >> >> >> >> > >
