Matt, I have seen this same issue. When passing a LIMIT to a query with joins (most query tools do it implicitly), Phoenix seems to apply that to the table on the right of the join I believe. I hadn't had a chance to play with it more and file a JIRA, but what you are describing is consistent with what I have seen.
Abe On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson <[email protected]> wrote: > Hi Constantin, > > > > Many thanks for your reply – the quotes were both of the same type (double > quotes for table and column names, single quotes for string literals), it > is just my email client that formatted them weirdly, sorry! > > > > I have discovered what I believe is an important piece of the puzzle to my > problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has > a “Limit Rows” feature. When I try and count the number of rows in a single > table: > > > > *select count(*) from “mytable1”* > > > > I get the expected number of results (eg 20,000). But when I join two > tables together, it seems that the “Limit Rows” from Squirrel is somehow > being applied before the join is performed, and if “Limit Rows” is set to > 100 I get 100 results or less. If the inner join is quite sparse (eg 20,000 > rows in a table but only 100 of these will join with a second table) then I > believe it tries to join the first 100 it finds and returns no results. In > my experience of Oracle or MySQL, joins are done entirely on server side > and then you just get back the number of rows you limited, rather than what > appears to be happening which is the row limit is applied to the first > table before the join is attempted with the second table. Is that how > Phoenix works? > > > > I have also discovered that I get different results (with “Limit Rows” > turned on) depending on which order I join the tables: > > > > *SELECT count(*) FROM “mytable1” hc* > > *INNER JOIN “mytable2” bs* > > *On hc.”myId” = bs.”myId”* > > > > Gives me a very different number of results than: > > > > *SELECT count(*) FROM “mytable2” bs* > > *INNER JOIN “mytable1” hc* > > *On hc.”myId” = bs.”myId”* > > > > > > Unfortunately I cannot test whether I get the same number of results with > “Limit Rows” turned off because my query times out! So I am now looking at > creating secondary indexes on the “myId” column in both tables to see if I > am able to do this join quicker. Does a join like this use a lot of memory > on server side? Is something likely to be running out of resources? > > > > Many thanks again for your time. > > > > Cheers, > > Matt > > > > > > *From:* Ciureanu, Constantin (GfK) [mailto:[email protected]] > *Sent:* 20 February 2015 14:40 > *To:* [email protected] > *Subject:* RE: Inner Join not returning any results in Phoenix > > > > Hi Matthew, > > > > Is it working without the quotes “ / *"* ? (I see you are using 2 > types of quotes, weird) > > I guess that’s not needed, and probably causing troubles. I don’t have > to use quotes anyway. > > > > Alternatively check the types of data in those 2 tables (if the field > types are not the same in both tables, the join will not work). > > > > Good luck, > > Constantin > > > > *From:* Matthew Johnson [mailto:[email protected] > <[email protected]>] > *Sent:* Friday, February 20, 2015 12:54 PM > *To:* [email protected] > *Subject:* Inner Join not returning any results in Phoenix > > > > Hi guys, > > > > I’m a little bit stuck with doing an Inner Join with Phoenix. I set up one > environment, created tables in HBase, and then created views (rather than > tables) in Phoenix, and am able to query as expected (when I join my two > tables I see results). I’ve just promoted to another environment, with the > exact same setup, but my Inner Join returns no results! > > > > I run the following two individual queries: > > > > *SELECT * FROM "mytable1" hc* > > *where hc."myId" = 'XS0'* > > > > *SELECT * FROM "mytable2" bs* > > *where bs."myId" = 'XS0'* > > > > And both of these queries give results. But when I run: > > > > *SELECT * FROM "mytable1" hc* > > *INNER JOIN “mytable2” bs* > > *On hc."myId" = bs.”myId”* > > > > I get no results. I also get no results if I try: > > > > *SELECT * FROM "mytable1" hc* > > *where hc."myId" in (select distinct “myId” from “mytable2”)* > > > > I have checked in HBase shell and can see the “myId” value is as expected > (XS0 in both tables). I am not sure if there are any logs that I can look > at to get some insight? > > > > Many thanks in advance for any suggestions! > > > > Cheers, > > Matt > > > > *PS* Something that may or may not be of note: In the environments I am > using: > > WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar* > > FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar* >
