I did not see any issue when running with HBase 0.98.7 client bundled with Drill against HBase 1.1 servers.
I have just assigned DRILL-4199[1] to myself to evaluate moving to HBase 1.1 in next Drill release. [1] https://issues.apache.org/jira/browse/DRILL-4199 On Mon, Mar 21, 2016 at 12:13 PM, Kevin Verhoeven <[email protected] > wrote: > Aditya, > > Looking into the bug we read that the behavior will still occur if the > hbase-client version does not include the fix (between a 0.98 client and > 1.0 server). The hbase-client used by Drill under jars/3rdparty is > hbase-client-0.98.7-hadoop2.jar which does not include the fix. I updated > the hbase-client jar with hbase-client-0.98.17-hadoop2.jar, but I receive a > java.lang.NoClassDefFoundError error. Are you able to test Drill with an > updated hbase-client jar against CDH? Here is the error I received: > > 2016-03-21 18:58:36,874 [USER-rpc-event-queue] ERROR > o.a.d.exec.server.rest.QueryWrapper - Query Failed > org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: > Failure while loading table test6c in database hbase. > Message: com.google.protobuf.ServiceException: > java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge > > at > org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46) > [drill-rpc-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31) > [drill-rpc-1.4.0.jar:1.4.0] > at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69) > [drill-rpc-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400) > [drill-rpc-1.4.0.jar:1.4.0] > at > org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105) > [drill-rpc-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264) > [drill-rpc-1.4.0.jar:1.4.0] > at > org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142) > [drill-rpc-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:298) > [drill-rpc-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:269) > [drill-rpc-1.4.0.jar:1.4.0] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) > [netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) > [netty-handler-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > [netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) > [netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] > > Thanks, > > Kevin > > -----Original Message----- > From: Kevin Verhoeven [mailto:[email protected]] > Sent: Monday, March 21, 2016 11:33 AM > To: [email protected] > Cc: Kumiko Yada <[email protected]>; [email protected]; > [email protected]; [email protected]; Ki Kang <[email protected] > > > Subject: RE: Drill query does not return all results from HBase > > Thanks Aditya, > > I also see that the bug was backported in CDH 5.4.3: > https://archive.cloudera.com/cdh5/cdh/5/hbase-1.0.0-cdh5.4.3.releasenotes.html. > I tested Drill on CDH version 5.4.2, 5.4.3, 5.4.7, and 5.5.2 and see the > same behavior. > > Kevin > > From: Aditya [mailto:[email protected]] > Sent: Monday, March 21, 2016 10:26 AM > To: Kevin Verhoeven <[email protected]> > Cc: Kumiko Yada <[email protected]>; [email protected]; > [email protected]; [email protected]; Ki Kang <[email protected] > > > Subject: Re: Drill query does not return all results from HBase > > Since I suspected that it was a bug in HBase, I tried it with the original > version you reported in the first post in this thread, i.e. CDH 5.4.3. > If it was back-ported to 5.4.7, upgrading should fix this issue. > > On Mon, Mar 21, 2016 at 10:18 AM, Kevin Verhoeven < > [email protected]<mailto:[email protected]>> wrote: > Aditya, > > Thank you for your help. What version of CDH are you running? I contacted > Cloudera and they stated that bug HBASE-13262 is backported into CDH 5.4.7. > > Thanks, > > Kevin > > From: Aditya [mailto:[email protected]<mailto: > [email protected]>] > Sent: Sunday, March 20, 2016 10:45 PM > > To: Kumiko Yada <[email protected]<mailto:[email protected]>> > Cc: [email protected]<mailto:[email protected]>; > [email protected]<mailto:[email protected]>; > [email protected]<mailto:[email protected]>; Ki Kang < > [email protected]<mailto:[email protected]>>; Kevin Verhoeven < > [email protected]<mailto:[email protected]>> > Subject: Re: Drill query does not return all results from HBase > > Finally managed to reproduce it with CDH distribution (So far I was > testing with HBase 1.1 distributed with MapR, which does not have this bug). > This is essentially an HBase bug, HBASE-13262[1], which has been fixed in > 1.0.1, 1.1.0. > Please update your HBase distribution. > > [1] https://issues.apache.org/jira/browse/HBASE-13262 > > On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <[email protected] > <mailto:[email protected]>> wrote: > Aditya, > > When we were exchanging the emails, you mentioned to me that you > discovered another issue in case where the table is spit into multiple > regions and the first region returned to the client did not have any rows. > I think this issue is related to the issue that I’m seeing. Have you > opened the JIRA for this issue? Have you investigated/fixed this issue? > > Thanks > Kumiko > > From: Aditya [mailto:[email protected]<mailto: > [email protected]>] > Sent: Thursday, March 17, 2016 3:02 PM > To: Kumiko Yada <[email protected]<mailto:[email protected]>> > Cc: [email protected]<mailto:[email protected]>; > [email protected]<mailto:[email protected]>; > [email protected]<mailto:[email protected]>; Ki Kang < > [email protected]<mailto:[email protected]>>; Kevin Verhoeven < > [email protected]<mailto:[email protected]>> > > Subject: Re: Drill query does not return all results from HBase > > Hi Kumiko, > > I have tried to reproduce this locally with Apache 1.x release but have > failed so far. > From my mail exchange with Kevin on another thread, it appears that the > HBase scanner stops returning rows after a while which seem odd. > Probably it is unique to CDH distribution. I am planning to setup a single > node CDH cluster to see if it I can reproduce it there. > > On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <[email protected] > <mailto:[email protected]>> wrote: > Hello, > > I provided all information that was requested; however, I haven't heard > back anything since February 24. > > Is anyone taking look at this? Are there any workarounds? > > https://issues.apache.org/jira/browse/DRILL-4271 > > Thanks > Kumiko > > -----Original Message----- > From: Aditya [mailto:[email protected]<mailto: > [email protected]>] > Sent: Friday, February 19, 2016 12:48 PM > To: user <[email protected]<mailto:[email protected]>> > Cc: [email protected]<mailto:[email protected]>; Ki Kang < > [email protected]<mailto:[email protected]>>; Kevin Verhoeven < > [email protected]<mailto:[email protected]>> > Subject: Re: Drill query does not return all results from HBase > > Hi Kumiko, > > I apologies for not chiming in until now, considering that if there is a > bug here it is most probably put in by me :) > > I've assigned the JIRA to myself and going to take a l look. > > Would it be possible for you to either attach to the JIRA or send me > privately the Drill query profiles form both the correct and the incorrect > executions? > > Regards, > aditya... > > On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <[email protected] > <mailto:[email protected]>> wrote: > > > Hello, > > > > Does anyone have any update on this issue, > > https://issues.apache.org/jira/browse/DRILL-4271? Are there any plan > > that this would be investigated/fixed? > > > > Thanks > > Kumiko > > > > -----Original Message----- > > From: Kumiko Yada > > [mailto:[email protected]<mailto:[email protected]>] > > Sent: Thursday, January 14, 2016 3:44 PM > > To: [email protected]<mailto:[email protected]>; > > [email protected]<mailto:[email protected]> > > Subject: RE: Drill query does not return all results from HBase > > > > The query time was very short on the one with the incorrect result. > > > > Thanks > > Kumiko > > > > -----Original Message----- > > From: Jason Altekruse > > [mailto:[email protected]<mailto:[email protected]>] > > Sent: Thursday, January 14, 2016 1:25 PM > > To: user <[email protected]<mailto:[email protected]>> > > Subject: Fwd: Drill query does not return all results from HBase > > > > Thanks for the update, I'm forwarding your message back to the list. > > > > Just to confirm, was the query time longer on the the one with the > > incorrect result? In the incorrect case I think we are just misreading > > the HBase metadata during our optimization to return row counts > > without reading any data. This should be really fast, and noticeably > > different than running a complete query, even with a small dataset as > > we have to read in your table and run an aggregation over it. > > > > This would just be a final confirmation of where the issue is > > occurring, I will hopefully have time soon to get this fixed but I'm > > wrapping up some other things right now. > > > > > > ---------- Forwarded message ---------- > > From: Kumiko Yada > > <[email protected]<mailto:[email protected]>> > > Date: Thu, Jan 14, 2016 at 12:53 PM > > Subject: RE: Drill query does not return all results from HBase > > To: Jason Altekruse > > <[email protected]<mailto:[email protected]>> > > > > > > Jason, > > > > > > > > I’m sorry. My testing was incorrect last night. I’m not sure what I > > did differently; however your guess were correct. When I did the one > > column count, the row count was correct. Here is the additional testing > results. > > > > > > > > My company has been invested to use the drill, and it’s very important > > for us that this is fixed. Let me know if I can do anything to get > > this issue to be fixed. I really appreciate you that you are looking > into issue! > > > > Hbase table (1 column family, 5 columns, 10000000 rows) > > > > COUNT(*) - row count is correct > > > > 1 column count - row count is correct > > > > *Hbase table (1 column family, 6 columns, 10000000 rows)* > > > > *COUNT(*) - row count is incorrect (**returned 6724 rows)* > > > > 1 column count - row count is correct > > > > *Hbase table (2 column family, 6 columns in each columns family, > > 10000000 > > rows)* > > > > *COUNT(*) - row count is incorrect (returned 3362 rows)* > > > > 1 column count - row count is correct > > > > Hbase table (2 column family, 2 columns in each columns family, > > 10000000 > > rows) > > > > COUNT(*) - row count is correct > > > > 1 column count - row count is correct > > > > *Hbasetable (2 column family, 4 columns in one column family and 2 > > columns in other column family, 10000000 rows)* > > > > *COUNT(*) - row count is incorrect (returned 6723 rows)* > > > > 1 column count - row count is correct > > > > Hbasetable (2 column family, 1 column in one column family and 3 > > columns in other column family, 10000000 rows) > > > > COUNT(*) - row count is correct > > > > 1 column count - row count is correct > > > > > > > > Thanks > > > > Kumiko > > > > > > > > *From:* Kumiko Yada > > *Sent:* Wednesday, January 13, 2016 7:28 PM > > *To:* 'Jason Altekruse' > > <[email protected]<mailto:[email protected]>> > > *Cc:* Ki Kang <[email protected]<mailto:[email protected]>>; Kevin > > Verhoeven < > > [email protected]<mailto:[email protected]>> > > *Subject:* RE: Drill query does not return all results from HBase > > > > > > > > I also run the query to display only 1 column with no limit to try > > force a full scan, but the result was the same, just 10000 rows > > selected. With the same table (contains 6 columns), I run the query > > to display the row_key, and it display all records, 10,000,000 rows. > > > > > > > > -Kumiko > > > > > > > > *From:* Kumiko Yada > > *Sent:* Wednesday, January 13, 2016 7:24 PM > > *To:* 'Jason Altekruse' > > <[email protected]<mailto:[email protected]>> > > *Cc:* Ki Kang <[email protected]<mailto:[email protected]>>; Kevin > > Verhoeven < > > [email protected]<mailto:[email protected]>> > > *Subject:* RE: Drill query does not return all results from HBase > > > > > > > > Jason > > > > > > > > I run the query to display only 1 column for 100000 rows, and it only > > returned 10000 rows. > > > > > > > > -Kumiko > > > > > > > > *From:* Jason Altekruse > > [mailto:[email protected]<mailto:[email protected]> < > > [email protected]<mailto:[email protected]>>] > > *Sent:* Wednesday, January 13, 2016 6:39 PM > > *To:* Kumiko Yada > > <[email protected]<mailto:[email protected]>> > > *Cc:* Ki Kang <[email protected]<mailto:[email protected]>>; Kevin > > Verhoeven < > > [email protected]<mailto:[email protected]>> > > > > *Subject:* Re: Drill query does not return all results from HBase > > > > > > > > I know in a number of cases we have special optimizer rules that try > > to skip reading the dataset all together if we have metadata for the > > number of rows and all that is requested is a count(*). I assume that > > this is the case with HBase, and this may be where we aren't doing > something correctly. > > Can you try to run a 'sum', or other aggregate query on one of the > > columns to see if a full scan of the data is operating correctly? > > > > > > > > On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada > > <[email protected]<mailto:[email protected]>> > > wrote: > > > > Thank you, Jason! > > > > Let me know if you need any help on this. I will be glad to help on > > repro and/or test the fix. > > > > Thanks > > Kumiko > > > > -----Original Message----- > > From: Jason Altekruse > > [mailto:[email protected]<mailto:[email protected]>] > > Sent: Wednesday, January 13, 2016 6:24 PM > > To: user <[email protected]<mailto:[email protected]>> > > > > Cc: Aditya Kishore > > <[email protected]<mailto:[email protected]>>; Kevin > > Verhoeven < > > [email protected]<mailto:[email protected]>> > > Subject: Re: Drill query does not return all results from HBase > > > > Thanks for filing the issue. I haven't worked much with HBase, but > > this is a critical wrong results issues, so I will be taking a look at > > this soon if no one else raises their hand. > > > > On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada > > <[email protected]<mailto:[email protected]>> > > wrote: > > > > > I opened the bug on this. The drill is returning the correct rows > > > when the hbase contains 5 or less columns, but not 6 or more columns. > > > > > > https://issues.apache.org/jira/browse/DRILL-4271 > > > > > > Thanks > > > Kumiko > > > > > > -----Original Message----- > > > From: Kumiko Yada > > > [mailto:[email protected]<mailto:[email protected]>] > > > Sent: Wednesday, January 13, 2016 4:52 PM > > > To: [email protected]<mailto:[email protected]> > > > Cc: Aditya Kishore > > > <[email protected]<mailto:[email protected]>>; Kevin > > > Verhoeven < > > > [email protected]<mailto:[email protected]>> > > > Subject: RE: Drill query does not return all results from HBase > > > > > > We are using the HBase 1.0.0. & CDH 5.4. I found out the correct > > > row count returned when the Hbase table contains only 1 column > > > family, 1 column, but the incorrect row count is returned for the > > > Hbase table contains 1 column family, 6 columns. > > > > > > This looks like the Drill issue. Has anyone found any workaround? > > > > > > Thanks > > > Kumiko > > > > > > -----Original Message----- > > > From: Abhishek Girish > > > [mailto:[email protected]<mailto:[email protected]>] > > > Sent: Tuesday, January 12, 2016 6:51 PM > > > To: user <[email protected]<mailto:[email protected]>> > > > Cc: Aditya Kishore > > > <[email protected]<mailto:[email protected]>> > > > Subject: Re: Drill query does not return all results from HBase > > > > > > Well, the major version din't change if I remember it right, hence > > > did not share the info in my previous mail. I'm on HBase 1.1.1 right > > > now and don't see the issue. Also, I am on a MapR setup, which might > > > not be comparable with their CDH setups. > > > > > > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse > > > <[email protected]<mailto:[email protected]> > > > > > > > wrote: > > > > > > > Abhishek, > > > > > > > > What version of HBase did you have the problem with, and what > > > > version did you upgrade to that solved the problem? I assume this > > > > would be useful information to compare your setup with Kevin's and > > Kumiko's. > > > > > > > > - Jason > > > > > > > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish < > > > > [email protected]<mailto:[email protected]> > > > > > wrote: > > > > > > > > > I hit a very similar issue recently. Via HBase shell, i was able > > > > > to fetch all records, whereas I was only able to see a small > > > > > subset of records > > > > when > > > > > queried from Drill. Each time I inserted 1000 records, only > > > > > about > > > > > 50 of those would show up. > > > > > > > > > > Although I could repro' the problem consistently, it was > > > > > resolved once i updated my Hadoop setup. My guess is that it was > > > > > a HBase bug which got resolved. Although strange as it seems, it > > > > > might not have to do with > > > > Drill > > > > > itself. > > > > > > > > > > -Abhishek > > > > > > > > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse < > > > > [email protected]<mailto:[email protected]> > > > > > > > > > > > wrote: > > > > > > > > > > > I'm not sure why this is happening, we have tests in our > > > > > > automated > > > > suite > > > > > > that I believe run some pretty large queries against Hbase and > > > > > > verify > > > > the > > > > > > results. > > > > > > > > > > > > Aditya, do you have some time available to try to reproduce > > > > > > this and diagnose the problem? > > > > > > > > > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada > > > > > > <[email protected]<mailto:[email protected]>> > > > > > wrote: > > > > > > > > > > > > > I'm having the same issue. Is there any workaround for this? > > > > > > > > > > > > > > Thanks > > > > > > > Kumiko > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Kevin Verhoeven > > > > > > > [mailto:[email protected]<mailto:Kevin.Verhoeven@ds- > > > > > > > iq.com>] > > > > > > > Sent: Monday, December 21, 2015 10:37 AM > > > > > > > To: [email protected]<mailto:[email protected]> > > > > > > > Subject: Drill query does not return all results from HBase > > > > > > > > > > > > > > We have a problem where a Drill query against HBase does not > > > > > > > return > > > > all > > > > > > > results. The following query should return over 100,000 > > > > > > > rows, but we > > > > > only > > > > > > > get about 1,030 back. > > > > > > > > > > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE > > > > > > > customer_number = > > > > > 800 > > > > > > > > > > > > > > If we scan directly using the hbase shell we see over > > > > > > > 100,000 rows, > > > > but > > > > > > > the same Drill query does not return a fraction of the > > > > > > > expected > > > > > results. > > > > > > We > > > > > > > have also run a count against the table and Drill returns > > > > > > > the same > > > > > 1,030 > > > > > > > number, which is far less than expect. What could be going > wrong? > > > > > > > > > > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 > > > > > > > (HBase > > > > 1.0). > > > > > > We > > > > > > > run HBase on six RegionServers, the table has about 1.3 > > > > > > > billion > > > rows. > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Kevin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
