Re: hbase regionserver all dead
Regards, ch huang. Which version of HBase are you using? Please, check the following things: - zookeeper session timeout - zookeeper ticktime - hbase.zookeeper.property.maxClientsCnxns (default 35) - ulimit - increase the quantity of open files (32k or more) 2014-03-04 2:22 GMT+01:00 ch huang justlo...@gmail.com: hi,maillist: this morning i check my hbase cluster log,and find all region server down ,i do not know why,hope some expert can show me some clue, here is the log which i find in the first death happened node 2014-03-03 17:16:11,413 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=16.78 MB, free=1.98 GB, max=2.00 GB, blocks=0, accesses=82645, hits=4, hitRatio=0.00%, , cachingAccesses=5, cachingHits=0, cachingHitsRatio=0, evictions=0, evicted=5, evictedPerRun=Infinity 2014-03-03 17:20:30,093 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server listener on 60020: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1798) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1181) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:750) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:541) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:516) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 2014-03-03 17:21:11,413 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=17.28 MB, free=1.98 GB, max=2.00 GB, blocks=4, accesses=88870, hits=3112, hitRatio=3.5 0%, , cachingAccesses=3117, cachingHits=3108, cachingHitsRatio=99.71%, , evictions=0, evicted=5, evictedPerRun=Infinity 2014-03-03 17:21:45,112 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1043055049-192.168.11.11-1382442676609:blk_-716939259337 565008_4210841 java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:695) 2014-03-03 17:21:45,116 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-1043055049-192.168.11.11-1382442676609:blk_-716939259337565008_4210841 in pipeline 192 .168.11.14:50010, 192.168.11.10:50010, 192.168.11.15:50010: bad datanode 192.168.11.14:50010 2014-03-03 17:24:58,114 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:36837,call:next(-2524485469510465096, 100), rpc version=1, client versi on=29, methodsFingerPrint=-1368823753,client:192.168.11.174:39642 ,starttimems:1393838661274,queuetimems:0,class:HRegionServer,responsesize:6,method:next} 2014-03-03 17:24:58,117 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:36880,call:next(6510031569997476480, 100), rpc version=1, client versio n=29, methodsFingerPrint=-1368823753,client:192.168.11.174:39642 ,starttimems:1393838661234,queuetimems:1,class:HRegionServer,responsesize:6,method:next} 2014-03-03 17:24:58,117 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:36880,call:next(-8080468273710364924, 100), rpc version=1, client versi on=29, methodsFingerPrint=-1368823753,client:192.168.11.174:39642 ,starttimems:1393838661234,queuetimems:1,class:HRegionServer,responsesize:6,method:next} 2014-03-03 17:24:58,118 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:36882,call:next(-1838307716001367158, 100), rpc version=1, client version=29, methodsFingerPrint=-1368823753,client: 192.168.11.174:39642 ,starttimems:1393838661234,queuetimems:1,class:HRegionServer,responsesize:6,method:next} 2014-03-03 17:24:58,119 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 38421ms for sessionid 0x441fb1d01a1759, closing socket connection and attempting reconnect 2014-03-03 17:24:58,119 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 43040ms for sessionid
Re: Compile and run HBase 0.98 from svn?
Did you read here? https://hbase.apache.org/book/build.html 2014-03-01 21:55 GMT-04:30 Jean-Marc Spaggiari jean-m...@spaggiari.org: Hi all, Any hint on the way to build and run 0.98? svn co http://svn.apache.org/repos/asf/hbase/branches/0.98/ hbase-0.98 mvn clean install assembly:assembly -DskipTests -Prelease All passed. Update hbase-env hbase-site and regionservers. Distribute all the directory on 4 servers. But... something is missing. Doing that, I miss many libs, like zookeeper-3.4.5.jar. So I guess I'm not building it the right way. Can anyone point me to the right steps? Thanks, JM -- Marcos Ortiz Valmaseda http://about.me/marcosortiz
Re: Compile and run HBase 0.98 from svn?
Well, tomorrow, you could read this old post from Praveen: http://praveen.kumar.in/2011/06/20/building-hadoop-and-hbase-for-hbase-maven-application-development/ 2014-03-01 22:49 GMT-04:30 Jean-Marc Spaggiari jean-m...@spaggiari.org: Yep. And the page after. I tried this too: MAVEN_OPTS=-Xmx3g mvn -f pom.xml clean install -DskipTests javadoc:aggregate site assembly:single -Prelease With no success... I will continue tomorrow... 2014-03-01 21:44 GMT-05:00 Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com: Did you read here? https://hbase.apache.org/book/build.html 2014-03-01 21:55 GMT-04:30 Jean-Marc Spaggiari jean-m...@spaggiari.org : Hi all, Any hint on the way to build and run 0.98? svn co http://svn.apache.org/repos/asf/hbase/branches/0.98/hbase-0.98 mvn clean install assembly:assembly -DskipTests -Prelease All passed. Update hbase-env hbase-site and regionservers. Distribute all the directory on 4 servers. But... something is missing. Doing that, I miss many libs, like zookeeper-3.4.5.jar. So I guess I'm not building it the right way. Can anyone point me to the right steps? Thanks, JM -- Marcos Ortiz Valmaseda http://about.me/marcosortiz -- Marcos Ortiz Valmaseda http://about.me/marcosortiz
Re: HTablePool is deprecated, any alternatives?
You are right Li Li. HTablePool was deprecated in 0.94, 0.95/0.96, and removed in 0.98: See: https://issues.apache.org/jira/browse/HBASE-6580 Use HConnection instead: http://comments.gmane.org/gmane.comp.java.hadoop.hbase.devel/38950 2014-02-12 22:13 GMT-04:30 Li Li fancye...@gmail.com: I am using hbase 0.94.11. it says HTablePool is deprecated. is there any alternatives for it? -- Marcos Ortiz Valmaseda http://about.me/marcosortiz
Re: Please welcome our newest committer, Rajeshbabu Chintaguntla
Congratulations, Rajeshbabu !! 2013/9/11 ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com Hi All, Please join me in welcoming Rajeshbabu (Rajesh) as our new HBase committer. Rajesh has been there for more than a year and has been solving some very good bugs around the Assignment Manger area. He has been working on other stuff like HBase-Mapreduce performance improvement, migration scripts and off late in the Secondary Index related things. Rajesh has made his first commit to the pom.xml already. Once again, congratulations and welcome to this new role (smile). Cheers Ram -- Marcos Ortiz Valmaseda Product Manager at PDVSA http://about.me/marcosortiz
Re: Please welcome our newest committer, Nick Dimiduk
Congratulations, Nick !!! Keep doing this great work 2013/9/10 ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com Congratulations Nick.!!! On Wed, Sep 11, 2013 at 9:15 AM, rajeshbabu chintaguntla rajeshbabu.chintagun...@huawei.com wrote: Congratulations Nick. From: lars hofhansl [la...@apache.org] Sent: Wednesday, September 11, 2013 7:30 AM To: d...@hbase.apache.org; hbase-user Subject: Re: Please welcome our newest committer, Nick Dimiduk Congrats Nick, great to have you on board! - Original Message - From: Enis Söztutar e...@apache.org To: d...@hbase.apache.org d...@hbase.apache.org; hbase-user user@hbase.apache.org Cc: Sent: Tuesday, September 10, 2013 3:54 PM Subject: Please welcome our newest committer, Nick Dimiduk Hi, Please join me in welcoming Nick as our new addition to the list of committers. Nick is exceptionally good with user-facing issues, and has done major contributions in mapreduce related areas, hive support, as well as 0.96 issues and the new and shiny data types API. Nick, as tradition, feel free to do your first commit to add yourself to pom.xml. Cheers, Enis -- Marcos Ortiz Valmaseda Product Manager at PDVSA http://about.me/marcosortiz
Re: Hbase keeps dying (Zookeeper)
Regards, Trevor. hadoop-hbase-0.90.6+84.73-1 hadoop-zookeeper-3.3.5+19.5-1 hadoop-0.20.2+923.421-1 Why not to upgrade your components? HBase to the last 0.94.10: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12324627 Zookeeper to the last 3.4.5: http://zookeeper.apache.org/doc/r3.4.5/releasenotes.html Hadoop 1.2.1: http://hadoop.apache.org/docs/r1.2.1/releasenotes.html That's my first advice. Now, from 3.3.5 to 3.4.5, there a lot of bug fixes and a lot of improvements: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801version=12321883 2013/8/9, Trevor Antczak tantc...@operasolutions.com: So I've done some more research into this and it appears that my Zookeeper doesn't have /hbase/master. From zkCli: [zk: localhost:2181(CONNECTED) 3] ls /hbase [splitlog, unassigned, root-region-server, rs, table, shutdown] [zk: localhost:2181(CONNECTED) 4] get /hbase/master Node does not exist: /hbase/master [zk: localhost:2181(CONNECTED) 5] I have no idea how this could have happened, but is there a way to regenerate the node in zookeeper? All of the other expected nodes are there. It seems from the logs that everything was fine with hbase until 12:01 AM on August 1st, at which point it just stopped working. I can't find any reason that any of this has happened either. It's all very strange. Trevor -Original Message- From: Trevor Antczak [mailto:tantc...@operasolutions.com] Sent: Monday, August 05, 2013 2:40 PM To: user@hbase.apache.org Subject: RE: Hbase keeps dying (Zookeeper) hadoop-hbase-0.90.6+84.73-1 hadoop-zookeeper-3.3.5+19.5-1 hadoop-0.20.2+923.421-1 Yes, hbase is managing the Quorum. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Monday, August 05, 2013 12:39 PM To: user@hbase.apache.org Subject: Re: Hbase keeps dying (Zookeeper) bq. there wasn't a copy of hdfs-site.xml Can you tell us the versions of: hadoop hbase zookeeper you're using ? Did you let HBase manage your zookeeper quorum ? On Mon, Aug 5, 2013 at 9:15 AM, Trevor Antczak tantc...@operasolutions.comwrote: Hi all, I have an hbase system that has worked fine for quite a long time, but now it is quite suddenly developing errors. First it was dying immediately on startup because there wasn't a copy of hdfs-site.xml in the hbase conf directory (which doesn't seem like it should be necessary, and I'm not sure how it got moved if it had been there in the first place). I copied the hdfs-site-xml from /etc/hadoops/conf into /etc/hbase/conf. Now hbase starts up, but it can never connect to Zookeeper and dies after a few minutes of trying. The weird thing, is that according to Zookeeper the connection is happening. From the hbase logs I get a ton of messages like: 2013-08-05 11:57:19,019 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x4403f9ef5b20026 Creating (or updating) unassigned node for 0f3ca79375768472af70765ff231ee32 with OFFLINE state 2013-08-05 11:57:19,020 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=hmaster:6, region=0f3ca79375768472af70765ff231ee32 Eventually followed by: 2013-08-05 11:57:19,105 WARN org.apache.zookeeper.ClientCnxn: Session 0x4403f9ef5b20026 for server hslave14/172.20.7.124:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Packet len4935980 is out of range! at org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:708) at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:867) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1154) And then a bunch more Java errors as the process dies. From the Zookeeper logs I see the hbase server connect: 13/08/05 11:40:27 INFO server.NIOServerCnxn: Accepted socket connection from /xxx.xxx.xxx.xxx:34879 13/08/05 11:40:27 INFO server.NIOServerCnxn: Client attempting to establish new session at /xxx.xxx.xxx.xxx:34879 13/08/05 11:40:27 INFO server.NIOServerCnxn: Established session 0x1404ee40a8d000c with negotiated timeout 4 for client /xxx.xxx.xxx.xxx:34879 Then disconnect, but only after it shuts down: 13/08/05 11:45:52 INFO server.NIOServerCnxn: Closed socket connection for client /xxx.xxx.xxx.xxx:34879 which had sessionid 0x1404ee40a8d000c Does anyone have any clever ideas of places I can look for this error? Or why I'm suddenly having this problem when I haven't changed anything? Thanks in advance for any help provided. Trevor -- Marcos Ortiz Valmaseda Product Manager at PDVSA http://about.me/marcosortiz
Re: Hbase keeps dying (Zookeeper)
Regards, Trevor. hadoop-hbase-0.90.6+84.73-1 hadoop-zookeeper-3.3.5+19.5-1 hadoop-0.20.2+923.421-1 Why not to upgrade your components? HBase to the last 0.94.10: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12324627 Zookeeper to the last 3.4.5: http://zookeeper.apache.org/doc/r3.4.5/releasenotes.html Hadoop 1.2.1: http://hadoop.apache.org/docs/r1.2.1/releasenotes.html That's my first advice. Now, from 3.3.5 to 3.4.5, there a lot of bug fixes and a lot of improvements: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801version=12321883 2013/8/9, Trevor Antczak tantc...@operasolutions.com: So I've done some more research into this and it appears that my Zookeeper doesn't have /hbase/master. From zkCli: [zk: localhost:2181(CONNECTED) 3] ls /hbase [splitlog, unassigned, root-region-server, rs, table, shutdown] [zk: localhost:2181(CONNECTED) 4] get /hbase/master Node does not exist: /hbase/master [zk: localhost:2181(CONNECTED) 5] I have no idea how this could have happened, but is there a way to regenerate the node in zookeeper? All of the other expected nodes are there. It seems from the logs that everything was fine with hbase until 12:01 AM on August 1st, at which point it just stopped working. I can't find any reason that any of this has happened either. It's all very strange. Trevor -Original Message- From: Trevor Antczak [mailto:tantc...@operasolutions.com] Sent: Monday, August 05, 2013 2:40 PM To: user@hbase.apache.org Subject: RE: Hbase keeps dying (Zookeeper) hadoop-hbase-0.90.6+84.73-1 hadoop-zookeeper-3.3.5+19.5-1 hadoop-0.20.2+923.421-1 Yes, hbase is managing the Quorum. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Monday, August 05, 2013 12:39 PM To: user@hbase.apache.org Subject: Re: Hbase keeps dying (Zookeeper) bq. there wasn't a copy of hdfs-site.xml Can you tell us the versions of: hadoop hbase zookeeper you're using ? Did you let HBase manage your zookeeper quorum ? On Mon, Aug 5, 2013 at 9:15 AM, Trevor Antczak tantc...@operasolutions.comwrote: Hi all, I have an hbase system that has worked fine for quite a long time, but now it is quite suddenly developing errors. First it was dying immediately on startup because there wasn't a copy of hdfs-site.xml in the hbase conf directory (which doesn't seem like it should be necessary, and I'm not sure how it got moved if it had been there in the first place). I copied the hdfs-site-xml from /etc/hadoops/conf into /etc/hbase/conf. Now hbase starts up, but it can never connect to Zookeeper and dies after a few minutes of trying. The weird thing, is that according to Zookeeper the connection is happening. From the hbase logs I get a ton of messages like: 2013-08-05 11:57:19,019 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x4403f9ef5b20026 Creating (or updating) unassigned node for 0f3ca79375768472af70765ff231ee32 with OFFLINE state 2013-08-05 11:57:19,020 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=hmaster:6, region=0f3ca79375768472af70765ff231ee32 Eventually followed by: 2013-08-05 11:57:19,105 WARN org.apache.zookeeper.ClientCnxn: Session 0x4403f9ef5b20026 for server hslave14/172.20.7.124:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Packet len4935980 is out of range! at org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:708) at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:867) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1154) And then a bunch more Java errors as the process dies. From the Zookeeper logs I see the hbase server connect: 13/08/05 11:40:27 INFO server.NIOServerCnxn: Accepted socket connection from /xxx.xxx.xxx.xxx:34879 13/08/05 11:40:27 INFO server.NIOServerCnxn: Client attempting to establish new session at /xxx.xxx.xxx.xxx:34879 13/08/05 11:40:27 INFO server.NIOServerCnxn: Established session 0x1404ee40a8d000c with negotiated timeout 4 for client /xxx.xxx.xxx.xxx:34879 Then disconnect, but only after it shuts down: 13/08/05 11:45:52 INFO server.NIOServerCnxn: Closed socket connection for client /xxx.xxx.xxx.xxx:34879 which had sessionid 0x1404ee40a8d000c Does anyone have any clever ideas of places I can look for this error? Or why I'm suddenly having this problem when I haven't changed anything? Thanks in advance for any help provided. Trevor -- Marcos Ortiz Valmaseda Product Manager at PDVSA http://about.me/marcosortiz
Re: Region not splitted
Regards, Jean-Marc. What version of HBase are you using? In the new version of the platform (0.94), there a lot of improvements for auto spliting and pre-spliting regions. The great Hortonworks's team published an amazing post for this particular topic: http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/ 2013/8/9, Jean-Marc Spaggiari jean-m...@spaggiari.org: Hi, Quick question regarding the split. Let's consider the table work_proposed' below: 275164921921 hdfs://node3:9000/hbase/work_proposed This is a 256GB table. I think there is more than 1B lines into it but I have not counted them for a while. This table as a pretty default definition: hbase(main):001:0 describe 'work_proposed' DESCRIPTION ENABLED 'work_proposed', {NAME = '@', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '3', TTL = '2147483647', MIN true _VERSIONS = '0', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', ENCODE_ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'a', DATA_BLOCK_ENCODIN G = 'NONE', BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', VERSIONS = '3', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'} 1 row(s) in 0.7590 seconds Those are all default parameters. Which mean, the default FILE_SIZE value is 10GB. If I look into Hannibal, it's fine. I can see my table, the regions, the red line at 10GB showing the max size before the split, etc. All the regions are under this line except one! hadoop@buldo:~/hadoop-1.0.3$ bin/hadoop fs -ls /hbase/work_proposed/46f8ea6e24982fbeb249a4516c879109/@ Found 1 items -rw-r--r-- 3 hbase supergroup 22911054018 2013-08-03 20:57 /hbase/work_proposed/46f8ea6e24982fbeb249a4516c879109/@/404fcf681e5e4fdbac99db80345b011b This region is 21GB. And it doesn't want to split. The first thing you will say is it's because I have one single 21GB row in this region, but I don't think so. My rows are URLs. I will be surprised if I have a 21GB URL ;) I triggered major_compact many times, I stopped/start the cluster many times, nothing. I can most probably ask for a manual split and that will work, but I want to take this oportunity to figure why it's not splitting, if it should be, and if there is any defect behind that. I have not found any exception in the logs. I just started another major_compaction and will grep the region name from the logs, but any idea why I'm facing that, and where in the code I should start to look at? I can deploy customized code to show more logs if required. I still start to look at the split policies... JM -- Marcos Ortiz Valmaseda Product Manager at PDVSA http://about.me/marcosortiz
Fwd: scan very slow in hbase
Regards, ch. Which version of HBase are you using? HBase´s development group have worked very hard in Scans improvements http://www.slideshare.net/cloudera/hbase-consistency-and-performance-finalhttp://es.slideshare.net/cloudera/hbase-consistency-and-performance-final http://www.slideshare.net/cloudera/6-real-time-analytics-with-h-base-alex-baranau-sematext-final-3-updated-last-minutehttp://es.slideshare.net/cloudera/6-real-time-analytics-with-h-base-alex-baranau-sematext-final-3-updated-last-minute http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-finalhttp://es.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final -- Forwarded message -- From: ch huang justlo...@gmail.com Date: 2013/8/6 Subject: Re: scan very slow in hbase To: user@hbase.apache.org i find the error is because i stop the running java code,i have not found why the scan is so slow On Tue, Aug 6, 2013 at 10:38 PM, Stack st...@duboce.net wrote: I would suggest you search the mail archives before posting first (you will usually get your answer faster if you go this route). The below has been answered in the recent past. See http://search-hadoop.com/m/5tk8QnhFqw Thanks, St.Ack On Tue, Aug 6, 2013 at 12:39 AM, ch huang justlo...@gmail.com wrote: my workmate tell me hbase is very slow for scan something ,i check the region server find the following information,anyone can help? 13/08/06 15:30:34 WARN ipc.HBaseServer: IPC Server listener on 60020: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) at sun.nio.ch.IOUtil.read(IOUtil.java:171) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) at org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1796) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1179) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 13/08/06 15:30:34 ERROR regionserver.HRegionServer: org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call next(1916648340315433886, 1), rpc version=1, client version=29, methodsFingerPrint=-1368823753 from 192.168.2.209:1150 after 8504 ms, since caller disconnected at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3856) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3776) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3768) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2513) at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) -- Marcos Ortiz Valmaseda Product Manager at PDVSA http://about.me/marcosortiz
Re: scan very slow in hbase
Regards, ch. I was looking a post about this topic and I found it. It was posted in the Ericsson Labs´s blog, and it talked about the approach they followed to improve HBase performance in several ways, including Scans. You can read it here: http://labs.ericsson.com/blog/hbase-performance-tuners Other resources: Ameya´s talk about the HBase´s use at Groupon (See his last slide): http://www.slideshare.net/cloudera/case-studies-session-3bhttp://es.slideshare.net/cloudera/case-studies-session-3b Manoj and Goving talk: http://www.slideshare.net/cloudera/hbasecon-2013-evolving-a-firstgeneration-apache-hbase-deployment-to-second-generation-and-beyondhttp://es.slideshare.net/cloudera/hbasecon-2013-evolving-a-firstgeneration-apache-hbase-deployment-to-second-generation-and-beyond 2013/8/6 Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com Regards, ch. Which version of HBase are you using? HBase´s development group have worked very hard in Scans improvements http://www.slideshare.net/cloudera/hbase-consistency-and-performance-finalhttp://es.slideshare.net/cloudera/hbase-consistency-and-performance-final http://www.slideshare.net/cloudera/6-real-time-analytics-with-h-base-alex-baranau-sematext-final-3-updated-last-minutehttp://es.slideshare.net/cloudera/6-real-time-analytics-with-h-base-alex-baranau-sematext-final-3-updated-last-minute http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-finalhttp://es.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final -- Forwarded message -- From: ch huang justlo...@gmail.com Date: 2013/8/6 Subject: Re: scan very slow in hbase To: user@hbase.apache.org i find the error is because i stop the running java code,i have not found why the scan is so slow On Tue, Aug 6, 2013 at 10:38 PM, Stack st...@duboce.net wrote: I would suggest you search the mail archives before posting first (you will usually get your answer faster if you go this route). The below has been answered in the recent past. See http://search-hadoop.com/m/5tk8QnhFqw Thanks, St.Ack On Tue, Aug 6, 2013 at 12:39 AM, ch huang justlo...@gmail.com wrote: my workmate tell me hbase is very slow for scan something ,i check the region server find the following information,anyone can help? 13/08/06 15:30:34 WARN ipc.HBaseServer: IPC Server listener on 60020: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) at sun.nio.ch.IOUtil.read(IOUtil.java:171) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) at org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1796) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1179) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 13/08/06 15:30:34 ERROR regionserver.HRegionServer: org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call next(1916648340315433886, 1), rpc version=1, client version=29, methodsFingerPrint=-1368823753 from 192.168.2.209:1150 after 8504 ms, since caller disconnected at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3856) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3776) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3768) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2513) at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320) at org.apache.hadoop.hbase.ipc.HBaseServer
Re: Region size per region on the table page
Hi, Bryan. If you file an issue for that, it would be nice to work on it. 2013/8/1 Bryan Beaudreault bbeaudrea...@hubspot.com Hannibal is very useful, but samar is right. It's another thing to install and maintain. I'd hope that over time the need for tools like hannibal would be lessened as some of the features make its way into the main install. Hannibal does its work through crawling log files, whereas some (or all) of the data it provides could be provided through the HBase api, and thus admin ui, in a less hacky way. If someone were willing to invest the time in adding such a metric to the hbase admin ui (and HBaseAdmin API please) it would bring us one step closer. On Thu, Aug 1, 2013 at 2:42 PM, samar.opensource samar.opensou...@gmail.com wrote: Hi Jean, You are right , hannibal does that, but it a seperate process we need to install/maintail. I thought if we had a quick and easy way to see it from master-status page. The stats are already on the regionserver page(like total size of the store) , just that it would make sense to have it on the table page too(IMO) to understand the data size distribution of regions of a particular table. Samar On 01/08/13 5:51 PM, Jean-Marc Spaggiari wrote: Hi Samar Hannibal is already doing what you are looking for. Cheers, JMS 2013/8/1 samar.opensource samar.opensou...@gmail.com Hi Devs/Users, Most of the time we want to know if our table split logic is accurate of if our current regions are well balanced for a table. I was wondering if we can expose the size of region on the table.jsp too on the table region table. If people thing it is useful I can pick it up. Also let me know if it already exists. Samar -- Marcos Ortiz Valmaseda Product Manager at PDVSA http://about.me/marcosortiz
Re: HBase on EMR disk space issue
Regards, Oussana. First advice: Update your HBase installation to a more recent version like 0.94.9 for example: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12324431 The problem with EMR is the version for HBase is very old, so, it would be nice instead EC2 for HBase with a recent version. See the official docummentation here: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hbase-launch.html See these tips too: http://aws-musings.com/7-tips-for-running-hbase-in-ec2/ HBase use-cases in the last HBaseCon 2013. http://www.orzota.com/hbase-use-cases/ http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/35001 My favorite is the HBase at Pinterest talk: http://www.hbasecon.com/sessions/apache-hbase-operations-at-pinterest/ http://www.slideshare.net/cloudera/operations-session-1 2013/7/22, Oussama Jilal jilal.ouss...@gmail.com: Hello, I need some help with my issue, We have been running HBase on Amazon EMR (v 0.92) for quite some months, today we had a serious problem, Hbase was throwing some connection refused errors in out client applications, after some investigation, we found out that one of the two EC2 instances running hbase (the slave/regionserver one) was disk full (800+ G), we can connect to both of the instances, but when we try to run a command from the HBase Shell (list, scan, disable, truncate ... anything) we only get errors like : 13/07/22 14:06:29 INFO httpclient.HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection refused 13/07/22 14:06:29 INFO httpclient.HttpMethodDirector: Retrying request We want to truncate some tables and gz compress some column families but we can't do any operation. Executing the command jps on the master returns : 6966 Jps 1968 JobTracker 2177 HQuorumPeer 1814 NameNode 4623 HMaster but on the region server it only return : 5986 Jps Rebooting didn't help ... Any one can help ? We are not very experienced with HBase, especially administration. Thank you. -- Marcos Ortiz Valmaseda Product Manager at PDVSA http://about.me/marcosortiz
Re: EC2 Elastic MapReduce HBase install recommendations
I think that Andrew talked about this some years ago and he created some scripts for that. You can find them here: https://github.com/apurtell/hbase-ec2 Then, you can review some links about this topic: http://blog.cloudera.com/blog/2012/10/set-up-a-hadoophbase-cluster-on-ec2-in-about-an-hour/ http://my.safaribooksonline.com/book/databases/storage-systems/9781849517140/1dot-setting-up-hbase-cluster/id286696951 http://whynosql.com/why-we-run-our-hbase-on-ec2/ You can read the HBase on EC2 demo from Andrew in the HBaseCon 2012: https://github.com/apurtell/ec2-demo 2013/5/7 Pal Konyves paul.kony...@gmail.com Hi, Has anyone got some recommendations about running HBase on EC2? I am testing it, and so far I am very disappointed with it. I did not change anything about the default 'Amazon distribution' installation. It has one MasterNode and two slave nodes, and write performance is around 2500 small rows per sec at most, but I expected it to be way better. Oh, and this is with batch put operations with autocommit turned off, where each batch containes about 500-1000 rows... When I do it with autocommit, it does not even reach the 1000 rows per sec. Every nodes were m1.Large ones. Any experiences, suggestions? Is it worth to try the RMap distribution instead of the amazon one? Thanks, Pal -- Marcos Ortiz Valmaseda Product Manager at PDVSA http://about.me/marcosortiz
Re: EC2 Elastic MapReduce HBase install recommendations
I think that you when you are talking about RMap, you are referring to MapR´s distribution. I think that MapR´s team released a very good version of its Hadoop distribution focused on HBase called M7. You can see its overview here: http://www.mapr.com/products/mapr-editions/m7-edition But this release was under beta testing, and I see that it´s not included in the Amazon Marketplace yet: https://aws.amazon.com/marketplace/seller-profile?id=802b0a25-877e-4b57-9007-a3fd284815a5 2013/5/7 Pal Konyves paul.kony...@gmail.com Hi, Has anyone got some recommendations about running HBase on EC2? I am testing it, and so far I am very disappointed with it. I did not change anything about the default 'Amazon distribution' installation. It has one MasterNode and two slave nodes, and write performance is around 2500 small rows per sec at most, but I expected it to be way better. Oh, and this is with batch put operations with autocommit turned off, where each batch containes about 500-1000 rows... When I do it with autocommit, it does not even reach the 1000 rows per sec. Every nodes were m1.Large ones. Any experiences, suggestions? Is it worth to try the RMap distribution instead of the amazon one? Thanks, Pal -- Marcos Ortiz Valmaseda Product Manager at PDVSA http://about.me/marcosortiz
Re: hbase + mapreduce
Here you have several examples: http://hbase.apache.org/book/mapreduce.example.html http://sujee.net/tech/articles/hadoop/hbase-map-reduce-freq-counter/ http://bigdataprocessing.wordpress.com/2012/07/27/hadoop-hbase-mapreduce-examples/ http://stackoverflow.com/questions/12215313/load-data-into-hbase-table-using-hbase-map-reduce-api 2013/4/21 Adrian Acosta Mitjans amitj...@estudiantes.uci.cu Hello: I'm working in a proyect, and i'm using hbase for storage the data, y have this method that work great but without the performance i'm looking for, so i want is to make the same but using mapreduce. public ArrayListMyObject findZ(String z) throws IOException { ArrayListMyObject rows = new ArrayListMyObject(); Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, test); Scan s = new Scan(); s.addColumn(Bytes.toBytes(x), Bytes.toBytes(y)); ResultScanner scanner = table.getScanner(s); try { for (Result rr : scanner) { if (Bytes.toString(rr.getValue(Bytes.toBytes(x), Bytes.toBytes(y))).equals(z)) { rows.add(getInformation(Bytes.toString(rr.getRow(; } } } finally { scanner.close(); } return archivos; } the getInformation method take all the columns and convert the row in MyObject type. I just want a example or a link to a tutorial that make something like this, i want to get a result type as answer and not a number to count words, like many a found. My natural language is spanish, so sorry if something is not well writing. Thanths http://www.uci.cu -- Marcos Ortiz Valmaseda, *Data-Driven Product Manager* at PDVSA *Blog*: http://dataddict.wordpress.com/ *LinkedIn: *http://www.linkedin.com/in/marcosluis2186 *Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186
Re: Inconsistent performance numbers with increased nodes
Just a question, Alex. Why are you using OpenJDK? The first recommendation for a Hadoop cluster is to use Java SDK from Oracle , because precisely with OpenJDK, there are some performance issues, which should be fixed in the next releases, but I encourage you to use Java 1.6. from Oracle. - Which is the replication factor in your cluster? (default: 3) - What is the value of your HDFS blocks? (default: 64 Mb, a good value is 128 Mb or 256 Mb depending of your cluster load) 2013/4/19 Alex O'Ree spyhunte...@gmail.com Marcos - Java version - 1.6 OpenJDK x64, latest version in the CentOS repo - JVM tuning configuration, I think that we just changed the max ram to close to 4GB - Hadoop JT, DN, NN configuration, 1 JT, 10/12 DN, 1 NN. No security, no ssl - Network topology, star - Network speed for the cluster, emulated 4G celluar - Hardware properties for all nodes in the cluster - 2 core, 2.2Ghz, 4GB ram - Which platform are you using for the benchmark? The benchmark was the basic word count sample app, using the wikipedia export as the data set. Here's the result set I'm looking at and i'm just giving bogus values to make the point 10 DN cluster, 10 minutes, consistently 12 DN cluster, 10m, 15m, 10m, 15m, 15m, 10m, 10m Basically, there the result set for the 12 DN cluster I expected to be consistent, however the data set isn't. Since there's a high correlation between the lowest values in the 12 DN data with the average values in the 10 DN cluster, I'm asserting that Hadoop may have just talked to 10 DNs instead of all 12. This is for a paper that I plan on publishing shortly containing emulated network conditions for a number of different network types. On Fri, Apr 19, 2013 at 3:26 PM, Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com wrote: Regards, Alex. We need more information to be able to get you a good answer: - Java version - JVM tuning configuration - Hadoop JT, DN, NN configuration - Network topology - Network speed for the cluster - Hardware properties for all nodes in the cluster Hadoop is an actual scalable system, where you can add more nodes and the performance should be better, but there are some configurations which can downgrade its performance. Another things is: Which platform are you using for the benchmark? There is an amazing platform developed by Jason Dai from Intel called Hibench, which is great for this kind of work.[1][2] With all this information, I think that we can help you to find the root causes behind the performance of the cluster. [1] https://github.com/intel-hadoop/HiBench [2] http://hadoopsummit.org/amsterdam-blog/meet-the-presenters-jason-dai-of-intel/ 2013/4/19 Alex O'Ree spyhunte...@gmail.com Hi I'm running a 10 data node cluster and was experimenting with adding additional nodes to it. I've done some performance bench marking with 10 nodes and have compared them to 12 nodes and I've found some rather interesting and inconsistent results. The behavior I'm seeing is that during some of the 12 node bench runs, I'm actually seeing two different performance levels, one set at a different level than 10 nodes, and another at exactly the performance of a 10 node cluster. I've eliminated any possibility of networking problems or problems related to a specific machine. Before switching to a 12 node cluster, the initial cluster was destroyed, rebuilt and the dataset was added in. This should have yielded an evenly balanced cluster (confirmed through the web app) So my question is, is this an expected behavior or is something else going on here that I'm not aware of. For reference, I'm using 1.0.8 on CentOS 6.3 x64 -- Marcos Ortiz Valmaseda, Data-Driven Product Manager at PDVSA Blog: http://dataddict.wordpress.com/ LinkedIn: http://www.linkedin.com/in/marcosluis2186 Twitter: @marcosluis2186 -- Marcos Ortiz Valmaseda, *Data-Driven Product Manager* at PDVSA *Blog*: http://dataddict.wordpress.com/ *LinkedIn: *http://www.linkedin.com/in/marcosluis2186 *Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186
Re: RefGuide schema design examples
Wow, great work, Doug. 2013/4/19 Doug Meil doug.m...@explorysmedical.com Hi folks, I reorganized the Schema Design case studies 2 weeks ago and consolidated them into here, plus added several cases common on the dist-list. http://hbase.apache.org/book.html#schema.casestudies Comments/suggestions welcome. Thanks! Doug Meil Chief Software Architect, Explorys doug.m...@explorysmedical.com -- Marcos Ortiz Valmaseda, *Data-Driven Product Manager* at PDVSA *Blog*: http://dataddict.wordpress.com/ *LinkedIn: *http://www.linkedin.com/in/marcosluis2186 *Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186
Re: should i use compression?
+1 for Ted´s advice. Using compression can save a lot of space in memory and disc, so it´s a good recommendation. 2013/4/3 Ted Yu yuzhih...@gmail.com You should use data block encoding (in 0.94.x releases only). It is helpful for reads. You can also enable compression. Cheers On Wed, Apr 3, 2013 at 6:42 AM, Prakash Kadel prakash.ka...@gmail.com wrote: Hello, I have a question. I have a table where i store data in the column qualifiers(the values itself are null). I just have 1 column family. The number of columns per row is variable (1~ few thousands) Currently i don't use compression or the data_block_encoding. Should i? I want to have faster reads. Please suggest. Sincerely, Prakash Kadel -- Marcos Ortiz Valmaseda, *Data-Driven Product Manager* at PDVSA *Blog*: http://dataddict.wordpress.com/ *LinkedIn: *http://www.linkedin.com/in/marcosluis2186 *Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186
Re: should i use compression?
Regards, Jean-Marc. The best resource that I found for this is a great blog post called Apache HBase I/O - HFile from Matteo Bertozzi in Cloudera´s blog. Here´s the link: http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/ 2013/4/3 Jean-Marc Spaggiari jean-m...@spaggiari.org Is there any documentation anywhere regarding the differences between PREFIX, DIFF and FAST_DIFF? 2013/4/3 prakash kadel prakash.ka...@gmail.com: thank you very much. i will try with snappy compression with data_block_encoding On Wed, Apr 3, 2013 at 11:21 PM, Kevin O'dell kevin.od...@cloudera.com wrote: Prakash, Yes, I would recommend Snappy Compression. On Wed, Apr 3, 2013 at 10:18 AM, Prakash Kadel prakash.ka...@gmail.com wrote: Thanks, is there any specific compression that is recommended of the use case i have? Since my values are all null will compression help? I am thinking of using prefix data_block_encoding.. Sincerely, Prakash Kadel On Apr 3, 2013, at 10:55 PM, Ted Yu wrote: You should use data block encoding (in 0.94.x releases only). It is helpful for reads. You can also enable compression. Cheers On Wed, Apr 3, 2013 at 6:42 AM, Prakash Kadel prakash.ka...@gmail.com wrote: Hello, I have a question. I have a table where i store data in the column qualifiers(the values itself are null). I just have 1 column family. The number of columns per row is variable (1~ few thousands) Currently i don't use compression or the data_block_encoding. Should i? I want to have faster reads. Please suggest. Sincerely, Prakash Kadel -- Kevin O'Dell Systems Engineer, Cloudera -- Marcos Ortiz Valmaseda, *Data-Driven Product Manager* at PDVSA *Blog*: http://dataddict.wordpress.com/ *LinkedIn: *http://www.linkedin.com/in/marcosluis2186 *Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186
Re: should i use compression?
You can read this JIra issue for this too: https://issues.apache.org/jira/browse/HBASE-4218 2013/4/3 Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com Regards, Jean-Marc. The best resource that I found for this is a great blog post called Apache HBase I/O - HFile from Matteo Bertozzi in Cloudera´s blog. Here´s the link: http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/ 2013/4/3 Jean-Marc Spaggiari jean-m...@spaggiari.org Is there any documentation anywhere regarding the differences between PREFIX, DIFF and FAST_DIFF? 2013/4/3 prakash kadel prakash.ka...@gmail.com: thank you very much. i will try with snappy compression with data_block_encoding On Wed, Apr 3, 2013 at 11:21 PM, Kevin O'dell kevin.od...@cloudera.com wrote: Prakash, Yes, I would recommend Snappy Compression. On Wed, Apr 3, 2013 at 10:18 AM, Prakash Kadel prakash.ka...@gmail.com wrote: Thanks, is there any specific compression that is recommended of the use case i have? Since my values are all null will compression help? I am thinking of using prefix data_block_encoding.. Sincerely, Prakash Kadel On Apr 3, 2013, at 10:55 PM, Ted Yu wrote: You should use data block encoding (in 0.94.x releases only). It is helpful for reads. You can also enable compression. Cheers On Wed, Apr 3, 2013 at 6:42 AM, Prakash Kadel prakash.ka...@gmail.com wrote: Hello, I have a question. I have a table where i store data in the column qualifiers(the values itself are null). I just have 1 column family. The number of columns per row is variable (1~ few thousands) Currently i don't use compression or the data_block_encoding. Should i? I want to have faster reads. Please suggest. Sincerely, Prakash Kadel -- Kevin O'Dell Systems Engineer, Cloudera -- Marcos Ortiz Valmaseda, *Data-Driven Product Manager* at PDVSA *Blog*: http://dataddict.wordpress.com/ *LinkedIn: *http://www.linkedin.com/in/marcosluis2186 *Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186 -- Marcos Ortiz Valmaseda, *Data-Driven Product Manager* at PDVSA *Blog*: http://dataddict.wordpress.com/ *LinkedIn: *http://www.linkedin.com/in/marcosluis2186 *Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186
Re: should i use compression?
Here´s the API documentation: *FAST_DIFF*: http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.html Encoder similar to DiffKeyDeltaEncoderhttp://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.html but supposedly faster. Compress using: - store size of common prefix - save column family once in the first KeyValue - use integer compression for key, value and prefix (7-bit encoding) - use bits to avoid duplication key length, value length and type if it same as previous - store in 3 bits length of prefix timestamp with previous KeyValue's timestamp - one bit which allow to omit value if it is the same Format: - 1 byte: flag - 1-5 bytes: key length (only if FLAG_SAME_KEY_LENGTH is not set in flag) - 1-5 bytes: value length (only if FLAG_SAME_VALUE_LENGTH is not set in flag) - 1-5 bytes: prefix length - ... bytes: rest of the row (if prefix length is small enough) - ... bytes: qualifier (or suffix depending on prefix length) - 1-8 bytes: timestamp suffix - 1 byte: type (only if FLAG_SAME_TYPE is not set in the flag) - ... bytes: value (only if FLAG_SAME_VALUE is not set in the flag) *DIFF*: http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.html Compress using: - store size of common prefix - save column family once, it is same within HFile - use integer compression for key, value and prefix (7-bit encoding) - use bits to avoid duplication key length, value length and type if it same as previous - store in 3 bits length of timestamp field - allow diff in timestamp instead of actual value Format: - 1 byte: flag - 1-5 bytes: key length (only if FLAG_SAME_KEY_LENGTH is not set in flag) - 1-5 bytes: value length (only if FLAG_SAME_VALUE_LENGTH is not set in flag) - 1-5 bytes: prefix length - ... bytes: rest of the row (if prefix length is small enough) - ... bytes: qualifier (or suffix depending on prefix length) - 1-8 bytes: timestamp or diff - 1 byte: type (only if FLAG_SAME_TYPE is not set in the flag) - ... bytes: value I was reading the FAQ´s and there is not anything related to this topic. It would be nice to include it in the documentation. Lars, What do you think? It would be nice if you could write a detailed blog post about this topic. 2013/4/3 Jean-Marc Spaggiari jean-m...@spaggiari.org I read the JIRA already but it was not clear to me. However Cloudera's link is very clear. Thanks for that. Any idea what's the difference between DIFF and FAST_DIFF? 2013/4/3 Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com: You can read this JIra issue for this too: https://issues.apache.org/jira/browse/HBASE-4218 2013/4/3 Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com Regards, Jean-Marc. The best resource that I found for this is a great blog post called Apache HBase I/O - HFile from Matteo Bertozzi in Cloudera´s blog. Here´s the link: http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/ 2013/4/3 Jean-Marc Spaggiari jean-m...@spaggiari.org Is there any documentation anywhere regarding the differences between PREFIX, DIFF and FAST_DIFF? 2013/4/3 prakash kadel prakash.ka...@gmail.com: thank you very much. i will try with snappy compression with data_block_encoding On Wed, Apr 3, 2013 at 11:21 PM, Kevin O'dell kevin.od...@cloudera.comwrote: Prakash, Yes, I would recommend Snappy Compression. On Wed, Apr 3, 2013 at 10:18 AM, Prakash Kadel prakash.ka...@gmail.com wrote: Thanks, is there any specific compression that is recommended of the use case i have? Since my values are all null will compression help? I am thinking of using prefix data_block_encoding.. Sincerely, Prakash Kadel On Apr 3, 2013, at 10:55 PM, Ted Yu wrote: You should use data block encoding (in 0.94.x releases only). It is helpful for reads. You can also enable compression. Cheers On Wed, Apr 3, 2013 at 6:42 AM, Prakash Kadel prakash.ka...@gmail.com wrote: Hello, I have a question. I have a table where i store data in the column qualifiers(the values itself are null). I just have 1 column family. The number of columns per row is variable (1~ few thousands) Currently i don't use compression or the data_block_encoding. Should i? I want to have faster reads. Please suggest. Sincerely, Prakash Kadel -- Kevin O'Dell Systems Engineer, Cloudera -- Marcos Ortiz Valmaseda, Data-Driven Product Manager at PDVSA Blog: http://dataddict.wordpress.com/ LinkedIn: http://www.linkedin.com/in/marcosluis2186 Twitter: @marcosluis2186 -- Marcos Ortiz Valmaseda, Data-Driven Product Manager at PDVSA Blog: http://dataddict.wordpress.com/ LinkedIn: http://www.linkedin.com
Re: coprocessor is timing out in 0.94
Regards, Saurabh. I see that you are using SingleColumnValueFilter. Look for these links: http://gbif.blogspot.com/2012/05/optimizing-hbase-mapreduce-scans-for.html http://mapredit.blogspot.com/2012/05/using-filters-in-hbase-to-match-two.html Take a look later to this link, about the working to improve scans: https://issues.apache.org/jira/browse/HBASE-5416 2013/3/28 Agarwal, Saurabh saurabh.agar...@citi.com Ted, Thanks for response. Here is the filter we are using - SingleColumnValueFilter(Bytes.toBytes(columnFamily), Bytes.toBytes(columnQualifier), CompareFilter.CompareOp.EQUAL, new RegexStringComparator((?i)+keyword)); The thread dump at different points show that coprocessor is getting called. Also logs showed it keep processing. But the speed is much slower compare to 0.92. Regards, Saurabh. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Thursday, March 28, 2013 6:57 PM To: user@hbase.apache.org Subject: Re: coprocessor is timing out in 0.94 bq. I checked thread dump If there was no exception in region server logs, thread dump of region server when your coprocessor was running would reveal where it got stuck. From your description below, looks like you can utilize HBASE-5416 Improve performance of scans with some kind of filters. bq. to apply the filter on one of the column Basically this column is the essential column. Cheers On Thu, Mar 28, 2013 at 3:22 PM, Ted Yu yuzhih...@gmail.com wrote: bq. when I removed the filter, it ran fine in 0.94 Can you disclose more information about your filter ? BTW 0.94.6 was just released which is fully compatible with 0.94.2 Cheers On Thu, Mar 28, 2013 at 3:18 PM, Agarwal, Saurabh saurabh.agar...@citi.com wrote: Hi, We are in process of migrating from 0.92.1 to 0.94.2. A coprocessor was running fine in 0.92. After migrating to 0.94, the client is timing out (java.net.SocketTimeoutException). We are using coprocessor to apply the filter on one of the column and return the columns that match with that filter criteria. I checked thread dump, region server, web UI, logs. There is no error or exception. One thing I noticed that when I removed the filter, it ran fine in 0.94 as well. Please advise if there is any specific setting we need to make in 0.94. Thanks, Saurabh. -- Marcos Ortiz Valmaseda, *Data-Driven Product Manager* at PDVSA *Blog*: http://dataddict.wordpress.com/ *LinkedIn: *http://www.linkedin.com/in/marcosluis2186 *Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186