HBase Shell available through Apache Zeppelin
I have contributed a feature to access HBase Shell to Apache Zeppelin. The main advantages are: - Admins have quick access to HBase shell through the browser. - Sessions are saved. So you can log in and the current state of triage/experiments is available. - On a similar note, standard recipes or sequence of commands can be saved for the future and run quickly during triage. The easiest method to get access is to install and run Apache Zeppelin on HBase master or any other machine where HBase shell works. In our experience, the HBase interpreter is not a resource hog. JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-651 Commit: https://github.com/apache/incubator-zeppelin/commit/1940388e3422b86a322fc82a0e7868ff25126804 Looking forward to feedback and suggestions for improvements. Rajat Venkatesh Engg. Lead Qubole
Add keys to column family in HBase using Python
Dear HBase experts, I have a Hadoop cluster which has Hive, HBase installed along with other Hadoop components. I am currently exploring ways to automate a data migration process from Hive to HBase which involves new columns of data added ever so often. I was successful in creating a HBase table using Hive and load data into the HBase table, on these lines I tried to add new columns to the HBase table(from Hive) using the alter table syntax and I got the error message, ALTER TABLE cannot be used for a non-native table temp_testing. As an alternative to this I am also trying to do this programmatically using Python, I have explored the libraries HappyBasehttps://happybase.readthedocs.org/en/latest/index.html and starbasehttp://pythonhosted.org//starbase/. These libraries provide functionality for creating, deleting and other features but none of these provide an option to add a key to a column family. Does anybody know of a better way of achieving this with Python, say libraries or through other means. Thanks in advance, Manoj The information transmitted in this email is intended only for the person or entity to which it is addressed, and may contain material confidential to Xoom Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. If you received this email in error, please contact the sender and delete the material from your files.
Re: mapreduce job failure
thanks J-D as always -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, May 17, 2011 8:04 pm Subject: Re: mapreduce job failure 400 regions a day is way too much, also in 0.20.6 there's a high risk of collision when you get near the 10 thousands of regions. But that's most probably not your current issue. That HDFS message 99% of the time means that the region server went into GC and when it came back the master already moved the regions away. Should be pretty obvious in the logs. As to why the tasks get killed, it's probably related. And since you are running such an old release you have data loss and if that happens on the .META. table then you lose metadata about the regions. To help with GC issues, I suggest you read the multi-part blog post from Todd: http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/ J-D On Mon, May 16, 2011 at 2:08 PM, Venkatesh vramanatha...@aol.com wrote: Thanks J-D Using hbase-0.20.6, 49 node cluster The map reduce job involve a full table scan...(region size 4 gig) The job runs great for 1 week.. Starts failing after 1 week of data accumulation (about 3000 regions) About 400 regions get created per day... Can you suggest any tunables at the HBase level. or HDFS level.? Also, I've one more issue..when region servers die..Errors below: (any suggestion here is helpfull as well) org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase_data_one_110425/.../compaction.dir/249610074/4534752250560182124 File does not exist. Holder DFSClient_-398073404 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1332) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1323) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1251) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Fri, May 13, 2011 12:39 am Subject: Re: mapreduce job failure All that means is that the task stayed in map() for 10 minutes, blocked on something. If you were scanning an hbase table, and didn't get a new row after 1 minute, then the scanner would expire. That's orthogonal tho. You need to figure what you're blocking on, add logging and try to jstack your Child processes for example. J-D On Thu, May 12, 2011 at 7:21 PM, Venkatesh vramanatha...@aol.com wrote: Hi Using hbase-0.20.6 mapreduce job started failing in the map phase (using hbase table as input for mapper)..(ran fine for a week or so starting with empty tables).. task tracker log: Task attempt_201105121141_0002_m_000452_0 failed to report status for 600 seconds. Killing Region server log: 2011-05-12 18:27:39,919 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -7857209327501974146 lease expired 2011-05-12 18:28:29,716 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:org.apache.hadoop.hbase.UnknownScannerException: Name: -7857209327501974146 at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1880) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) 2011-05-12 18:28:29,897 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020, call next(-78572093275019 74146, 1) from .:35202: error: org.apache.hadoop.hbase.UnknownScannerException: Name: -7857209327501974146 org.apache.hadoop.hbase.UnknownScannerException: Name: -7857209327501974146
Re: mapreduce job failure
Thanks J-D Using hbase-0.20.6, 49 node cluster The map reduce job involve a full table scan...(region size 4 gig) The job runs great for 1 week.. Starts failing after 1 week of data accumulation (about 3000 regions) About 400 regions get created per day... Can you suggest any tunables at the HBase level. or HDFS level.? Also, I've one more issue..when region servers die..Errors below: (any suggestion here is helpfull as well) org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase_data_one_110425/.../compaction.dir/249610074/4534752250560182124 File does not exist. Holder DFSClient_-398073404 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1332) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1323) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1251) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Fri, May 13, 2011 12:39 am Subject: Re: mapreduce job failure All that means is that the task stayed in map() for 10 minutes, blocked on something. If you were scanning an hbase table, and didn't get a new row after 1 minute, then the scanner would expire. That's orthogonal tho. You need to figure what you're blocking on, add logging and try to jstack your Child processes for example. J-D On Thu, May 12, 2011 at 7:21 PM, Venkatesh vramanatha...@aol.com wrote: Hi Using hbase-0.20.6 mapreduce job started failing in the map phase (using hbase table as input for mapper)..(ran fine for a week or so starting with empty tables).. task tracker log: Task attempt_201105121141_0002_m_000452_0 failed to report status for 600 seconds. Killing Region server log: 2011-05-12 18:27:39,919 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -7857209327501974146 lease expired 2011-05-12 18:28:29,716 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:org.apache.hadoop.hbase.UnknownScannerException: Name: -7857209327501974146 at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1880) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) 2011-05-12 18:28:29,897 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020, call next(-78572093275019 74146, 1) from .:35202: error: org.apache.hadoop.hbase.UnknownScannerException: Name: -7857209327501974146 org.apache.hadoop.hbase.UnknownScannerException: Name: -7857209327501974146 at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1880) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) I don't see any error in datanodes Appreciate any help thanks v
mapreduce job failure
Hi Using hbase-0.20.6 mapreduce job started failing in the map phase (using hbase table as input for mapper)..(ran fine for a week or so starting with empty tables).. task tracker log: Task attempt_201105121141_0002_m_000452_0 failed to report status for 600 seconds. Killing Region server log: 2011-05-12 18:27:39,919 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -7857209327501974146 lease expired 2011-05-12 18:28:29,716 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:org.apache.hadoop.hbase.UnknownScannerException: Name: -7857209327501974146 at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1880) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) 2011-05-12 18:28:29,897 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020, call next(-78572093275019 74146, 1) from .:35202: error: org.apache.hadoop.hbase.UnknownScannerException: Name: -7857209327501974146 org.apache.hadoop.hbase.UnknownScannerException: Name: -7857209327501974146 at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1880) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) I don't see any error in datanodes Appreciate any help thanks v
Re: java.lang.IndexOutOfBoundsException
Thanks..we have the same exact code that process 700 million puts per day in 0.20.6 from a tomcat servlet (each thread creates new HTable, does 1 put closes) ..in 0.90.2..we changed just the API whose signature changed (mainly HTable)..it crawls ..each/most requests taking well over 2 sec..we can't keep up with even 1/10 th of production load. everything in the cluster is identical 20 node cluster.. That is impressive performance..from async..Thanks for the tip..i'll give it a try (assuming it would work with 0.90.2) -Original Message- From: tsuna tsuna...@gmail.com To: user@hbase.apache.org Sent: Wed, Apr 20, 2011 4:30 pm Subject: Re: java.lang.IndexOutOfBoundsException On Wed, Apr 20, 2011 at 10:04 AM, Venkatesh vramanatha...@aol.com wrote: On 0.90.2, do you all think using HTablePool would help with performance problem? What performance problems are you seeing? BTW, if you want a thread-safe client that's highly scalable for high-throughput, multi-threaded applications, look at asynchbase: http://github.com/stumbleupon/asynchbase OpenTSDB uses it and I'm able to push 20 edits per second to 3 RegionServers. -- Benoit tsuna Sigoure Software Engineer @ www.StumbleUpon.com
Re: hbase 0.90.2 - incredibly slow response
Thanks St. Ack.. Sorry I had to roll back to 0.20.6..as our system is down way too long.. so..i don't have log rt now..i'll try to recreate in a different machine at a later time.. yes..700 mil puts per day cluster is 20 node (20 datanode+ region server) ..besides that 1 machine with HMaster 1 name node, 3 zoo keeper we do , new HTable, put, close in a multi threaded servlet (tomcat based) (HBase configuration object is constructed in init) same logic works great in 0.20.6.. In 0.90.2 all I changed was retrofit HTable constructor .. it crawls we rolled back to 0.20.6 it works great again..obviously some major logic change in 0.90.2 that requires perhaps different coding practice for client api..If you can shed some light that wld be helpfull My hbase config is pretty much default except region size (using 4gig) -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Wed, Apr 20, 2011 2:11 pm Subject: Re: hbase 0.90.2 - incredibly slow response Want to paste your configuration up in pastebin? Is that 700million puts a day? Remind us of your cluster size. Paste some of a regionserver log too. That can be informative. St.Ack On Wed, Apr 20, 2011 at 10:41 AM, Venkatesh vramanatha...@aol.com wrote: shell is no problems..ones/twos..i've tried mass puts from shell we cant handle our production load (even 1/3 of it) 700 mill per day is full load..same load we handled with absolutely no issues in 0.20.6.. there is several pause between batch of puts as wel -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Wed, Apr 20, 2011 1:30 pm Subject: Re: hbase 0.90.2 - incredibly slow response On Tue, Apr 19, 2011 at 11:58 AM, Venkatesh vramanatha...@aol.com wrote: I was hoping that too.. I don't have scripts to generate # requests from shell..I will try that.. Did you try it? Above you seem to say that a simple put of 100 bytes takes 2 seconds where in 0.20.6 it took 10 milliseconds. A put from shell of 100 bytes is easy enough to do. hbase put 'YOUR_TABLE', 'SOME_ROW', 'SOME_COLUMN', 'SOME_STRING_OF_100_BYTES' The shell will print out rough numbers on how long it takes to do the put (from ruby). I did n't pre-create regions in 0.20.6 it handled fine the same load.. I'll try performance in 0.90.2 by precreating regions.. Would sharing a single HBaseConfiguration object for all threads hurt performance? I'd doubt that this is the issue. It should help usually. St.Ack
Re: hbase 0.90.2 - incredibly slow response
Thanks St.Ack Yes..will try the upgrade in a smaller setup with ..production like load will investigate/compare -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Thu, Apr 21, 2011 11:47 pm Subject: Re: hbase 0.90.2 - incredibly slow response Sorry to hear you rolled back. I think its fair to say that going to 0.90.2 usually has things running faster and more efficiently. As to why your experience, I'm not sure what it could be though of course; it sounds likes something we've not come across before since we passed you all that we could think of. Whats your plan now? Are you going to try the upgrade again? You might research how your current install is running. Do what Jack Levin did this afternoon where he enabled rpc DEBUG for a while to get a sense of the type of requests and how long hbase is taking to process them (In his case he found that upping the handlers cured a slow scan issue). You could study the 0.20.6 response times and then when you upgrade to 0.90.2, check what its showing. That would at least give us a clue as to where to start digging. St.Ack On Thu, Apr 21, 2011 at 8:21 PM, Venkatesh vramanatha...@aol.com wrote: Thanks St. Ack.. Sorry I had to roll back to 0.20.6..as our system is down way too long.. so..i don't have log rt now..i'll try to recreate in a different machine at a later time.. yes..700 mil puts per day cluster is 20 node (20 datanode+ region server) ..besides that 1 machine with HMaster 1 name node, 3 zoo keeper we do , new HTable, put, close in a multi threaded servlet (tomcat based) (HBase configuration object is constructed in init) same logic works great in 0.20.6.. In 0.90.2 all I changed was retrofit HTable constructor .. it crawls we rolled back to 0.20.6 it works great again..obviously some major logic change in 0.90.2 that requires perhaps different coding practice for client api..If you can shed some light that wld be helpfull My hbase config is pretty much default except region size (using 4gig) -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Wed, Apr 20, 2011 2:11 pm Subject: Re: hbase 0.90.2 - incredibly slow response Want to paste your configuration up in pastebin? Is that 700million puts a day? Remind us of your cluster size. Paste some of a regionserver log too. That can be informative. St.Ack On Wed, Apr 20, 2011 at 10:41 AM, Venkatesh vramanatha...@aol.com wrote: shell is no problems..ones/twos..i've tried mass puts from shell we cant handle our production load (even 1/3 of it) 700 mill per day is full load..same load we handled with absolutely no issues in 0.20.6.. there is several pause between batch of puts as wel -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Wed, Apr 20, 2011 1:30 pm Subject: Re: hbase 0.90.2 - incredibly slow response On Tue, Apr 19, 2011 at 11:58 AM, Venkatesh vramanatha...@aol.com wrote: I was hoping that too.. I don't have scripts to generate # requests from shell..I will try that.. Did you try it? Above you seem to say that a simple put of 100 bytes takes 2 seconds where in 0.20.6 it took 10 milliseconds. A put from shell of 100 bytes is easy enough to do. hbase put 'YOUR_TABLE', 'SOME_ROW', 'SOME_COLUMN', 'SOME_STRING_OF_100_BYTES' The shell will print out rough numbers on how long it takes to do the put (from ruby). I did n't pre-create regions in 0.20.6 it handled fine the same load.. I'll try performance in 0.90.2 by precreating regions.. Would sharing a single HBaseConfiguration object for all threads hurt performance? I'd doubt that this is the issue. It should help usually. St.Ack
Re: java.lang.IndexOutOfBoundsException
Yeah you J-D both hit it.. I knew it's bad..I was trying anything everything to solve the incredibly long latency with hbase puts on 0.90.2.. I get ok/better response with batch put.. this was quick dirty way to accumulate puts by sharing same HTable instance Thanks for letting me know..this exception is due to sharing of HTable.. I've to go back to to 0.20.6 since our system is down too long..(starting with empty table) On 0.90.2, do you all think using HTablePool would help with performance problem? thx -Original Message- From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Sent: Wed, Apr 20, 2011 12:27 pm Subject: Re: java.lang.IndexOutOfBoundsException I think HConnectionManager can catch IndexOutOfBoundsException and translate into a more user-friendly message, informing user about thread-safety. On Wed, Apr 20, 2011 at 9:11 AM, Ted Yu yuzhih...@gmail.com wrote: I have seen this before. HTable isn't thread-safe. Please describe your usage. Thanks On Wed, Apr 20, 2011 at 6:03 AM, Venkatesh vramanatha...@aol.com wrote: Using hbase-0.90.2..(sigh..) Any tip? thanks java.lang.IndexOutOfBoundsException: Index: 4, Size: 3 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.remove(ArrayList.java:387) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1257) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:822) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:678) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:663)
Re: java.lang.IndexOutOfBoundsException
If I use default ..i can't share/pass my HBaseConfiguration object..atleast i don't see a constructor/setter.. that would go against previous suggestion -Original Message- From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Sent: Wed, Apr 20, 2011 1:08 pm Subject: Re: java.lang.IndexOutOfBoundsException When using HTablePool, try not to define maxSize yourself - use the default. On Wed, Apr 20, 2011 at 10:04 AM, Venkatesh vramanatha...@aol.com wrote: Yeah you J-D both hit it.. I knew it's bad..I was trying anything everything to solve the incredibly long latency with hbase puts on 0.90.2.. I get ok/better response with batch put.. this was quick dirty way to accumulate puts by sharing same HTable instance Thanks for letting me know..this exception is due to sharing of HTable.. I've to go back to to 0.20.6 since our system is down too long..(starting with empty table) On 0.90.2, do you all think using HTablePool would help with performance problem? thx -Original Message- From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Sent: Wed, Apr 20, 2011 12:27 pm Subject: Re: java.lang.IndexOutOfBoundsException I think HConnectionManager can catch IndexOutOfBoundsException and translate into a more user-friendly message, informing user about thread-safety. On Wed, Apr 20, 2011 at 9:11 AM, Ted Yu yuzhih...@gmail.com wrote: I have seen this before. HTable isn't thread-safe. Please describe your usage. Thanks On Wed, Apr 20, 2011 at 6:03 AM, Venkatesh vramanatha...@aol.com wrote: Using hbase-0.90.2..(sigh..) Any tip? thanks java.lang.IndexOutOfBoundsException: Index: 4, Size: 3 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.remove(ArrayList.java:387) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1257) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:822) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:678) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:663)
Re: java.lang.IndexOutOfBoundsException
sorry..yeah..that's dumb of me..clearly i'm not thinking anything..just frustrated with upgrade thx -Original Message- From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Sent: Wed, Apr 20, 2011 1:24 pm Subject: Re: java.lang.IndexOutOfBoundsException I meant specifying Integer.MAX_VALUE as maxSize along with config. On Wed, Apr 20, 2011 at 10:17 AM, Venkatesh vramanatha...@aol.com wrote: If I use default ..i can't share/pass my HBaseConfiguration object..atleast i don't see a constructor/setter.. that would go against previous suggestion -Original Message- From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Sent: Wed, Apr 20, 2011 1:08 pm Subject: Re: java.lang.IndexOutOfBoundsException When using HTablePool, try not to define maxSize yourself - use the default. On Wed, Apr 20, 2011 at 10:04 AM, Venkatesh vramanatha...@aol.com wrote: Yeah you J-D both hit it.. I knew it's bad..I was trying anything everything to solve the incredibly long latency with hbase puts on 0.90.2.. I get ok/better response with batch put.. this was quick dirty way to accumulate puts by sharing same HTable instance Thanks for letting me know..this exception is due to sharing of HTable.. I've to go back to to 0.20.6 since our system is down too long..(starting with empty table) On 0.90.2, do you all think using HTablePool would help with performance problem? thx -Original Message- From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Sent: Wed, Apr 20, 2011 12:27 pm Subject: Re: java.lang.IndexOutOfBoundsException I think HConnectionManager can catch IndexOutOfBoundsException and translate into a more user-friendly message, informing user about thread-safety. On Wed, Apr 20, 2011 at 9:11 AM, Ted Yu yuzhih...@gmail.com wrote: I have seen this before. HTable isn't thread-safe. Please describe your usage. Thanks On Wed, Apr 20, 2011 at 6:03 AM, Venkatesh vramanatha...@aol.com wrote: Using hbase-0.90.2..(sigh..) Any tip? thanks java.lang.IndexOutOfBoundsException: Index: 4, Size: 3 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.remove(ArrayList.java:387) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1257) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:822) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:678) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:663)
Re: hbase 0.90.2 - incredibly slow response
shell is no problems..ones/twos..i've tried mass puts from shell we cant handle our production load (even 1/3 of it) 700 mill per day is full load..same load we handled with absolutely no issues in 0.20.6.. there is several pause between batch of puts as wel -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Wed, Apr 20, 2011 1:30 pm Subject: Re: hbase 0.90.2 - incredibly slow response On Tue, Apr 19, 2011 at 11:58 AM, Venkatesh vramanatha...@aol.com wrote: I was hoping that too.. I don't have scripts to generate # requests from shell..I will try that.. Did you try it? Above you seem to say that a simple put of 100 bytes takes 2 seconds where in 0.20.6 it took 10 milliseconds. A put from shell of 100 bytes is easy enough to do. hbase put 'YOUR_TABLE', 'SOME_ROW', 'SOME_COLUMN', 'SOME_STRING_OF_100_BYTES' The shell will print out rough numbers on how long it takes to do the put (from ruby). I did n't pre-create regions in 0.20.6 it handled fine the same load.. I'll try performance in 0.90.2 by precreating regions.. Would sharing a single HBaseConfiguration object for all threads hurt performance? I'd doubt that this is the issue. It should help usually. St.Ack
hbase 0.90.2 - incredibly slow response
Just upgraded to 0.90.2 from 0.20.6..Doing a simple put to table ( 100 bytes per put).. Only code change was to retrofit the HTable API to work with 0.90.2 Initializing HBaseConfiguration in servlet.init()... reusing that config for HTable constructor doing put Performance is very slow 90% of requests are well over 2 sec..(With 0.20.6, 90% use to be 10 milli sec) I did run set_meta_memstore_size.rb as per the book.. Any help to debug is appreciated..I also see periodic pauses between hbase puts thanks v
Re: hbase 0.90.2 - incredibly slow response
I was hoping that too.. I don't have scripts to generate # requests from shell..I will try that.. I did n't pre-create regions in 0.20.6 it handled fine the same load.. I'll try performance in 0.90.2 by precreating regions.. Would sharing a single HBaseConfiguration object for all threads hurt performance? frustrating..thanks for your help -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Tue, Apr 19, 2011 1:40 pm Subject: Re: hbase 0.90.2 - incredibly slow response 0.90.2 should be faster. Running same query from shell, it gives you same lag? St.Ack On Tue, Apr 19, 2011 at 10:35 AM, Venkatesh vramanatha...@aol.com wrote: Just upgraded to 0.90.2 from 0.20.6..Doing a simple put to table ( 100 bytes per put).. Only code change was to retrofit the HTable API to work with 0.90.2 Initializing HBaseConfiguration in servlet.init()... reusing that config for HTable constructor doing put Performance is very slow 90% of requests are well over 2 sec..(With 0.20.6, 90% use to be 10 milli sec) I did run set_meta_memstore_size.rb as per the book.. Any help to debug is appreciated..I also see periodic pauses between hbase puts thanks v
Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
Thanks J-D I made sure to pass conf objects around in places where I was n't.. will give it a try -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, Apr 12, 2011 6:40 pm Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job Yes there are a few places like that. Also when you create new HTables, you should also close their connections (this is not done in htable.close). See HTable's javadoc which says: Instances of HTable passed the same Configuration instance will share connections to servers out on the cluster and to the zookeeper ensemble as well as caches of region locations. This is usually a *good* thing. This happens because they will all share the same underlying HConnection instance. See HConnectionManager for more on how this mechanism works. and it points to HCM which has more information: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html J-D On Tue, Apr 12, 2011 at 3:09 PM, Ruben Quintero rfq_...@yahoo.com wrote: I'm running into the same issue, but did some poking around and it seems that Zookeeper connections are being left open by an HBase internal. Basically, I'm running a mapreduce job within another program, and noticed in the logs that every time the job is run, a connection is open, but I never see it closed again. The connection is opened within the job.submit(). I looked closer and checked the jstack after running it for just under an hour, and sure enough there are a ton of Zookeeper threads just sitting there. Here's a pastebin link: http://pastebin.com/MccEuvrc I'm running 0.90.0 right now. - Ruben From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, April 12, 2011 4:23:05 PM Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job It's more in the vain of https://issues.apache.org/jira/browse/HBASE-3755 and https://issues.apache.org/jira/browse/HBASE-3771 Basically 0.90 has a regression regarding the handling of zookeeper connections that make it that you have to be super careful not to have more than 30 per machine (each new Configuration is one new ZK connection). Upping your zookeeper max connection config should get rid of your issues since you only get it occasionally. J-D On Tue, Apr 12, 2011 at 7:59 AM, Venkatesh vramanatha...@aol.com wrote: I get this occasionally..(not all the time)..Upgrading from 0.20.6 to 0.90.2 Is this issue same as this JIRA https://issues.apache.org/jira/browse/HBASE-3578 I'm using HBaseConfiguration.create() setting that in job thx v 2011-04-12 02:13:06,870 ERROR Timer-0 org.apache.hadoop.hbase.mapreduce.TableInputFormat - org.apache.hadoop.hbase.ZooKeeperConnectionException: org.apache.hadoop.hbase.ZooKeeperConnectionException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbaseat org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1000) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:303) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:294) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:167) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:145) at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:91) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:882) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:448)
Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
Reuben: Yes..I've the exact same issue now.. I'm also kicking off from another jvm that runs for ever.. I don't have an alternate solution..either modify hbase code (or) modify my code to kick off as a standalone jvm (or) hopefully 0.90.3 release soon :) J-D/St.Ack may have some suggestions V -Original Message- From: Ruben Quintero rfq_...@yahoo.com To: user@hbase.apache.org Sent: Wed, Apr 13, 2011 2:39 pm Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job The problem I'm having is in getting the conf that is used to init the table within TableInputFormat. That's the one that's leaving open ZK connections for me. Following the code through, TableInputFormat initializes its HTable with new Configuration(new JobConf(conf)), where conf is the config I pass in via job initiation. I don't see a way of getting the initalized TableInputFormat in order to then get its table and its config to be able to properly close that connection. Cloned configs don't appear to produce similar hashes, either. The only other option I'm left with is closing all connections, but that disrupts things across the board. For MapReduce jobs run in their own JVM, this wouldn't be much of an issue, as the connection would just be closed on completion, but in my case (our code triggers the jobs internally), they simply pile up until the ConnectionLoss hits due to too many ZK connections. Am I missing a way to get that buried table's config, or another way to kill the orphaned connections? - Ruben From: Venkatesh vramanatha...@aol.com To: user@hbase.apache.org Sent: Wed, April 13, 2011 10:20:50 AM Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job Thanks J-D I made sure to pass conf objects around in places where I was n't.. will give it a try -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, Apr 12, 2011 6:40 pm Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job Yes there are a few places like that. Also when you create new HTables, you should also close their connections (this is not done in htable.close). See HTable's javadoc which says: Instances of HTable passed the same Configuration instance will share connections to servers out on the cluster and to the zookeeper ensemble as well as caches of region locations. This is usually a *good* thing. This happens because they will all share the same underlying HConnection instance. See HConnectionManager for more on how this mechanism works. and it points to HCM which has more information: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html J-D On Tue, Apr 12, 2011 at 3:09 PM, Ruben Quintero rfq_...@yahoo.com wrote: I'm running into the same issue, but did some poking around and it seems that Zookeeper connections are being left open by an HBase internal. Basically, I'm running a mapreduce job within another program, and noticed in the logs that every time the job is run, a connection is open, but I never see it closed again. The connection is opened within the job.submit(). I looked closer and checked the jstack after running it for just under an hour, and sure enough there are a ton of Zookeeper threads just sitting there. Here's a pastebin link: http://pastebin.com/MccEuvrc I'm running 0.90.0 right now. - Ruben From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, April 12, 2011 4:23:05 PM Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job It's more in the vain of https://issues.apache.org/jira/browse/HBASE-3755 and https://issues.apache.org/jira/browse/HBASE-3771 Basically 0.90 has a regression regarding the handling of zookeeper connections that make it that you have to be super careful not to have more than 30 per machine (each new Configuration is one new ZK connection). Upping your zookeeper max connection config should get rid of your issues since you only get it occasionally. J-D On Tue, Apr 12, 2011 at 7:59 AM, Venkatesh vramanatha...@aol.com wrote: I get this occasionally..(not all the time)..Upgrading from 0.20.6 to 0.90.2 Is this issue same as this JIRA https://issues.apache.org/jira/browse/HBASE-3578 I'm using HBaseConfiguration.create() setting that in job thx v 2011-04-12 02:13:06,870 ERROR Timer-0 org.apache.hadoop.hbase.mapreduce.TableInputFormat - org.apache.hadoop.hbase.ZooKeeperConnectionException: org.apache.hadoop.hbase.ZooKeeperConnectionException: org.apache.zookeeper.KeeperException
Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
Will do..I'll set it to 2000 as per JIRA.. Do we need a periodic bounce? ..because if this error comes up..only way I get the mapreduce to work is bounce. -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Wed, Apr 13, 2011 3:22 pm Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job Like I said, it's a zookeeper configuration that you can change. If hbase is managing your zookeeper then set hbase.zookeeper.property.maxClientCnxns to something higher than 30 and restart the zk server (can be done while hbase is running). J-D On Wed, Apr 13, 2011 at 12:04 PM, Venkatesh vramanatha...@aol.com wrote: Reuben: Yes..I've the exact same issue now.. I'm also kicking off from another jvm that runs for ever.. I don't have an alternate solution..either modify hbase code (or) modify my code to kick off as a standalone jvm (or) hopefully 0.90.3 release soon :) J-D/St.Ack may have some suggestions V -Original Message- From: Ruben Quintero rfq_...@yahoo.com To: user@hbase.apache.org Sent: Wed, Apr 13, 2011 2:39 pm Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job The problem I'm having is in getting the conf that is used to init the table within TableInputFormat. That's the one that's leaving open ZK connections for me. Following the code through, TableInputFormat initializes its HTable with new Configuration(new JobConf(conf)), where conf is the config I pass in via job initiation. I don't see a way of getting the initalized TableInputFormat in order to then get its table and its config to be able to properly close that connection. Cloned configs don't appear to produce similar hashes, either. The only other option I'm left with is closing all connections, but that disrupts things across the board. For MapReduce jobs run in their own JVM, this wouldn't be much of an issue, as the connection would just be closed on completion, but in my case (our code triggers the jobs internally), they simply pile up until the ConnectionLoss hits due to too many ZK connections. Am I missing a way to get that buried table's config, or another way to kill the orphaned connections? - Ruben From: Venkatesh vramanatha...@aol.com To: user@hbase.apache.org Sent: Wed, April 13, 2011 10:20:50 AM Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job Thanks J-D I made sure to pass conf objects around in places where I was n't.. will give it a try -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, Apr 12, 2011 6:40 pm Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job Yes there are a few places like that. Also when you create new HTables, you should also close their connections (this is not done in htable.close). See HTable's javadoc which says: Instances of HTable passed the same Configuration instance will share connections to servers out on the cluster and to the zookeeper ensemble as well as caches of region locations. This is usually a *good* thing. This happens because they will all share the same underlying HConnection instance. See HConnectionManager for more on how this mechanism works. and it points to HCM which has more information: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html J-D On Tue, Apr 12, 2011 at 3:09 PM, Ruben Quintero rfq_...@yahoo.com wrote: I'm running into the same issue, but did some poking around and it seems that Zookeeper connections are being left open by an HBase internal. Basically, I'm running a mapreduce job within another program, and noticed in the logs that every time the job is run, a connection is open, but I never see it closed again. The connection is opened within the job.submit(). I looked closer and checked the jstack after running it for just under an hour, and sure enough there are a ton of Zookeeper threads just sitting there. Here's a pastebin link: http://pastebin.com/MccEuvrc I'm running 0.90.0 right now. - Ruben From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, April 12, 2011 4:23:05 PM Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job It's more in the vain
Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
deleteAllConnections works well for my case..I can live with this but not with connection leaks thanks for the idea Venkatesh -Original Message- From: Ruben Quintero rfq_...@yahoo.com To: user@hbase.apache.org Sent: Wed, Apr 13, 2011 4:25 pm Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job Venkatesh, I guess the two quick and dirty solutions are: - Call deleteAllConnections(bool) at the end of your MapReduce jobs, or periodically. If you have no other tables or pools, etc. open, then no problem. If you do, they'll start throwing IOExceptions, but you can re-instantiate them with a new config and then continue as usual. (You do have to change the config or it'll simply grab the closed, cached one from the HCM). - As J-D said, subclasss TIF and basically copy the old setConf, except don't clone the conf that gets sent to the table. Each one has a downside and are definitely not ideal, but if you either don't modify the config in your job or don't have any other important hbase connections, then you can use the appropriate one. Thanks for the assistance, J-D. It's great that these forums are active and helpful. - Ruben From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Wed, April 13, 2011 3:50:42 PM Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job Yeah for a JVM running forever it won't work. If you know for a fact that the configuration passed to TIF won't be changed then you can subclass it and override setConf to not clone the conf. J-D On Wed, Apr 13, 2011 at 12:45 PM, Ruben Quintero rfq_...@yahoo.com wrote: The problem is the connections are never closed... so they just keep piling up until it hits the max. My max is at 400 right now, so after 14-15 hours of running, it gets stuck in an endless connection retry. I saw that the HConnectionManager will kick older HConnections out, but the problem is that their ZooKeeper threads continue on. Those need to be explicitly closed. Again, this is only an issue inside JVMs set to run forever, like Venkatesh said, because that's when the orphaned ZK connections will have a chance to build up to whatever your maximum is. Setting that higher and higher is just prolonging uptime before the eventual crash. It's essentially a memory (connection) leak within TableInputFormat, since there is no way that I can see to properly access and close those spawned connections. One question for you, JD: Inside of TableInputFormat.setConf, does the Configuration need to be cloned? (i.e. setHTable(new HTable(new Configuration(conf), tableName)); ). I'm guessing this is to prevent changes within the job from affecting the table and vice-versa...but if it weren't cloned, then you could use the job configuration (job.getConfiguration()) to close the connection Other quick fixes that I can think of, none of which are very pretty: 1 - Just call deleteAllConnections(bool), and have any other processes using HConnections recover from that. 2 - Make the static HBASE_INSTANCES map accessible (public) then you could iterate through open connections and try to match configs Venkatesh - unless you have other processes in your JVM accessing HBase (I have one), #1 might be the best bet. - Ruben From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Wed, April 13, 2011 3:22:48 PM Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job Like I said, it's a zookeeper configuration that you can change. If hbase is managing your zookeeper then set hbase.zookeeper.property.maxClientCnxns to something higher than 30 and restart the zk server (can be done while hbase is running). J-D On Wed, Apr 13, 2011 at 12:04 PM, Venkatesh vramanatha...@aol.com wrote: Reuben: Yes..I've the exact same issue now.. I'm also kicking off from another jvm that runs for ever.. I don't have an alternate solution..either modify hbase code (or) modify my code to kick off as a standalone jvm (or) hopefully 0.90.3 release soon :) J-D/St.Ack may have some suggestions V -Original Message- From: Ruben Quintero rfq_...@yahoo.com To: user@hbase.apache.org Sent: Wed, Apr 13, 2011 2:39 pm Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job The problem I'm having is in getting the conf that is used to init the table within TableInputFormat. That's the one that's leaving open ZK connections for me. Following the code through, TableInputFormat initializes its HTable with new Configuration(new JobConf(conf)), where conf is the config I pass in via job initiation. I don't see a way of getting the initalized TableInputFormat in order to then get its table
hbase -0.90.x upgrade - zookeeper exception in mapreduce job
I get this occasionally..(not all the time)..Upgrading from 0.20.6 to 0.90.2 Is this issue same as this JIRA https://issues.apache.org/jira/browse/HBASE-3578 I'm using HBaseConfiguration.create() setting that in job thx v 2011-04-12 02:13:06,870 ERROR Timer-0 org.apache.hadoop.hbase.mapreduce.TableInputFormat - org.apache.hadoop.hbase.ZooKeeperConnectionException: org.apache.hadoop.hbase.ZooKeeperConnectionException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbaseat org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1000) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:303) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:294) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:167) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:145) at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:91) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:882) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:448)
Re: zookeper warning with 0.90.1 hbase
Thanks St.Ack Yes..I see these when map-reduce job is complete..but not always..I'll ignore thanks..Getting close to 0.90.1 upgrade -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Cc: Venkatesh vramanatha...@aol.com Sent: Thu, Apr 7, 2011 11:55 pm Subject: Re: zookeper warning with 0.90.1 hbase They happen on the end of a map task or on shutdown? If so, yes, ignore (or, if you want to have nice clean shutdown, figure how Session 0x0 was set up -- was it you -- and call appropriate close in time). St.Ack On Thu, Apr 7, 2011 at 6:33 PM, Venkatesh vramanatha...@aol.com wrote: I see lot of these warnings..everything seems to be working otherwise..Is this something that can be ignored? 2011-04-07 21:29:15,032 WARN Timer-0-SendThread(..:2181) org.apache.zookeeper.ClientCnxn - Session 0x0 for server :2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:200) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:858) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130) 2011-04-07 21:29:15,032 DEBUG Timer-0-SendThread(..:2181) org.apache.zookeeper.ClientCnxn - Ignoring exception during shutdown input java.net.SocketException: Transport endpoint is not connected at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1205) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:11
zookeper warning with 0.90.1 hbase
I see lot of these warnings..everything seems to be working otherwise..Is this something that can be ignored? 2011-04-07 21:29:15,032 WARN Timer-0-SendThread(..:2181) org.apache.zookeeper.ClientCnxn - Session 0x0 for server :2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:200) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:858) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130) 2011-04-07 21:29:15,032 DEBUG Timer-0-SendThread(..:2181) org.apache.zookeeper.ClientCnxn - Ignoring exception during shutdown input java.net.SocketException: Transport endpoint is not connected at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1205) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:11
Re: row_counter map reduce job 0.90.1
Sorry about this..It was indeed an environment issue..my core-site.xml was pointing to wrong hadoop thanks for the tips -Original Message- From: Venkatesh vramanatha...@aol.com To: user@hbase.apache.org Sent: Fri, Apr 1, 2011 4:51 pm Subject: Re: row_counter map reduce job 0.90.1 Yeah.. I tried that as well as what Ted suggested..It can't find hadoop jar Hadoop map reduce jobs works fine ..it's just hbase map reduce jobs fails with this error tx -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Fri, Apr 1, 2011 12:39 pm Subject: Re: row_counter map reduce job 0.90.1 Does where you are running from have a build/classes dir and a hadoop-0.20.2-core.jar at top level? If so, try cleaning out the build/classes. Also, could try something like this: HADOOP_CLASSPATH=/home/stack/hbase-0.90.2-SNAPSHOT/hbase-0.90.2-SNAPSHOT-tests.jar:/home/stack/hbase-0.90.2-SNAPSHOT/hbase-0.90.2-SNAPSHOT.jar:`/home/stack/hbase-0.90.2-SNAPSHOT/bin/hbase classpath` ./bin/hadoop jar /home/stack/hbase-0.90.2-SNAPSHOT/hbase-0.90.2-SNAPSHOT.jar rowcounter usertable ... only make sure the hadoop jar is in HADOOP_CLASSPATH. But you shouldn't have to do the latter at least. Compare where it works to where it doesn't. Something is different. St.Ack On Fri, Apr 1, 2011 at 9:26 AM, Venkatesh vramanatha...@aol.com wrote: Definitely yes..It'all referenced in -classpath option of jvm of tasktracker/jobtracker/datanode/namenode.. file does exist in the cluster.. But the error I get is on the client File /home/hdfs/tmp/mapred/system/job_201103311630_0027/libjars/hadoop-0.20.2-core.jar does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:509) at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:629) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) So, in theory in should n't expect from client ..correct? This is the only that is stopping me in moving to 0.90.1 -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Fri, Apr 1, 2011 12:19 pm Subject: Re: row_counter map reduce job 0.90.1 On Fri, Apr 1, 2011 at 9:06 AM, Venkatesh vramanatha...@aol.com wrote: I'm able to run this job from the hadoop machine (where job task tracker also runs) /hadoop jar /home/maryama/hbase-0.90.1/hbase-0.90.1.jar rowcounter usertable But, I'm not able to run the same job from a) hbase client machine (full hbase hadoop installed) b) hbase server machines (ditto) Get File /home/.../hdfs/tmp/mapred/system/job_201103311630_0024/libjars/hadoop-0.20.2-core.jar does not exist. Is that jar present on the cluster? St.Ack
Re: HBase wiki updated
A big thankyou from a hbase user (sorry for the spam..but deserves thanks) -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Sat, Apr 2, 2011 3:51 pm Subject: Re: HBase wiki updated 2 Internets for you Doug, that's awesome! Thx J-D On Apr 2, 2011 11:59 AM, Doug Meil doug.m...@explorysmedical.com wrote: Hi there everybody- Just thought I'd let everybody know about this... Stack and I have been working on updating the HBase book and porting portions of the very-out-date HBase wiki to the HBase book. These two pages... http://wiki.apache.org/hadoop/Hbase/DesignOverview http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture ... now just have a 1-liner and a link about looking in the HBase book ( http://hbase.apache.org/book.html). Doug The Documenter Meil
row_counter map reduce job 0.90.1
I'm able to run this job from the hadoop machine (where job task tracker also runs) /hadoop jar /home/maryama/hbase-0.90.1/hbase-0.90.1.jar rowcounter usertable But, I'm not able to run the same job from a) hbase client machine (full hbase hadoop installed) b) hbase server machines (ditto) Get File /home/.../hdfs/tmp/mapred/system/job_201103311630_0024/libjars/hadoop-0.20.2-core.jar does not exist. Any idea how this jar file get packaged/where is it looking for? thanks v
Re: row_counter map reduce job 0.90.1
Yeah.. I tried that as well as what Ted suggested..It can't find hadoop jar Hadoop map reduce jobs works fine ..it's just hbase map reduce jobs fails with this error tx -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Fri, Apr 1, 2011 12:39 pm Subject: Re: row_counter map reduce job 0.90.1 Does where you are running from have a build/classes dir and a hadoop-0.20.2-core.jar at top level? If so, try cleaning out the build/classes. Also, could try something like this: HADOOP_CLASSPATH=/home/stack/hbase-0.90.2-SNAPSHOT/hbase-0.90.2-SNAPSHOT-tests.jar:/home/stack/hbase-0.90.2-SNAPSHOT/hbase-0.90.2-SNAPSHOT.jar:`/home/stack/hbase-0.90.2-SNAPSHOT/bin/hbase classpath` ./bin/hadoop jar /home/stack/hbase-0.90.2-SNAPSHOT/hbase-0.90.2-SNAPSHOT.jar rowcounter usertable ... only make sure the hadoop jar is in HADOOP_CLASSPATH. But you shouldn't have to do the latter at least. Compare where it works to where it doesn't. Something is different. St.Ack On Fri, Apr 1, 2011 at 9:26 AM, Venkatesh vramanatha...@aol.com wrote: Definitely yes..It'all referenced in -classpath option of jvm of tasktracker/jobtracker/datanode/namenode.. file does exist in the cluster.. But the error I get is on the client File /home/hdfs/tmp/mapred/system/job_201103311630_0027/libjars/hadoop-0.20.2-core.jar does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:509) at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:629) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) So, in theory in should n't expect from client ..correct? This is the only that is stopping me in moving to 0.90.1 -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Fri, Apr 1, 2011 12:19 pm Subject: Re: row_counter map reduce job 0.90.1 On Fri, Apr 1, 2011 at 9:06 AM, Venkatesh vramanatha...@aol.com wrote: I'm able to run this job from the hadoop machine (where job task tracker also runs) /hadoop jar /home/maryama/hbase-0.90.1/hbase-0.90.1.jar rowcounter usertable But, I'm not able to run the same job from a) hbase client machine (full hbase hadoop installed) b) hbase server machines (ditto) Get File /home/.../hdfs/tmp/mapred/system/job_201103311630_0024/libjars/hadoop-0.20.2-core.jar does not exist. Is that jar present on the cluster? St.Ack
Re: hole in META
Yeah...excise_regions seem to work but plug_hole does n't plug the hole..thinks the region still exists in META May be the issue is with excise_regions.. does n't cleanly remove it.. I also tried /hbase org.apache.hadoop.hbase.util.Merge tbl_name region region That does n't work for me in 0.20.6.. What are the region parameters? I tried encoded nam it did n't like..I tried name of the form tbl_name,st_key,, That did n't work either.. thanks -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Cc: Venkatesh vramanatha...@aol.com Sent: Thu, Mar 31, 2011 1:36 am Subject: Re: hole in META Be careful with those Venkatesh. I've not looked at them in a while. They may work for you since you are 0.20.x but please read them carefully first before running and make sure they make sense for your context. St.Ack On Wed, Mar 30, 2011 at 6:46 PM, Venkatesh vramanatha...@aol.com wrote: St.Ack I came across your script https://github.com/saintstack/hbase_bin_scripts/blob/master/README which I find it very very usefull...I'm done running one at a time..500 overlaps..(by check which one to remove from hdfs)..Still 500 or so to go..Slow but works.. Lucas- I did n't run yours since your code depends on 0.90.x lib ..did n't want to risk running on 0.20.6 thanks v -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Wed, Mar 30, 2011 12:20 pm Subject: Re: hole in META Can you run a rowcount against this table or does it not complete? St.Ack On Wed, Mar 30, 2011 at 4:13 AM, Venkatesh vramanatha...@aol.com wrote: Yes..st.ack..overlapping.. one of them has no data.. there are too many of them about 800 or so.. there are some with holes too.. -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Wed, Mar 30, 2011 1:38 am Subject: Re: hole in META What is that? Overlapping regions? Can you try merging them with merge tool? Else, study whats in hdfs. One may have nothing in it (check sizes). It might just be reference files only. If so, lets go from there. And I describe how to merge. St.Ack On Tue, Mar 29, 2011 at 9:25 PM, Venkatesh vramanatha...@aol.com wrote: I've regions like this... add_table.rb is unable to fix this... Is there anything else I could do to fix holes? startkey end-key yv018381 yv018381 yv018381 . yv018381 . -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Tue, Mar 29, 2011 12:55 pm Subject: Re: hole in META On Tue, Mar 29, 2011 at 9:09 AM, Venkatesh vramanatha...@aol.com wrote: I ran into missing jar with hadoop jar file when running a map reduce..which i could n't fix it..That is the only known issue with upgrade If I can fix that, i'll upgrade Tell us more. Whats the complaint? Missing Guava? Commons-logging? BTW, is it better to fix existing holes using add_table.rb before the upgrade? (or) upgrade takes care missing holes? Yes. Make sure all is wholesome before upgrade. Are you able to do this? Good stuff V, St.Ack = =
Re: hole in META
Yes..st.ack..overlapping.. one of them has no data.. there are too many of them about 800 or so.. there are some with holes too.. -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Wed, Mar 30, 2011 1:38 am Subject: Re: hole in META What is that? Overlapping regions? Can you try merging them with merge tool? Else, study whats in hdfs. One may have nothing in it (check sizes). It might just be reference files only. If so, lets go from there. And I describe how to merge. St.Ack On Tue, Mar 29, 2011 at 9:25 PM, Venkatesh vramanatha...@aol.com wrote: I've regions like this... add_table.rb is unable to fix this... Is there anything else I could do to fix holes? startkey end-key yv018381 yv018381 yv018381 . yv018381 . -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Tue, Mar 29, 2011 12:55 pm Subject: Re: hole in META On Tue, Mar 29, 2011 at 9:09 AM, Venkatesh vramanatha...@aol.com wrote: I ran into missing jar with hadoop jar file when running a map reduce..which i could n't fix it..That is the only known issue with upgrade If I can fix that, i'll upgrade Tell us more. Whats the complaint? Missing Guava? Commons-logging? BTW, is it better to fix existing holes using add_table.rb before the upgrade? (or) upgrade takes care missing holes? Yes. Make sure all is wholesome before upgrade. Are you able to do this? Good stuff V, St.Ack
Re: hole in META
Thanks Lucas..I'll give it a try -Original Message- From: Lukas mr.bobu...@gmail.com To: user@hbase.apache.org Sent: Wed, Mar 30, 2011 4:19 am Subject: Re: hole in META Sorry for any inconvenience. This was in reply of http://mail-archives.apache.org/mod_mbox/hbase-user/201103.mbox/%3c8cdbca99c33-1c78-9...@webmail-m083.sysops.aol.com%3e On Wed, Mar 30, 2011 at 10:13 AM, Lukas mr.bobu...@gmail.com wrote: Hi there, It seems, that I had the same problem. AFAIK fix_table and hbck currently won't be able to fix this, so I wrote myself two small tools. A first one detects such loops in the meta table: https://gist.github.com/894031#file_h_base_region_loops.java If you specify '--fix', the loopy/duplicated regions are moved to a directory you specify and the meta is updated. The second one (https://gist.github.com/894031#file_add_records_from_region.java) takes one of a moved region as input and adds its content to the specified table, if there isn't already an entry with the same family:qualifier (this fitted my needs, as I only have one entry per family:qualifier). DISCLAIMER: Those tools were programmed rather quickly, so please make sure, that they serve your needs. If you have fixed your table, I would migrate as quickly as possible to 0.90.x! Best, Lukas
Re: hole in META
St.Ack I came across your script https://github.com/saintstack/hbase_bin_scripts/blob/master/README which I find it very very usefull...I'm done running one at a time..500 overlaps..(by check which one to remove from hdfs)..Still 500 or so to go..Slow but works.. Lucas- I did n't run yours since your code depends on 0.90.x lib ..did n't want to risk running on 0.20.6 thanks v -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Wed, Mar 30, 2011 12:20 pm Subject: Re: hole in META Can you run a rowcount against this table or does it not complete? St.Ack On Wed, Mar 30, 2011 at 4:13 AM, Venkatesh vramanatha...@aol.com wrote: Yes..st.ack..overlapping.. one of them has no data.. there are too many of them about 800 or so.. there are some with holes too.. -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Wed, Mar 30, 2011 1:38 am Subject: Re: hole in META What is that? Overlapping regions? Can you try merging them with merge tool? Else, study whats in hdfs. One may have nothing in it (check sizes). It might just be reference files only. If so, lets go from there. And I describe how to merge. St.Ack On Tue, Mar 29, 2011 at 9:25 PM, Venkatesh vramanatha...@aol.com wrote: I've regions like this... add_table.rb is unable to fix this... Is there anything else I could do to fix holes? startkey end-key yv018381 yv018381 yv018381 . yv018381 . -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Tue, Mar 29, 2011 12:55 pm Subject: Re: hole in META On Tue, Mar 29, 2011 at 9:09 AM, Venkatesh vramanatha...@aol.com wrote: I ran into missing jar with hadoop jar file when running a map reduce..which i could n't fix it..That is the only known issue with upgrade If I can fix that, i'll upgrade Tell us more. Whats the complaint? Missing Guava? Commons-logging? BTW, is it better to fix existing holes using add_table.rb before the upgrade? (or) upgrade takes care missing holes? Yes. Make sure all is wholesome before upgrade. Are you able to do this? Good stuff V, St.Ack =
hole in META
Hi Using hbase-0.20.6..This has happened quite often..Is this a known issue in 0.20.6 that we would n't see in 0.90.1 (or) see less of? ..Attempt to fix/avoid this earlier times by truncating table, running add_table.rb before What is the best way to fix this in 0.20.6? Now it's there in more tables that I cannot afford to lose the data Running add_table.rb increases # of regions (which we already are way over the limit 25K+) thanks v
Re: hole in META
Thanks. St.Ack Yeah...I'm eager to upgrade I had to make one small change to HBase client API to use the new version.. I ran into missing jar with hadoop jar file when running a map reduce..which i could n't fix it..That is the only known issue with upgrade If I can fix that, i'll upgrade BTW, is it better to fix existing holes using add_table.rb before the upgrade? (or) upgrade takes care missing holes? -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Tue, Mar 29, 2011 11:59 am Subject: Re: hole in META On Tue, Mar 29, 2011 at 7:38 AM, Venkatesh vramanatha...@aol.com wrote: What is the best way to fix this in 0.20.6? Move to 0.90.1 to avoid holes in .META. and to avoid losing data. Let us know if we can help you with upgrade. St.Ack
Export/Import and # of regions
Hi, If I export existing table using Export MR job, truncate the table, increase region size, do a Import will it make use of the new region size? thanks V
Re: Export/Import and # of regions
Thanks J-D..Using 0.20.6..I don't see that method with pre-split in 0.20.6 API spec 1) Will the data still be accessible if I Import the data to a new table? (purely for backup reasons) I tried on small data set..I could.. Before I do export/Import on large table, want to make sure.. 2) Data exported using 0.20.6, can it be imported using 0.90.1? (i could use pre-split in this case) -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, Mar 29, 2011 5:38 pm Subject: Re: Export/Import and # of regions Pre-splitting was discussed a few times on the mailing list today, and a few times in the past weeks, for example: http://search-hadoop.com/m/XB9Vr1gQc66 Import works on a pre-existing table so it won't recreate it. Also it doesn't know how your key space is constructed, so it cannot guess the start/stop row keys for you. J-D On Tue, Mar 29, 2011 at 2:33 PM, Venkatesh vramanatha...@aol.com wrote: Thanks J-D We have way too much data it won't fit in 1 region.Is Import smart enough create reqd # of regions? Cld u pl. elaborate on pre-split table creation? steps? Reason I'm doing this exercise is reduce # of regions in our cluster (in the absence of additional hardware 25K regions on 20 node) -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, Mar 29, 2011 5:29 pm Subject: Re: Export/Import and # of regions Yes but you'll start with a single region, instead of truncating you probably want instead to create a pre-split table. J-D On Tue, Mar 29, 2011 at 2:27 PM, Venkatesh vramanatha...@aol.com wrote: Hi, If I export existing table using Export MR job, truncate the table, increase region size, do a Import will it make use of the new region size? thanks V
Re: hole in META
I've regions like this... add_table.rb is unable to fix this... Is there anything else I could do to fix holes? startkey end-key yv018381 yv018381 yv018381 . yv018381 . -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Tue, Mar 29, 2011 12:55 pm Subject: Re: hole in META On Tue, Mar 29, 2011 at 9:09 AM, Venkatesh vramanatha...@aol.com wrote: I ran into missing jar with hadoop jar file when running a map reduce..which i could n't fix it..That is the only known issue with upgrade If I can fix that, i'll upgrade Tell us more. Whats the complaint? Missing Guava? Commons-logging? BTW, is it better to fix existing holes using add_table.rb before the upgrade? (or) upgrade takes care missing holes? Yes. Make sure all is wholesome before upgrade. Are you able to do this? Good stuff V, St.Ack
java.io.FileNotFoundException:
Does anyone how to get around this? Trying to run a mapreduce job in a cluster..The one change was hbase upgraded to 0.90.1 (from 0.20.6)..No code change java.io.FileNotFoundException: File /data/servers/datastore/mapred/mapred/system/job_201103151601_0363/libjars/zookeeper-3.2.2.jar does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:509) at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:629) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447) at com.aol.mail.antispam.Profiler.UserProfileJob.run(UserProfileJob.java:1916) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java
Re: java.io.FileNotFoundException:
Thanks St.Ack..I'm blind..Got past that.. Now I get for hadoop-0.20.2-core.jar I've removed *append*.jar all over the place replace with hadoop-0.20.2-core.jar 0.90.1 will work with hadoop-0.20.2-core right? Regular gets/puts work..but not the mapreduce job java.io.FileNotFoundException: File /data/servers/datastore/mapred/mapred/system/job_201103161652_0004/libjars/hadoop-0.20.2-core.jar does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:509) at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:633) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447) -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Wed, Mar 16, 2011 1:39 pm Subject: Re: java.io.FileNotFoundException: 0.90.1 ships with zookeeper-3.3.2, not with 3.2.2. St.Ack On Wed, Mar 16, 2011 at 8:05 AM, Venkatesh vramanatha...@aol.com wrote: Does anyone how to get around this? Trying to run a mapreduce job in a cluster..The one change was hbase upgraded to 0.90.1 (from 0.20.6)..No code change java.io.FileNotFoundException: File /data/servers/datastore/mapred/mapred/system/job_201103151601_0363/libjars/zookeeper-3.2.2.jar does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:509) at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:629) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447) at com.aol.mail.antispam.Profiler.UserProfileJob.run(UserProfileJob.java:1916) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java
Re: java.io.FileNotFoundException:
yeah..thats why i feel very stupid..I'm pretty sure it exists on my cluster..but i still get the err.. I'll try on a fresh day -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Wed, Mar 16, 2011 7:44 pm Subject: Re: java.io.FileNotFoundException: The below is pretty basic error. Reference the jar that is actually present on your cluster. St.Ack On Wed, Mar 16, 2011 at 3:50 PM, Venkatesh vramanatha...@aol.com wrote: yeah..i was aware of that..I removed that tried with hadoop-0.20.2-core.jar as I was n't ready to upgrade hadoop.. I tried this time with the *append*.jar ..now it's complaining FileNotFound for append File /data/servers/datastore/mapred/mapred/system/job_201103161750_0030/libjars/hadoop-core-0.20-append-r1056497.jar does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:509) at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:633) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:448) -Original Message- From: Harsh J qwertyman...@gmail.com To: user@hbase.apache.org Sent: Wed, Mar 16, 2011 5:32 pm Subject: Re: java.io.FileNotFoundException: 0.90.1 ships with a hadoop-0.20-append jar (not vanilla hadoop 0.20.2). Look up its name in the lib/ directory of the distribution (comes with a rev #) :) On Thu, Mar 17, 2011 at 2:33 AM, Venkatesh vramanatha...@aol.com wrote: Thanks St.Ack..I'm blind..Got past that.. Now I get for hadoop-0.20.2-core.jar I've removed *append*.jar all over the place replace with hadoop-0.20.2-core.jar 0.90.1 will work with hadoop-0.20.2-core right? Regular gets/puts work..but not the mapreduce job java.io.FileNotFoundException: File /data/servers/datastore/mapred/mapred/system/job_201103161652_0004/libjars/hadoop-0.20.2-core.jar does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:509) at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:633) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447) -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Wed, Mar 16, 2011 1:39 pm Subject: Re: java.io.FileNotFoundException: 0.90.1 ships with zookeeper-3.3.2, not with 3.2.2. St.Ack On Wed, Mar 16, 2011 at 8:05 AM, Venkatesh vramanatha...@aol.com wrote: Does anyone how to get around this? Trying to run a mapreduce job in a cluster..The one change was hbase upgraded to 0.90.1 (from 0.20.6)..No code change java.io.FileNotFoundException: File /data/servers/datastore/mapred/mapred/system/job_201103151601_0363/libjars/zookeeper-3.2.2.jar does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:509) at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:629) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447) at com.aol.mail.antispam.Profiler.UserProfileJob.run(UserProfileJob.java:1916) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java -- Harsh J http://harshj.com
hbase 0.90.1 upgrade issue - mapreduce job
Hi When I upgraded to 0.90.1, mapreduce fails with exception.. system/job_201103151601_0121/libjars/hbase-0.90.1.jar does not exist. I have the jar file in classpath (hadoop-env.sh) any ideas? thanks
Re: region servers shutdown
Thanks J-D.. I was quite happy in the 1st 3 months..Last month or so, lots of instabilities.. i) It's good to know that 0.90.x fixes lots of instabilities..will consider upgrading..It is not listed as stable production release hence the hesitation :) ii) Our cluster is 20 - node (20 data nodes + 20 region servers) (data/region server on every box)..besides that 1 name node, 1 hmaster, 3 zookeper all on diff physical machines iii) hardware pentium .., 36 gig memory on each node iii) Processing about 600 million events per day (real-time put) - 200 bytes per put. Each event is a row in a hbase table. so 600 mill records, 1 column family, 6-10 columns iv) About 50,000 regions so far. v) we run map reduce job every nite that takes the 600 mil records updates/creates aggregate data (1 get per record) aggregate data translates to 25 mill..x 3 puts vi) region splits occur quite frequently..every 5 minutes or so How big are the tables? - have n't run a count on tables lately.. - events table we keep for 90 days - 600 mill record per day..we process each days data - 3 additional tables for aggregate. How many region servers - 20 and how many regions do they serve? - 50,000 regions..x-new regions get created every day..(don't have that #) Are you using lots of families per table? - No..just 1 family in all tables...# of columns 20 Are you using LZO compression? - NO thanks again for your help -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Thu, Feb 10, 2011 2:40 pm Subject: Re: region servers shutdown I see you are running on a very old version of hbase, and under that you have a version of hadoop that doesn't support appends so you are bound to have data loss on machine failure and when a region server needs to abort like it just did. I suggest you upgrade to 0.90.0, or even consider the release candidate of 0.90.1 which can be found here http://people.apache.org/~todd/hbase-0.90.1.rc0/, this is going to help solving a lot of stability problems. Also if you were able to reach 4097 xceivers on your datanodes, it means that you are keeping a LOT of files opened. This suggests that you either have a very small cluster or way too many files. Can you tell us more about your cluster? How big are the tables? How many region servers and how many regions do they serve? Are you using lots of families per table? Are you using LZO compression? Thanks for helping us helping you :) J-D On Thu, Feb 10, 2011 at 11:32 AM, Venkatesh vramanatha...@aol.com wrote: Thanks J-D.. Can't believe i missed that..I have had it before ..i did look for it..(not hard/carefull enough, i guess) this time deflt that's the reason ...xceiverCount 4097 exceeds the limit of concurrent xcievers 4096... ..thinking of doubling this.. I've had had so many issues in the last month..holes in meta, data node hung,..etc..this time it was enmass -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Thu, Feb 10, 2011 1:56 pm Subject: Re: region servers shutdown The first thing to do would be to look at the datanode logs a the time of the outage. Very often it's caused by either ulimit or xcievers that weren't properly configured, checkout http://hbase.apache.org/notsoquick.html#ulimit J-D On Thu, Feb 10, 2011 at 10:42 AM, Venkatesh vramanatha...@aol.com wrote: Hi I've had this before ..but not to 70% of the cluster..region servers all dying..Any insight is helpful. Using hbase-0.20.6, hadoop-0.20.2 I don't see any error in the datanode or the namenode many thanks Here's the relevant log entires ..in master... Got while writing region XXlog java.io.IOException: Bad connect ack with firstBadLink YYY 2011-02-10 01:31:26,052 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Waiting for hlog writers to terminate, iteration #9 2011-02-10 01:31:28,974 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) 2011-02-10 01:31:28,975 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_1053173551314261780_21097871 bad datanode[2] nodes == null 2011-02-10 01:31:28,975 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file /hbase_data//1560386868/oldlogfile.log - Aborting... in region server..(one of them) 2011-02-10 01:29:41,028 WARN org.apache.hadoop.hdfs.DFSClient
Re: region servers shutdown
Keys are randomized..Requests are nearly equally distributed across region servers (use to have sequential but that created region hot spots)..However current scheme requires our map reduce job to look for events in all regions (using hbase time stamp).. which hurts the map-reduce performance..but did help the real puts -Original Message- From: Ted Dunning tdunn...@maprtech.com To: user@hbase.apache.org Sent: Thu, Feb 10, 2011 3:45 pm Subject: Re: region servers shutdown Are your keys sequential or randomized? On Thu, Feb 10, 2011 at 12:35 PM, Venkatesh vramanatha...@aol.com wrote: iii) Processing about 600 million events per day (real-time put) - 200 bytes per put. Each event is a row in a hbase table. so 600 mill records, 1 column family, 6-10 columns iv) About 50,000 regions so far. v) we run map reduce job every nite that takes the 600 mil records updates/creates aggregate data (1 get per record) aggregate data translates to 25 mill..x 3 puts
Re: region servers shutdown
Thanks J-D will increase MAX_FILESIZE as u suggested...I could truncate one of the tables which constitutes 80% of the regions will try compression after that -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Thu, Feb 10, 2011 4:37 pm Subject: Re: region servers shutdown 2500 regions per region server can be a lot of files to keep opened, which is probably one of the main reason for your instability (as your regions were growing, it started poking into those dark corners of xcievers and eventually ulimits). You need to set your regions to be bigger, and use LZO compression to even lower the cost of storing those events and at the same time improve performance across the board. Check the MAX_FILESIZE config for your table in the shell, I would recommend 1GB instead of the default 256MB. Then, follow this wiki to setup LZO: http://wiki.apache.org/hadoop/UsingLzoCompression Finally, you cannot merge regions (like it was said in other threads this week) to bring the count back down, so one option you might consider is copying all the content from the first table to a second better configured table. It's probably going to be a pain to do it in 0.20.6 because you cannot create a table with multiple regions, so maybe that would be another reason to upgrade :) Oh and one other thing, if your zk servers are of the same class of hardware as the region servers and you're not using them for anything else than HBase, then you should only use 1 zk server and collocate it with the master and the namenode, then use those 3 machines as region servers to help spread the region load. J-D On Thu, Feb 10, 2011 at 12:35 PM, Venkatesh vramanatha...@aol.com wrote: Thanks J-D.. I was quite happy in the 1st 3 months..Last month or so, lots of instabilities.. i) It's good to know that 0.90.x fixes lots of instabilities..will consider upgrading..It is not listed as stable production release hence the hesitation :) ii) Our cluster is 20 - node (20 data nodes + 20 region servers) (data/region server on every box)..besides that 1 name node, 1 hmaster, 3 zookeper all on diff physical machines iii) hardware pentium .., 36 gig memory on each node iii) Processing about 600 million events per day (real-time put) - 200 bytes per put. Each event is a row in a hbase table. so 600 mill records, 1 column family, 6-10 columns iv) About 50,000 regions so far. v) we run map reduce job every nite that takes the 600 mil records updates/creates aggregate data (1 get per record) aggregate data translates to 25 mill..x 3 puts vi) region splits occur quite frequently..every 5 minutes or so How big are the tables? - have n't run a count on tables lately.. - events table we keep for 90 days - 600 mill record per day..we process each days data - 3 additional tables for aggregate. How many region servers - 20 and how many regions do they serve? - 50,000 regions..x-new regions get created every day..(don't have that #) Are you using lots of families per table? - No..just 1 family in all tables...# of columns 20 Are you using LZO compression? - NO thanks again for your help -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Thu, Feb 10, 2011 2:40 pm Subject: Re: region servers shutdown I see you are running on a very old version of hbase, and under that you have a version of hadoop that doesn't support appends so you are bound to have data loss on machine failure and when a region server needs to abort like it just did. I suggest you upgrade to 0.90.0, or even consider the release candidate of 0.90.1 which can be found here http://people.apache.org/~todd/hbase-0.90.1.rc0/, this is going to help solving a lot of stability problems. Also if you were able to reach 4097 xceivers on your datanodes, it means that you are keeping a LOT of files opened. This suggests that you either have a very small cluster or way too many files. Can you tell us more about your cluster? How big are the tables? How many region servers and how many regions do they serve? Are you using lots of families per table? Are you using LZO compression? Thanks for helping us helping you :) J-D On Thu, Feb 10, 2011 at 11:32 AM, Venkatesh vramanatha...@aol.com wrote: Thanks J-D.. Can't believe i missed that..I have had it before ..i did look for it..(not hard/carefull enough, i guess) this time deflt that's the reason ...xceiverCount 4097 exceeds the limit of concurrent xcievers 4096... ..thinking of doubling this.. I've had had so many issues in the last month..holes in meta, data node hung,..etc..this time it was enmass -Original
script to delete regions with no rows
Is there a script? thanks
Re: script to delete regions with no rows
thankyou -Original Message- From: Stack st...@duboce.net To: user@hbase.apache.org Sent: Fri, Jan 28, 2011 3:43 pm Subject: Re: script to delete regions with no rows The end key of one region must match the start key of the next so you can't just remove the region from .META. and its directory -- if one -- in HDFS. You'd need to adjust the start or end key on the region previous or after to include the scope of the just removed region. There is no script to do this that I know of. Check the content of bin/*.rb. These scripts mess around with meta adding and removing regions. They might inspire. Also look at the Merge.java class. See how it edits .META. after merging two adjacent regions to create a new region that spans the key space of the two old adjacent regions. St.Ack On Fri, Jan 28, 2011 at 12:29 PM, Venkatesh vramanatha...@aol.com wrote: Is there a script? thanks
Re: getting retries exhausted exception
Thanks J-D I do see this in region server log 2011-01-26 03:03:24,459 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(5800409546372591083, 1000) from 172.29.253.231:35656: output error 2011-01-26 03:03:24,462 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 256 on 60020 caught: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) at org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1164) at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1125) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:615) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:679) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:943) ... 2011-01-26 03:04:17,961 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: org.apache.hadoop.hbase.UnknownScannerException: Scanner was closed (timed out?) after we renewed it. Could be caused by a very slow scanner or a lengthy garbage collection at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1865) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1897) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) 2011-01-26 03:04:17,966 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 5800409546372591083 lease expired 2011-01-26 03:04:17,966 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 4439572834176684295 lease expired -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Wed, Jan 26, 2011 5:26 pm Subject: Re: getting retries exhausted exception It seems to be coming from the region server side... so one thing you can check is the region server logs and see if the NPEs are there. If not, and there's nothing suspicious, then consider enabling DEBUG for hbase and re-run the job to hopefully get more information. J-D On Wed, Jan 26, 2011 at 8:44 AM, Venkatesh vramanatha...@aol.com wrote: Using 0.20.6..any solutions? Occurs during mapper phase..will increasing retry count fix this? thanks here's the stack trace org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server null for region , row '', but failed after 10 attempts. Exceptions: java.lang.NullPointerException java.lang.NullPointerException java.lang.NullPointerException java.lang.NullPointerException java.lang.NullPointerException java.lang.NullPointerException java.lang.NullPointerException java.lang.NullPointerException java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1045) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:2003) at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1923) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:403) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.nextKeyValue(TableInputFormatBase.java:210) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170)
Re: hbase.client.retries.number
Hi Sean: Thx Size of column family is very small 100 bytes Investigating potential bottleneck spot..Our cluster is small (relatively speaking)..10 node Our hardware is high end (not commodity) venkatesh -Original Message- From: Sean Bigdatafun sean.bigdata...@gmail.com To: user@hbase.apache.org Sent: Fri, Oct 15, 2010 5:28 pm Subject: Re: hbase.client.retries.number On Thu, Oct 14, 2010 at 12:03 PM, Venkatesh vramanatha...@aol.com wrote: Thanks J-D Yeah..Found out the hard way in prod :) set to zero..since client requests were backing up.. everything stopped working/region server would n't come up..etc..(did not realize hbase client property would be used by server :) I reverted all retries back to default.. So far everything seems good...(fingers crossed).after making several tunables along the way.. - Using HBase 0.20.6 -Processing about 300 million event puts -85% of requests are under 10 milli.sec..while the mean is about 300 millisecs..Trying to narrow that..if it's during our client GC or Hbase pause..Tuning region server handler count This is way slow too. -mapreduce job to process 40 million records takes about an hour..Majority in the reduce phase. Trying to optimize that..by varying buffer size of writes..Going to try the in_memory option as well. This is way slow too. - Full table scan takes about 30 minutes..Is that reasonable for a table size of 10 mill records? hbase.client.scanner.caching - If set in hbase-site.xml, Scan calls should pick that up correct? This is way slow for a 10 million records table. What size is your column family? thanks venkatesh -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Thu, Oct 14, 2010 2:39 pm Subject: Re: hbase.client.retries.number hbase.client.retries.number is used by HConnectionManager, so this means anything that uses the HBase client. I think some parts of the region server code use it, or used it at some point, I'd have to dig in. But definitely never set this to 0, as any region move/split will kill your client, About this RetriesExhaustedException, it seems that either the region is in an unknown state or that it just took a lot of time to close and be moved. You need to correlate this with the master log (look for this region's name) since the client cannot possibly know what went on inside the cluster. Also, which version are you using? J-D On Mon, Oct 11, 2010 at 3:06 PM, Venkatesh vramanatha...@aol.com wrote: BTW..get this exception while trying a new put.. Also, get this exception on gets on some region servers org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=true, tries=9, numtries=10, i=0, listsize=1, region=user_activity,1286789413060_atanackovics_30306_4a3e0812,1286789581757 for region user_activity,1286789413060_30306_4a3e0812,1286789581757, row '1286823659253_v6_1_df34b22f', but failed after 10 attempts. Exceptions: org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1149) org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1230) org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666) org.apache.hadoop.hbase.client.HTable.close(HTable.java:682) com.aol.mail.antispam.Profiler.notifyEmailSendActivity.processGetRequest(notifyEmailSendActivity.java:363) com.aol.mail.antispam.Profiler.notifyEmailSendActivity.doGet(notifyEmailSendActivity.java:450) javax.servlet.http.HttpServlet.service(HttpServlet.java:617) javax.servlet.http.HttpServlet.service(HttpServlet.java:717) -Original Message- From: Venkatesh vramanatha...@aol.com To: user@hbase.apache.org Sent: Mon, Oct 11, 2010 2:35 pm Subject: hbase.client.retries.number HBase was seamless for first couple of weeks..now all kinds of issues in production :) fun fun.. Curious ..does this property have to match up on hbase client side region server side.. I've this number set to 0 on region server side default on client side.. I can't do any put (new) thanks venkatesh
Re: hbase.client.retries.number
Thanks J-D Yeah..Found out the hard way in prod :) set to zero..since client requests were backing up.. everything stopped working/region server would n't come up..etc..(did not realize hbase client property would be used by server :) I reverted all retries back to default.. So far everything seems good...(fingers crossed).after making several tunables along the way.. - Using HBase 0.20.6 -Processing about 300 million event puts -85% of requests are under 10 milli.sec..while the mean is about 300 millisecs..Trying to narrow that..if it's during our client GC or Hbase pause..Tuning region server handler count -mapreduce job to process 40 million records takes about an hour..Majority in the reduce phase. Trying to optimize that..by varying buffer size of writes..Going to try the in_memory option as well. - Full table scan takes about 30 minutes..Is that reasonable for a table size of 10 mill records? hbase.client.scanner.caching - If set in hbase-site.xml, Scan calls should pick that up correct? thanks venkatesh -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Thu, Oct 14, 2010 2:39 pm Subject: Re: hbase.client.retries.number hbase.client.retries.number is used by HConnectionManager, so this means anything that uses the HBase client. I think some parts of the region server code use it, or used it at some point, I'd have to dig in. But definitely never set this to 0, as any region move/split will kill your client, About this RetriesExhaustedException, it seems that either the region is in an unknown state or that it just took a lot of time to close and be moved. You need to correlate this with the master log (look for this region's name) since the client cannot possibly know what went on inside the cluster. Also, which version are you using? J-D On Mon, Oct 11, 2010 at 3:06 PM, Venkatesh vramanatha...@aol.com wrote: BTW..get this exception while trying a new put.. Also, get this exception on gets on some region servers org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=true, tries=9, numtries=10, i=0, listsize=1, region=user_activity,1286789413060_atanackovics_30306_4a3e0812,1286789581757 for region user_activity,1286789413060_30306_4a3e0812,1286789581757, row '1286823659253_v6_1_df34b22f', but failed after 10 attempts. Exceptions: org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1149) org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1230) org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666) org.apache.hadoop.hbase.client.HTable.close(HTable.java:682) com.aol.mail.antispam.Profiler.notifyEmailSendActivity.processGetRequest(notifyEmailSendActivity.java:363) com.aol.mail.antispam.Profiler.notifyEmailSendActivity.doGet(notifyEmailSendActivity.java:450) javax.servlet.http.HttpServlet.service(HttpServlet.java:617) javax.servlet.http.HttpServlet.service(HttpServlet.java:717) -Original Message- From: Venkatesh vramanatha...@aol.com To: user@hbase.apache.org Sent: Mon, Oct 11, 2010 2:35 pm Subject: hbase.client.retries.number HBase was seamless for first couple of weeks..now all kinds of issues in production :) fun fun.. Curious ..does this property have to match up on hbase client side region server side.. I've this number set to 0 on region server side default on client side.. I can't do any put (new) thanks venkatesh
Re: Increase region server throughput
Thanks St.Ack. I've both those settings autoflush writebuffer size. I'll try the new HTable(conf, ..)..(I just have new HTable(table) now) Right now upto 85% under 10ms..I'm trying to bring the mean down PS: I can tolerate some loss of data for getting better throughput. -Original Message- From: Sean Bigdatafun sean.bigdata...@gmail.com To: user@hbase.apache.org Sent: Thu, Oct 14, 2010 8:11 pm Subject: Re: Increase region server throughput Though this setup, setautoflush(false), increases the thoughput, the data loss rate increases significantly -- there is no way for the client to know what has been lost and what has gone through. That bothers me. Sean On Tue, Oct 12, 2010 at 11:32 AM, Stack st...@duboce.net wrote: Have you played with these settings HTable API? http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean) http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/HTable.html#setWriteBufferSize(long) There is something seriously wrong if you are seeing 5 seconds per put (unless your put is gigabytes in size?). Are you doing 'new HTable(tablename)' in your client or are you doing 'new HTable(conf, tablename)' in your client code? Do the latter if not -- share the configuration with HTable instances. St.Ack On Mon, Oct 11, 2010 at 10:47 PM, Venkatesh vramanatha...@aol.com wrote: I would like to tune region server to increase throughput..On a 10 node cluster, I'm getting 5 sec per put. (this is unbatched/unbuffered). Other than region server handler count property is there anything else I can tune to increase throughput? ( this operation i can't use buffered write without code change) thx venkatesh
hbase.client.retries.number
HBase was seamless for first couple of weeks..now all kinds of issues in production :) fun fun.. Curious ..does this property have to match up on hbase client side region server side.. I've this number set to 0 on region server side default on client side.. I can't do any put (new) thanks venkatesh
Re: hbase.client.retries.number
BTW..get this exception while trying a new put.. Also, get this exception on gets on some region servers org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=true, tries=9, numtries=10, i=0, listsize=1, region=user_activity,1286789413060_atanackovics_30306_4a3e0812,1286789581757 for region user_activity,1286789413060_30306_4a3e0812,1286789581757, row '1286823659253_v6_1_df34b22f', but failed after 10 attempts. Exceptions: org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1149) org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1230) org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666) org.apache.hadoop.hbase.client.HTable.close(HTable.java:682) com.aol.mail.antispam.Profiler.notifyEmailSendActivity.processGetRequest(notifyEmailSendActivity.java:363) com.aol.mail.antispam.Profiler.notifyEmailSendActivity.doGet(notifyEmailSendActivity.java:450) javax.servlet.http.HttpServlet.service(HttpServlet.java:617) javax.servlet.http.HttpServlet.service(HttpServlet.java:717) -Original Message- From: Venkatesh vramanatha...@aol.com To: user@hbase.apache.org Sent: Mon, Oct 11, 2010 2:35 pm Subject: hbase.client.retries.number HBase was seamless for first couple of weeks..now all kinds of issues in production :) fun fun.. Curious ..does this property have to match up on hbase client side region server side.. I've this number set to 0 on region server side default on client side.. I can't do any put (new) thanks venkatesh
Increase region server throughput
I would like to tune region server to increase throughput..On a 10 node cluster, I'm getting 5 sec per put. (this is unbatched/unbuffered). Other than region server handler count property is there anything else I can tune to increase throughput? ( this operation i can't use buffered write without code change) thx venkatesh
Region servers suddenly disappearing
Some of the region servers suddenly dying..I've pasted relevant log lines..I don't see any error in datanodes Any ideas? thanks venkatsh . 2010-10-10 12:55:36,664 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block.at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) 2010-10-10 12:55:36,664 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-8758558338582893960_95415 bad datanode[0] nodes == null 2010-10-10 12:55:36,665 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file /hbase_data/user_activity/compaction.dir/78194102/766401078063435042 - Aborting... 2010-10-10 12:55:36,666 ERROR org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split failed for region user_activity,1286729575294_11655_614aa74e,1286729678877 java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250)at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) 2010-10-10 12:55:40,176 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2010-10-10 12:55:40,176 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-568910271688144725_95415 2010-10-10 12:55:53,353 DEBUG org.apache.hadoop.hbase.regionserver.Store: closed activities2010-10-10 12:55:53,353 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed user_activity,1286232613677_albridgew4_18363_c45677e1,12862335110072010-10-10 12:55:53,353 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: closing region user_activity,1286202422485_bayequip_15725_a6b7893e,12862031448812010-10-10 12:55:53,353 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing user_activity,1286202422485_bayequip_15725_a6b7893e,1286203144881: disabling compactions flushes2010-10-10 12:55:53,353 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region, no outstanding scanners on user_activity,1286202422485_bayequip_15725_a6b7893e,12862031448812010-10-10 12:55:53,353 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No more row locks outstanding on region user_activity,1286202422485_15725_a6b7893e,12862031448812010-10-10 12:55:53,353 DEBUG org.apache.hadoop.hbase.regionserver.Store: closed activities2010-10-10 12:55:53,354 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed user_activity,1286202422485_bayequip_15725_a6b7893e,1286203144881 2010-10-10 12:55:53,354 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at: 172.29.253.200:60020 2010-10-10 12:55:55,091 INFO org.apache.hadoop.hbase.Leases: regionserver/172.29.253.200:60020.leaseChecker closing leases2010-10-10 12:55:55,091 INFO org.apache.hadoop.hbase.Leases: regionserver/172.29.253.200:60020.leaseChecker closed leases 2010-10-10 12:55:59,664 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting 2010-10-10 12:55:59,664 INFO org.apache.zookeeper.ZooKeeper: Closing session: 0x22b967dce5d0001 2010-10-10 12:55:59,664 INFO org.apache.zookeeper.ClientCnxn: Closing ClientCnxn for session: 0x22b967dce5d0001 2010-10-10 12:55:59,669 INFO org.apache.zookeeper.ClientCnxn: Exception while closing send thread for session 0x22b967dce5d0001 : Read error rc = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] 2010-10-10 12:55:59,775 INFO org.apache.zookeeper.ClientCnxn: Disconnecting ClientCnxn for session: 0x22b967dce5d0001 2010-10-10 12:55:59,775 INFO org.apache.zookeeper.ZooKeeper: Session: 0x22b967dce5d0001 closed 2010-10-10 12:55:59,775 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Closed connection with ZooKeeper 2010-10-10 12:55:59,775 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2010-10-10 12:55:59,776 ERROR org.apache.hadoop.hdfs.DFSClient: Exception closing file /hbase_data/user_activity/78194102/activities/8044918410206348854 : java.io.EOFException java.io.EOFExceptionat java.io.DataInputStream.readByte(DataInputStream.java:250)at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
Incredibly slow response to Scan
J-D et.al I've put the mapreduce issue that I had in the back burner for now. I'm getting incredible slow response to Scan.. On a 10 node cluster for a table with 1200 regions ..it takes 20 minutes to scan a column with given value..Got 100 or so records for the response.. Is this normal? thanks venkatesh PS. setCaching(100) ..did n't make a dent in performance
Re: HBase map reduce job timing
Ahh ..ok..That makes sense I've a 10 node cluster each with 36 gig..I've allocated 4gig for HBase Region Servers..master.jsp reports used heap is less than half on each region server. I've close to 800 regions total..Guess it needs to kick off a jvm to see if data exists in all regions.. -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, Oct 5, 2010 11:52 pm Subject: Re: HBase map reduce job timing Regarding number of map tasks 500+, 490 of them processing nothing, do you have an explanation for that?..Wondering if its kicking off too many JVMs most doing nothing.. This would mean that throughout your regions, only a few have data in the timestamp range you're looking for. 'top' reports less free memory (couple of gig.) though box has 36 gig total.. I don't quite trust top since cached blocks don't show up under free column even if no process is running.. You only have 1 machine? BTW how much RAM did you give to HBase? J-D
Re: HBase map reduce job timing
Also, do you think if I query using rowkey instead of hbase time stamp..it would not kick off that many tasks.. since region server knows the exact locations? thanks venkatesh -Original Message- From: Venkatesh vramanatha...@aol.com To: user@hbase.apache.org Sent: Wed, Oct 6, 2010 8:53 am Subject: Re: HBase map reduce job timing Ahh ..ok..That makes sense I've a 10 node cluster each with 36 gig..I've allocated 4gig for HBase Region Servers..master.jsp reports used heap is less than half on each region server. I've close to 800 regions total..Guess it needs to kick off a jvm to see if data exists in all regions.. -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, Oct 5, 2010 11:52 pm Subject: Re: HBase map reduce job timing Regarding number of map tasks 500+, 490 of them processing nothing, do you have an explanation for that?..Wondering if its kicking off too many JVMs most doing nothing.. This would mean that throughout your regions, only a few have data in the timestamp range you're looking for. 'top' reports less free memory (couple of gig.) though box has 36 gig total.. I don't quite trust top since cached blocks don't show up under free column even if no process is running.. You only have 1 machine? BTW how much RAM did you give to HBase? J-D =
Re: HBase map reduce job timing
Thanks J-D I'll hookup Ganglia (wanting but kept pushing back..) get back V -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Wed, Oct 6, 2010 12:22 pm Subject: Re: HBase map reduce job timing Also, do you think if I query using rowkey instead of hbase time stamp..it would not kick off that many tasks.. since region server knows the exact locations? I don't see how you could do that in a scalable way, unless you really have to query a few rows (less than a million). I've a 10 node cluster each with 36 gig..I've allocated 4gig for HBase Region Servers..master.jsp reports used heap is less than half on each region server. This is Java so the reported heap doesn't mean much... the garbage collector doesn't collect aggressively since that would be awfully inefficient. I've close to 800 regions total..Guess it needs to kick off a jvm to see if data exists in all regions.. It does, and like you said the mappers take only a few minutes so optimizing that part of the job is useless until you get your reducers faster. So regarding the speed of inserts (this seem to be the real issue if what you said about the write buffer is true), I'd be interested in 1) seeing your reducer's code (strip whatever you have that's business specific) and 2) seeing some monitoring data while the job is running (if not, get ganglia in there). Inserts could be slow for many reasons apart from bad API usage, such as cluster misconfiguration, sub-optimal insertion pattern (the classic being having only 1 region), etc. J-D
HBase map reduce job timing
I've a mapreduce job that is taking too long..over an hour..Trying to see what can a tune to to bring it down..One thing I noticed, the job is kicking off - 500+ map tasks : 490 of them do not process any records..where as 10 of them process all the records (200 K each..)..Any idea why that would be?... ..map phase takes about couple of minutes.. ..reduce phase takes the rest.. ..i'll try increasing # of reduce tasks..Open to other other suggestion for tunables.. thanks for your input venkatesh
Re: HBase map reduce job timing
Sorry..yeah..i've to do some digging to provide some data.. What sort of data would be helpful? Would stats reported by jobtracker.jsp suffice? I've pasted that in this email.. I can gather more jvm stats..thanks Status: Succeeded Started at: Tue Oct 05 21:39:58 EDT 2010 Finished at: Tue Oct 05 22:36:43 EDT 2010 Finished in: 56mins, 45sec Job Cleanup: Successful Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 100.00% 565 0 0 565 0 0 / 11 reduce 100.00% 20 0 0 20 0 0 / 2 Counter Map Reduce Total Job Counters Launched reduce tasks 0 0 22 Rack-local map tasks 0 0 66 Launched map tasks 0 0 576 Data-local map tasks 0 0 510 com.JobRecords REDUCE_PHASE_RECORDS 0 597,712 597,712 MAP_PHASE_RECORDS 2,534,807 0 2,534,807 FileSystemCounters FILE_BYTES_READ 335,845,726 861,146,518 1,196,992,244 FILE_BYTES_WRITTEN 1,197,031,156 861,146,518 2,058,177,674 Map-Reduce Framework Reduce input groups 0 597,712 597,712 Combine output records 0 0 0 Map input records 2,534,807 0 2,534,807 Reduce shuffle bytes 0 789,145,342 789,145,342 Reduce output records 0 0 0 Spilled Records 3,522,428 2,534,807 6,057,235 Map output bytes 851,007,170 0 851,007,170 Map output records 2,534,807 0 2,534,807 Combine input records 0 0 0 Reduce input records 0 2,534,807 2,534,807 -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, Oct 5, 2010 10:53 pm Subject: Re: HBase map reduce job timing I'd love to give you tips, but you didn't provide any data about the input and output of your job, the kind of hardware you're using, etc. At this point any suggestion would be a stab in the dark, the best I can do is pointing to the existing documentation http://wiki.apache.org/hadoop/PerformanceTuning J-D On Tue, Oct 5, 2010 at 7:12 PM, Venkatesh vramanatha...@aol.com wrote: I've a mapreduce job that is taking too long..over an hour..Trying to see what can a tune to to bring it down..One thing I noticed, the job is kicking off - 500+ map tasks : 490 of them do not process any records..where as 10 of them process all the records (200 K each..)..Any idea why that would be?... ..map phase takes about couple of minutes.. ..reduce phase takes the rest.. ..i'll try increasing # of reduce tasks..Open to other other suggestion for tunables.. thanks for your input venkatesh
Re: HBase map reduce job timing
Sure..Both input output are HBase tables Input (mapper phase) - scanning a HBase table for all records within time range (using hbase timestamps) Output (reduce phase) - doing a Put to 3 different HBase tables -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, Oct 5, 2010 11:14 pm Subject: Re: HBase map reduce job timing It'd be more useful if we knew where that data is coming from, and where it's going. Are you scanning HBase and/or writing to it? J-D On Tue, Oct 5, 2010 at 8:05 PM, Venkatesh vramanatha...@aol.com wrote: Sorry..yeah..i've to do some digging to provide some data.. What sort of data would be helpful? Would stats reported by jobtracker.jsp suffice? I've pasted that in this email.. I can gather more jvm stats..thanks Status: Succeeded Started at: Tue Oct 05 21:39:58 EDT 2010 Finished at: Tue Oct 05 22:36:43 EDT 2010 Finished in: 56mins, 45sec Job Cleanup: Successful Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 100.00% 565 0 0 565 0 0 / 11 reduce 100.00% 20 0 0 20 0 0 / 2 Counter Map Reduce Total Job Counters Launched reduce tasks 0 0 22 Rack-local map tasks 0 0 66 Launched map tasks 0 0 576 Data-local map tasks 0 0 510 com.JobRecords REDUCE_PHASE_RECORDS 0 597,712 597,712 MAP_PHASE_RECORDS 2,534,807 0 2,534,807 FileSystemCounters FILE_BYTES_READ 335,845,726 861,146,518 1,196,992,244 FILE_BYTES_WRITTEN 1,197,031,156 861,146,518 2,058,177,674 Map-Reduce Framework Reduce input groups 0 597,712 597,712 Combine output records 0 0 0 Map input records 2,534,807 0 2,534,807 Reduce shuffle bytes 0 789,145,342 789,145,342 Reduce output records 0 0 0 Spilled Records 3,522,428 2,534,807 6,057,235 Map output bytes 851,007,170 0 851,007,170 Map output records 2,534,807 0 2,534,807 Combine input records 0 0 0 Reduce input records 0 2,534,807 2,534,807 -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Tue, Oct 5, 2010 10:53 pm Subject: Re: HBase map reduce job timing I'd love to give you tips, but you didn't provide any data about the input and output of your job, the kind of hardware you're using, etc. At this point any suggestion would be a stab in the dark, the best I can do is pointing to the existing documentation http://wiki.apache.org/hadoop/PerformanceTuning J-D On Tue, Oct 5, 2010 at 7:12 PM, Venkatesh vramanatha...@aol.com wrote: I've a mapreduce job that is taking too long..over an hour..Trying to see what can a tune to to bring it down..One thing I noticed, the job is kicking off - 500+ map tasks : 490 of them do not process any records..where as 10 of them process all the records (200 K each..)..Any idea why that would be?... ..map phase takes about couple of minutes.. ..reduce phase takes the rest.. ..i'll try increasing # of reduce tasks..Open to other other suggestion for tunables.. thanks for your input venkatesh
hbase/hdfs disk usage
Hi We have cluster up running in production..Firstly, thanks to J-D the group for all the initial input/help. I'm trying to handle of DFS disk usage. All I'm storing in hdfs is hbase records. If i do hadoop du /hbase_data dir ..i see 1 gig so far..where as DFS usage via the web interface..reports 10gig That seems odd..Any ideas on what could be taking up space?..I don't have permission to look the entire hdfs..yet Just thought i'll ask the group thanks venkatesh
Re: stop-hbase.sh takes forever (never stops)
Don't know if this helps..but here are couple of reasons when I had the issue how i resolved it - If zookeeper is not running (or do not have the quorum) in a cluster setup, hbase does not go down..bring up zookeeper - Make sure pid file is not under /tmp...somtimes files get cleaned out of tmp..Change *env.sh to point to diff dir. -Original Message- From: Jian Lu j...@local.com To: user@hbase.apache.org user@hbase.apache.org Sent: Tue, Sep 7, 2010 5:44 pm Subject: stop-hbase.sh takes forever (never stops) Hi, could someone please tell me why stop-hbase.sh takes more than 24 hrs and still running? I was able to started / stopped hbase in the past two months. Now it suddenly stops working. I am running hbase-0.20.4 with Linux 64-bit CPU / 64-bit operating system. I downloaded hbase-0.20.4 and run on a standalone mode (http://hbase.apache.org/docs/current/api/overview-summary.html#overview_description) Thanks! Jack.
Dependencies for running mapreduce
The mapreduce job code I have (java app) depends on other libraries. It runs fine when the job is run locally but when I'm running on a true distributed setup..it's failing on dependencies..Do I have to put all the libraies, propertty files (dependent) of my application in HADOOP_CLASSPATH ..for the mapreduce to run in a cluster? thanks venkatesh
jobtracker.jsp
I'm running map/reduce jobs from java app (table mapper reducer) in true distributed mode..I don't see anything in jobtracker page..Map/reduce job runs fine..Am I missing some config? thanks venkatesh
Re: jobtracker.jsp
Thanks J-D I figured I did n't have mapred-site.xml in my WEB-INF/classes directory (classpth) I copied that from the cluster ..that fixed part of it..Now i don't have zookeper in hadoop-env.sh:HADOOP_CLASSPATH I distinctly looked at this link a while ago.. it did n't have zookeper listed .(i've everything else i.e hbase-*.) perhaps i had a old link can all the config in mapred-site.xml be added to hbase-site.xml?..It kind of works with them being separate.. just wondering.. Have one more question..I also have trouble stopping namenode/datanode/jobtracker..to make this classpath effective Is there a force shutdown option? (other than kill -9)..? venkatesh -Original Message- From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Sent: Fri, Aug 27, 2010 12:10 am Subject: Re: jobtracker.jsp HBase needs to know about the job tracker, it could be on the same machine or distant, and that's taken care by giving HBase mapred's configurations. Here's the relevant documentation : http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath J-D 2010/8/26 xiujin yang xiujiny...@hotmail.com: Hi When I run Hbase performance, I met the same problem. When job are run on local, it don't show up on job list. Best Xiujin Yang. To: user@hbase.apache.org Subject: Re: jobtracker.jsp Date: Thu, 26 Aug 2010 22:30:09 -0400 From: vramanatha...@aol.com yeah..log says it's running Locally..i've to figure out why.. 2010-08-26 08:49:01,491 INFO Thread-16 org.apache.hadoop.mapred.MapTask - Starting flush of map output 2010-08-26 08:49:01,578 INFO Thread-16 org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_00_0 is done. And is in the process of commiting 2010-08-26 08:49:01,586 INFO Thread-16 org.apache.hadoop.mapred.LocalJobRunner - 2010-08-26 08:49:01,587 INFO Thread-16 org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_00_0' done. 2010-08-26 08:49:01,613 INFO Thread-16 org.apache.hadoop.mapred.LocalJobRunner - 2010-08-26 08:49:01,630 INFO Thread-16 org.apache.hadoop.mapred.Merger - Merging 1 sorted segments 2010-08-26 08:49:01,640 INFO Thread-16 org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes 2010-08-26 08:49:01,640 INFO Thread-16 org.apache.hadoop.mapred.LocalJobRunner - 2010-08-26 08:49:01,658 INFO Thread-16 org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_r_00_0 is done. And is in the process of commiting 2010-08-26 08:49:01,659 INFO Thread-16 org.apache.hadoop.mapred.LocalJobRunner - reduce reduce 2010-08-26 08:49:01,660 INFO Thread-16 org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_r_00_0' done. -Original Message- From: Jeff Zhang zjf...@gmail.com To: user@hbase.apache.org Sent: Thu, Aug 26, 2010 9:42 pm Subject: Re: jobtracker.jsp So what's the log in your client side ? On Thu, Aug 26, 2010 at 6:23 PM, Venkatesh vramanatha...@aol.com wrote: I'm running map/reduce jobs from java app (table mapper reducer) in true distributed mode..I don't see anything in jobtracker page..Map/reduce job runs fine..Am I missing some config? thanks venkatesh -- Best Regards Jeff Zhang
Re: How to delete rows in a FIFO manner
I wrestled with that idea of time bounded tables..Would it make it harder to write code/run map reduce on multiple tables ? Also, how do u decide to when to do the cut over (start of a new day, week/month..) if u do how to process data that cross those time boundaries efficiently.. Guess that is not your requirement.. If it is fixed time cut over, is n't enough to set the TTL timestamp ? Interesting thread..thanks -Original Message- From: Thomas Downing tdown...@proteus-technologies.com To: user@hbase.apache.org user@hbase.apache.org Sent: Fri, Aug 6, 2010 11:39 am Subject: Re: How to delete rows in a FIFO manner Thanks for the suggestions. The problem isn't generating the Delete objects, or the delete operation itself - both are fast enough. The problem is generating the list of row keys from which the Delete objects are created. For now, the obvious work-around is to create and drop tables on the fly, using HBaseAdmin, with the tables being time-bounded. When the high end of a table passes the expiry time, just drop the table. When a table is written with the first record greater than the low bound, create a new table for the next time interval. As I am having other problems related to high ingest rates, the fact may be that I am just using the wrong tool for the job. Thanks td On 8/6/2010 10:24 AM, Jean-Daniel Cryans wrote: If the inserts are coming from more than 1 client, and your are trying to delete from only 1 client, then likely it won't work. You could try using a pool of deleters (multiple threads that delete rows) that you feed from the scanner. Or you could run a MapReduce that would parallelize that for you, that takes your table as an input and that outputs Delete objects. J-D On Fri, Aug 6, 2010 at 5:50 AM, Thomas Downing tdown...@proteus-technologies.com wrote: Hi, Continuing with testing HBase suitability in a high ingest rate environment, I've come up with a new stumbling block, likely due to my inexperience with HBase. We want to keep and purge records on a time basis: i.e, when a record is older than say, 24 hours, we want to purge it from the database. The problem that I am encountering is the only way I've found to delete records using an arbitrary but strongly ordered over time row id is to scan for rows from lower bound to upper bound, then build an array of Delete using for Result in ResultScanner add new Delete( Result.getRow( ) ) to Delete array. This method is far too slow to keep up with our ingest rate; the iteration over the Results in the ResultScanner is the bottleneck, even though the Scan is limited to a single small column in the column family. The obvious but naive solution is to use a sequential row id where the lower and upper bound can be known. This would allow the building of the array of Delete objects without a scan step. Problem with this approach is how do you guarantee a sequential and non-colliding row id across more than one Put'ing process, and do it efficiently. As it happens, I can do this, but given the details of my operational requirements, it's not a simple thing to do. So I was hoping that I had just missed something. The ideal would be a Delete object that would take row id bounds in the same way that Scan does, allowing the work to be done all on the server side. Does this exists somewhere? Or is there some other way to skin this cat? Thanks Thomas Downing -- Follow this link to mark it as spam: http://mailfilter.proteus-technologies.com/cgi-bin/learn-msg.cgi?id=6574C2821B.A5164
HTable object - how long it is valid
Hi If I construct new HTable() object upon my app init, is it valid until my app is shutdown? I read in earlier postings that it is better to construct HTable once for performance reasons. Wonder if underlying connection other resources are kept around for ever for put/scan/.. Also ..when do I call close()..upon every operation (put/get/..) ? to avoid memory leaks thanks venkatesh