Region size per region on the table page
Hi Devs/Users, Most of the time we want to know if our table split logic is accurate of if our current regions are well balanced for a table. I was wondering if we can expose the size of region on the table.jsp too on the table region table. If people thing it is useful I can pick it up. Also let me know if it already exists. Samar
Re: Region size per region on the table page
Hi Samar Hannibal is already doing what you are looking for. Cheers, JMS 2013/8/1 samar.opensource samar.opensou...@gmail.com Hi Devs/Users, Most of the time we want to know if our table split logic is accurate of if our current regions are well balanced for a table. I was wondering if we can expose the size of region on the table.jsp too on the table region table. If people thing it is useful I can pick it up. Also let me know if it already exists. Samar
AssignmentManager looping?
My master keep logging that: 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,074 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:53:00,074 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,261 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:53:00,261 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,417 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:53:00,417 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split hbase@node3:~/hbase-0.94.3$ cat logs/hbase-hbase-master-node3.log* | grep Region 270a9c371fcbe9cd9a04986e0b77d16b not found | wc 5042 65546 927728 Then crashed. 2013-07-31 22:22:46,072 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2013-07-31 22:22:46,073 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : work_proposed,\x02\xE8\x92'\x00\x00\x00\x00 http://video.inportnews.ca/search/all/source/sun-news-network/harry-potter-in-translation/68463493001/page/1526,1375307272709.d95bb27cc026511c2a8c8ad155e79bf6. state=OPENING, ts=1375323766008, server=node7,60020,1375319044055 .. Cannot transit it to OFFLINE. java.lang.IllegalStateException: Unexpected state : work_proposed,\x02\xE8\x92'\x00\x00\x00\x00 http://video.inportnews.ca/search/all/source/sun-news-network/harry-potter-in-translation/68463493001/page/1526,1375307272709.d95bb27cc026511c2a8c8ad155e79bf6. state=OPENING, ts=1375323766008, server=node7,60020,1375319044055 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 2013-07-31 22:22:46,075 INFO
Re: AssignmentManager looping?
Does it exist in meta or hdfs? On Aug 1, 2013 8:24 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: My master keep logging that: 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,074 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:53:00,074 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,261 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:53:00,261 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,417 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:53:00,417 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split hbase@node3:~/hbase-0.94.3$ cat logs/hbase-hbase-master-node3.log* | grep Region 270a9c371fcbe9cd9a04986e0b77d16b not found | wc 5042 65546 927728 Then crashed. 2013-07-31 22:22:46,072 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2013-07-31 22:22:46,073 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : work_proposed,\x02\xE8\x92'\x00\x00\x00\x00 http://video.inportnews.ca/search/all/source/sun-news-network/harry-potter-in-translation/68463493001/page/1526,1375307272709.d95bb27cc026511c2a8c8ad155e79bf6. state=OPENING, ts=1375323766008, server=node7,60020,1375319044055 .. Cannot transit it to OFFLINE. java.lang.IllegalStateException: Unexpected state : work_proposed,\x02\xE8\x92'\x00\x00\x00\x00 http://video.inportnews.ca/search/all/source/sun-news-network/harry-potter-in-translation/68463493001/page/1526,1375307272709.d95bb27cc026511c2a8c8ad155e79bf6. state=OPENING, ts=1375323766008, server=node7,60020,1375319044055 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at
slow operation in postPut
Hello, I have a class extending BaseRegionObserver and I use the postPut method to run a slow procedure. I'd like to run more these procedures in more threads. Is it possible to run more HTable.put(put) methods concurrently? I tried, but I have this error for each thread: Exception in thread Thread-3 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:604) at java.util.ArrayList.remove(ArrayList.java:445) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:966) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:811) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:786) at img.PutFilesThread.run(PutFilesThread.java:74) at java.lang.Thread.run(Thread.java:724) Anybody has an idea? Thanks, Pavel Hančar
Re: slow operation in postPut
HTable is not thread safe. On Aug 1, 2013, at 5:58 AM, Pavel Hančar pavel.han...@gmail.com wrote: Hello, I have a class extending BaseRegionObserver and I use the postPut method to run a slow procedure. I'd like to run more these procedures in more threads. Is it possible to run more HTable.put(put) methods concurrently? I tried, but I have this error for each thread: Exception in thread Thread-3 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:604) at java.util.ArrayList.remove(ArrayList.java:445) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:966) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:811) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:786) at img.PutFilesThread.run(PutFilesThread.java:74) at java.lang.Thread.run(Thread.java:724) Anybody has an idea? Thanks, Pavel Hančar
Re: AssignmentManager looping?
Yes you can if HBase is down, first I would copy .META out of HDFS local and then you can search it for split issues. Deleting those znodes should clear this up though. On Aug 1, 2013 8:52 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I can't check the meta since HBase is down. Regarding HDFS, I took few random lines like: 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 28328fdb7181cbd9cc4d6814775e8895 not found on server node4,60020,1375319042033; failed processing 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 28328fdb7181cbd9cc4d6814775e8895 from server node4,60020,1375319042033 but it doesn't exist anymore, probably already processed its split And each time, there is nothing like that. hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -lsr / | grep 28328fdb7181cbd9cc4d6814775e8895 On ZK side: [zk: localhost:2181(CONNECTED) 3] ls /hbase/splitlog [zk: localhost:2181(CONNECTED) 10] ls /hbase/unassigned [28328fdb7181cbd9cc4d6814775e8895, a8781a598c46f19723a2405345b58470, b7ebfeb63b10997736fd12920fde2bb8, d95bb27cc026511c2a8c8ad155e79bf6, 270a9c371fcbe9cd9a04986e0b77d16b, aff4d1d8bf470458bb19525e8aef0759] Can I just delete those zknodes? Worst case hbck will find them back from HDFS if required? JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Does it exist in meta or hdfs? On Aug 1, 2013 8:24 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: My master keep logging that: 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,074 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:53:00,074 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,261 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:53:00,261 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,417 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:53:00,417 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split hbase@node3:~/hbase-0.94.3$ cat logs/hbase-hbase-master-node3.log* | grep Region 270a9c371fcbe9cd9a04986e0b77d16b not found | wc 5042 65546 927728 Then crashed. 2013-07-31 22:22:46,072 FATAL org.apache.hadoop.hbase.master.HMaster:
Re: slow operation in postPut
If I want to use multi-thread with thread safe, which class should I use? On Thu, Aug 1, 2013 at 3:08 PM, Ted Yu yuzhih...@gmail.com wrote: HTable is not thread safe. On Aug 1, 2013, at 5:58 AM, Pavel Hančar pavel.han...@gmail.com wrote: Hello, I have a class extending BaseRegionObserver and I use the postPut method to run a slow procedure. I'd like to run more these procedures in more threads. Is it possible to run more HTable.put(put) methods concurrently? I tried, but I have this error for each thread: Exception in thread Thread-3 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:604) at java.util.ArrayList.remove(ArrayList.java:445) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:966) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:811) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:786) at img.PutFilesThread.run(PutFilesThread.java:74) at java.lang.Thread.run(Thread.java:724) Anybody has an idea? Thanks, Pavel Hančar
Import HBase snapshots possible?
Hi there, I am testing out newly added snapshot capability, ExportSnapshot in particular. Its working fine for me. I am able to run ExportSnapshot properly. But the biggest (noob) issue is, once exported, is there any way to import those snapshots back in hbase? I don't see any ImportSnapshot util there. Thanks, Siddharth
Re: Import HBase snapshots possible?
Btw, I am running on hbase-0.95.1-hadoop1. On Thu, Aug 1, 2013 at 7:05 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Hi there, I am testing out newly added snapshot capability, ExportSnapshot in particular. Its working fine for me. I am able to run ExportSnapshot properly. But the biggest (noob) issue is, once exported, is there any way to import those snapshots back in hbase? I don't see any ImportSnapshot util there. Thanks, Siddharth
Re: slow operation in postPut
Use HTablePool instead. For more infor, http://hbase.apache.org/book/client.html. On Thu, Aug 1, 2013 at 3:32 PM, yonghu yongyong...@gmail.com wrote: If I want to use multi-thread with thread safe, which class should I use? On Thu, Aug 1, 2013 at 3:08 PM, Ted Yu yuzhih...@gmail.com wrote: HTable is not thread safe. On Aug 1, 2013, at 5:58 AM, Pavel Hančar pavel.han...@gmail.com wrote: Hello, I have a class extending BaseRegionObserver and I use the postPut method to run a slow procedure. I'd like to run more these procedures in more threads. Is it possible to run more HTable.put(put) methods concurrently? I tried, but I have this error for each thread: Exception in thread Thread-3 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:604) at java.util.ArrayList.remove(ArrayList.java:445) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:966) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:811) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:786) at img.PutFilesThread.run(PutFilesThread.java:74) at java.lang.Thread.run(Thread.java:724) Anybody has an idea? Thanks, Pavel Hančar
Re: slow operation in postPut
See 9.3.1.1. Connection Pooling in http://hbase.apache.org/book.html On Thu, Aug 1, 2013 at 6:32 AM, yonghu yongyong...@gmail.com wrote: If I want to use multi-thread with thread safe, which class should I use? On Thu, Aug 1, 2013 at 3:08 PM, Ted Yu yuzhih...@gmail.com wrote: HTable is not thread safe. On Aug 1, 2013, at 5:58 AM, Pavel Hančar pavel.han...@gmail.com wrote: Hello, I have a class extending BaseRegionObserver and I use the postPut method to run a slow procedure. I'd like to run more these procedures in more threads. Is it possible to run more HTable.put(put) methods concurrently? I tried, but I have this error for each thread: Exception in thread Thread-3 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:604) at java.util.ArrayList.remove(ArrayList.java:445) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:966) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:811) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:786) at img.PutFilesThread.run(PutFilesThread.java:74) at java.lang.Thread.run(Thread.java:724) Anybody has an idea? Thanks, Pavel Hančar
Re: Import HBase snapshots possible?
The ExportSnapshot will export the snapshot data+metadata, in theory, to another hbase cluster. so on the second cluster you'll now be able to do list_snapshots from shell and see the exported snapshot. now you can simply do clone_snapshot snapshot_name, new_table_name and you're restoring a snapshot on the second cluster assuming that you have removed the snapshot from cluster1 and you want to export back your snapshot.. you just use ExportSnapshot again to move the snapshot from cluster2 to cluster1 and same as before you do a clone_snapshot to restore it Matteo On Thu, Aug 1, 2013 at 2:35 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Hi there, I am testing out newly added snapshot capability, ExportSnapshot in particular. Its working fine for me. I am able to run ExportSnapshot properly. But the biggest (noob) issue is, once exported, is there any way to import those snapshots back in hbase? I don't see any ImportSnapshot util there. Thanks, Siddharth
Re: Import HBase snapshots possible?
Can't I export it to plain HDFS? I think that would be very useful. On Thu, Aug 1, 2013 at 7:08 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: The ExportSnapshot will export the snapshot data+metadata, in theory, to another hbase cluster. so on the second cluster you'll now be able to do list_snapshots from shell and see the exported snapshot. now you can simply do clone_snapshot snapshot_name, new_table_name and you're restoring a snapshot on the second cluster assuming that you have removed the snapshot from cluster1 and you want to export back your snapshot.. you just use ExportSnapshot again to move the snapshot from cluster2 to cluster1 and same as before you do a clone_snapshot to restore it Matteo On Thu, Aug 1, 2013 at 2:35 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Hi there, I am testing out newly added snapshot capability, ExportSnapshot in particular. Its working fine for me. I am able to run ExportSnapshot properly. But the biggest (noob) issue is, once exported, is there any way to import those snapshots back in hbase? I don't see any ImportSnapshot util there. Thanks, Siddharth
Re: Import HBase snapshots possible?
Yes, the export an HDFS path. $ bin/hbase class org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/hbase so you can export to some /my-backup-dir on your HDFS and then you've to export back to an hbase cluster, when you want to restore it Matteo On Thu, Aug 1, 2013 at 2:45 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Can't I export it to plain HDFS? I think that would be very useful. On Thu, Aug 1, 2013 at 7:08 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: The ExportSnapshot will export the snapshot data+metadata, in theory, to another hbase cluster. so on the second cluster you'll now be able to do list_snapshots from shell and see the exported snapshot. now you can simply do clone_snapshot snapshot_name, new_table_name and you're restoring a snapshot on the second cluster assuming that you have removed the snapshot from cluster1 and you want to export back your snapshot.. you just use ExportSnapshot again to move the snapshot from cluster2 to cluster1 and same as before you do a clone_snapshot to restore it Matteo On Thu, Aug 1, 2013 at 2:35 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Hi there, I am testing out newly added snapshot capability, ExportSnapshot in particular. Its working fine for me. I am able to run ExportSnapshot properly. But the biggest (noob) issue is, once exported, is there any way to import those snapshots back in hbase? I don't see any ImportSnapshot util there. Thanks, Siddharth
Re: Import HBase snapshots possible?
Yeah, thats right. But the issue is, hdfs that I am exporting to is not under HBase. Can you please provide some example command to do this... Thanks, Siddharth On Thu, Aug 1, 2013 at 7:17 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: Yes, the export an HDFS path. $ bin/hbase class org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/hbase so you can export to some /my-backup-dir on your HDFS and then you've to export back to an hbase cluster, when you want to restore it Matteo On Thu, Aug 1, 2013 at 2:45 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Can't I export it to plain HDFS? I think that would be very useful. On Thu, Aug 1, 2013 at 7:08 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: The ExportSnapshot will export the snapshot data+metadata, in theory, to another hbase cluster. so on the second cluster you'll now be able to do list_snapshots from shell and see the exported snapshot. now you can simply do clone_snapshot snapshot_name, new_table_name and you're restoring a snapshot on the second cluster assuming that you have removed the snapshot from cluster1 and you want to export back your snapshot.. you just use ExportSnapshot again to move the snapshot from cluster2 to cluster1 and same as before you do a clone_snapshot to restore it Matteo On Thu, Aug 1, 2013 at 2:35 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Hi there, I am testing out newly added snapshot capability, ExportSnapshot in particular. Its working fine for me. I am able to run ExportSnapshot properly. But the biggest (noob) issue is, once exported, is there any way to import those snapshots back in hbase? I don't see any ImportSnapshot util there. Thanks, Siddharth
Re: Import HBase snapshots possible?
Ok, so to export a snapshot from your HBase cluster, you can do $ bin/hbase class org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/my-backup-dir Now on cluster2, hdfs:///srv2:8082 you've your my-backup-dir that contains the exported snapshot (note that the snapshot is under hidden dirs .snapshots, and .archive) Now if you want to restore the snapshot, you have to export it back an HBase cluster. So on cluster2, you can do: $ bin/hbase class org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot -D hbase.rootdir=hdfs:///srv2:8082/my-backup-dir -snapshot MySnapshot -copy-to hdfs:///hbaseSrv:8082/hbase so, to recap - You take a snapshot - You Export the snapshot from HBase Cluster-1 - to a simple HDFS dir in Cluster-2 - Then you want to restore - You Export the snapshot from HDFS dir in Cluster-2 to HBase Cluster (it can be a different one from the original) - From the hbase shell you can just: clone_snapshot 'snapshotName', 'newTableName' if the table does not exists or use restore_snapshot 'snapshotName', if there's a table with the same name Matteo On Thu, Aug 1, 2013 at 2:54 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Yeah, thats right. But the issue is, hdfs that I am exporting to is not under HBase. Can you please provide some example command to do this... Thanks, Siddharth On Thu, Aug 1, 2013 at 7:17 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: Yes, the export an HDFS path. $ bin/hbase class org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/hbase so you can export to some /my-backup-dir on your HDFS and then you've to export back to an hbase cluster, when you want to restore it Matteo On Thu, Aug 1, 2013 at 2:45 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Can't I export it to plain HDFS? I think that would be very useful. On Thu, Aug 1, 2013 at 7:08 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: The ExportSnapshot will export the snapshot data+metadata, in theory, to another hbase cluster. so on the second cluster you'll now be able to do list_snapshots from shell and see the exported snapshot. now you can simply do clone_snapshot snapshot_name, new_table_name and you're restoring a snapshot on the second cluster assuming that you have removed the snapshot from cluster1 and you want to export back your snapshot.. you just use ExportSnapshot again to move the snapshot from cluster2 to cluster1 and same as before you do a clone_snapshot to restore it Matteo On Thu, Aug 1, 2013 at 2:35 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Hi there, I am testing out newly added snapshot capability, ExportSnapshot in particular. Its working fine for me. I am able to run ExportSnapshot properly. But the biggest (noob) issue is, once exported, is there any way to import those snapshots back in hbase? I don't see any ImportSnapshot util there. Thanks, Siddharth
Re: Import HBase snapshots possible?
Tried what you suggested. Here is what I get - ssk01:~/siddharth/tools/hbase-0.95.1-hadoop1 # ./bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -Dhbase.rootdir=hdfs://10.209.17.88:9000/hbase -snapshot s1 -copy-to /root/siddharth/tools/hbase-0.95.1-hadoop1/data/ Exception in thread main java.lang.IllegalArgumentException: Wrong FS: hdfs://10.209.17.88:9000/hbase/.hbase-snapshot/s1/.snapshotinfo, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:393) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:125) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427) at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:296) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.getSnapshotFiles(ExportSnapshot.java:371) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:618) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:690) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:694) Am I missing something? Thanks, Siddharth On Thu, Aug 1, 2013 at 7:31 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: Ok, so to export a snapshot from your HBase cluster, you can do $ bin/hbase class org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/my-backup-dir Now on cluster2, hdfs:///srv2:8082 you've your my-backup-dir that contains the exported snapshot (note that the snapshot is under hidden dirs .snapshots, and .archive) Now if you want to restore the snapshot, you have to export it back an HBase cluster. So on cluster2, you can do: $ bin/hbase class org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot -D hbase.rootdir=hdfs:///srv2:8082/my-backup-dir -snapshot MySnapshot -copy-to hdfs:///hbaseSrv:8082/hbase so, to recap - You take a snapshot - You Export the snapshot from HBase Cluster-1 - to a simple HDFS dir in Cluster-2 - Then you want to restore - You Export the snapshot from HDFS dir in Cluster-2 to HBase Cluster (it can be a different one from the original) - From the hbase shell you can just: clone_snapshot 'snapshotName', 'newTableName' if the table does not exists or use restore_snapshot 'snapshotName', if there's a table with the same name Matteo On Thu, Aug 1, 2013 at 2:54 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Yeah, thats right. But the issue is, hdfs that I am exporting to is not under HBase. Can you please provide some example command to do this... Thanks, Siddharth On Thu, Aug 1, 2013 at 7:17 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: Yes, the export an HDFS path. $ bin/hbase class org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/hbase so you can export to some /my-backup-dir on your HDFS and then you've to export back to an hbase cluster, when you want to restore it Matteo On Thu, Aug 1, 2013 at 2:45 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Can't I export it to plain HDFS? I think that would be very useful. On Thu, Aug 1, 2013 at 7:08 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: The ExportSnapshot will export the snapshot data+metadata, in theory, to another hbase cluster. so on the second cluster you'll now be able to do list_snapshots from shell and see the exported snapshot. now you can simply do clone_snapshot snapshot_name, new_table_name and you're restoring a snapshot on the second cluster assuming that you have removed the snapshot from cluster1 and you want to export back your snapshot.. you just use ExportSnapshot again to move the snapshot from cluster2 to cluster1 and same as before you do a clone_snapshot to restore it Matteo On Thu, Aug 1, 2013 at 2:35 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Hi there, I am testing out newly added snapshot capability, ExportSnapshot in particular. Its working fine for me. I am able to run ExportSnapshot properly. But the biggest (noob) issue is, once exported, is there any way to import those snapshots back in hbase? I don't see any ImportSnapshot util there. Thanks, Siddharth
Re: AssignmentManager looping?
I tried to remove the znodes but got the same result. So I shutted down all the RS and restarted HBase, and now I have 0 regions for this table. Running HBCK. Seems that it has a lot to do... 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Yes you can if HBase is down, first I would copy .META out of HDFS local and then you can search it for split issues. Deleting those znodes should clear this up though. On Aug 1, 2013 8:52 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I can't check the meta since HBase is down. Regarding HDFS, I took few random lines like: 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 28328fdb7181cbd9cc4d6814775e8895 not found on server node4,60020,1375319042033; failed processing 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 28328fdb7181cbd9cc4d6814775e8895 from server node4,60020,1375319042033 but it doesn't exist anymore, probably already processed its split And each time, there is nothing like that. hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -lsr / | grep 28328fdb7181cbd9cc4d6814775e8895 On ZK side: [zk: localhost:2181(CONNECTED) 3] ls /hbase/splitlog [zk: localhost:2181(CONNECTED) 10] ls /hbase/unassigned [28328fdb7181cbd9cc4d6814775e8895, a8781a598c46f19723a2405345b58470, b7ebfeb63b10997736fd12920fde2bb8, d95bb27cc026511c2a8c8ad155e79bf6, 270a9c371fcbe9cd9a04986e0b77d16b, aff4d1d8bf470458bb19525e8aef0759] Can I just delete those zknodes? Worst case hbck will find them back from HDFS if required? JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Does it exist in meta or hdfs? On Aug 1, 2013 8:24 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: My master keep logging that: 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,074 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:53:00,074 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,261 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:53:00,261 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,417 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:53:00,417 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server
Re: Import HBase snapshots possible?
you have to use 3 slashes otherwise is interpreted as local file-system path -Dhbase.rootdir=hdfs:///10.209.17.88:9000/hbase Matteo On Thu, Aug 1, 2013 at 3:09 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Tried what you suggested. Here is what I get - ssk01:~/siddharth/tools/hbase-0.95.1-hadoop1 # ./bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -Dhbase.rootdir=hdfs://10.209.17.88:9000/hbase -snapshot s1 -copy-to /root/siddharth/tools/hbase-0.95.1-hadoop1/data/ Exception in thread main java.lang.IllegalArgumentException: Wrong FS: hdfs://10.209.17.88:9000/hbase/.hbase-snapshot/s1/.snapshotinfo, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:393) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:125) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427) at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:296) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.getSnapshotFiles(ExportSnapshot.java:371) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:618) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:690) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:694) Am I missing something? Thanks, Siddharth On Thu, Aug 1, 2013 at 7:31 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: Ok, so to export a snapshot from your HBase cluster, you can do $ bin/hbase class org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/my-backup-dir Now on cluster2, hdfs:///srv2:8082 you've your my-backup-dir that contains the exported snapshot (note that the snapshot is under hidden dirs .snapshots, and .archive) Now if you want to restore the snapshot, you have to export it back an HBase cluster. So on cluster2, you can do: $ bin/hbase class org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot -D hbase.rootdir=hdfs:///srv2:8082/my-backup-dir -snapshot MySnapshot -copy-to hdfs:///hbaseSrv:8082/hbase so, to recap - You take a snapshot - You Export the snapshot from HBase Cluster-1 - to a simple HDFS dir in Cluster-2 - Then you want to restore - You Export the snapshot from HDFS dir in Cluster-2 to HBase Cluster (it can be a different one from the original) - From the hbase shell you can just: clone_snapshot 'snapshotName', 'newTableName' if the table does not exists or use restore_snapshot 'snapshotName', if there's a table with the same name Matteo On Thu, Aug 1, 2013 at 2:54 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Yeah, thats right. But the issue is, hdfs that I am exporting to is not under HBase. Can you please provide some example command to do this... Thanks, Siddharth On Thu, Aug 1, 2013 at 7:17 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: Yes, the export an HDFS path. $ bin/hbase class org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/hbase so you can export to some /my-backup-dir on your HDFS and then you've to export back to an hbase cluster, when you want to restore it Matteo On Thu, Aug 1, 2013 at 2:45 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Can't I export it to plain HDFS? I think that would be very useful. On Thu, Aug 1, 2013 at 7:08 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: The ExportSnapshot will export the snapshot data+metadata, in theory, to another hbase cluster. so on the second cluster you'll now be able to do list_snapshots from shell and see the exported snapshot. now you can simply do clone_snapshot snapshot_name, new_table_name and you're restoring a snapshot on the second cluster assuming that you have removed the snapshot from cluster1 and you want to export back your snapshot.. you just use ExportSnapshot again to move the snapshot from cluster2 to cluster1 and same as before you do a clone_snapshot to restore it Matteo On Thu, Aug 1, 2013 at 2:35 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Hi there, I am testing out newly added snapshot capability, ExportSnapshot in
Re: Import HBase snapshots possible?
Its failing with '//' as well as '///'. Error suggests that it needs local fs. 3 /// ssk01:~/siddharth/tools/hbase-0.95.1-hadoop1 # ./bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -Dhbase.rootdir=hdfs:///10.209.17.88:9000/hbase/s2 -snapshot s2 -copy-to /root/siddharth/tools/hbase-0.95.1-hadoop1/data/ Exception in thread main java.io.IOException: Incomplete HDFS URI, no host: hdfs:///10.209.17.88:9000/hbase/s2 at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:85) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:860) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:594) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:690) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:694) 2 // ssk01:~/siddharth/tools/hbase-0.95.1-hadoop1 # ./bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -Dhbase.rootdir=hdfs://10.209.17.88:9000/hbase/s2 -snapshot s2 -copy-to /root/siddharth/tools/hbase-0.95.1-hadoop1/data/ Exception in thread main java.lang.IllegalArgumentException: Wrong FS: hdfs://10.209.17.88:9000/hbase/s2/.hbase-snapshot/s2/.snapshotinfo, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:393) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:125) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427) at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:296) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.getSnapshotFiles(ExportSnapshot.java:371) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:618) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:690) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:694) Btw, I tried one more thing. From my HDFS location, I just did a copy like - ssk01:~/siddharth/tools/hadoop-1.1.2 # ./bin/hadoop fs -copyToLocal hdfs://10.209.17.88:9000/hbase/s1/.hbase-snapshot/s1 /root/siddharth/tools/hbase-0.95.1-hadoop1/data/.hbase-snapshot/ After doing this, I am able to see s1 in 'list_snapshots'. But it is failing at 'clone_snapshot'. hbase(main):014:0 clone_snapshot 's1', 'ts1' ERROR: java.io.IOException: Table 'ts1' not yet enabled, after 199617ms. Here is some help for this command: Create a new table by cloning the snapshot content. There're no copies of data involved. And writing on the newly created table will not influence the snapshot data. Examples: hbase clone_snapshot 'snapshotName', 'tableName' On Thu, Aug 1, 2013 at 7:44 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: you have to use 3 slashes otherwise is interpreted as local file-system path -Dhbase.rootdir=hdfs:///10.209.17.88:9000/hbase Matteo On Thu, Aug 1, 2013 at 3:09 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Tried what you suggested. Here is what I get - ssk01:~/siddharth/tools/hbase-0.95.1-hadoop1 # ./bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -Dhbase.rootdir=hdfs://10.209.17.88:9000/hbase -snapshot s1 -copy-to /root/siddharth/tools/hbase-0.95.1-hadoop1/data/ Exception in thread main java.lang.IllegalArgumentException: Wrong FS: hdfs://10.209.17.88:9000/hbase/.hbase-snapshot/s1/.snapshotinfo, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:393) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:125) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283) at
Re: Import HBase snapshots possible?
You can't just copy the .snapshot folder... so now you've the RSs that is failing since the files for the cloned table are not available.. when you specify the hbase.rootdir you've just to specify the hbase.rootdir the one in /etc/hbase-site.xml which doesn't contain the name of the snapshot/table that you want to export (e.g. hdfs://10.209.17.88:9000/hbasenot hdfs:// 10.209.17.88:9000/hbase/s2) Matteo On Thu, Aug 1, 2013 at 3:25 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Its failing with '//' as well as '///'. Error suggests that it needs local fs. 3 /// ssk01:~/siddharth/tools/hbase-0.95.1-hadoop1 # ./bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -Dhbase.rootdir=hdfs:///10.209.17.88:9000/hbase/s2 -snapshot s2 -copy-to /root/siddharth/tools/hbase-0.95.1-hadoop1/data/ Exception in thread main java.io.IOException: Incomplete HDFS URI, no host: hdfs:///10.209.17.88:9000/hbase/s2 at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:85) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:860) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:594) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:690) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:694) 2 // ssk01:~/siddharth/tools/hbase-0.95.1-hadoop1 # ./bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -Dhbase.rootdir=hdfs://10.209.17.88:9000/hbase/s2 -snapshot s2 -copy-to /root/siddharth/tools/hbase-0.95.1-hadoop1/data/ Exception in thread main java.lang.IllegalArgumentException: Wrong FS: hdfs://10.209.17.88:9000/hbase/s2/.hbase-snapshot/s2/.snapshotinfo, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:393) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:125) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427) at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:296) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.getSnapshotFiles(ExportSnapshot.java:371) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:618) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:690) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:694) Btw, I tried one more thing. From my HDFS location, I just did a copy like - ssk01:~/siddharth/tools/hadoop-1.1.2 # ./bin/hadoop fs -copyToLocal hdfs://10.209.17.88:9000/hbase/s1/.hbase-snapshot/s1 /root/siddharth/tools/hbase-0.95.1-hadoop1/data/.hbase-snapshot/ After doing this, I am able to see s1 in 'list_snapshots'. But it is failing at 'clone_snapshot'. hbase(main):014:0 clone_snapshot 's1', 'ts1' ERROR: java.io.IOException: Table 'ts1' not yet enabled, after 199617ms. Here is some help for this command: Create a new table by cloning the snapshot content. There're no copies of data involved. And writing on the newly created table will not influence the snapshot data. Examples: hbase clone_snapshot 'snapshotName', 'tableName' On Thu, Aug 1, 2013 at 7:44 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: you have to use 3 slashes otherwise is interpreted as local file-system path -Dhbase.rootdir=hdfs:///10.209.17.88:9000/hbase Matteo On Thu, Aug 1, 2013 at 3:09 PM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Tried what you suggested. Here is what I get - ssk01:~/siddharth/tools/hbase-0.95.1-hadoop1 # ./bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -Dhbase.rootdir=hdfs://10.209.17.88:9000/hbase -snapshot s1 -copy-to /root/siddharth/tools/hbase-0.95.1-hadoop1/data/ Exception in thread main java.lang.IllegalArgumentException: Wrong FS: hdfs://10.209.17.88:9000/hbase/.hbase-snapshot/s1/.snapshotinfo, expected: file:/// at
Why HBase integation with Hive makes Hive slow
Hi, I have a cluster (1 master + 3 slaves) on which there Hive, Hbase, and Hadoop. In order to do some daily row-level update routine, we need to integrate Hbase with hive, but the performance is not good. E.g. There are 2 tables in hive, hbase_table: a hbase table created via Hive hive_table: a native hive table both hold the same data set. When runing: select count(*) from hbase_table; === takes 500 s select count(*) from hive_table; === takes 6 s I have tried a lot of queries on the two tables. But hbase_table is always very slow. To be claire, I created the hbase_ table as below: CREATE TABLE hbase_table ( idvisite string, client_list Arraystring, nb_client int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,clients:id_list,clients:nb) TBLPROPERTIES(hbase.table.name = table_test) ; And my Hbase is on pseudo-distributed mode. I guess, at the beginning of a hive query execution, hive will load data from Hbase, where serde takes a long time. Could someone tell me how to improve my poor performance ? Is this cause by my wrongly configured integration ? Is a fully-distributed mode needed here ? Thank you in advance for your time. Hao. -- Hao Ren ClaraVista www.claravista.fr
ETL like merge databases to HBase
Hi All, I have a use case I have a few applications running independently, Let's say applications A, B, C. Each has a DB associated. I wanna a aggregated view on all the databases so that I don't have to jump into different dbs to find the info I need. Is there a tool out there, allows me to move data from A,B,C to a single/centralised HBase cluster? Would be even nicer, if DB A, B, C gets updated by the apps, the updates will be synchronised to HBase too. -- All the best, Shengjie Min
Re: ETL like merge databases to HBase
bq. Each has a DB associated They're RDBMS, I assume ? Have you looked at Sqoop ? On Thu, Aug 1, 2013 at 8:04 AM, Shengjie Min kelvin@gmail.com wrote: Hi All, I have a use case I have a few applications running independently, Let's say applications A, B, C. Each has a DB associated. I wanna a aggregated view on all the databases so that I don't have to jump into different dbs to find the info I need. Is there a tool out there, allows me to move data from A,B,C to a single/centralised HBase cluster? Would be even nicer, if DB A, B, C gets updated by the apps, the updates will be synchronised to HBase too. -- All the best, Shengjie Min
Re: Recursive delete upon cleanup
On Tue, Jul 30, 2013 at 6:19 PM, Ted Yu yuzhih...@gmail.com wrote: I searched HBase 0.94 code base, hadoop 1 and hadoop 2 code base. I didn't find where 'Try with recursive flag' was logged. Mind giving us a bit more information on the Hadoop / HBase releases you were using ? You're right, that error message is coming from our filesystem client. I think that the HBase version is 0.94.5, but i'll need to look into that later (more pressing things have arisen, of course). Thank you for the help. rone
Re: AssignmentManager looping?
Something went wrong with split. It should be easy to fix your cluster. However, it will be more interesting to find out how it happened. Do you remember what has happened since it was good previously? Do you have all the logs? On Thu, Aug 1, 2013 at 7:08 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I tried to remove the znodes but got the same result. So I shutted down all the RS and restarted HBase, and now I have 0 regions for this table. Running HBCK. Seems that it has a lot to do... 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Yes you can if HBase is down, first I would copy .META out of HDFS local and then you can search it for split issues. Deleting those znodes should clear this up though. On Aug 1, 2013 8:52 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I can't check the meta since HBase is down. Regarding HDFS, I took few random lines like: 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 28328fdb7181cbd9cc4d6814775e8895 not found on server node4,60020,1375319042033; failed processing 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 28328fdb7181cbd9cc4d6814775e8895 from server node4,60020,1375319042033 but it doesn't exist anymore, probably already processed its split And each time, there is nothing like that. hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -lsr / | grep 28328fdb7181cbd9cc4d6814775e8895 On ZK side: [zk: localhost:2181(CONNECTED) 3] ls /hbase/splitlog [zk: localhost:2181(CONNECTED) 10] ls /hbase/unassigned [28328fdb7181cbd9cc4d6814775e8895, a8781a598c46f19723a2405345b58470, b7ebfeb63b10997736fd12920fde2bb8, d95bb27cc026511c2a8c8ad155e79bf6, 270a9c371fcbe9cd9a04986e0b77d16b, aff4d1d8bf470458bb19525e8aef0759] Can I just delete those zknodes? Worst case hbck will find them back from HDFS if required? JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Does it exist in meta or hdfs? On Aug 1, 2013 8:24 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: My master keep logging that: 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,074 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:53:00,074 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,261 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:53:00,261 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but
Pagination with HBase
Hi! Is there a way to scan a HBase table getting, for example, the first 100 results, then later get the next 100 and so on... Just like in SQL we do with LIMIT and OFFSET? *Jonathan Cardoso** ** Universidade Federal de Goias*
Re: AssignmentManager looping?
Hi Jimmy, I should still have all the logs. What I did is pretty simple. I tried to turn the cluster off while a single regioned 250GB table was under major_compaction to get splitted. I will targz all the logs for the few last days and make that available. On the other side, I'm still not able to bring it back up... JM 2013/8/1 Jimmy Xiang jxi...@cloudera.com Something went wrong with split. It should be easy to fix your cluster. However, it will be more interesting to find out how it happened. Do you remember what has happened since it was good previously? Do you have all the logs? On Thu, Aug 1, 2013 at 7:08 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I tried to remove the znodes but got the same result. So I shutted down all the RS and restarted HBase, and now I have 0 regions for this table. Running HBCK. Seems that it has a lot to do... 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Yes you can if HBase is down, first I would copy .META out of HDFS local and then you can search it for split issues. Deleting those znodes should clear this up though. On Aug 1, 2013 8:52 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I can't check the meta since HBase is down. Regarding HDFS, I took few random lines like: 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 28328fdb7181cbd9cc4d6814775e8895 not found on server node4,60020,1375319042033; failed processing 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 28328fdb7181cbd9cc4d6814775e8895 from server node4,60020,1375319042033 but it doesn't exist anymore, probably already processed its split And each time, there is nothing like that. hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -lsr / | grep 28328fdb7181cbd9cc4d6814775e8895 On ZK side: [zk: localhost:2181(CONNECTED) 3] ls /hbase/splitlog [zk: localhost:2181(CONNECTED) 10] ls /hbase/unassigned [28328fdb7181cbd9cc4d6814775e8895, a8781a598c46f19723a2405345b58470, b7ebfeb63b10997736fd12920fde2bb8, d95bb27cc026511c2a8c8ad155e79bf6, 270a9c371fcbe9cd9a04986e0b77d16b, aff4d1d8bf470458bb19525e8aef0759] Can I just delete those zknodes? Worst case hbck will find them back from HDFS if required? JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Does it exist in meta or hdfs? On Aug 1, 2013 8:24 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: My master keep logging that: 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,074 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:53:00,074 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server
Re: Pagination with HBase
Use http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#getStartRow() and http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#getStopRow() i=1; //start row j=100; //end row while(isomeval) { Scan scan = new Scan(Bytes.ToBytes(i),Bytes.toBytes(j)); // i denotes start row and j denotes end row //probably convert i j to string to keep it human-readable scan.addColumn(ColumnFamily, ColumnQualifier); // do something i = i+100; j = j +100; } On Thu, Aug 1, 2013 at 10:31 PM, Jonathan Cardoso jonathancar...@gmail.com wrote: Hi! Is there a way to scan a HBase table getting, for example, the first 100 results, then later get the next 100 and so on... Just like in SQL we do with LIMIT and OFFSET? *Jonathan Cardoso** ** Universidade Federal de Goias* -- Regards- Pavan
Re: Pagination with HBase
Take a look at ColumnPaginationFilter.java and its unit test. Cheers On Thu, Aug 1, 2013 at 10:01 AM, Jonathan Cardoso jonathancar...@gmail.comwrote: Hi! Is there a way to scan a HBase table getting, for example, the first 100 results, then later get the next 100 and so on... Just like in SQL we do with LIMIT and OFFSET? *Jonathan Cardoso** ** Universidade Federal de Goias*
Re: Pagination with HBase
@Jonathan Ted Yu is right! Ignore my mail :) On Thu, Aug 1, 2013 at 10:46 PM, Ted Yu yuzhih...@gmail.com wrote: Take a look at ColumnPaginationFilter.java and its unit test. Cheers On Thu, Aug 1, 2013 at 10:01 AM, Jonathan Cardoso jonathancar...@gmail.comwrote: Hi! Is there a way to scan a HBase table getting, for example, the first 100 results, then later get the next 100 and so on... Just like in SQL we do with LIMIT and OFFSET? *Jonathan Cardoso** ** Universidade Federal de Goias* -- Regards- Pavan
Re: AssignmentManager looping?
JM, Stop HBase rmr /hbase from zkcli Sideline META Run offline meta repair Start HBase On Aug 1, 2013 1:01 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Jimmy, I should still have all the logs. What I did is pretty simple. I tried to turn the cluster off while a single regioned 250GB table was under major_compaction to get splitted. I will targz all the logs for the few last days and make that available. On the other side, I'm still not able to bring it back up... JM 2013/8/1 Jimmy Xiang jxi...@cloudera.com Something went wrong with split. It should be easy to fix your cluster. However, it will be more interesting to find out how it happened. Do you remember what has happened since it was good previously? Do you have all the logs? On Thu, Aug 1, 2013 at 7:08 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I tried to remove the znodes but got the same result. So I shutted down all the RS and restarted HBase, and now I have 0 regions for this table. Running HBCK. Seems that it has a lot to do... 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Yes you can if HBase is down, first I would copy .META out of HDFS local and then you can search it for split issues. Deleting those znodes should clear this up though. On Aug 1, 2013 8:52 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I can't check the meta since HBase is down. Regarding HDFS, I took few random lines like: 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 28328fdb7181cbd9cc4d6814775e8895 not found on server node4,60020,1375319042033; failed processing 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 28328fdb7181cbd9cc4d6814775e8895 from server node4,60020,1375319042033 but it doesn't exist anymore, probably already processed its split And each time, there is nothing like that. hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -lsr / | grep 28328fdb7181cbd9cc4d6814775e8895 On ZK side: [zk: localhost:2181(CONNECTED) 3] ls /hbase/splitlog [zk: localhost:2181(CONNECTED) 10] ls /hbase/unassigned [28328fdb7181cbd9cc4d6814775e8895, a8781a598c46f19723a2405345b58470, b7ebfeb63b10997736fd12920fde2bb8, d95bb27cc026511c2a8c8ad155e79bf6, 270a9c371fcbe9cd9a04986e0b77d16b, aff4d1d8bf470458bb19525e8aef0759] Can I just delete those zknodes? Worst case hbck will find them back from HDFS if required? JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Does it exist in meta or hdfs? On Aug 1, 2013 8:24 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: My master keep logging that: 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:53:00,074 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region
Re: AssignmentManager looping?
If that doesn't work you probably have an invalid reference file and you will find that in RS logs for the HLog split that is never finishing. On Aug 1, 2013 1:38 PM, Kevin O'dell kevin.od...@cloudera.com wrote: JM, Stop HBase rmr /hbase from zkcli Sideline META Run offline meta repair Start HBase On Aug 1, 2013 1:01 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Jimmy, I should still have all the logs. What I did is pretty simple. I tried to turn the cluster off while a single regioned 250GB table was under major_compaction to get splitted. I will targz all the logs for the few last days and make that available. On the other side, I'm still not able to bring it back up... JM 2013/8/1 Jimmy Xiang jxi...@cloudera.com Something went wrong with split. It should be easy to fix your cluster. However, it will be more interesting to find out how it happened. Do you remember what has happened since it was good previously? Do you have all the logs? On Thu, Aug 1, 2013 at 7:08 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I tried to remove the znodes but got the same result. So I shutted down all the RS and restarted HBase, and now I have 0 regions for this table. Running HBCK. Seems that it has a lot to do... 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Yes you can if HBase is down, first I would copy .META out of HDFS local and then you can search it for split issues. Deleting those znodes should clear this up though. On Aug 1, 2013 8:52 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I can't check the meta since HBase is down. Regarding HDFS, I took few random lines like: 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 28328fdb7181cbd9cc4d6814775e8895 not found on server node4,60020,1375319042033; failed processing 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 28328fdb7181cbd9cc4d6814775e8895 from server node4,60020,1375319042033 but it doesn't exist anymore, probably already processed its split And each time, there is nothing like that. hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -lsr / | grep 28328fdb7181cbd9cc4d6814775e8895 On ZK side: [zk: localhost:2181(CONNECTED) 3] ls /hbase/splitlog [zk: localhost:2181(CONNECTED) 10] ls /hbase/unassigned [28328fdb7181cbd9cc4d6814775e8895, a8781a598c46f19723a2405345b58470, b7ebfeb63b10997736fd12920fde2bb8, d95bb27cc026511c2a8c8ad155e79bf6, 270a9c371fcbe9cd9a04986e0b77d16b, aff4d1d8bf470458bb19525e8aef0759] Can I just delete those zknodes? Worst case hbck will find them back from HDFS if required? JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Does it exist in meta or hdfs? On Aug 1, 2013 8:24 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: My master keep logging that: 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,636 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b
Re: Pagination with HBase
If you need more insight into HBase Pagination, these link might help you: http://search-hadoop.com/m/feqnAUeLR1 http://search-hadoop.com/m/m5zM2rTSkb On Thu, Aug 1, 2013 at 10:18 AM, Pavan Sudheendra pavan0...@gmail.comwrote: @Jonathan Ted Yu is right! Ignore my mail :) On Thu, Aug 1, 2013 at 10:46 PM, Ted Yu yuzhih...@gmail.com wrote: Take a look at ColumnPaginationFilter.java and its unit test. Cheers On Thu, Aug 1, 2013 at 10:01 AM, Jonathan Cardoso jonathancar...@gmail.comwrote: Hi! Is there a way to scan a HBase table getting, for example, the first 100 results, then later get the next 100 and so on... Just like in SQL we do with LIMIT and OFFSET? *Jonathan Cardoso** ** Universidade Federal de Goias* -- Regards- Pavan -- Thanks Regards, Anil Gupta
Re: AssignmentManager looping?
So I had to remove few reference files and run few hbck to get everything back online. Summary: don't stop your cluster while it's major compacting huge tables ;) Thanks all! JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com If that doesn't work you probably have an invalid reference file and you will find that in RS logs for the HLog split that is never finishing. On Aug 1, 2013 1:38 PM, Kevin O'dell kevin.od...@cloudera.com wrote: JM, Stop HBase rmr /hbase from zkcli Sideline META Run offline meta repair Start HBase On Aug 1, 2013 1:01 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Jimmy, I should still have all the logs. What I did is pretty simple. I tried to turn the cluster off while a single regioned 250GB table was under major_compaction to get splitted. I will targz all the logs for the few last days and make that available. On the other side, I'm still not able to bring it back up... JM 2013/8/1 Jimmy Xiang jxi...@cloudera.com Something went wrong with split. It should be easy to fix your cluster. However, it will be more interesting to find out how it happened. Do you remember what has happened since it was good previously? Do you have all the logs? On Thu, Aug 1, 2013 at 7:08 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I tried to remove the znodes but got the same result. So I shutted down all the RS and restarted HBase, and now I have 0 regions for this table. Running HBCK. Seems that it has a lot to do... 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Yes you can if HBase is down, first I would copy .META out of HDFS local and then you can search it for split issues. Deleting those znodes should clear this up though. On Aug 1, 2013 8:52 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I can't check the meta since HBase is down. Regarding HDFS, I took few random lines like: 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 28328fdb7181cbd9cc4d6814775e8895 not found on server node4,60020,1375319042033; failed processing 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 28328fdb7181cbd9cc4d6814775e8895 from server node4,60020,1375319042033 but it doesn't exist anymore, probably already processed its split And each time, there is nothing like that. hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -lsr / | grep 28328fdb7181cbd9cc4d6814775e8895 On ZK side: [zk: localhost:2181(CONNECTED) 3] ls /hbase/splitlog [zk: localhost:2181(CONNECTED) 10] ls /hbase/unassigned [28328fdb7181cbd9cc4d6814775e8895, a8781a598c46f19723a2405345b58470, b7ebfeb63b10997736fd12920fde2bb8, d95bb27cc026511c2a8c8ad155e79bf6, 270a9c371fcbe9cd9a04986e0b77d16b, aff4d1d8bf470458bb19525e8aef0759] Can I just delete those zknodes? Worst case hbck will find them back from HDFS if required? JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Does it exist in meta or hdfs? On Aug 1, 2013 8:24 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: My master keep logging that: 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,461 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split
Re: AssignmentManager looping?
Jimmy, Sounds like our dreaded reference file issue again. I spoke with JM and he is going to try to reproduce this My gut tells me our point of no return may be in the wrong place due to some code change along the way, but hbck could also just be doing something wonky. JM, This cluster is not CM managed correct? On Aug 1, 2013 1:49 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: So I had to remove few reference files and run few hbck to get everything back online. Summary: don't stop your cluster while it's major compacting huge tables ;) Thanks all! JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com If that doesn't work you probably have an invalid reference file and you will find that in RS logs for the HLog split that is never finishing. On Aug 1, 2013 1:38 PM, Kevin O'dell kevin.od...@cloudera.com wrote: JM, Stop HBase rmr /hbase from zkcli Sideline META Run offline meta repair Start HBase On Aug 1, 2013 1:01 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Jimmy, I should still have all the logs. What I did is pretty simple. I tried to turn the cluster off while a single regioned 250GB table was under major_compaction to get splitted. I will targz all the logs for the few last days and make that available. On the other side, I'm still not able to bring it back up... JM 2013/8/1 Jimmy Xiang jxi...@cloudera.com Something went wrong with split. It should be easy to fix your cluster. However, it will be more interesting to find out how it happened. Do you remember what has happened since it was good previously? Do you have all the logs? On Thu, Aug 1, 2013 at 7:08 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I tried to remove the znodes but got the same result. So I shutted down all the RS and restarted HBase, and now I have 0 regions for this table. Running HBCK. Seems that it has a lot to do... 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Yes you can if HBase is down, first I would copy .META out of HDFS local and then you can search it for split issues. Deleting those znodes should clear this up though. On Aug 1, 2013 8:52 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I can't check the meta since HBase is down. Regarding HDFS, I took few random lines like: 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 28328fdb7181cbd9cc4d6814775e8895 not found on server node4,60020,1375319042033; failed processing 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 28328fdb7181cbd9cc4d6814775e8895 from server node4,60020,1375319042033 but it doesn't exist anymore, probably already processed its split And each time, there is nothing like that. hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -lsr / | grep 28328fdb7181cbd9cc4d6814775e8895 On ZK side: [zk: localhost:2181(CONNECTED) 3] ls /hbase/splitlog [zk: localhost:2181(CONNECTED) 10] ls /hbase/unassigned [28328fdb7181cbd9cc4d6814775e8895, a8781a598c46f19723a2405345b58470, b7ebfeb63b10997736fd12920fde2bb8, d95bb27cc026511c2a8c8ad155e79bf6, 270a9c371fcbe9cd9a04986e0b77d16b, aff4d1d8bf470458bb19525e8aef0759] Can I just delete those zknodes? Worst case hbck will find them back from HDFS if required? JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Does it exist in meta or hdfs? On Aug 1, 2013 8:24 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: My master keep logging that: 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore, probably already processed its split 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,339 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b from server node7,60020,1375319044055 but it doesn't exist anymore,
Re: Pagination with HBase
Thanks! I've tested ColumnPaginationFilter but It's exactly what I need. Correct me if I'm wrong please, but ColumnPaginationFilter filters the columns of the result, how many of them will be retrieved based on the settings of 'limit' and 'offset' properties. But I need to make a Scan and get only the first X rows, not the first X columns *Jonathan Cardoso** ** Universidade Federal de Goias* 2013/8/1 anil gupta anilgupt...@gmail.com If you need more insight into HBase Pagination, these link might help you: http://search-hadoop.com/m/feqnAUeLR1 http://search-hadoop.com/m/m5zM2rTSkb On Thu, Aug 1, 2013 at 10:18 AM, Pavan Sudheendra pavan0...@gmail.com wrote: @Jonathan Ted Yu is right! Ignore my mail :) On Thu, Aug 1, 2013 at 10:46 PM, Ted Yu yuzhih...@gmail.com wrote: Take a look at ColumnPaginationFilter.java and its unit test. Cheers On Thu, Aug 1, 2013 at 10:01 AM, Jonathan Cardoso jonathancar...@gmail.comwrote: Hi! Is there a way to scan a HBase table getting, for example, the first 100 results, then later get the next 100 and so on... Just like in SQL we do with LIMIT and OFFSET? *Jonathan Cardoso** ** Universidade Federal de Goias* -- Regards- Pavan -- Thanks Regards, Anil Gupta
Re: Pagination with HBase
By anil's links I guess what I should use is PageFilter instead of ColumnPaginationFilter. *Jonathan Cardoso** ** Universidade Federal de Goias* 2013/8/1 Jonathan Cardoso jonathancar...@gmail.com Thanks! I've tested ColumnPaginationFilter but It's exactly what I need. Correct me if I'm wrong please, but ColumnPaginationFilter filters the columns of the result, how many of them will be retrieved based on the settings of 'limit' and 'offset' properties. But I need to make a Scan and get only the first X rows, not the first X columns *Jonathan Cardoso** ** Universidade Federal de Goias* 2013/8/1 anil gupta anilgupt...@gmail.com If you need more insight into HBase Pagination, these link might help you: http://search-hadoop.com/m/feqnAUeLR1 http://search-hadoop.com/m/m5zM2rTSkb On Thu, Aug 1, 2013 at 10:18 AM, Pavan Sudheendra pavan0...@gmail.com wrote: @Jonathan Ted Yu is right! Ignore my mail :) On Thu, Aug 1, 2013 at 10:46 PM, Ted Yu yuzhih...@gmail.com wrote: Take a look at ColumnPaginationFilter.java and its unit test. Cheers On Thu, Aug 1, 2013 at 10:01 AM, Jonathan Cardoso jonathancar...@gmail.comwrote: Hi! Is there a way to scan a HBase table getting, for example, the first 100 results, then later get the next 100 and so on... Just like in SQL we do with LIMIT and OFFSET? *Jonathan Cardoso** ** Universidade Federal de Goias* -- Regards- Pavan -- Thanks Regards, Anil Gupta
Re: AssignmentManager looping?
No,it's a HBase 0.94.10 cluster with Hadoop 1.0.3, everything installed manually from JARs ;) It's a mess to monitor and I would have loved to have it under CM now, but I have to deal with that ;) I'm building a 2nd cluster at home so I will be able to replicate this one to the other one, which might allow me to play even further with it... I will try to reproduce the issue, give me just couple of hours... JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Jimmy, Sounds like our dreaded reference file issue again. I spoke with JM and he is going to try to reproduce this My gut tells me our point of no return may be in the wrong place due to some code change along the way, but hbck could also just be doing something wonky. JM, This cluster is not CM managed correct? On Aug 1, 2013 1:49 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: So I had to remove few reference files and run few hbck to get everything back online. Summary: don't stop your cluster while it's major compacting huge tables ;) Thanks all! JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com If that doesn't work you probably have an invalid reference file and you will find that in RS logs for the HLog split that is never finishing. On Aug 1, 2013 1:38 PM, Kevin O'dell kevin.od...@cloudera.com wrote: JM, Stop HBase rmr /hbase from zkcli Sideline META Run offline meta repair Start HBase On Aug 1, 2013 1:01 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Jimmy, I should still have all the logs. What I did is pretty simple. I tried to turn the cluster off while a single regioned 250GB table was under major_compaction to get splitted. I will targz all the logs for the few last days and make that available. On the other side, I'm still not able to bring it back up... JM 2013/8/1 Jimmy Xiang jxi...@cloudera.com Something went wrong with split. It should be easy to fix your cluster. However, it will be more interesting to find out how it happened. Do you remember what has happened since it was good previously? Do you have all the logs? On Thu, Aug 1, 2013 at 7:08 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I tried to remove the znodes but got the same result. So I shutted down all the RS and restarted HBase, and now I have 0 regions for this table. Running HBCK. Seems that it has a lot to do... 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Yes you can if HBase is down, first I would copy .META out of HDFS local and then you can search it for split issues. Deleting those znodes should clear this up though. On Aug 1, 2013 8:52 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I can't check the meta since HBase is down. Regarding HDFS, I took few random lines like: 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 28328fdb7181cbd9cc4d6814775e8895 not found on server node4,60020,1375319042033; failed processing 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 28328fdb7181cbd9cc4d6814775e8895 from server node4,60020,1375319042033 but it doesn't exist anymore, probably already processed its split And each time, there is nothing like that. hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -lsr / | grep 28328fdb7181cbd9cc4d6814775e8895 On ZK side: [zk: localhost:2181(CONNECTED) 3] ls /hbase/splitlog [zk: localhost:2181(CONNECTED) 10] ls /hbase/unassigned [28328fdb7181cbd9cc4d6814775e8895, a8781a598c46f19723a2405345b58470, b7ebfeb63b10997736fd12920fde2bb8, d95bb27cc026511c2a8c8ad155e79bf6, 270a9c371fcbe9cd9a04986e0b77d16b, aff4d1d8bf470458bb19525e8aef0759] Can I just delete those zknodes? Worst case hbck will find them back from HDFS if required? JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Does it exist in meta or hdfs? On Aug 1, 2013 8:24 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: My master keep logging that: 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 270a9c371fcbe9cd9a04986e0b77d16b not found on server node7,60020,1375319044055; failed processing 2013-07-31 21:52:59,201 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 270a9c371fcbe9cd9a04986e0b77d16b
Re: AssignmentManager looping?
It will be great if you can reproduce this issue. One thing to keep in mind is not to run hbck(repair) in this case since hbck may have some problem to handle the split parent properly. By the way, in trunk, region split uses multi row mutate to update meta, which is more reliable. So I think the issue should have been fixed in trunk. On Thu, Aug 1, 2013 at 11:07 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: No,it's a HBase 0.94.10 cluster with Hadoop 1.0.3, everything installed manually from JARs ;) It's a mess to monitor and I would have loved to have it under CM now, but I have to deal with that ;) I'm building a 2nd cluster at home so I will be able to replicate this one to the other one, which might allow me to play even further with it... I will try to reproduce the issue, give me just couple of hours... JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Jimmy, Sounds like our dreaded reference file issue again. I spoke with JM and he is going to try to reproduce this My gut tells me our point of no return may be in the wrong place due to some code change along the way, but hbck could also just be doing something wonky. JM, This cluster is not CM managed correct? On Aug 1, 2013 1:49 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: So I had to remove few reference files and run few hbck to get everything back online. Summary: don't stop your cluster while it's major compacting huge tables ;) Thanks all! JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com If that doesn't work you probably have an invalid reference file and you will find that in RS logs for the HLog split that is never finishing. On Aug 1, 2013 1:38 PM, Kevin O'dell kevin.od...@cloudera.com wrote: JM, Stop HBase rmr /hbase from zkcli Sideline META Run offline meta repair Start HBase On Aug 1, 2013 1:01 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Jimmy, I should still have all the logs. What I did is pretty simple. I tried to turn the cluster off while a single regioned 250GB table was under major_compaction to get splitted. I will targz all the logs for the few last days and make that available. On the other side, I'm still not able to bring it back up... JM 2013/8/1 Jimmy Xiang jxi...@cloudera.com Something went wrong with split. It should be easy to fix your cluster. However, it will be more interesting to find out how it happened. Do you remember what has happened since it was good previously? Do you have all the logs? On Thu, Aug 1, 2013 at 7:08 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I tried to remove the znodes but got the same result. So I shutted down all the RS and restarted HBase, and now I have 0 regions for this table. Running HBCK. Seems that it has a lot to do... 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Yes you can if HBase is down, first I would copy .META out of HDFS local and then you can search it for split issues. Deleting those znodes should clear this up though. On Aug 1, 2013 8:52 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I can't check the meta since HBase is down. Regarding HDFS, I took few random lines like: 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 28328fdb7181cbd9cc4d6814775e8895 not found on server node4,60020,1375319042033; failed processing 2013-08-01 08:45:57,260 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 28328fdb7181cbd9cc4d6814775e8895 from server node4,60020,1375319042033 but it doesn't exist anymore, probably already processed its split And each time, there is nothing like that. hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -lsr / | grep 28328fdb7181cbd9cc4d6814775e8895 On ZK side: [zk: localhost:2181(CONNECTED) 3] ls /hbase/splitlog [zk: localhost:2181(CONNECTED) 10] ls /hbase/unassigned [28328fdb7181cbd9cc4d6814775e8895, a8781a598c46f19723a2405345b58470, b7ebfeb63b10997736fd12920fde2bb8, d95bb27cc026511c2a8c8ad155e79bf6, 270a9c371fcbe9cd9a04986e0b77d16b, aff4d1d8bf470458bb19525e8aef0759] Can I just delete those zknodes? Worst case hbck will find them back from HDFS if required? JM 2013/8/1 Kevin O'dell kevin.od...@cloudera.com Does it exist in meta or
Re: Region size per region on the table page
Hi Jean, You are right , hannibal does that, but it a seperate process we need to install/maintail. I thought if we had a quick and easy way to see it from master-status page. The stats are already on the regionserver page(like total size of the store) , just that it would make sense to have it on the table page too(IMO) to understand the data size distribution of regions of a particular table. Samar On 01/08/13 5:51 PM, Jean-Marc Spaggiari wrote: Hi Samar Hannibal is already doing what you are looking for. Cheers, JMS 2013/8/1 samar.opensource samar.opensou...@gmail.com Hi Devs/Users, Most of the time we want to know if our table split logic is accurate of if our current regions are well balanced for a table. I was wondering if we can expose the size of region on the table.jsp too on the table region table. If people thing it is useful I can pick it up. Also let me know if it already exists. Samar
Re: Why HBase integation with Hive makes Hive slow
Need to set scanner caching, otherwise each call to next will be an network RTT. From: Hao Ren h@claravista.fr To: user@hbase.apache.org Sent: Thursday, August 1, 2013 7:45 AM Subject: Why HBase integation with Hive makes Hive slow Hi, I have a cluster (1 master + 3 slaves) on which there Hive, Hbase, and Hadoop. In order to do some daily row-level update routine, we need to integrate Hbase with hive, but the performance is not good. E.g. There are 2 tables in hive, hbase_table: a hbase table created via Hive hive_table: a native hive table both hold the same data set. When runing: select count(*) from hbase_table; === takes 500 s select count(*) from hive_table; === takes 6 s I have tried a lot of queries on the two tables. But hbase_table is always very slow. To be claire, I created the hbase_ table as below: CREATE TABLE hbase_table ( idvisite string, client_list Arraystring, nb_client int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,clients:id_list,clients:nb) TBLPROPERTIES(hbase.table.name = table_test) ; And my Hbase is on pseudo-distributed mode. I guess, at the beginning of a hive query execution, hive will load data from Hbase, where serde takes a long time. Could someone tell me how to improve my poor performance ? Is this cause by my wrongly configured integration ? Is a fully-distributed mode needed here ? Thank you in advance for your time. Hao. -- Hao Ren ClaraVista www.claravista.fr
HDFS Restart with Replication
I'm running: CDH4.1.2 HBase 0.92.1 Hadoop 2.0.0 Is there an issue with restarting a standby cluster with replication running? I am doing the following on the standby cluster: - stop hmaster - stop name_node - start name_node - start hmaster When the name node comes back up, it's reliably missing blocks. I started with 0 missing blocks, and have run through this scenario a few times, and am up to 46 missing blocks, all from the table that is the standby for our production table (in a different datacenter). The missing blocks all are from the same table, and look like: blk_-2036986832155369224 /hbase/splitlog/data01.sea01.staging.tdb.com ,60020,1372703317824_hdfs%3A%2F%2Fname-node.sea01.staging.tdb.com %3A8020%2Fhbase%2F.logs%2Fdata05.sea01.staging.tdb.com %2C60020%2C1373557074890-splitting%2Fdata05.sea01.staging.tdb.com %252C60020%252C1373557074890.1374960698485/tempodb-data/c9cdd64af0bfed70da154c219c69d62d/recovered.edits/01366319450.temp Do I have to stop replication before restarting the standby? Thanks, Patrick
Reload configs
Is there a way to reload the HBase configs without restarting the whole system (in other words, without an interruption of service)? I'm on: CDH4.1.2 HBase 0.92.1 Hadoop 2.0.0 Thanks, Patrick
Re: Region size per region on the table page
Hi, Bryan. If you file an issue for that, it would be nice to work on it. 2013/8/1 Bryan Beaudreault bbeaudrea...@hubspot.com Hannibal is very useful, but samar is right. It's another thing to install and maintain. I'd hope that over time the need for tools like hannibal would be lessened as some of the features make its way into the main install. Hannibal does its work through crawling log files, whereas some (or all) of the data it provides could be provided through the HBase api, and thus admin ui, in a less hacky way. If someone were willing to invest the time in adding such a metric to the hbase admin ui (and HBaseAdmin API please) it would bring us one step closer. On Thu, Aug 1, 2013 at 2:42 PM, samar.opensource samar.opensou...@gmail.com wrote: Hi Jean, You are right , hannibal does that, but it a seperate process we need to install/maintail. I thought if we had a quick and easy way to see it from master-status page. The stats are already on the regionserver page(like total size of the store) , just that it would make sense to have it on the table page too(IMO) to understand the data size distribution of regions of a particular table. Samar On 01/08/13 5:51 PM, Jean-Marc Spaggiari wrote: Hi Samar Hannibal is already doing what you are looking for. Cheers, JMS 2013/8/1 samar.opensource samar.opensou...@gmail.com Hi Devs/Users, Most of the time we want to know if our table split logic is accurate of if our current regions are well balanced for a table. I was wondering if we can expose the size of region on the table.jsp too on the table region table. If people thing it is useful I can pick it up. Also let me know if it already exists. Samar -- Marcos Ortiz Valmaseda Product Manager at PDVSA http://about.me/marcosortiz
Re: Reload configs
Hi Patrick, I will say it depends on the configuration you want to change. You can do a rolling restart so there is no server interruption? JM 2013/8/1 Patrick Schless patrick.schl...@gmail.com Is there a way to reload the HBase configs without restarting the whole system (in other words, without an interruption of service)? I'm on: CDH4.1.2 HBase 0.92.1 Hadoop 2.0.0 Thanks, Patrick
Re: HDFS Restart with Replication
I can't think of a way how your missing blocks would be related to HBase replication, there's something else going on. Are all the datanodes checking back in? J-D On Thu, Aug 1, 2013 at 2:17 PM, Patrick Schless patrick.schl...@gmail.com wrote: I'm running: CDH4.1.2 HBase 0.92.1 Hadoop 2.0.0 Is there an issue with restarting a standby cluster with replication running? I am doing the following on the standby cluster: - stop hmaster - stop name_node - start name_node - start hmaster When the name node comes back up, it's reliably missing blocks. I started with 0 missing blocks, and have run through this scenario a few times, and am up to 46 missing blocks, all from the table that is the standby for our production table (in a different datacenter). The missing blocks all are from the same table, and look like: blk_-2036986832155369224 /hbase/splitlog/data01.sea01.staging.tdb.com ,60020,1372703317824_hdfs%3A%2F%2Fname-node.sea01.staging.tdb.com %3A8020%2Fhbase%2F.logs%2Fdata05.sea01.staging.tdb.com %2C60020%2C1373557074890-splitting%2Fdata05.sea01.staging.tdb.com %252C60020%252C1373557074890.1374960698485/tempodb-data/c9cdd64af0bfed70da154c219c69d62d/recovered.edits/01366319450.temp Do I have to stop replication before restarting the standby? Thanks, Patrick
Re: HDFS Restart with Replication
Yup, 14 datanodes, all check back in. However, all of the corrupt files seem to be splitlogs from data05. This is true even though I've done several restarts (each restart adding a few missing blocks). There's nothing special about data05, and it seems to be in the cluster, the same as anyone else. On Thu, Aug 1, 2013 at 5:04 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: I can't think of a way how your missing blocks would be related to HBase replication, there's something else going on. Are all the datanodes checking back in? J-D On Thu, Aug 1, 2013 at 2:17 PM, Patrick Schless patrick.schl...@gmail.com wrote: I'm running: CDH4.1.2 HBase 0.92.1 Hadoop 2.0.0 Is there an issue with restarting a standby cluster with replication running? I am doing the following on the standby cluster: - stop hmaster - stop name_node - start name_node - start hmaster When the name node comes back up, it's reliably missing blocks. I started with 0 missing blocks, and have run through this scenario a few times, and am up to 46 missing blocks, all from the table that is the standby for our production table (in a different datacenter). The missing blocks all are from the same table, and look like: blk_-2036986832155369224 /hbase/splitlog/data01.sea01.staging.tdb.com ,60020,1372703317824_hdfs%3A%2F%2Fname-node.sea01.staging.tdb.com %3A8020%2Fhbase%2F.logs%2Fdata05.sea01.staging.tdb.com %2C60020%2C1373557074890-splitting%2Fdata05.sea01.staging.tdb.com %252C60020%252C1373557074890.1374960698485/tempodb-data/c9cdd64af0bfed70da154c219c69d62d/recovered.edits/01366319450.temp Do I have to stop replication before restarting the standby? Thanks, Patrick
Re: HDFS Restart with Replication
Can you follow the life of one of those blocks though the Namenode and datanode logs? I'd suggest you start by doing a fsck on one of those files with the option that gives the block locations first. By the way why do you have split logs? Are region servers dying every time you try out something? On Thu, Aug 1, 2013 at 3:16 PM, Patrick Schless patrick.schl...@gmail.com wrote: Yup, 14 datanodes, all check back in. However, all of the corrupt files seem to be splitlogs from data05. This is true even though I've done several restarts (each restart adding a few missing blocks). There's nothing special about data05, and it seems to be in the cluster, the same as anyone else. On Thu, Aug 1, 2013 at 5:04 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: I can't think of a way how your missing blocks would be related to HBase replication, there's something else going on. Are all the datanodes checking back in? J-D On Thu, Aug 1, 2013 at 2:17 PM, Patrick Schless patrick.schl...@gmail.com wrote: I'm running: CDH4.1.2 HBase 0.92.1 Hadoop 2.0.0 Is there an issue with restarting a standby cluster with replication running? I am doing the following on the standby cluster: - stop hmaster - stop name_node - start name_node - start hmaster When the name node comes back up, it's reliably missing blocks. I started with 0 missing blocks, and have run through this scenario a few times, and am up to 46 missing blocks, all from the table that is the standby for our production table (in a different datacenter). The missing blocks all are from the same table, and look like: blk_-2036986832155369224 /hbase/splitlog/data01.sea01.staging.tdb.com ,60020,1372703317824_hdfs%3A%2F%2Fname-node.sea01.staging.tdb.com %3A8020%2Fhbase%2F.logs%2Fdata05.sea01.staging.tdb.com %2C60020%2C1373557074890-splitting%2Fdata05.sea01.staging.tdb.com %252C60020%252C1373557074890.1374960698485/tempodb-data/c9cdd64af0bfed70da154c219c69d62d/recovered.edits/01366319450.temp Do I have to stop replication before restarting the standby? Thanks, Patrick
Re: Region size per region on the table page
Created https://issues.apache.org/jira/browse/HBASE-9113 On Thu, Aug 1, 2013 at 5:34 PM, Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com wrote: Hi, Bryan. If you file an issue for that, it would be nice to work on it. 2013/8/1 Bryan Beaudreault bbeaudrea...@hubspot.com Hannibal is very useful, but samar is right. It's another thing to install and maintain. I'd hope that over time the need for tools like hannibal would be lessened as some of the features make its way into the main install. Hannibal does its work through crawling log files, whereas some (or all) of the data it provides could be provided through the HBase api, and thus admin ui, in a less hacky way. If someone were willing to invest the time in adding such a metric to the hbase admin ui (and HBaseAdmin API please) it would bring us one step closer. On Thu, Aug 1, 2013 at 2:42 PM, samar.opensource samar.opensou...@gmail.com wrote: Hi Jean, You are right , hannibal does that, but it a seperate process we need to install/maintail. I thought if we had a quick and easy way to see it from master-status page. The stats are already on the regionserver page(like total size of the store) , just that it would make sense to have it on the table page too(IMO) to understand the data size distribution of regions of a particular table. Samar On 01/08/13 5:51 PM, Jean-Marc Spaggiari wrote: Hi Samar Hannibal is already doing what you are looking for. Cheers, JMS 2013/8/1 samar.opensource samar.opensou...@gmail.com Hi Devs/Users, Most of the time we want to know if our table split logic is accurate of if our current regions are well balanced for a table. I was wondering if we can expose the size of region on the table.jsp too on the table region table. If people thing it is useful I can pick it up. Also let me know if it already exists. Samar -- Marcos Ortiz Valmaseda Product Manager at PDVSA http://about.me/marcosortiz
Re: Excessive .META scans
Just patched 6870 and it immediately fixed the problem ! On Tue, Jul 30, 2013 at 12:57 PM, Stack st...@duboce.net wrote: Try turning off http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#setRegionCachePrefetch(byte[] , boolean) St.Ack On Tue, Jul 30, 2013 at 11:27 AM, Varun Sharma va...@pinterest.com wrote: JD, its a big problem. The region server holding .META has 2X the network traffic and 2X the cpu load, I can easily spot the region server holding .META. by just looking at the ganglia graphs of the region servers side by side - I don't need to go the master console. So we can't scale up the cluster or add more load since its bottlenecked on this one region server. Thanks Nicholas for the pointer, its seems quite probable that this is the issue - it was fixed with 0.94.8 so we don't have it. I will give it a shot. On Mon, Jul 29, 2013 at 10:43 AM, Nicolas Liochon nkey...@gmail.com wrote: It could be HBASE-6870? On Mon, Jul 29, 2013 at 7:37 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: Can you tell who's doing it? You could enable IPC debug for a few secs to see who's coming in with scans. You could also try to disable pre-fetching, set hbase.client.prefetch.limit to 0 Also, is it even causing a problem or you're just worried it might since it doesn't look normal? J-D On Mon, Jul 29, 2013 at 10:32 AM, Varun Sharma va...@pinterest.com wrote: Hi folks, We are seeing an issue with hbase 0.94.3 on CDH 4.2.0 with excessive .META. reads... In the steady state where there are no client crashes and there are no region server crashes/region movement, the server holding .META. is serving an incredibly large # of read requests on the .META. table. From my understanding, in the steady state, region locations should be indefinitely cached in the client. The client is running a work load of multiput(s), puts, gets and coprocessor calls. Thanks Varun
Hitting HBASE-7693 with hbase-0.94.9
Hello list, Although the issue https://issues.apache.org/jira/browse/HBASE-7693 has been fixed, it looks like i'm hitting it. *Environment :* hadoop-1.1.2 hbase-0.94.9 OS X 10.8.4 (12E55) I'd really appreciate if somebody could throw some light. Here is the trace I see when I run my MR job against HBase : 2013-08-02 05:45:20.636 java[37884:1203] Unable to load realm mapping info from SCDynamicStore 13/08/02 05:45:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13/08/02 05:45:21 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/08/02 05:45:22 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:host.name =192.168.0.100 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_51 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Apple Inc. 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:java.home=/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/workspace/hbasemr2/bin:/Users/miqbal1/hadoop-eco/hbase-0.94.9/lib/zookeeper-3.4.5.jar:/Users/miqbal1/hadoop-eco/hbase-0.94.9/lib/guava-11.0.2.jar:/Users/miqbal1/hadoop-eco/hbase-0.94.9/hbase-0.94.9.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/hadoop-core-1.1.2.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/asm-3.2.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/aspectjrt-1.6.11.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/aspectjtools-1.6.11.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-beanutils-1.7.0.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-beanutils-core-1.8.0.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-cli-1.2.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-codec-1.4.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-collections-3.2.1.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-configuration-1.6.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-daemon-1.0.1.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-digester-1.8.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-el-1.0.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-httpclient-3.0.1.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-io-2.1.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-lang-2.4.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-logging-1.1.1.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-logging-api-1.0.4.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-math-2.1.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/commons-net-3.1.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/core-3.1.1.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/hadoop-capacity-scheduler-1.1.2.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/hadoop-fairscheduler-1.1.2.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/hadoop-thriftfs-1.1.2.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/hsqldb-1.8.0.10.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/jackson-core-asl-1.8.8.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/jackson-mapper-asl-1.8.8.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/jasper-compiler-5.5.12.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/jasper-runtime-5.5.12.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/jdeb-0.8.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/jersey-core-1.8.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/jersey-json-1.8.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/jersey-server-1.8.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/jets3t-0.6.1.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/jetty-6.1.26.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/jetty-util-6.1.26.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/jsch-0.1.42.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/junit-4.5.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/kfs-0.2.2.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/log4j-1.2.15.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/mockito-all-1.8.5.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/oro-2.0.8.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/servlet-api-2.5-20081211.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/slf4j-api-1.4.3.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/slf4j-log4j12-1.4.3.jar:/Users/miqbal1/hadoop-eco/hadoop-1.1.2/lib/xmlenc-0.52.jar:/Users/miqbal1/ZIPS-N-JARS/protobuf-java-2.4.1.jar 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:java.library.path=.:/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/var/folders/n3/d0ghj1ln2zl0kpd8zkz4zf04mdm1y2/T/
importtsv issure
Hi all I use importtsv tools load data to HBase ,but I just load data of about 5GB and HDFS like this NodeLast ContactAdmin StateConfigured Capacity (GB)Used (GB)Non DFS Used (GB)Remaining (GB)Used (%)Used (%)Remaining (%)BlocksBlock Pool Used (GB)Block Pool Used (%) BlocksFailed Volumeshydra0003http://hydra0003:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 1In Service92.2372.223.9816.0378.3017.39121372.2278.300hydra0004http://hydra0004:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 0In Service92.2369.343.9218.9775.1820.57118369.3475.180hydra0005http://hydra0005:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 2In Service176.6259.9557.6159.0733.9433.4498759.9533.940hydra0006http://hydra0006:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 1In Service92.2367.854.2620.1373.5621.82115367.8573.560hydra0007http://hydra0007:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 1In Service176.6255.718.21112.7131.5463.8195955.7131.540 the 3,4,7 is regionserver local and why the HBase use to mang space ,In my HDFS just store 5GB data my hbase table have one column family and 151 column... who can help me ? Thanks Yan
Re: Pagination with HBase
Hello Jonathan, You might have to pay special attention to the OFFSET part though. OFFSET of nth row will not remain n forever. As you insert new rows ordering will get changed as rows are not arranged in the order of insertion in HBase. Warm Regards, Tariq cloudfront.blogspot.com On Thu, Aug 1, 2013 at 11:27 PM, Jonathan Cardoso jonathancar...@gmail.comwrote: By anil's links I guess what I should use is PageFilter instead of ColumnPaginationFilter. *Jonathan Cardoso** ** Universidade Federal de Goias* 2013/8/1 Jonathan Cardoso jonathancar...@gmail.com Thanks! I've tested ColumnPaginationFilter but It's exactly what I need. Correct me if I'm wrong please, but ColumnPaginationFilter filters the columns of the result, how many of them will be retrieved based on the settings of 'limit' and 'offset' properties. But I need to make a Scan and get only the first X rows, not the first X columns *Jonathan Cardoso** ** Universidade Federal de Goias* 2013/8/1 anil gupta anilgupt...@gmail.com If you need more insight into HBase Pagination, these link might help you: http://search-hadoop.com/m/feqnAUeLR1 http://search-hadoop.com/m/m5zM2rTSkb On Thu, Aug 1, 2013 at 10:18 AM, Pavan Sudheendra pavan0...@gmail.com wrote: @Jonathan Ted Yu is right! Ignore my mail :) On Thu, Aug 1, 2013 at 10:46 PM, Ted Yu yuzhih...@gmail.com wrote: Take a look at ColumnPaginationFilter.java and its unit test. Cheers On Thu, Aug 1, 2013 at 10:01 AM, Jonathan Cardoso jonathancar...@gmail.comwrote: Hi! Is there a way to scan a HBase table getting, for example, the first 100 results, then later get the next 100 and so on... Just like in SQL we do with LIMIT and OFFSET? *Jonathan Cardoso** ** Universidade Federal de Goias* -- Regards- Pavan -- Thanks Regards, Anil Gupta
Re: importtsv issure
The following snapshot was taken after the loading, right ? Did you happen to take snapshot before the loading ? Was the table empty before loading ? Cheers On Thu, Aug 1, 2013 at 5:54 PM, 闫昆 yankunhad...@gmail.com wrote: Hi all I use importtsv tools load data to HBase ,but I just load data of about 5GB and HDFS like this NodeLast ContactAdmin StateConfigured Capacity (GB)Used (GB)Non DFS Used (GB)Remaining (GB)Used (%)Used (%)Remaining (%)BlocksBlock Pool Used (GB)Block Pool Used (%) BlocksFailed Volumeshydra0003 http://hydra0003:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 1In Service92.2372.223.9816.0378.3017.39121372.2278.300hydra0004 http://hydra0004:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 0In Service92.2369.343.9218.9775.1820.57118369.3475.180hydra0005 http://hydra0005:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 2In Service176.6259.9557.6159.0733.9433.4498759.9533.940hydra0006 http://hydra0006:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 1In Service92.2367.854.2620.1373.5621.82115367.8573.560hydra0007 http://hydra0007:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 1In Service176.6255.718.21112.7131.5463.8195955.7131.540 the 3,4,7 is regionserver local and why the HBase use to mang space ,In my HDFS just store 5GB data my hbase table have one column family and 151 column... who can help me ? Thanks Yan
Re: Hitting HBASE-7693 with hbase-0.94.9
Looking at ./src/core/org/apache/hadoop/net/DNS.java in hadoop branch-1, here is line 79 : String hostname = attribute.get(PTR).get().toString(); It is not clear which part was null. Reading HBASE-7693 once more, it says: PTR records contain a trailing period, which then shows up in the input split location causing the JobTracker to incorrectly match map jobs to data-local map slots. So it seems that the problem you encountered was different. Cheers On Thu, Aug 1, 2013 at 5:18 PM, Mohammad Tariq donta...@gmail.com wrote: Hello list, Although the issue https://issues.apache.org/jira/browse/HBASE-7693 has been fixed, it looks like i'm hitting it. *Environment :* hadoop-1.1.2 hbase-0.94.9 OS X 10.8.4 (12E55) I'd really appreciate if somebody could throw some light. Here is the trace I see when I run my MR job against HBase : 2013-08-02 05:45:20.636 java[37884:1203] Unable to load realm mapping info from SCDynamicStore 13/08/02 05:45:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13/08/02 05:45:21 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/08/02 05:45:22 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:host.name =192.168.0.100 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_51 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Apple Inc. 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:java.home=/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client
Re: importtsv issure
my hbase 0.94 I am not sure use snapshot my hbase-site.xml not configuration snapshot . this is my first load data to hbase but first load data 2Millions second load data 2Millions three load data 10Millions In same hbase table 'data_rk' 2013/8/2 Ted Yu yuzhih...@gmail.com The following snapshot was taken after the loading, right ? Did you happen to take snapshot before the loading ? Was the table empty before loading ? Cheers On Thu, Aug 1, 2013 at 5:54 PM, 闫昆 yankunhad...@gmail.com wrote: Hi all I use importtsv tools load data to HBase ,but I just load data of about 5GB and HDFS like this NodeLast ContactAdmin StateConfigured Capacity (GB)Used (GB)Non DFS Used (GB)Remaining (GB)Used (%)Used (%)Remaining (%)BlocksBlock Pool Used (GB)Block Pool Used (%) BlocksFailed Volumeshydra0003 http://hydra0003:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 1In Service92.2372.223.9816.0378.3017.39121372.2278.300hydra0004 http://hydra0004:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 0In Service92.2369.343.9218.9775.1820.57118369.3475.180hydra0005 http://hydra0005:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 2In Service176.6259.9557.6159.0733.9433.4498759.9533.940hydra0006 http://hydra0006:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 1In Service92.2367.854.2620.1373.5621.82115367.8573.560hydra0007 http://hydra0007:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 1In Service176.6255.718.21112.7131.5463.8195955.7131.540 the 3,4,7 is regionserver local and why the HBase use to mang space ,In my HDFS just store 5GB data my hbase table have one column family and 151 column... who can help me ? Thanks Yan
Re: Hitting HBASE-7693 with hbase-0.94.9
Hello Ted, Thank you so much for the quick response. I need to dig a bit more in that case. Will get back here once I get something. Warm Regards, Tariq cloudfront.blogspot.com On Fri, Aug 2, 2013 at 6:42 AM, Ted Yu yuzhih...@gmail.com wrote: Looking at ./src/core/org/apache/hadoop/net/DNS.java in hadoop branch-1, here is line 79 : String hostname = attribute.get(PTR).get().toString(); It is not clear which part was null. Reading HBASE-7693 once more, it says: PTR records contain a trailing period, which then shows up in the input split location causing the JobTracker to incorrectly match map jobs to data-local map slots. So it seems that the problem you encountered was different. Cheers On Thu, Aug 1, 2013 at 5:18 PM, Mohammad Tariq donta...@gmail.com wrote: Hello list, Although the issue https://issues.apache.org/jira/browse/HBASE-7693 has been fixed, it looks like i'm hitting it. *Environment :* hadoop-1.1.2 hbase-0.94.9 OS X 10.8.4 (12E55) I'd really appreciate if somebody could throw some light. Here is the trace I see when I run my MR job against HBase : 2013-08-02 05:45:20.636 java[37884:1203] Unable to load realm mapping info from SCDynamicStore 13/08/02 05:45:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13/08/02 05:45:21 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/08/02 05:45:22 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:host.name =192.168.0.100 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_51 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Apple Inc. 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client environment:java.home=/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home 13/08/02 05:45:22 INFO zookeeper.ZooKeeper: Client
Re: ETL like merge databases to HBase
@Ted Yu Yes, they are pretty much RDBMS, I've looked at Sqoop, but looks like Sqoop only does one time migration? How about continus updates? Shengjie On 2 August 2013 00:13, Ted Yu yuzhih...@gmail.com wrote: bq. Each has a DB associated They're RDBMS, I assume ? Have you looked at Sqoop ? On Thu, Aug 1, 2013 at 8:04 AM, Shengjie Min kelvin@gmail.com wrote: Hi All, I have a use case I have a few applications running independently, Let's say applications A, B, C. Each has a DB associated. I wanna a aggregated view on all the databases so that I don't have to jump into different dbs to find the info I need. Is there a tool out there, allows me to move data from A,B,C to a single/centralised HBase cluster? Would be even nicer, if DB A, B, C gets updated by the apps, the updates will be synchronised to HBase too. -- All the best, Shengjie Min -- All the best, Shengjie Min
Re: ETL like merge databases to HBase
Mind asking the question on sqoop mailing list ? http://sqoop.apache.org/mail-lists.html On Thu, Aug 1, 2013 at 6:23 PM, Shengjie Min kelvin@gmail.com wrote: @Ted Yu Yes, they are pretty much RDBMS, I've looked at Sqoop, but looks like Sqoop only does one time migration? How about continus updates? Shengjie On 2 August 2013 00:13, Ted Yu yuzhih...@gmail.com wrote: bq. Each has a DB associated They're RDBMS, I assume ? Have you looked at Sqoop ? On Thu, Aug 1, 2013 at 8:04 AM, Shengjie Min kelvin@gmail.com wrote: Hi All, I have a use case I have a few applications running independently, Let's say applications A, B, C. Each has a DB associated. I wanna a aggregated view on all the databases so that I don't have to jump into different dbs to find the info I need. Is there a tool out there, allows me to move data from A,B,C to a single/centralised HBase cluster? Would be even nicer, if DB A, B, C gets updated by the apps, the updates will be synchronised to HBase too. -- All the best, Shengjie Min -- All the best, Shengjie Min
Re: importtsv issure
By snapshot I meant the status of hdfs, shown in your first email. Which HBase 0.94 release were you using ? The 2Millions below refer to number of rows or amount of data ? On Thu, Aug 1, 2013 at 6:14 PM, 闫昆 yankunhad...@gmail.com wrote: my hbase 0.94 I am not sure use snapshot my hbase-site.xml not configuration snapshot . this is my first load data to hbase but first load data 2Millions second load data 2Millions three load data 10Millions In same hbase table 'data_rk' 2013/8/2 Ted Yu yuzhih...@gmail.com The following snapshot was taken after the loading, right ? Did you happen to take snapshot before the loading ? Was the table empty before loading ? Cheers On Thu, Aug 1, 2013 at 5:54 PM, 闫昆 yankunhad...@gmail.com wrote: Hi all I use importtsv tools load data to HBase ,but I just load data of about 5GB and HDFS like this NodeLast ContactAdmin StateConfigured Capacity (GB)Used (GB)Non DFS Used (GB)Remaining (GB)Used (%)Used (%)Remaining (%)BlocksBlock Pool Used (GB)Block Pool Used (%) BlocksFailed Volumeshydra0003 http://hydra0003:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 1In Service92.2372.223.9816.0378.3017.39121372.2278.300hydra0004 http://hydra0004:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 0In Service92.2369.343.9218.9775.1820.57118369.3475.180hydra0005 http://hydra0005:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 2In Service176.6259.9557.6159.0733.9433.4498759.9533.940hydra0006 http://hydra0006:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 1In Service92.2367.854.2620.1373.5621.82115367.8573.560hydra0007 http://hydra0007:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 1In Service176.6255.718.21112.7131.5463.8195955.7131.540 the 3,4,7 is regionserver local and why the HBase use to mang space ,In my HDFS just store 5GB data my hbase table have one column family and 151 column... who can help me ? Thanks Yan
Re: importtsv issure
I use hbase-0.94.6-cdh4.3.0 Total about 16Millions rows and data size about 4-5GB I am sorry my englishi is poor,.. Thanks Yu 2013/8/2 Ted Yu yuzhih...@gmail.com By snapshot I meant the status of hdfs, shown in your first email. Which HBase 0.94 release were you using ? The 2Millions below refer to number of rows or amount of data ? On Thu, Aug 1, 2013 at 6:14 PM, 闫昆 yankunhad...@gmail.com wrote: my hbase 0.94 I am not sure use snapshot my hbase-site.xml not configuration snapshot . this is my first load data to hbase but first load data 2Millions second load data 2Millions three load data 10Millions In same hbase table 'data_rk' 2013/8/2 Ted Yu yuzhih...@gmail.com The following snapshot was taken after the loading, right ? Did you happen to take snapshot before the loading ? Was the table empty before loading ? Cheers On Thu, Aug 1, 2013 at 5:54 PM, 闫昆 yankunhad...@gmail.com wrote: Hi all I use importtsv tools load data to HBase ,but I just load data of about 5GB and HDFS like this NodeLast ContactAdmin StateConfigured Capacity (GB)Used (GB)Non DFS Used (GB)Remaining (GB)Used (%)Used (%)Remaining (%)BlocksBlock Pool Used (GB)Block Pool Used (%) BlocksFailed Volumeshydra0003 http://hydra0003:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 1In Service92.2372.223.9816.0378.3017.39121372.2278.300hydra0004 http://hydra0004:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 0In Service92.2369.343.9218.9775.1820.57118369.3475.180hydra0005 http://hydra0005:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 2In Service176.6259.9557.6159.0733.9433.4498759.9533.940hydra0006 http://hydra0006:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 1In Service92.2367.854.2620.1373.5621.82115367.8573.560hydra0007 http://hydra0007:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=%2Fnnaddr=192.5.1.50:8020 1In Service176.6255.718.21112.7131.5463.8195955.7131.540 the 3,4,7 is regionserver local and why the HBase use to mang space ,In my HDFS just store 5GB data my hbase table have one column family and 151 column... who can help me ? Thanks Yan
Re: ETL like merge databases to HBase
Hbase doesn't have dynamic views on data outside of itself. But you can easily re run your sqoop flow to dump information into hbase. Actually, it might be easier to go with a pure RDBMS solution here since nowadays the Slave/master architectures in postgre and MySQL are mature enough to handle this sort of thing even for hundreds of thousands of rows.
Re: ETL like merge databases to HBase
Though it is better as Ted suggested to discuss this in Sqoop mailing list (as Sqoop 2 supposed to be more feature rich) but just to get this out, Sqoop does support incremental imports if you can come up with a suitable and compatible strategy. Tha tmigh thelp you if you configure you imports on some periodic schedule. Regards, Shahab On Thu, Aug 1, 2013 at 10:17 PM, Jay Vyas jayunit...@gmail.com wrote: Hbase doesn't have dynamic views on data outside of itself. But you can easily re run your sqoop flow to dump information into hbase. Actually, it might be easier to go with a pure RDBMS solution here since nowadays the Slave/master architectures in postgre and MySQL are mature enough to handle this sort of thing even for hundreds of thousands of rows.
issure about DNS error in running hbase mapreduce
i use hadoop-dns-checker check the dns problem ,seems all ok,but when i run MR task in hbase,it report problem,anyone have good idea? # ./run-on-cluster.sh hosts1 CH22 The authenticity of host 'ch22 (192.168.10.22)' can't be established. RSA key fingerprint is f3:4a:ca:a3:17:08:98:c2:0a:bd:27:99:a3:65:bc:89. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'ch22,192.168.10.22' (RSA) to the list of known hosts. root@ch22's password: sending incremental file list created directory hadoop-dns a.jar hosts1 run.sh sent 2394 bytes received 69 bytes 547.33 bytes/sec total size is 2618 speedup is 1.06 root@ch22's password: # self check... -- host : CH22 host lookup : success (192.168.10.22) reverse lookup : success (CH22) is reachable : yes # end self check Running on : CH22/192.168.10.22 = -- host : CH22 host lookup : success (192.168.10.22) reverse lookup : success (CH22) is reachable : yes -- host : CH34 host lookup : success (192.168.10.34) reverse lookup : success (CH34) is reachable : yes -- host : CH35 host lookup : success (192.168.10.35) reverse lookup : success (CH35) is reachable : yes -- host : CH36 host lookup : success (192.168.10.36) reverse lookup : success (CH36) is reachable : yes CH34 root@ch34's password: sending incremental file list created directory hadoop-dns a.jar hosts1 run.sh sent 2394 bytes received 69 bytes 703.71 bytes/sec total size is 2618 speedup is 1.06 root@ch34's password: # self check... -- host : CH34 host lookup : success (192.168.10.34) reverse lookup : success (CH34) is reachable : yes # end self check Running on : CH34/192.168.10.34 = -- host : CH22 host lookup : success (192.168.10.22) reverse lookup : success (CH22) is reachable : yes -- host : CH34 host lookup : success (192.168.10.34) reverse lookup : success (CH34) is reachable : yes -- host : CH35 host lookup : success (192.168.10.35) reverse lookup : success (CH35) is reachable : yes -- host : CH36 host lookup : success (192.168.10.36) reverse lookup : success (CH36) is reachable : yes CH35 root@ch35's password: sending incremental file list created directory hadoop-dns a.jar hosts1 run.sh sent 2394 bytes received 69 bytes 703.71 bytes/sec total size is 2618 speedup is 1.06 root@ch35's password: # self check... -- host : CH35 host lookup : success (192.168.10.35) reverse lookup : success (CH35) is reachable : yes # end self check Running on : CH35/192.168.10.35 = -- host : CH22 host lookup : success (192.168.10.22) reverse lookup : success (CH22) is reachable : yes -- host : CH34 host lookup : success (192.168.10.34) reverse lookup : success (CH34) is reachable : yes -- host : CH35 host lookup : success (192.168.10.35) reverse lookup : success (CH35) is reachable : yes -- host : CH36 host lookup : success (192.168.10.36) reverse lookup : success (CH36) is reachable : yes CH36 root@ch36's password: sending incremental file list created directory hadoop-dns a.jar hosts1 run.sh sent 2394 bytes received 69 bytes 703.71 bytes/sec total size is 2618 speedup is 1.06 root@ch36's password: # self check... -- host : CH36 host lookup : success (192.168.10.36) reverse lookup : success (CH36) is reachable : yes # end self check Running on : CH36/192.168.10.36 = -- host : CH22 host lookup : success (192.168.10.22) reverse lookup : success (CH22) is reachable : yes -- host : CH34 host lookup : success (192.168.10.34) reverse lookup : success (CH34) is reachable : yes -- host : CH35 host lookup : success (192.168.10.35) reverse lookup : success (CH35) is reachable : yes -- host : CH36 host lookup : success (192.168.10.36) reverse lookup : success (CH36) is reachable : yes # yarn jar mapreducehbaseTest.jar com.mediaadx.hbase.hadoop.test.TxtHbase '' '' 13/08/02 13:10:17 WARN conf.Configuration: dfs.df.interval is deprecated. Instead, use fs.df.interval 13/08/02 13:10:17 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 13/08/02 13:10:17 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 13/08/02 13:10:17 WARN conf.Configuration: topology.script.number.args is deprecated. Instead, use net.topology.script.number.args 13/08/02 13:10:17 WARN conf.Configuration: dfs.umaskmode is deprecated. Instead, use fs.permissions.umask-mode 13/08/02 13:10:17 WARN conf.Configuration: topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl 13/08/02 13:10:17 WARN conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id 13/08/02 13:10:17 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 13/08/02 13:10:17 WARN conf.Configuration: slave.host.name is deprecated. Instead, use