Re: hbase regionserver all dead

2014-03-03 Thread Marcos Luis Ortiz Valmaseda
Regards, ch huang.
Which version of HBase are you using?
Please, check the following things:
- zookeeper session timeout
- zookeeper ticktime
- hbase.zookeeper.property.maxClientsCnxns (default 35)
- ulimit
- increase the quantity of open files (32k or more)







2014-03-04 2:22 GMT+01:00 ch huang justlo...@gmail.com:

 hi,maillist:
this morning i check my hbase cluster log,and find all region
 server down ,i do not know why,hope some expert can show me some clue, here
 is the log which i find in the first death happened node

 2014-03-03 17:16:11,413 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=16.78 MB,
 free=1.98 GB, max=2.00 GB, blocks=0, accesses=82645, hits=4,
 hitRatio=0.00%,
  , cachingAccesses=5, cachingHits=0, cachingHitsRatio=0, evictions=0,
 evicted=5, evictedPerRun=Infinity
 2014-03-03 17:20:30,093 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
 listener on 60020: readAndProcess threw exception java.io.IOException:
 Connection reset by peer. Count
  of bytes read: 0
 java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
 at sun.nio.ch.IOUtil.read(IOUtil.java:197)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
 at
 org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1798)
 at

 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1181)
 at

 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:750)
 at

 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:541)
 at

 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:516)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 2014-03-03 17:21:11,413 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=17.28 MB,
 free=1.98 GB, max=2.00 GB, blocks=4, accesses=88870, hits=3112,
 hitRatio=3.5
 0%, , cachingAccesses=3117, cachingHits=3108, cachingHitsRatio=99.71%, ,
 evictions=0, evicted=5, evictedPerRun=Infinity
 2014-03-03 17:21:45,112 WARN org.apache.hadoop.hdfs.DFSClient:
 DFSOutputStream ResponseProcessor exception  for block
 BP-1043055049-192.168.11.11-1382442676609:blk_-716939259337
 565008_4210841
 java.io.EOFException: Premature EOF: no length prefix available
 at

 org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
 at

 org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114)
 at

 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:695)
 2014-03-03 17:21:45,116 WARN org.apache.hadoop.hdfs.DFSClient: Error
 Recovery for block
 BP-1043055049-192.168.11.11-1382442676609:blk_-716939259337565008_4210841
 in pipeline 192
 .168.11.14:50010, 192.168.11.10:50010, 192.168.11.15:50010: bad datanode
 192.168.11.14:50010
 2014-03-03 17:24:58,114 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:36837,call:next(-2524485469510465096, 100), rpc
 version=1, client versi
 on=29, methodsFingerPrint=-1368823753,client:192.168.11.174:39642

 ,starttimems:1393838661274,queuetimems:0,class:HRegionServer,responsesize:6,method:next}
 2014-03-03 17:24:58,117 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:36880,call:next(6510031569997476480, 100), rpc
 version=1, client versio
 n=29, methodsFingerPrint=-1368823753,client:192.168.11.174:39642

 ,starttimems:1393838661234,queuetimems:1,class:HRegionServer,responsesize:6,method:next}
 2014-03-03 17:24:58,117 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:36880,call:next(-8080468273710364924, 100), rpc
 version=1, client versi
 on=29, methodsFingerPrint=-1368823753,client:192.168.11.174:39642

 ,starttimems:1393838661234,queuetimems:1,class:HRegionServer,responsesize:6,method:next}
 2014-03-03 17:24:58,118 WARN org.apache.hadoop.ipc.HBaseServer:
 (responseTooSlow):
 {processingtimems:36882,call:next(-1838307716001367158, 100), rpc
 version=1, client version=29, methodsFingerPrint=-1368823753,client:
 192.168.11.174:39642

 ,starttimems:1393838661234,queuetimems:1,class:HRegionServer,responsesize:6,method:next}
 2014-03-03 17:24:58,119 INFO org.apache.zookeeper.ClientCnxn: Client
 session timed out, have not heard from server in 38421ms for sessionid
 0x441fb1d01a1759, closing socket connection and attempting reconnect
 2014-03-03 17:24:58,119 INFO org.apache.zookeeper.ClientCnxn: Client
 session timed out, have not heard from server in 43040ms for sessionid
 

Re: Compile and run HBase 0.98 from svn?

2014-03-01 Thread Marcos Luis Ortiz Valmaseda
Did you read here?
https://hbase.apache.org/book/build.html


2014-03-01 21:55 GMT-04:30 Jean-Marc Spaggiari jean-m...@spaggiari.org:

 Hi all,

 Any hint on the way to build and run 0.98?

 svn co  http://svn.apache.org/repos/asf/hbase/branches/0.98/ hbase-0.98
 mvn clean install assembly:assembly -DskipTests -Prelease

 All passed.

 Update hbase-env hbase-site and regionservers.

 Distribute all the directory on 4 servers.

 But... something is missing. Doing that, I miss many libs, like
 zookeeper-3.4.5.jar.

 So I guess I'm not building it the right way.

 Can anyone point me to the right steps?

 Thanks,

 JM




-- 
Marcos Ortiz Valmaseda
http://about.me/marcosortiz


Re: Compile and run HBase 0.98 from svn?

2014-03-01 Thread Marcos Luis Ortiz Valmaseda
Well, tomorrow, you could read this old post from Praveen:
http://praveen.kumar.in/2011/06/20/building-hadoop-and-hbase-for-hbase-maven-application-development/


2014-03-01 22:49 GMT-04:30 Jean-Marc Spaggiari jean-m...@spaggiari.org:

 Yep. And the page after.

 I tried this too: MAVEN_OPTS=-Xmx3g mvn -f pom.xml clean install
 -DskipTests  javadoc:aggregate site assembly:single -Prelease

 With no success... I will continue tomorrow...


 2014-03-01 21:44 GMT-05:00 Marcos Luis Ortiz Valmaseda 
 marcosluis2...@gmail.com:

  Did you read here?
  https://hbase.apache.org/book/build.html
 
 
  2014-03-01 21:55 GMT-04:30 Jean-Marc Spaggiari jean-m...@spaggiari.org
 :
 
   Hi all,
  
   Any hint on the way to build and run 0.98?
  
   svn co  http://svn.apache.org/repos/asf/hbase/branches/0.98/hbase-0.98
   mvn clean install assembly:assembly -DskipTests -Prelease
  
   All passed.
  
   Update hbase-env hbase-site and regionservers.
  
   Distribute all the directory on 4 servers.
  
   But... something is missing. Doing that, I miss many libs, like
   zookeeper-3.4.5.jar.
  
   So I guess I'm not building it the right way.
  
   Can anyone point me to the right steps?
  
   Thanks,
  
   JM
  
 
 
 
  --
  Marcos Ortiz Valmaseda
  http://about.me/marcosortiz
 




-- 
Marcos Ortiz Valmaseda
http://about.me/marcosortiz


Re: HTablePool is deprecated, any alternatives?

2014-02-12 Thread Marcos Luis Ortiz Valmaseda
You are right Li Li. HTablePool was deprecated in 0.94, 0.95/0.96, and
removed in 0.98:
See: https://issues.apache.org/jira/browse/HBASE-6580
Use HConnection instead:
http://comments.gmane.org/gmane.comp.java.hadoop.hbase.devel/38950


2014-02-12 22:13 GMT-04:30 Li Li fancye...@gmail.com:

 I am using hbase 0.94.11. it says HTablePool is deprecated. is there
 any alternatives for it?




-- 
Marcos Ortiz Valmaseda
http://about.me/marcosortiz


Re: Please welcome our newest committer, Rajeshbabu Chintaguntla

2013-09-11 Thread Marcos Luis Ortiz Valmaseda
Congratulations, Rajeshbabu !!


2013/9/11 ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com

 Hi All,

 Please join me in welcoming Rajeshbabu (Rajesh) as our new HBase committer.
 Rajesh has been there for more than a year and has been solving some very
 good bugs around the Assignment Manger area.  He has been working on other
 stuff like HBase-Mapreduce performance improvement, migration scripts and
 off late in the Secondary Index related things.

 Rajesh has made his first commit to the pom.xml already.
 Once again, congratulations and welcome to this new role (smile).

 Cheers
 Ram




-- 
Marcos Ortiz Valmaseda
Product Manager at PDVSA
http://about.me/marcosortiz


Re: Please welcome our newest committer, Nick Dimiduk

2013-09-10 Thread Marcos Luis Ortiz Valmaseda
Congratulations, Nick !!! Keep doing this great work


2013/9/10 ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com

 Congratulations Nick.!!!


 On Wed, Sep 11, 2013 at 9:15 AM, rajeshbabu chintaguntla 
 rajeshbabu.chintagun...@huawei.com wrote:

  Congratulations Nick.
  
  From: lars hofhansl [la...@apache.org]
  Sent: Wednesday, September 11, 2013 7:30 AM
  To: d...@hbase.apache.org; hbase-user
  Subject: Re: Please welcome our newest committer, Nick Dimiduk
 
  Congrats Nick, great to have you on board!
 
 
 
 
  - Original Message -
  From: Enis Söztutar e...@apache.org
  To: d...@hbase.apache.org d...@hbase.apache.org; hbase-user 
  user@hbase.apache.org
  Cc:
  Sent: Tuesday, September 10, 2013 3:54 PM
  Subject: Please welcome our newest committer, Nick Dimiduk
 
  Hi,
 
  Please join me in welcoming Nick as our new addition to the list of
  committers. Nick is exceptionally good with user-facing issues, and has
  done major contributions in mapreduce related areas, hive support, as
 well
  as 0.96 issues and the new and shiny data types API.
 
  Nick, as tradition, feel free to do your first commit to add yourself to
  pom.xml.
 
  Cheers,
  Enis
 
 




-- 
Marcos Ortiz Valmaseda
Product Manager at PDVSA
http://about.me/marcosortiz


Re: Hbase keeps dying (Zookeeper)

2013-08-09 Thread Marcos Luis Ortiz Valmaseda
Regards, Trevor.
 hadoop-hbase-0.90.6+84.73-1
 hadoop-zookeeper-3.3.5+19.5-1
 hadoop-0.20.2+923.421-1


Why not to upgrade your components?
HBase to the last 0.94.10:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12324627

Zookeeper to the last 3.4.5:
http://zookeeper.apache.org/doc/r3.4.5/releasenotes.html

Hadoop 1.2.1:
http://hadoop.apache.org/docs/r1.2.1/releasenotes.html
That's my first advice.

Now, from 3.3.5 to 3.4.5, there a lot of bug fixes and a lot of improvements:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801version=12321883

2013/8/9, Trevor Antczak tantc...@operasolutions.com:
 So I've done some more research into this and it appears that my Zookeeper
 doesn't have /hbase/master.  From zkCli:

 [zk: localhost:2181(CONNECTED) 3] ls /hbase
 [splitlog, unassigned, root-region-server, rs, table, shutdown]
 [zk: localhost:2181(CONNECTED) 4] get /hbase/master
 Node does not exist: /hbase/master
 [zk: localhost:2181(CONNECTED) 5]

 I have no idea how this could have happened, but is there a way to
 regenerate the node in zookeeper?  All of the other expected nodes are
 there.  It seems from the logs that everything was fine with hbase until
 12:01 AM on August 1st, at which point it just stopped working. I can't find
 any reason that any of this has happened either.  It's all very strange.

 Trevor

 -Original Message-
 From: Trevor Antczak [mailto:tantc...@operasolutions.com]
 Sent: Monday, August 05, 2013 2:40 PM
 To: user@hbase.apache.org
 Subject: RE: Hbase keeps dying (Zookeeper)

 hadoop-hbase-0.90.6+84.73-1
 hadoop-zookeeper-3.3.5+19.5-1
 hadoop-0.20.2+923.421-1

 Yes, hbase is managing the Quorum.

 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Monday, August 05, 2013 12:39 PM
 To: user@hbase.apache.org
 Subject: Re: Hbase keeps dying (Zookeeper)

 bq. there wasn't a copy of hdfs-site.xml

 Can you tell us the versions of:
  hadoop
  hbase
  zookeeper
 you're using ?

 Did you let HBase manage your zookeeper quorum ?

 On Mon, Aug 5, 2013 at 9:15 AM, Trevor Antczak
 tantc...@operasolutions.comwrote:

 Hi all,

 I have an hbase system that has worked fine for quite a long time, but
 now it is quite suddenly developing errors.  First it was dying
 immediately on startup because there wasn't a copy of hdfs-site.xml in
 the hbase conf directory (which doesn't seem like it should be
 necessary, and I'm not sure how it got moved if it had been there in
 the first place).  I copied the hdfs-site-xml from /etc/hadoops/conf
 into /etc/hbase/conf.  Now hbase starts up, but it can never connect
 to Zookeeper and dies after a few minutes of trying.  The weird thing,
 is that according to Zookeeper the connection is happening.  From the
 hbase logs I get a ton of messages like:

 2013-08-05 11:57:19,019 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
 master:6-0x4403f9ef5b20026 Creating (or updating) unassigned node
 for
 0f3ca79375768472af70765ff231ee32 with OFFLINE state
 2013-08-05 11:57:19,020 DEBUG
 org.apache.hadoop.hbase.master.AssignmentManager: Handling
 transition=M_ZK_REGION_OFFLINE, server=hmaster:6,
 region=0f3ca79375768472af70765ff231ee32

 Eventually followed by:

 2013-08-05 11:57:19,105 WARN org.apache.zookeeper.ClientCnxn: Session
 0x4403f9ef5b20026 for server hslave14/172.20.7.124:2181, unexpected
 error, closing socket connection and attempting reconnect
 java.io.IOException: Packet len4935980 is out of range!
 at
 org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:708)
 at
 org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:867)
 at
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1154)

 And then a bunch more Java errors as the process dies.  From the
 Zookeeper logs I see the hbase server connect:

 13/08/05 11:40:27 INFO server.NIOServerCnxn: Accepted socket
 connection from /xxx.xxx.xxx.xxx:34879
 13/08/05 11:40:27 INFO server.NIOServerCnxn: Client attempting to
 establish new session at /xxx.xxx.xxx.xxx:34879
 13/08/05 11:40:27 INFO server.NIOServerCnxn: Established session
 0x1404ee40a8d000c with negotiated timeout 4 for client
 /xxx.xxx.xxx.xxx:34879

 Then disconnect, but only after it shuts down:

 13/08/05 11:45:52 INFO server.NIOServerCnxn: Closed socket connection
 for client /xxx.xxx.xxx.xxx:34879 which had sessionid
 0x1404ee40a8d000c

 Does anyone have any clever ideas of places I can look for this error?
 Or why I'm suddenly having this problem when I haven't changed anything?
  Thanks in advance for any help provided.

 Trevor




-- 
Marcos Ortiz Valmaseda
Product Manager at PDVSA
http://about.me/marcosortiz


Re: Hbase keeps dying (Zookeeper)

2013-08-09 Thread Marcos Luis Ortiz Valmaseda
Regards, Trevor.
 hadoop-hbase-0.90.6+84.73-1
 hadoop-zookeeper-3.3.5+19.5-1
 hadoop-0.20.2+923.421-1


Why not to upgrade your components?
HBase to the last 0.94.10:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12324627

Zookeeper to the last 3.4.5:
http://zookeeper.apache.org/doc/r3.4.5/releasenotes.html

Hadoop 1.2.1:
http://hadoop.apache.org/docs/r1.2.1/releasenotes.html
That's my first advice.

Now, from 3.3.5 to 3.4.5, there a lot of bug fixes and a lot of improvements:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801version=12321883

2013/8/9, Trevor Antczak tantc...@operasolutions.com:
 So I've done some more research into this and it appears that my Zookeeper
 doesn't have /hbase/master.  From zkCli:

 [zk: localhost:2181(CONNECTED) 3] ls /hbase
 [splitlog, unassigned, root-region-server, rs, table, shutdown]
 [zk: localhost:2181(CONNECTED) 4] get /hbase/master
 Node does not exist: /hbase/master
 [zk: localhost:2181(CONNECTED) 5]

 I have no idea how this could have happened, but is there a way to
 regenerate the node in zookeeper?  All of the other expected nodes are
 there.  It seems from the logs that everything was fine with hbase until
 12:01 AM on August 1st, at which point it just stopped working. I can't find
 any reason that any of this has happened either.  It's all very strange.

 Trevor

 -Original Message-
 From: Trevor Antczak [mailto:tantc...@operasolutions.com]
 Sent: Monday, August 05, 2013 2:40 PM
 To: user@hbase.apache.org
 Subject: RE: Hbase keeps dying (Zookeeper)

 hadoop-hbase-0.90.6+84.73-1
 hadoop-zookeeper-3.3.5+19.5-1
 hadoop-0.20.2+923.421-1

 Yes, hbase is managing the Quorum.

 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Monday, August 05, 2013 12:39 PM
 To: user@hbase.apache.org
 Subject: Re: Hbase keeps dying (Zookeeper)

 bq. there wasn't a copy of hdfs-site.xml

 Can you tell us the versions of:
  hadoop
  hbase
  zookeeper
 you're using ?

 Did you let HBase manage your zookeeper quorum ?

 On Mon, Aug 5, 2013 at 9:15 AM, Trevor Antczak
 tantc...@operasolutions.comwrote:

 Hi all,

 I have an hbase system that has worked fine for quite a long time, but
 now it is quite suddenly developing errors.  First it was dying
 immediately on startup because there wasn't a copy of hdfs-site.xml in
 the hbase conf directory (which doesn't seem like it should be
 necessary, and I'm not sure how it got moved if it had been there in
 the first place).  I copied the hdfs-site-xml from /etc/hadoops/conf
 into /etc/hbase/conf.  Now hbase starts up, but it can never connect
 to Zookeeper and dies after a few minutes of trying.  The weird thing,
 is that according to Zookeeper the connection is happening.  From the
 hbase logs I get a ton of messages like:

 2013-08-05 11:57:19,019 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
 master:6-0x4403f9ef5b20026 Creating (or updating) unassigned node
 for
 0f3ca79375768472af70765ff231ee32 with OFFLINE state
 2013-08-05 11:57:19,020 DEBUG
 org.apache.hadoop.hbase.master.AssignmentManager: Handling
 transition=M_ZK_REGION_OFFLINE, server=hmaster:6,
 region=0f3ca79375768472af70765ff231ee32

 Eventually followed by:

 2013-08-05 11:57:19,105 WARN org.apache.zookeeper.ClientCnxn: Session
 0x4403f9ef5b20026 for server hslave14/172.20.7.124:2181, unexpected
 error, closing socket connection and attempting reconnect
 java.io.IOException: Packet len4935980 is out of range!
 at
 org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:708)
 at
 org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:867)
 at
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1154)

 And then a bunch more Java errors as the process dies.  From the
 Zookeeper logs I see the hbase server connect:

 13/08/05 11:40:27 INFO server.NIOServerCnxn: Accepted socket
 connection from /xxx.xxx.xxx.xxx:34879
 13/08/05 11:40:27 INFO server.NIOServerCnxn: Client attempting to
 establish new session at /xxx.xxx.xxx.xxx:34879
 13/08/05 11:40:27 INFO server.NIOServerCnxn: Established session
 0x1404ee40a8d000c with negotiated timeout 4 for client
 /xxx.xxx.xxx.xxx:34879

 Then disconnect, but only after it shuts down:

 13/08/05 11:45:52 INFO server.NIOServerCnxn: Closed socket connection
 for client /xxx.xxx.xxx.xxx:34879 which had sessionid
 0x1404ee40a8d000c

 Does anyone have any clever ideas of places I can look for this error?
 Or why I'm suddenly having this problem when I haven't changed anything?
  Thanks in advance for any help provided.

 Trevor




-- 
Marcos Ortiz Valmaseda
Product Manager at PDVSA
http://about.me/marcosortiz


Re: Region not splitted

2013-08-09 Thread Marcos Luis Ortiz Valmaseda
Regards, Jean-Marc.
What version of HBase are you using?
In the new version of the platform (0.94), there a lot of improvements
for auto spliting and pre-spliting regions.
The great Hortonworks's team published an amazing post for this
particular topic:
http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/

2013/8/9, Jean-Marc Spaggiari jean-m...@spaggiari.org:
 Hi,

 Quick question regarding the split.

 Let's consider the table work_proposed' below:

 275164921921  hdfs://node3:9000/hbase/work_proposed

 This is a 256GB table. I think there is more than 1B lines into it but I
 have not counted them for a while.

 This table as a pretty default definition:


 hbase(main):001:0 describe 'work_proposed'
 DESCRIPTION
 ENABLED

  'work_proposed', {NAME = '@', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER
 = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '3',
 TTL = '2147483647', MIN
 true

  _VERSIONS = '0', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536',
 ENCODE_ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 'true'},
 {NAME = 'a',
 DATA_BLOCK_ENCODIN

  G = 'NONE', BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', VERSIONS =
 '3', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
 KEEP_DELETED_CELLS =
 'false',

  BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true',
 BLOCKCACHE =
 'true'}

 1 row(s) in 0.7590 seconds

 Those are all default parameters. Which mean, the default FILE_SIZE value
 is 10GB.

 If I look into Hannibal, it's fine. I can see my table, the regions, the
 red line at 10GB showing the max size before the split, etc. All the
 regions are under this line except one!

 hadoop@buldo:~/hadoop-1.0.3$ bin/hadoop fs -ls
 /hbase/work_proposed/46f8ea6e24982fbeb249a4516c879109/@
 Found 1 items
 -rw-r--r--   3 hbase supergroup 22911054018 2013-08-03 20:57
 /hbase/work_proposed/46f8ea6e24982fbeb249a4516c879109/@/404fcf681e5e4fdbac99db80345b011b

 This region is 21GB. And it doesn't want to split. The first thing you will
 say is it's because I have one single 21GB row in this region, but I don't
 think so. My rows are URLs. I will be surprised if I have a 21GB URL ;)

 I triggered major_compact many times, I stopped/start the cluster many
 times, nothing. I can most probably ask for a manual split and that will
 work, but I want to take this oportunity to figure why it's not splitting,
 if it should be, and if there is any defect behind that.

 I have not found any exception in the logs. I just started another
 major_compaction and will grep the region name from the logs, but any idea
 why I'm facing that, and where in the code I should start to look at? I can
 deploy customized code to show more logs if required. I still start to look
 at the split policies...

 JM



-- 
Marcos Ortiz Valmaseda
Product Manager at PDVSA
http://about.me/marcosortiz


Fwd: scan very slow in hbase

2013-08-06 Thread Marcos Luis Ortiz Valmaseda
Regards, ch.
Which version of HBase are you using?
HBase´s development group have worked very hard in Scans improvements

http://www.slideshare.net/cloudera/hbase-consistency-and-performance-finalhttp://es.slideshare.net/cloudera/hbase-consistency-and-performance-final
http://www.slideshare.net/cloudera/6-real-time-analytics-with-h-base-alex-baranau-sematext-final-3-updated-last-minutehttp://es.slideshare.net/cloudera/6-real-time-analytics-with-h-base-alex-baranau-sematext-final-3-updated-last-minute
http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-finalhttp://es.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final


-- Forwarded message --
From: ch huang justlo...@gmail.com
Date: 2013/8/6
Subject: Re: scan very slow in hbase
To: user@hbase.apache.org


i find the error is because i stop the running java code,i have not found
why the scan is so slow

On Tue, Aug 6, 2013 at 10:38 PM, Stack st...@duboce.net wrote:

 I would suggest you search the mail archives before posting first (you
will
 usually get your answer faster if you go this route).

 The below has been answered in the recent past.  See
 http://search-hadoop.com/m/5tk8QnhFqw

 Thanks,
 St.Ack


 On Tue, Aug 6, 2013 at 12:39 AM, ch huang justlo...@gmail.com wrote:

  my workmate  tell me hbase is very slow for scan something ,i check the
  region server find the following information,anyone can help?
 
 
  13/08/06 15:30:34 WARN ipc.HBaseServer: IPC Server listener on 60020:
  readAndProcess threw exception java.io.IOException: Connection reset by
  peer. Count of bytes read: 0
  java.io.IOException: Connection reset by peer
  at sun.nio.ch.FileDispatcher.read0(Native Method)
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
  at sun.nio.ch.IOUtil.read(IOUtil.java:171)
  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
  at
 
 org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1796)
  at
 
 

org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1179)
  at
 
 

org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748)
  at
 
 

org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539)
  at
 
 

org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514)
  at
 
 

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
  13/08/06 15:30:34 ERROR regionserver.HRegionServer:
  org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call
  next(1916648340315433886, 1), rpc version=1, client version=29,
  methodsFingerPrint=-1368823753 from 192.168.2.209:1150 after 8504 ms,
  since
  caller disconnected
  at
 
 

org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
  at
 
 

org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3856)
  at
 
 

org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3776)
  at
 
 

org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3768)
  at
 
 

org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2513)
  at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
  at
 
 

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
 
 

org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
  at
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
 




-- 
Marcos Ortiz Valmaseda
Product Manager at PDVSA
http://about.me/marcosortiz


Re: scan very slow in hbase

2013-08-06 Thread Marcos Luis Ortiz Valmaseda
Regards, ch.
I was looking a post about this topic and I found it. It was posted in the
Ericsson Labs´s blog, and it talked about the approach they followed to
improve HBase performance in several ways, including Scans. You can read it
here:
http://labs.ericsson.com/blog/hbase-performance-tuners

Other resources:
Ameya´s talk about the HBase´s use at Groupon (See his last slide):
http://www.slideshare.net/cloudera/case-studies-session-3bhttp://es.slideshare.net/cloudera/case-studies-session-3b

Manoj and Goving talk:
http://www.slideshare.net/cloudera/hbasecon-2013-evolving-a-firstgeneration-apache-hbase-deployment-to-second-generation-and-beyondhttp://es.slideshare.net/cloudera/hbasecon-2013-evolving-a-firstgeneration-apache-hbase-deployment-to-second-generation-and-beyond



2013/8/6 Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com

 Regards, ch.
 Which version of HBase are you using?
 HBase´s development group have worked very hard in Scans improvements

 http://www.slideshare.net/cloudera/hbase-consistency-and-performance-finalhttp://es.slideshare.net/cloudera/hbase-consistency-and-performance-final

 http://www.slideshare.net/cloudera/6-real-time-analytics-with-h-base-alex-baranau-sematext-final-3-updated-last-minutehttp://es.slideshare.net/cloudera/6-real-time-analytics-with-h-base-alex-baranau-sematext-final-3-updated-last-minute

 http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-finalhttp://es.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final


 -- Forwarded message --
 From: ch huang justlo...@gmail.com
 Date: 2013/8/6
 Subject: Re: scan very slow in hbase
 To: user@hbase.apache.org


 i find the error is because i stop the running java code,i have not found
 why the scan is so slow

 On Tue, Aug 6, 2013 at 10:38 PM, Stack st...@duboce.net wrote:

  I would suggest you search the mail archives before posting first (you
 will
  usually get your answer faster if you go this route).
 
  The below has been answered in the recent past.  See
  http://search-hadoop.com/m/5tk8QnhFqw
 
  Thanks,
  St.Ack
 
 
  On Tue, Aug 6, 2013 at 12:39 AM, ch huang justlo...@gmail.com wrote:
 
   my workmate  tell me hbase is very slow for scan something ,i check the
   region server find the following information,anyone can help?
  
  
   13/08/06 15:30:34 WARN ipc.HBaseServer: IPC Server listener on 60020:
   readAndProcess threw exception java.io.IOException: Connection reset by
   peer. Count of bytes read: 0
   java.io.IOException: Connection reset by peer
   at sun.nio.ch.FileDispatcher.read0(Native Method)
   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
   at sun.nio.ch.IOUtil.read(IOUtil.java:171)
   at
 sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
   at
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1796)
   at
  
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1179)
   at
  
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748)
   at
  
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539)
   at
  
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514)
   at
  
  
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at
  
  
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
   13/08/06 15:30:34 ERROR regionserver.HRegionServer:
   org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call
   next(1916648340315433886, 1), rpc version=1, client version=29,
   methodsFingerPrint=-1368823753 from 192.168.2.209:1150 after 8504 ms,
   since
   caller disconnected
   at
  
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3856)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3776)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3768)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2513)
   at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
   at
  
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at
  
  
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
   at
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer

Re: Region size per region on the table page

2013-08-01 Thread Marcos Luis Ortiz Valmaseda
Hi, Bryan. If you file an issue for that, it would be nice to work on it.



2013/8/1 Bryan Beaudreault bbeaudrea...@hubspot.com

 Hannibal is very useful, but samar is right. It's another thing to install
 and maintain.  I'd hope that over time the need for tools like hannibal
 would be lessened as some of the features make its way into the main
 install.  Hannibal does its work through crawling log files, whereas some
 (or all) of the data it provides could be provided through the HBase api,
 and thus admin ui, in a less hacky way.

 If someone were willing to invest the time in adding such a metric to the
 hbase admin ui (and HBaseAdmin API please) it would bring us one step
 closer.


 On Thu, Aug 1, 2013 at 2:42 PM, samar.opensource 
 samar.opensou...@gmail.com
  wrote:

  Hi Jean,
You are right , hannibal does that, but it a seperate process we need
 to
  install/maintail. I thought if we had a quick and easy way to see it from
  master-status page. The stats are already on the regionserver page(like
  total size of the store) , just that it would make sense to have it on
 the
  table page too(IMO) to understand the data size distribution of regions
 of
  a particular table.
 
  Samar
 
  On 01/08/13 5:51 PM, Jean-Marc Spaggiari wrote:
 
  Hi Samar
 
  Hannibal is already doing what you are looking for.
 
  Cheers,
 
  JMS
 
  2013/8/1 samar.opensource samar.opensou...@gmail.com
 
   Hi Devs/Users,
  Most of the time we want to know if our table split logic is
 accurate
  of if our current regions are well balanced for a table. I was
 wondering
  if
  we can expose the size of region on the table.jsp too on the table
  region
  table. If people thing it is useful I can pick it up. Also let me know
 if
  it already exists.
 
  Samar
 
 
 




-- 
Marcos Ortiz Valmaseda
Product Manager at PDVSA
http://about.me/marcosortiz


Re: HBase on EMR disk space issue

2013-07-22 Thread Marcos Luis Ortiz Valmaseda
Regards, Oussana.
First advice: Update your HBase installation to a more recent version
like 0.94.9 for example:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12324431

The problem with EMR is the version for HBase is very old, so, it
would be nice instead EC2 for HBase with a recent version.
See the official docummentation here:
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hbase-launch.html

See these tips too:
http://aws-musings.com/7-tips-for-running-hbase-in-ec2/

HBase use-cases in the last HBaseCon 2013.
http://www.orzota.com/hbase-use-cases/
http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/35001


My favorite is the HBase at Pinterest talk:
http://www.hbasecon.com/sessions/apache-hbase-operations-at-pinterest/
http://www.slideshare.net/cloudera/operations-session-1

2013/7/22, Oussama Jilal jilal.ouss...@gmail.com:
 Hello, I need some help with my issue,

 We have been running HBase on Amazon EMR (v 0.92) for quite some months,
 today we had a serious problem, Hbase was throwing some connection
 refused errors in out client applications, after some investigation, we
 found out that one of the two EC2 instances running hbase (the
 slave/regionserver one) was disk full (800+ G), we can connect to both
 of the instances, but when we try to run a command from the HBase Shell
 (list, scan, disable, truncate ... anything) we only get errors like :

 13/07/22 14:06:29 INFO httpclient.HttpMethodDirector: I/O exception
 (java.net.ConnectException) caught when processing request: Connection
 refused
 13/07/22 14:06:29 INFO httpclient.HttpMethodDirector: Retrying request

 We want to truncate some tables and gz compress some column families but
 we can't do any operation.

 Executing the command jps on the master returns :

 6966 Jps
 1968 JobTracker
 2177 HQuorumPeer
 1814 NameNode
 4623 HMaster

 but on the region server it only return :

 5986 Jps

 Rebooting didn't help ...

 Any one can help ? We are not very experienced with HBase, especially
 administration.

 Thank you.



-- 
Marcos Ortiz Valmaseda
Product Manager at PDVSA
http://about.me/marcosortiz


Re: EC2 Elastic MapReduce HBase install recommendations

2013-05-07 Thread Marcos Luis Ortiz Valmaseda
I think that Andrew talked about this some years ago and he created some
scripts for that. You can find them here:
https://github.com/apurtell/hbase-ec2

Then, you can review some links about this topic:
http://blog.cloudera.com/blog/2012/10/set-up-a-hadoophbase-cluster-on-ec2-in-about-an-hour/
http://my.safaribooksonline.com/book/databases/storage-systems/9781849517140/1dot-setting-up-hbase-cluster/id286696951

http://whynosql.com/why-we-run-our-hbase-on-ec2/

You can read the HBase on EC2 demo from Andrew in the HBaseCon 2012:
https://github.com/apurtell/ec2-demo




2013/5/7 Pal Konyves paul.kony...@gmail.com

 Hi,

 Has anyone got some recommendations about running HBase on EC2? I am
 testing it, and so far I am very disappointed with it. I did not change
 anything about the default 'Amazon distribution' installation. It has one
 MasterNode and two slave nodes, and write performance is around 2500 small
 rows per sec at most, but I expected it to be way  better. Oh, and this is
 with batch put operations with autocommit turned off, where each batch
 containes about 500-1000 rows... When I do it with autocommit, it does not
 even reach the 1000 rows per sec.

 Every nodes were m1.Large ones.

 Any experiences, suggestions? Is it worth to try the RMap distribution
 instead of the amazon one?

 Thanks,
 Pal




-- 
Marcos Ortiz Valmaseda
Product Manager at PDVSA
http://about.me/marcosortiz


Re: EC2 Elastic MapReduce HBase install recommendations

2013-05-07 Thread Marcos Luis Ortiz Valmaseda
I think that you when you are talking about RMap, you are referring to
MapR´s distribution.
I think that MapR´s team released a very good version of its Hadoop
distribution focused on HBase called M7. You can see its overview here:
http://www.mapr.com/products/mapr-editions/m7-edition

But this release was under beta testing, and I see that it´s not included
in the Amazon Marketplace yet:
https://aws.amazon.com/marketplace/seller-profile?id=802b0a25-877e-4b57-9007-a3fd284815a5




2013/5/7 Pal Konyves paul.kony...@gmail.com

 Hi,

 Has anyone got some recommendations about running HBase on EC2? I am
 testing it, and so far I am very disappointed with it. I did not change
 anything about the default 'Amazon distribution' installation. It has one
 MasterNode and two slave nodes, and write performance is around 2500 small
 rows per sec at most, but I expected it to be way  better. Oh, and this is
 with batch put operations with autocommit turned off, where each batch
 containes about 500-1000 rows... When I do it with autocommit, it does not
 even reach the 1000 rows per sec.

 Every nodes were m1.Large ones.

 Any experiences, suggestions? Is it worth to try the RMap distribution
 instead of the amazon one?

 Thanks,
 Pal




-- 
Marcos Ortiz Valmaseda
Product Manager at PDVSA
http://about.me/marcosortiz


Re: hbase + mapreduce

2013-04-21 Thread Marcos Luis Ortiz Valmaseda
Here you have several examples:
http://hbase.apache.org/book/mapreduce.example.html
http://sujee.net/tech/articles/hadoop/hbase-map-reduce-freq-counter/
http://bigdataprocessing.wordpress.com/2012/07/27/hadoop-hbase-mapreduce-examples/
http://stackoverflow.com/questions/12215313/load-data-into-hbase-table-using-hbase-map-reduce-api




2013/4/21 Adrian Acosta Mitjans amitj...@estudiantes.uci.cu

 Hello:

 I'm working in a proyect, and i'm using hbase for storage
 the data, y have this method that work great but without the performance
  i'm looking for, so i want is to make the same but using mapreduce.


 public ArrayListMyObject findZ(String z) throws IOException {

 ArrayListMyObject rows = new ArrayListMyObject();
 Configuration conf = HBaseConfiguration.create();
 HTable table = new HTable(conf, test);
 Scan s = new Scan();
 s.addColumn(Bytes.toBytes(x), Bytes.toBytes(y));
 ResultScanner scanner = table.getScanner(s);
 try {
 for (Result rr : scanner) {
 if (Bytes.toString(rr.getValue(Bytes.toBytes(x),
 Bytes.toBytes(y))).equals(z)) {
 rows.add(getInformation(Bytes.toString(rr.getRow(;
 }
 }
 } finally {
 scanner.close();
 }
 return archivos;
 }

 the getInformation method take all the columns and convert the row in
 MyObject type.

 I
  just want a example or a link to a tutorial that make something like
 this,  i want to get a result type as answer and not a number to count
 words, like many a found.
 My natural language is spanish, so sorry if something is not well writing.

 Thanths
 http://www.uci.cu




-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186


Re: Inconsistent performance numbers with increased nodes

2013-04-19 Thread Marcos Luis Ortiz Valmaseda
Just a question, Alex. Why are you using OpenJDK? The first recommendation
for a Hadoop cluster is to use Java SDK from Oracle , because precisely
with OpenJDK, there are some performance issues, which should be fixed in
the next releases, but I encourage you to use Java 1.6. from Oracle.

- Which is the replication factor in your cluster? (default: 3)
- What is the value of your HDFS blocks? (default: 64 Mb, a good value is
128 Mb or 256 Mb depending of your cluster load)



2013/4/19 Alex O'Ree spyhunte...@gmail.com

 Marcos

 - Java version - 1.6 OpenJDK x64, latest version in the CentOS repo
 - JVM tuning configuration, I think that we just changed the max ram
 to close to 4GB
 - Hadoop JT, DN, NN configuration, 1 JT, 10/12 DN, 1 NN. No security, no
 ssl
 - Network topology, star
 - Network speed for the cluster, emulated 4G celluar
 - Hardware properties for all nodes in the cluster - 2 core, 2.2Ghz, 4GB
 ram
 - Which platform are you using for the benchmark? The benchmark was
 the basic word count sample app, using the wikipedia export as the
 data set.

 Here's the result set I'm looking at and i'm just giving bogus values
 to make the point
 10 DN cluster,
 10 minutes, consistently

 12 DN cluster,
 10m, 15m, 10m, 15m, 15m, 10m, 10m

 Basically, there the result set for the 12 DN cluster I expected to be
 consistent, however the data set isn't. Since there's a high
 correlation between the lowest values in the 12 DN data with the
 average values in the 10 DN cluster, I'm asserting that Hadoop may
 have just talked to 10 DNs instead of all 12.

 This is for a paper that I plan on publishing shortly containing
 emulated network conditions for a number of different network types.

 On Fri, Apr 19, 2013 at 3:26 PM, Marcos Luis Ortiz Valmaseda
 marcosluis2...@gmail.com wrote:
  Regards, Alex.
  We need more information to be able to get you a good answer:
  - Java version
  - JVM tuning configuration
  - Hadoop JT, DN, NN configuration
  - Network topology
  - Network speed for the cluster
  - Hardware properties for all nodes in the cluster
 
  Hadoop is an actual scalable system, where you can add more nodes and the
  performance should be better, but there are some configurations which can
  downgrade its performance.
 
  Another things is:
  Which platform are you using for the benchmark?
  There is an amazing platform developed by Jason Dai from Intel called
  Hibench, which is great for this kind of work.[1][2]
 
  With all this information, I think that we can help you to find the root
  causes behind the performance of the cluster.
 
  [1] https://github.com/intel-hadoop/HiBench
  [2]
 
 http://hadoopsummit.org/amsterdam-blog/meet-the-presenters-jason-dai-of-intel/
 
 
 
  2013/4/19 Alex O'Ree spyhunte...@gmail.com
 
  Hi I'm running a 10 data node cluster and was experimenting with
  adding additional nodes to it. I've done some performance bench
  marking with 10 nodes and have compared them to 12 nodes and I've
  found some rather interesting and inconsistent results. The behavior
  I'm seeing is that during some of the 12 node bench runs, I'm actually
  seeing two different performance levels, one set at a different level
  than 10 nodes, and another at exactly the performance of a 10 node
  cluster. I've eliminated any possibility of networking problems or
  problems related to a specific machine. Before switching to a 12 node
  cluster, the initial cluster was destroyed, rebuilt and the dataset
  was added in. This should have yielded an evenly balanced cluster
  (confirmed through the web app)
 
  So my question is, is this an expected behavior or is something else
  going on here that I'm not aware of. For reference, I'm using 1.0.8 on
  CentOS 6.3 x64
 
 
 
 
  --
  Marcos Ortiz Valmaseda,
  Data-Driven Product Manager at PDVSA
  Blog: http://dataddict.wordpress.com/
  LinkedIn: http://www.linkedin.com/in/marcosluis2186
  Twitter: @marcosluis2186




-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186


Re: RefGuide schema design examples

2013-04-19 Thread Marcos Luis Ortiz Valmaseda
Wow, great work, Doug.


2013/4/19 Doug Meil doug.m...@explorysmedical.com

 Hi folks,

 I reorganized the Schema Design case studies 2 weeks ago and consolidated
 them into here, plus added several cases common on the dist-list.

 http://hbase.apache.org/book.html#schema.casestudies

 Comments/suggestions welcome.  Thanks!


 Doug Meil
 Chief Software Architect, Explorys
 doug.m...@explorysmedical.com





-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186


Re: should i use compression?

2013-04-03 Thread Marcos Luis Ortiz Valmaseda
+1 for Ted´s advice.
Using compression can save a lot of space in memory and disc, so it´s a
good recommendation.



2013/4/3 Ted Yu yuzhih...@gmail.com

 You should use data block encoding (in 0.94.x releases only). It is helpful
 for reads.

 You can also enable compression.

 Cheers


 On Wed, Apr 3, 2013 at 6:42 AM, Prakash Kadel prakash.ka...@gmail.com
 wrote:

  Hello,
  I have a question.
  I have a table where i store data in the column qualifiers(the values
  itself are null).
  I just have 1 column family.
 The number of columns per row is variable (1~ few thousands)
 
  Currently i don't use compression or the data_block_encoding.
 
  Should i?
  I want to have faster reads.
 
  Please suggest.
 
 
  Sincerely,
  Prakash Kadel




-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186


Re: should i use compression?

2013-04-03 Thread Marcos Luis Ortiz Valmaseda
Regards, Jean-Marc.
The best resource that I found for this is a great blog post called Apache
HBase I/O - HFile  from Matteo Bertozzi in Cloudera´s blog. Here´s the link:
http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/




2013/4/3 Jean-Marc Spaggiari jean-m...@spaggiari.org

 Is there any documentation anywhere regarding the differences between
 PREFIX, DIFF and FAST_DIFF?

 2013/4/3 prakash kadel prakash.ka...@gmail.com:
  thank you very much.
  i will try with snappy compression with data_block_encoding
 
 
 
 
  On Wed, Apr 3, 2013 at 11:21 PM, Kevin O'dell kevin.od...@cloudera.com
 wrote:
 
  Prakash,
 
Yes, I would recommend Snappy Compression.
 
  On Wed, Apr 3, 2013 at 10:18 AM, Prakash Kadel prakash.ka...@gmail.com
 
  wrote:
   Thanks,
   is there any specific compression that is recommended of the use
  case i have?
  Since my values are all null will compression help?
  
I am thinking of using prefix data_block_encoding..
   Sincerely,
   Prakash Kadel
  
  
   On Apr 3, 2013, at 10:55 PM, Ted Yu wrote:
  
   You should use data block encoding (in 0.94.x releases only). It is
  helpful
   for reads.
  
   You can also enable compression.
  
   Cheers
  
  
   On Wed, Apr 3, 2013 at 6:42 AM, Prakash Kadel 
 prakash.ka...@gmail.com
  wrote:
  
   Hello,
  I have a question.
  I have a table where i store data in the column qualifiers(the
  values
   itself are null).
  I just have 1 column family.
 The number of columns per row is variable (1~ few thousands)
  
   Currently i don't use compression or the data_block_encoding.
  
   Should i?
   I want to have faster reads.
  
   Please suggest.
  
  
   Sincerely,
   Prakash Kadel
  
 
 
 
  --
  Kevin O'Dell
  Systems Engineer, Cloudera
 




-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186


Re: should i use compression?

2013-04-03 Thread Marcos Luis Ortiz Valmaseda
You can read this JIra issue for this too:
https://issues.apache.org/jira/browse/HBASE-4218



2013/4/3 Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com

 Regards, Jean-Marc.
 The best resource that I found for this is a great blog post called Apache
 HBase I/O - HFile  from Matteo Bertozzi in Cloudera´s blog. Here´s the link:
 http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/




 2013/4/3 Jean-Marc Spaggiari jean-m...@spaggiari.org

 Is there any documentation anywhere regarding the differences between
 PREFIX, DIFF and FAST_DIFF?

 2013/4/3 prakash kadel prakash.ka...@gmail.com:
  thank you very much.
  i will try with snappy compression with data_block_encoding
 
 
 
 
  On Wed, Apr 3, 2013 at 11:21 PM, Kevin O'dell kevin.od...@cloudera.com
 wrote:
 
  Prakash,
 
Yes, I would recommend Snappy Compression.
 
  On Wed, Apr 3, 2013 at 10:18 AM, Prakash Kadel 
 prakash.ka...@gmail.com
  wrote:
   Thanks,
   is there any specific compression that is recommended of the use
  case i have?
  Since my values are all null will compression help?
  
I am thinking of using prefix data_block_encoding..
   Sincerely,
   Prakash Kadel
  
  
   On Apr 3, 2013, at 10:55 PM, Ted Yu wrote:
  
   You should use data block encoding (in 0.94.x releases only). It is
  helpful
   for reads.
  
   You can also enable compression.
  
   Cheers
  
  
   On Wed, Apr 3, 2013 at 6:42 AM, Prakash Kadel 
 prakash.ka...@gmail.com
  wrote:
  
   Hello,
  I have a question.
  I have a table where i store data in the column qualifiers(the
  values
   itself are null).
  I just have 1 column family.
 The number of columns per row is variable (1~ few thousands)
  
   Currently i don't use compression or the data_block_encoding.
  
   Should i?
   I want to have faster reads.
  
   Please suggest.
  
  
   Sincerely,
   Prakash Kadel
  
 
 
 
  --
  Kevin O'Dell
  Systems Engineer, Cloudera
 




 --
 Marcos Ortiz Valmaseda,
 *Data-Driven Product Manager* at PDVSA
 *Blog*: http://dataddict.wordpress.com/
 *LinkedIn: *http://www.linkedin.com/in/marcosluis2186
 *Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186




-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186


Re: should i use compression?

2013-04-03 Thread Marcos Luis Ortiz Valmaseda
Here´s the API documentation:

*FAST_DIFF*:
http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.html

Encoder similar to
DiffKeyDeltaEncoderhttp://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.html
but
supposedly faster.
Compress using:
 - store size of common prefix
- save column family once in the first KeyValue
- use integer compression for key, value and prefix (7-bit encoding)
- use bits to avoid duplication key length, value length and type if it
same as previous
- store in 3 bits length of prefix timestamp with previous KeyValue's
timestamp
- one bit which allow to omit value if it is the same Format:
- 1 byte: flag
- 1-5 bytes: key length (only if FLAG_SAME_KEY_LENGTH is not set in flag)
- 1-5 bytes: value length (only if FLAG_SAME_VALUE_LENGTH is not set in
flag)
- 1-5 bytes: prefix length
- ... bytes: rest of the row (if prefix length is small enough)
- ... bytes: qualifier (or suffix depending on prefix length)
- 1-8 bytes: timestamp suffix - 1 byte: type (only if FLAG_SAME_TYPE is not
set in the flag)
- ... bytes: value (only if FLAG_SAME_VALUE is not set in the flag)

*DIFF*:
http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.html

Compress using:
- store size of common prefix
- save column family once, it is same within HFile
- use integer compression for key, value and prefix (7-bit encoding)
- use bits to avoid duplication key length, value length and type if it
same as previous
- store in 3 bits length of timestamp field
- allow diff in timestamp instead of actual value Format:
- 1 byte: flag
- 1-5 bytes: key length (only if FLAG_SAME_KEY_LENGTH is not set in flag)
- 1-5 bytes: value length (only if FLAG_SAME_VALUE_LENGTH is not set in
flag)
- 1-5 bytes: prefix length
- ... bytes: rest of the row (if prefix length is small enough)
- ... bytes: qualifier (or suffix depending on prefix length)
- 1-8 bytes: timestamp or diff - 1 byte: type (only if FLAG_SAME_TYPE is
not set in the flag) - ... bytes: value

I was reading the FAQ´s and there is not anything related to this topic. It
would be nice to include it in the documentation.

Lars, What do you think? It would be nice if you could write a detailed
blog post about this topic.





2013/4/3 Jean-Marc Spaggiari jean-m...@spaggiari.org

 I read the JIRA already but it was not clear to me. However Cloudera's
 link is very clear. Thanks for that. Any idea what's the difference
 between DIFF and FAST_DIFF?

 2013/4/3 Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com:
  You can read this JIra issue for this too:
  https://issues.apache.org/jira/browse/HBASE-4218
 
 
 
  2013/4/3 Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com
 
  Regards, Jean-Marc.
  The best resource that I found for this is a great blog post called
 Apache
  HBase I/O - HFile  from Matteo Bertozzi in Cloudera´s blog. Here´s the
 link:
  http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/
 
 
 
 
  2013/4/3 Jean-Marc Spaggiari jean-m...@spaggiari.org
 
  Is there any documentation anywhere regarding the differences between
  PREFIX, DIFF and FAST_DIFF?
 
  2013/4/3 prakash kadel prakash.ka...@gmail.com:
   thank you very much.
   i will try with snappy compression with data_block_encoding
  
  
  
  
   On Wed, Apr 3, 2013 at 11:21 PM, Kevin O'dell
   kevin.od...@cloudera.comwrote:
  
   Prakash,
  
 Yes, I would recommend Snappy Compression.
  
   On Wed, Apr 3, 2013 at 10:18 AM, Prakash Kadel
   prakash.ka...@gmail.com
   wrote:
Thanks,
is there any specific compression that is recommended of the
 use
   case i have?
   Since my values are all null will compression help?
   
 I am thinking of using prefix data_block_encoding..
Sincerely,
Prakash Kadel
   
   
On Apr 3, 2013, at 10:55 PM, Ted Yu wrote:
   
You should use data block encoding (in 0.94.x releases only). It
 is
   helpful
for reads.
   
You can also enable compression.
   
Cheers
   
   
On Wed, Apr 3, 2013 at 6:42 AM, Prakash Kadel
prakash.ka...@gmail.com
   wrote:
   
Hello,
   I have a question.
   I have a table where i store data in the column
 qualifiers(the
   values
itself are null).
   I just have 1 column family.
  The number of columns per row is variable (1~ few thousands)
   
Currently i don't use compression or the data_block_encoding.
   
Should i?
I want to have faster reads.
   
Please suggest.
   
   
Sincerely,
Prakash Kadel
   
  
  
  
   --
   Kevin O'Dell
   Systems Engineer, Cloudera
  
 
 
 
 
  --
  Marcos Ortiz Valmaseda,
  Data-Driven Product Manager at PDVSA
  Blog: http://dataddict.wordpress.com/
  LinkedIn: http://www.linkedin.com/in/marcosluis2186
  Twitter: @marcosluis2186
 
 
 
 
  --
  Marcos Ortiz Valmaseda,
  Data-Driven Product Manager at PDVSA
  Blog: http://dataddict.wordpress.com/
  LinkedIn: http://www.linkedin.com

Re: coprocessor is timing out in 0.94

2013-03-28 Thread Marcos Luis Ortiz Valmaseda
Regards, Saurabh.
I see that you are using SingleColumnValueFilter. Look for these links:
http://gbif.blogspot.com/2012/05/optimizing-hbase-mapreduce-scans-for.html
http://mapredit.blogspot.com/2012/05/using-filters-in-hbase-to-match-two.html

Take a look later to this link, about the working to improve scans:
https://issues.apache.org/jira/browse/HBASE-5416



2013/3/28 Agarwal, Saurabh saurabh.agar...@citi.com

 Ted,

 Thanks for response.

 Here is the filter we are using -
 SingleColumnValueFilter(Bytes.toBytes(columnFamily),
 Bytes.toBytes(columnQualifier), CompareFilter.CompareOp.EQUAL, new
 RegexStringComparator((?i)+keyword));

 The thread dump at different points show that coprocessor is getting
 called. Also logs showed it keep processing. But the speed is much slower
 compare to 0.92.

 Regards,
 Saurabh.

 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Thursday, March 28, 2013 6:57 PM
 To: user@hbase.apache.org
 Subject: Re: coprocessor is timing out in 0.94

 bq. I checked thread dump

 If there was no exception in region server logs, thread dump of region
 server when your coprocessor was running would reveal where it got stuck.

 From your description below, looks like you can utilize HBASE-5416 Improve
 performance of scans with some kind of filters.

 bq. to apply the filter on one of the column

 Basically this column is the essential column.

 Cheers

 On Thu, Mar 28, 2013 at 3:22 PM, Ted Yu yuzhih...@gmail.com wrote:

  bq. when I removed the filter, it ran fine in 0.94
 
  Can you disclose more information about your filter ?
 
  BTW 0.94.6 was just released which is fully compatible with 0.94.2
 
  Cheers
 
  On Thu, Mar 28, 2013 at 3:18 PM, Agarwal, Saurabh 
  saurabh.agar...@citi.com wrote:
 
  Hi,
 
  We are in process of migrating from 0.92.1 to 0.94.2. A coprocessor
  was running fine in 0.92. After migrating to 0.94, the client is
  timing out (java.net.SocketTimeoutException).  We are using
  coprocessor to apply the filter on one of the column and return the
  columns that match with that filter criteria. I checked thread dump,
  region server, web UI, logs. There is no error or exception.  One
  thing I noticed that when I removed the filter, it ran fine in 0.94 as
 well.
 
  Please advise if there is any specific setting we need to make in 0.94.
 
  Thanks,
  Saurabh.
 
 
 




-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186