major compaction best practice

2011-05-16 Thread Oleg Ruchovets
Hi , We running production environment cluster (10 machines) : 1) hbase version 0.90.2 2) 2 tablse 3) we create ~ 15 regions per day (region size 250Mb) I want to ask about major compaction best practices: 1) Have we to run it automatically or manually 2) How ofter it should run 3) Where

Re: HTable.put hangs on bulk loading

2011-05-16 Thread Stan Barton
stack-3 wrote: On Fri, May 13, 2011 at 7:44 AM, Stan Barton bartx...@gmail.com wrote: stack-3 wrote: On Thu, Apr 28, 2011 at 6:54 AM, Stan Barton bartx...@gmail.com wrote: Are you swapping Stan?  You are close to the edge with your RAM allocations.  What do you have swappyness set to?  

RE: major compaction best practice

2011-05-16 Thread Doug Meil
For starters, take a look at this... http://hbase.apache.org/book.html#perf.configurations -Original Message- From: Oleg Ruchovets [mailto:oruchov...@gmail.com] Sent: Monday, May 16, 2011 6:42 AM To: user@hbase.apache.org Subject: major compaction best practice Hi , We running

Re: Zookeeper Configuration Challenges (I think)

2011-05-16 Thread Barney Frank
OK, I must be doing something wrong. This will be the death of me if I don't pass my scalability testing on Wednesday for my project to get approved. Running on version 0.90.1-cdh3u0 using the pseudo-distributed mode for Hadoop and Hbase. ZK mode is standalone. How can I tell if Hbase is

Re: Zookeeper Configuration Challenges (I think)

2011-05-16 Thread Ted Yu
From hbase-default.xml: If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. Normally I would let client use the same hbase-site.xml as what server uses. After increasing maxClientCnxns, do you observe the same problem ? Cheers On

Re: Zookeeper Configuration Challenges (I think)

2011-05-16 Thread Barney Frank
It was not set in hbase-env.sh. The errors now seem to be gone. Thanks for your prompt attention after my cry for help. On Mon, May 16, 2011 at 9:01 AM, Ted Yu yuzhih...@gmail.com wrote: From hbase-default.xml: If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which

HBase Error - assignment of -ROOT- failure

2011-05-16 Thread Nightie Wolfi
Hi everyone, I've just installing hadoop and hbase from cloudera (3) but when I try to go to http://localhost:60010 it just sits there continually loading. I can get to the regionserver fine - http://localhost:60030... Looking at the master hbase server logs I can see the following log output

RE: number of column families

2011-05-16 Thread Doug Meil
It's currently bad in general. -Original Message- From: Lars Egarots [mailto:lars.egar...@yahoo.com] Sent: Monday, May 16, 2011 12:36 PM To: user@hbase.apache.org Subject: number of column families The user documentation, in the Apache HBase book, states: HBase currently does not do

Re: HBase Error - assignment of -ROOT- failure

2011-05-16 Thread Stack
On Sun, May 15, 2011 at 6:10 AM, Nightie Wolfi nightwolf...@gmail.com wrote: org.apache.hadoop.hbase.Chore.run(Chore.java:66) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at

Re: Performance degrades on moving from desktop to blade environment

2011-05-16 Thread Stack
On Mon, May 16, 2011 at 10:33 AM, Himanish Kushary himan...@gmail.com wrote: Hi, We are in the process of moving a small Hbase/Hadoop cluster from our development to production environment.Our development environment were few intel desktops (8 cores CPU/8 Gigs RAM/7200 rpm disks) running

Re: wrong region exception

2011-05-16 Thread Stack
See the rest of my email. St.Ack On Mon, May 16, 2011 at 8:18 AM, Robert Gonzalez robert.gonza...@maxpointinteractive.com wrote: 0.90.0 -Original Message- From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack Sent: Friday, May 13, 2011 2:21 PM To:

Re: Storing data un column qualifier

2011-05-16 Thread Stack
2011/5/16 Frédéric Fondement frederic.fondem...@uha.fr: Hi all, Simple question: is it correct practice to save data in a column family qualifier ? Others use the qualifier to carry data. There is no rule against it. St.Ack

Re: major compaction best practice

2011-05-16 Thread Stack
On Mon, May 16, 2011 at 3:42 AM, Oleg Ruchovets oruchov...@gmail.com wrote: I want to ask about major compaction best practices: 1) Have we to run it automatically or manually Major compaction runs once a day by default. It has a tendency whereby it will start just when you do not want it to

HDFS Balancer and HBase

2011-05-16 Thread Erik Onnen
Is there any reason why running an HDFS balancer on the filesystem used for HBase would be considered bad practice? Doesn't seem so to me at face value but I wanted to be sure it seemed sane before enabling it. Thanks, -erik

Re: HDFS Balancer and HBase

2011-05-16 Thread Jean-Daniel Cryans
It would move blocks that are used by the local region servers, messing up your block locality. That the first reason I can think of. J-D On Mon, May 16, 2011 at 11:14 AM, Erik Onnen eon...@gmail.com wrote: Is there any reason why running an HDFS balancer on the filesystem used for HBase would

Re: HDFS Balancer and HBase

2011-05-16 Thread Stack
Should be fine. Don't run it at a high rate or the network traffic will drag on your hbase serving. St.Ack On Mon, May 16, 2011 at 11:14 AM, Erik Onnen eon...@gmail.com wrote: Is there any reason why running an HDFS balancer on the filesystem used for HBase would be considered bad practice?

Re: HDFS Balancer and HBase

2011-05-16 Thread Erik Onnen
We're only at about .4 network capacity during peak load so I don't think we'll cause network issues. Disk I/O may be another story but network will be fine I suspect. On Mon, May 16, 2011 at 11:16 AM, Stack st...@duboce.net wrote: Should be fine.  Don't run it at a high rate or the network

Re: Performance degrades on moving from desktop to blade environment

2011-05-16 Thread Jean-Daniel Cryans
You are giving us the mile high overview of the problem, pointing to a specific culprit could be very time consuming. Instead, can you run some system tests and make sure things work the way they should? Are the disks strangely slow? Any switches acting up? Regarding your CPUs, counting is mostly

Region locality and .META. region

2011-05-16 Thread Ophir Cohen
Hi, I have two questions: 1. Does HBase knows how to handle blocks moving. e.g does HBase can recognize that some local block deleted from machine and move that region to machine with that block? 2. What happen if the region server of the .META. failed? Does HBase has duplicate region for that?

Re: Region locality and .META. region

2011-05-16 Thread Jean-Daniel Cryans
Hi, I have two questions: 1. Does HBase knows how to handle blocks moving. e.g does HBase can recognize that some local block deleted from machine and move that region to machine with that block? No, transparent. 2. What happen if the region server of the .META. failed? Does HBase has

Re: HTable.put hangs on bulk loading

2011-05-16 Thread Stack
On Mon, May 16, 2011 at 4:55 AM, Stan Barton bartx...@gmail.com wrote: Sorry.  How do you enable overcommitment of memory, or do you mean to say that your processes add up to more than the RAM you have? The memory overcommitment is needed because in order to let java still allocate the

Re: GC and High CPU

2011-05-16 Thread Jean-Daniel Cryans
If you have a high insert rate then maybe log rolling (which blocks inserts a little) makes it that the calls get queued enough (occupying heap) to make you enter a GC loop of death? Can you enable RPC logging and see if you can confirm that? Thx, J-D On Sun, May 15, 2011 at 5:37 PM, Jack Levin

Re: Hbase Master Failover Issue

2011-05-16 Thread Jean-Daniel Cryans
Hey Dmitriy, Awesome you could figure it out. I wonder if there's something that could be done in HBase to help debugging such problems... Suggestions? Also, just to make sure, this thread was started by Sean and it seems you stepped up for him... you are working together right? At least that's

Re: GC and High CPU

2011-05-16 Thread Stack
On Sun, May 15, 2011 at 5:37 PM, Jack Levin magn...@gmail.com wrote: I've added occupancy:  export HBASE_OPTS=$HBASE_OPTS -verbose:gc -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log Does

Re: inconsistencies reported by hbck...

2011-05-16 Thread Jean-Daniel Cryans
Would you be able to patch in https://issues.apache.org/jira/browse/HBASE-3695 and see what hbck tells you now? Else you could try using the 0.90.3 rc0 which has it too: http://people.apache.org/~stack/hbase-0.90.3-candidate-0/ J-D On Sun, May 15, 2011 at 9:11 AM, Andy Sautins

Re: Performance degrades on moving from desktop to blade environment

2011-05-16 Thread Himanish Kushary
Thanks for the reply. We ran the TestDFSIO benchmark on both the development and production and found the production to be better.The statistics are shown below. But once we bring HBase into the picture things gets reversed :-( The count operation,map-reduces etc becomes less performing on the

Re: Pagination through families / columns?

2011-05-16 Thread Jean-Daniel Cryans
I doesn't look like you are doing something wrong, also I looked at the unit tests and they seem to cover the basic usage of ColumnPaginationFilter. Can you try removing the addFamily and setMaxVersions to see if it has any effect? Thx, J-D On Fri, May 13, 2011 at 6:27 PM, Matthew Ward

Re: Performance degrades on moving from desktop to blade environment

2011-05-16 Thread Jean-Daniel Cryans
Ok I see... so the only thing that changed is the HW right? No upgrades to a new version? Also could it be possible that you changed some configs (or missed them)? BTW counting has a parameter for scanner caching, like you would write: count myTable, CACHE = 1000 and it should stream through your

Re: Pagination through families / columns?

2011-05-16 Thread Jack Levin
When we change versions to 1 from 3 on hbase table schema, things appear work right. -Jack On Mon, May 16, 2011 at 12:14 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: I doesn't look like you are doing something wrong, also I looked at the unit tests and they seem to cover the basic usage

Re: Performance degrades on moving from desktop to blade environment

2011-05-16 Thread Jack Levin
We had issues of moving into 32 core AMD box also. The issue was revolving around datanode getting slow after about 12 hours. What you need to do is check fsreadlatency_ave_time graph, if it appears spiky then you have a problem with IO, next get a graph of Runnable Threads they should be

Re: Hbase Master Failover Issue

2011-05-16 Thread sean barden
Dima and I work together. He's got a good amount of opensource experience on me and I got pulled away to work on something else(MS-SQL issues, no less). He gets all the fun. :). Seriously, the issue wouldn't have been solved without him stepping up. thx Dima!. sean On Mon, May 16, 2011 at

Re: Performance degrades on moving from desktop to blade environment

2011-05-16 Thread Himanish Kushary
Yes, it is only the HW that was changed . All the configurations are kept at default from the cloudera installer. The regionserver logs semms ok. On Mon, May 16, 2011 at 3:20 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Ok I see... so the only thing that changed is the HW right? No

Re: Performance degrades on moving from desktop to blade environment

2011-05-16 Thread Jack Levin
What is the clock rate of your CPUs (desktop vs blade)? -Jack On Mon, May 16, 2011 at 1:24 PM, Himanish Kushary himan...@gmail.com wrote: Yes, it is only the HW that was changed . All the configurations are kept at default from the cloudera installer. The regionserver logs semms ok. On

Re: GC and High CPU

2011-05-16 Thread Jack Levin
How do you tell? This is the log entries when we had 100% cpu: 2011-05-14T15:48:58.240-0700: 5128.407: [GC 5128.407: [ParNew: 17723K-780K(19136K), 0.0199350 secs] 4309804K-4292973K(5777060K), 0.0200660 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] 2011-05-14T15:48:58.349-0700: 5128.515: [GC

Re: Performance degrades on moving from desktop to blade environment

2011-05-16 Thread Himanish Kushary
*PRODUCTION SERVER CPU INFO* processor : 0 vendor_id : AuthenticAMD cpu family : 16 model : 9 model name : AMD Opteron(tm) Processor 6174 stepping : 1 cpu MHz : 2200.022 cache size : 512 KB physical id : 1 siblings : 12 core id : 0 cpu cores : 12 apicid : 16 fpu : yes fpu_exception : yes cpuid

Re: mapreduce job failure

2011-05-16 Thread Venkatesh
Thanks J-D Using hbase-0.20.6, 49 node cluster The map reduce job involve a full table scan...(region size 4 gig) The job runs great for 1 week.. Starts failing after 1 week of data accumulation (about 3000 regions) About 400 regions get created per day... Can you suggest any tunables at the

Re: GC and High CPU

2011-05-16 Thread Jack Levin
I think this will resolve my issue, here is the output: 14 2011-05-16T15:58 13 2011-05-16T15:59 12 2011-05-16T16:00 14 2011-05-16T16:01 14 2011-05-16T16:02 13 2011-05-16T16:03 11 2011-05-16T16:04 12 2011-05-16T16:05 11 2011-05-16T16:06 16:06:55

Re: GC and High CPU

2011-05-16 Thread Stack
So, the change is that you started using CMS? You were using the default GC previous? ParNew is much bigger now. St.Ack On Mon, May 16, 2011 at 4:11 PM, Jack Levin magn...@gmail.com wrote: I think this will resolve my issue, here is the output:     14 2011-05-16T15:58     13

Re: GC and High CPU

2011-05-16 Thread Jack Levin
Those are the lines I added: -XX:+CMSIncrementalMode \ -XX:+CMSIncrementalPacing \ --- -XX:-TraceClassUnloading -- -Jack (used CMS before) On Mon, May 16, 2011 at 4:19 PM, Stack st...@duboce.net wrote: So, the change is that you started using CMS?  You were using the default

Re: GC and High CPU

2011-05-16 Thread Andrew Purtell
This is interesting because our conventional wisdom is those settings should increase the chance of stop-the-world GC and should be avoided. - Andy (who always gets nervous when we start talking about GC black magic) From: Jack Levin magn...@gmail.com Subject: Re: GC and High CPU To:

Re: GC and High CPU

2011-05-16 Thread Jack Levin
I think in our case we have a deadlock of not cleaning garbage in large enough chunks; being stuck in high cpu is as good as being dead Jack On May 16, 2011 4:41 PM, Andrew Purtell apurt...@apache.org wrote: This is interesting because our conventional wisdom is those settings should increase

Re: GC and High CPU

2011-05-16 Thread Stack
I don't understand what of the below made a difference though the difference is plain from the GC logs you show. See below: On Mon, May 16, 2011 at 5:06 PM, Jack Levin magn...@gmail.com wrote: Those are the lines I added: -XX:+CMSIncrementalMode \   From the doc., it says about

Re: GC and High CPU

2011-05-16 Thread Jack Levin
This is the way I read it. Low processors == high CPU tasks, e.g. high load. So, Incremental takes GC down a number of notches when it comes to competing with CPU for APP threads. That being the case the deadlock is less likely. It would be useful to add code to the RS that will start blocking