Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread Stack
On Thu, Apr 21, 2011 at 10:49 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Anyway. For a million requests shot at a region server at various speeds between 300 and 500 qps the picture is not pretty. RPC metrics are arctually good -- no more than 1ms average per next() and 0 per get(). So

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread Ted Dunning
This actually sounds like there is a problem with concurrency either on the client or the server side. TCP is plenty fast for this and having a dedicated TCP connection over which multiple requests can be multiplexed is probably much better than UDP because you would have to adapt your own window

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread Ted Dunning
Dmitriy, Did I hear you say that you are instantiating a new Htable for each request? Or was that somebody else? On Thu, Apr 21, 2011 at 11:04 PM, Stack st...@duboce.net wrote: On Thu, Apr 21, 2011 at 10:49 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Anyway. For a million requests shot

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread tsuna
On Thu, Apr 21, 2011 at 10:49 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: What doesn't seem so fast is RPC. As i reported before, i was getting 25ms TTLB under the circumstances. In this case all the traffic to the node goes thru same client (but in reality of course the node's portion per

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread Dmitriy Lyubimov
yes this is for 500 QPS of scans returning back approx. 15k worth of data total. You saw HBASE-2939  Allow Client-Side Connection Pooling?  Would that help? Interesting. let me take a look. i kind of was thinking maybe there's some sense to allow to pool more than one tcp connection from same

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread Dmitriy Lyubimov
in this case i pool them as well, which doesn't seem to make any difference (compared to when i just reuse them -- but i am not writing but outside of the test i do so i do pool them using techniques similar to those in HTablePool, CAS-based queues etc. ) On Thu, Apr 21, 2011 at 11:09 PM, Ted

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread Dmitriy Lyubimov
yes that was closer to my expectations, too. i am scratching my head as well but i don't have time to figure this out any longer. in reality i won't have 500QPS stream between single client and single region so i don't care much. On Thu, Apr 21, 2011 at 11:08 PM, Ted Dunning tdunn...@maprtech.com

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread Ted Dunning
Yeah... but with UDP you have to do packet reassembly yourself. And do source quench and all kinds of things. Been there. Done that. Don't recommend it unless it is your day job. We built the Veoh peer to peer system on UDP. It had compelling advantages for us as we moved a terabit of data

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread Dmitriy Lyubimov
I doubt that TCP doesn't perform well.  If you really believe so, can you provide a packet capture collected with: sudo tcpdump -nvi eth0 -s0 -w /tmp/pcap port 60020 Thanks, i will certainly try. However same class machine same data same test locally vs. remote same subnet is de facto 100%

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread Dmitriy Lyubimov
Exactly. that's why i said 'for short scans and gets' and perhaps a combo. As soon as it exceeds a frame, we'd rather not to mess with reassembly. But I agree it is most likely not worth it. Most likely reason for my latencies is not this. On Thu, Apr 21, 2011 at 11:22 PM, Ted Dunning

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread Dmitriy Lyubimov
You saw HBASE-2939  Allow Client-Side Connection Pooling?  Would that help? Ok just read thru the issue. That's exactly what i thought upon reading the code in HBaseClient class. Although in my cluster it did not seem to have more than about 20% effect and it was more or less evaporated after 3

Re: hbase 0.90.2 - incredibly slow response

2011-04-22 Thread Gaojinchao
It seem likes my case. My test data: Puts:75090 ops/s, average latency:2.7 ms. scan:494 ops/s ,average latency:1356 ms. (HMaster 1 name node, 3 zoo keeper, 7 Region server/Data node) about my test, some schema may be slower in version 0.90.2. How do you design your schema? If there is any

HDFS and HBase heap

2011-04-22 Thread Iulia Zidaru
Hi all, Supposing we have to constantly hit all data stored, which is a good report between the HDFS space used and the HBase heap size allocated per node? Do you calculate it somehow? Also, is there a report between the hadoop heap size and the hbase heap size that we should take into

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread Bakhru, Raj
W - Original Message - From: Dmitriy Lyubimov [mailto:dlie...@gmail.com] Sent: Friday, April 22, 2011 02:50 AM To: user@hbase.apache.org user@hbase.apache.org Subject: Re: 0.90 latency performance, cdh3b4 You saw HBASE-2939 Allow Client-Side Connection Pooling? Would that help? Ok

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread tsuna
On Thu, Apr 21, 2011 at 11:25 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: I certainly would. Even more, i already read the code there  just a bit although not enough to understand where the efficiency comes from. Do you actually implement another version of RPC on non-blocking sockets there?

Exception after upgrading to 90.1

2011-04-22 Thread Pete Tyler
Seeing this error in the client. I can create new HTable instances fine until I get to this one unit test, then I can't open HTable instances that I could open earlier. As far as I can tell the erro starts happening immediately after my client process has run a map reduce job locally. Running

Re: Exception after upgrading to 90.1

2011-04-22 Thread Jean-Daniel Cryans
Probably the same ConnectionLossException that others have been describing on this list? I don't see it in your stack trace (in fact I can't really see anything), but it sounds like what you describe. J-D On Fri, Apr 22, 2011 at 10:32 AM, Pete Tyler peteralanty...@gmail.com wrote: Seeing this

Row Key Question

2011-04-22 Thread Peter Haidinyak
I have a question on how HBase decides to save rows based on Row Keys. Say I have a million rows to insert into a new table in a ten node cluster. Each row's key is some random 32 byte value and there are two columns per row, each column contains some random 32 byte value. My question is how

importtsv

2011-04-22 Thread Eric Ross
Hi all, I'm having some trouble running the importtsv tool on CDH3B4 configured in pseudo distributed mode. The tool works fine unless I add the option importtsv.bulk.output. Does importtsv with the option importtsv.bulk.output work in pseudo distributed mode or do I maybe have something

Re: Exception after upgrading to 90.1

2011-04-22 Thread Pete Tyler
Is it possible my use of map reduce has been rendered invalid / outdated by the upgrade? It appears to create the expected result but causes follow on logic in the client to fail as described above. CLIENT: HBaseConfiguration conf = new HBaseConfiguration() Job job = new

Re: HDFS and HBase heap

2011-04-22 Thread Jean-Daniel Cryans
The datanodes don't consume much memory, we run ours with 1GB and give the rest to the region servers. BTW if you want to serve the whole dataset, depending on your SLA, you might want to try HDFS-347 since concurrent HDFS access is rather slow. The other choice would be to make sure you can hold

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread Dmitriy Lyubimov
I doubt that TCP doesn't perform well.  If you really believe so, can you provide a packet capture collected with: sudo tcpdump -nvi eth0 -s0 -w /tmp/pcap port 60020 Hm. What i discovered there is that I assumed my hack at RS connection pooling was working but it doesn't seem to be. Even

Re: Row Key Question

2011-04-22 Thread Jean-Daniel Cryans
The splitting is based on when a region reaches a configured size (default is 256MB). A table starts with 1 region, and splits as needed when you insert. For a bit more info see: http://hbase.apache.org/book.html#regions.arch J-D On Fri, Apr 22, 2011 at 10:40 AM, Peter Haidinyak

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread Dmitriy Lyubimov
Got it . So that's why: HBaseRPC: protected final static ClientCache CLIENTS = new ClientCache(); Client Cache is static regardless of HConnection instances and connection id is pretty much server address. So i guess no external hack is possible to overcome that than. On Fri, Apr 22, 2011 at

RE: Row Key Question

2011-04-22 Thread Buttler, David
Regions split when they are larger than the configuration parameter region size. Your data is small enough to fit on a single region. Keys are sorted in a region. When a region splits the new regions are about half the size of the original region, and contain half the key space each. Dave

Re: Creating table with regions failed when zk crashed.

2011-04-22 Thread Jean-Daniel Cryans
What exactly happened here? As much as I enjoy reading logs, I also enjoy short descriptions of the context of what I'm looking at. J-D On Thu, Apr 21, 2011 at 8:36 PM, Gaojinchao gaojinc...@huawei.com wrote: Is there any issue about this ? 2011-04-21 14:48:24,676 INFO

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread Dmitriy Lyubimov
Benoit, Thank you. is it possible to configure this client to open more than one socket connection from same client to same region server? In other words, is HBASE-2939 already non-issue there? asynchbase implements the HBase RPC protocol in a different way, it's written from scratch.  It

Re: Exception after upgrading to 90.1

2011-04-22 Thread Pete Tyler
One job, then a scan. Both from the same JVM. I do want to run multiple jobs from the same client JVM and those tests are failing too. I'm currently trying to figure out why the job is closing the connection and how I can stop it doing so. From my iPhone On Apr 22, 2011, at 12:05 PM,

Re: Exception after upgrading to 90.1

2011-04-22 Thread Jean-Daniel Cryans
I'm pretty sure, like I mentioned before, that the issue isn't that a connection is closed but it's in fact not closed. Threads like those ones talk about it: http://search-hadoop.com/m/JFj52oETZn http://search-hadoop.com/m/Wxcn42PBN9g2 J-D On Fri, Apr 22, 2011 at 12:16 PM, Pete Tyler

Re: LocalJobRunner and HBASE-2669 woes

2011-04-22 Thread Ted Yu
For HBASE-3777, Karthick and I finally nailed down issues related to finalizer that made TestTableMapReduce fail. A final patch would be put up for review :-). In the end, we expect user to use (better tuned) API wisely. We will add more javadoc for HTable and the new HConnectionKey class. Take

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread tsuna
On Fri, Apr 22, 2011 at 12:15 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: is it possible to configure this client to open more than one socket connection from same client to same region server? In other words, is HBASE-2939 already non-issue there? No asynchbase doesn't have HBASE-2939, but

Re: 0.90 latency performance, cdh3b4

2011-04-22 Thread Dmitriy Lyubimov
Thank you, sir. On Fri, Apr 22, 2011 at 12:31 PM, tsuna tsuna...@gmail.com wrote: On Fri, Apr 22, 2011 at 12:15 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: is it possible to configure this client to open more than one socket connection from same client to same region server? In other words,

Re: LocalJobRunner and HBASE-2669 woes

2011-04-22 Thread Stack
On Wed, Apr 20, 2011 at 7:20 PM, Robert Mahfoud robert.mahf...@gmail.com wrote: I think that this wasn't a wise design choice since one wouldn't expect using an incidental class (TOF) to have such a pervasive side effect. Agreed. Better testing -- coverage and exercise of candidate release

RE: Row Key Question

2011-04-22 Thread Peter Haidinyak
Thanks, that's the way I visualized it happening. Then the assumption is this process would continue until every server in the cluster has on region of data (more or less). My underlying question is that I need to store my data with the key starting with the date (-MM-DD). I know this means

Re: Row Key Question

2011-04-22 Thread Jean-Daniel Cryans
That's almost exactly what mozilla is doing with sorocco (google for their presentations). Also you seem to assume things about the region balancer that are, at least at the moment, untrue: Then the assumption is this process would continue until every server in the cluster has on region of

RE: Row Key Question

2011-04-22 Thread Peter Haidinyak
Thanks for the link, nice doodles :-) He kind of validates my thoughts, sequential key = BAD, but if you must do it use a prefix. I'm hoping that over time the keys will end up having a better distribution and I can still do a scan using a start and end row. I'll see how it distributes on my

Re: RPC metrics coming up as 0

2011-04-22 Thread Dmitriy Lyubimov
thanks i already did that :) On Thu, Apr 21, 2011 at 10:50 PM, Stack st...@duboce.net wrote: On Thu, Apr 21, 2011 at 12:52 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: On a completely other issue i was reporting, I still have no ideas why remote client latencies would hover around 25ms in

Splitlog() executed while the namenode was in safemode may cause data-loss

2011-04-22 Thread bijieshan
Hi, I found this problem while the namenode went into safemode due to some unclear reasons. There's one patch about this problem: try { HLogSplitter splitter = HLogSplitter.createLogSplitter( conf, rootdir, logDir, oldLogDir, this.fs); try { splitter.splitLog();