On Thu, Apr 21, 2011 at 10:49 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
Anyway. For a million requests shot at a region server at various
speeds between 300 and 500 qps the picture is not pretty. RPC metrics
are arctually good -- no more than 1ms average per next() and 0 per
get(). So
This actually sounds like there is a problem with concurrency either on the
client or the server side. TCP is plenty fast for this and having a
dedicated TCP connection over which multiple requests can be multiplexed is
probably much better than UDP because you would have to adapt your own
window
Dmitriy,
Did I hear you say that you are instantiating a new Htable for each request?
Or was that somebody else?
On Thu, Apr 21, 2011 at 11:04 PM, Stack st...@duboce.net wrote:
On Thu, Apr 21, 2011 at 10:49 PM, Dmitriy Lyubimov dlie...@gmail.com
wrote:
Anyway. For a million requests shot
On Thu, Apr 21, 2011 at 10:49 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
What doesn't seem so fast is RPC. As i reported before, i was getting
25ms TTLB under the circumstances. In this case all the traffic to the
node goes thru same client (but in reality of course the node's
portion per
yes this is for 500 QPS of scans returning back approx. 15k worth of data total.
You saw HBASE-2939 Allow Client-Side Connection Pooling? Would that help?
Interesting. let me take a look. i kind of was thinking maybe there's
some sense to allow to pool more than one tcp connection from same
in this case i pool them as well, which doesn't seem to make any
difference (compared to when i just reuse them -- but i am not writing
but outside of the test i do so i do pool them using techniques
similar to those in HTablePool, CAS-based queues etc. )
On Thu, Apr 21, 2011 at 11:09 PM, Ted
yes that was closer to my expectations, too. i am scratching my head
as well but i don't have time to figure this out any longer. in
reality i won't have 500QPS stream between single client and single
region so i don't care much.
On Thu, Apr 21, 2011 at 11:08 PM, Ted Dunning tdunn...@maprtech.com
Yeah... but with UDP you have to do packet reassembly yourself.
And do source quench and all kinds of things.
Been there. Done that. Don't recommend it unless it is your day job.
We built the Veoh peer to peer system on UDP. It had compelling advantages
for us as we moved a terabit of data
I doubt that TCP doesn't perform well. If you really believe so, can
you provide a packet capture collected with:
sudo tcpdump -nvi eth0 -s0 -w /tmp/pcap port 60020
Thanks, i will certainly try. However same class machine same data
same test locally vs. remote same subnet is de facto 100%
Exactly. that's why i said 'for short scans and gets' and perhaps a
combo. As soon as it exceeds a frame, we'd rather not to mess with
reassembly. But I agree it is most likely not worth it. Most likely
reason for my latencies is not this.
On Thu, Apr 21, 2011 at 11:22 PM, Ted Dunning
You saw HBASE-2939 Allow Client-Side Connection Pooling? Would that help?
Ok just read thru the issue. That's exactly what i thought upon
reading the code in HBaseClient class. Although in my cluster it did
not seem to have more than about 20% effect and it was more or less
evaporated after 3
It seem likes my case.
My test data:
Puts:75090 ops/s, average latency:2.7 ms.
scan:494 ops/s ,average latency:1356 ms.
(HMaster 1 name node, 3 zoo keeper, 7 Region server/Data node)
about my test, some schema may be slower in version 0.90.2.
How do you design your schema?
If there is any
Hi all,
Supposing we have to constantly hit all data stored, which is a good
report between the HDFS space used and the HBase heap size allocated per
node? Do you calculate it somehow?
Also, is there a report between the hadoop heap size and the hbase heap
size that we should take into
W
- Original Message -
From: Dmitriy Lyubimov [mailto:dlie...@gmail.com]
Sent: Friday, April 22, 2011 02:50 AM
To: user@hbase.apache.org user@hbase.apache.org
Subject: Re: 0.90 latency performance, cdh3b4
You saw HBASE-2939 Allow Client-Side Connection Pooling? Would that help?
Ok
On Thu, Apr 21, 2011 at 11:25 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
I certainly would. Even more, i already read the code there just a
bit although not enough to understand where the efficiency comes from.
Do you actually implement another version of RPC on non-blocking
sockets there?
Seeing this error in the client. I can create new HTable instances fine until
I get to this one unit test, then I can't open HTable instances that I could
open earlier. As far as I can tell the erro starts happening immediately
after my client process has run a map reduce job locally.
Running
Probably the same ConnectionLossException that others have been
describing on this list? I don't see it in your stack trace (in fact I
can't really see anything), but it sounds like what you describe.
J-D
On Fri, Apr 22, 2011 at 10:32 AM, Pete Tyler peteralanty...@gmail.com wrote:
Seeing this
I have a question on how HBase decides to save rows based on Row Keys. Say I
have a million rows to insert into a new table in a ten node cluster. Each
row's key is some random 32 byte value and there are two columns per row, each
column contains some random 32 byte value.
My question is how
Hi all,
I'm having some trouble running the importtsv tool on CDH3B4 configured in
pseudo distributed mode.
The tool works fine unless I add the option importtsv.bulk.output.
Does importtsv with the option importtsv.bulk.output work in pseudo distributed
mode or do I maybe have something
Is it possible my use of map reduce has been rendered invalid / outdated by
the upgrade? It appears to create the expected result but causes follow on
logic in the client to fail as described above.
CLIENT:
HBaseConfiguration conf = new HBaseConfiguration()
Job job = new
The datanodes don't consume much memory, we run ours with 1GB and give
the rest to the region servers.
BTW if you want to serve the whole dataset, depending on your SLA, you
might want to try HDFS-347 since concurrent HDFS access is rather
slow. The other choice would be to make sure you can hold
I doubt that TCP doesn't perform well. If you really believe so, can
you provide a packet capture collected with:
sudo tcpdump -nvi eth0 -s0 -w /tmp/pcap port 60020
Hm. What i discovered there is that I assumed my hack at RS connection
pooling was working but it doesn't seem to be.
Even
The splitting is based on when a region reaches a configured size
(default is 256MB). A table starts with 1 region, and splits as needed
when you insert. For a bit more info see:
http://hbase.apache.org/book.html#regions.arch
J-D
On Fri, Apr 22, 2011 at 10:40 AM, Peter Haidinyak
Got it . So that's why:
HBaseRPC:
protected final static ClientCache CLIENTS = new ClientCache();
Client Cache is static regardless of HConnection instances and
connection id is pretty much server address.
So i guess no external hack is possible to overcome that than.
On Fri, Apr 22, 2011 at
Regions split when they are larger than the configuration parameter region
size. Your data is small enough to fit on a single region.
Keys are sorted in a region. When a region splits the new regions are about
half the size of the original region, and contain half the key space each.
Dave
What exactly happened here? As much as I enjoy reading logs, I also
enjoy short descriptions of the context of what I'm looking at.
J-D
On Thu, Apr 21, 2011 at 8:36 PM, Gaojinchao gaojinc...@huawei.com wrote:
Is there any issue about this ?
2011-04-21 14:48:24,676 INFO
Benoit,
Thank you.
is it possible to configure this client to open more than one socket
connection from same client to same region server?
In other words, is HBASE-2939 already non-issue there?
asynchbase implements the HBase RPC protocol in a different way, it's
written from scratch. It
One job, then a scan. Both from the same JVM. I do want to run multiple jobs
from the same client JVM and those tests are failing too.
I'm currently trying to figure out why the job is closing the connection and
how I can stop it doing so.
From my iPhone
On Apr 22, 2011, at 12:05 PM,
I'm pretty sure, like I mentioned before, that the issue isn't that a
connection is closed but it's in fact not closed. Threads like those
ones talk about it:
http://search-hadoop.com/m/JFj52oETZn
http://search-hadoop.com/m/Wxcn42PBN9g2
J-D
On Fri, Apr 22, 2011 at 12:16 PM, Pete Tyler
For HBASE-3777, Karthick and I finally nailed down issues related to
finalizer that made TestTableMapReduce fail.
A final patch would be put up for review :-).
In the end, we expect user to use (better tuned) API wisely.
We will add more javadoc for HTable and the new HConnectionKey class.
Take
On Fri, Apr 22, 2011 at 12:15 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
is it possible to configure this client to open more than one socket
connection from same client to same region server?
In other words, is HBASE-2939 already non-issue there?
No asynchbase doesn't have HBASE-2939, but
Thank you, sir.
On Fri, Apr 22, 2011 at 12:31 PM, tsuna tsuna...@gmail.com wrote:
On Fri, Apr 22, 2011 at 12:15 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
is it possible to configure this client to open more than one socket
connection from same client to same region server?
In other words,
On Wed, Apr 20, 2011 at 7:20 PM, Robert Mahfoud
robert.mahf...@gmail.com wrote:
I think that this wasn't a wise design choice since one wouldn't expect
using an incidental class (TOF) to have such a pervasive side effect.
Agreed.
Better testing -- coverage and exercise of candidate release
Thanks, that's the way I visualized it happening. Then the assumption is this
process would continue until every server in the cluster has on region of data
(more or less). My underlying question is that I need to store my data with the
key starting with the date (-MM-DD). I know this means
That's almost exactly what mozilla is doing with sorocco (google for
their presentations).
Also you seem to assume things about the region balancer that are, at
least at the moment, untrue:
Then the assumption is this process would continue until every server in the
cluster has on region of
Thanks for the link, nice doodles :-) He kind of validates my thoughts,
sequential key = BAD, but if you must do it use a prefix. I'm hoping that over
time the keys will end up having a better distribution and I can still do a
scan using a start and end row. I'll see how it distributes on my
thanks i already did that :)
On Thu, Apr 21, 2011 at 10:50 PM, Stack st...@duboce.net wrote:
On Thu, Apr 21, 2011 at 12:52 AM, Dmitriy Lyubimov dlie...@gmail.com wrote:
On a completely other issue i was reporting, I still have no ideas why
remote client latencies would hover around 25ms in
Hi,
I found this problem while the namenode went into safemode due to some unclear
reasons.
There's one patch about this problem:
try {
HLogSplitter splitter = HLogSplitter.createLogSplitter(
conf, rootdir, logDir, oldLogDir, this.fs);
try {
splitter.splitLog();
38 matches
Mail list logo