https://issues.apache.org/jira/browse/HBASE-3857
There is some design doc there.
Also, I know Mikhail Bautin has done some presentations on HFile v2,
as have some other folks from FB. Some digging on google might turn
up some decks/videos from HUGS, etc.
On Sat, Nov 3, 2012 at 3:38 PM, Doug
+1 for 10/29 user meetup!
On Thu, Sep 13, 2012 at 10:44 AM, Stack st...@duboce.net wrote:
The folks at wizecommerce have kindly offered to host a meetup down in
San Mateo on the evening of 10/29.
Are you all up for a user meetup at the end of October after Hadoop World?
If so, I'll stick it
Not sure what kind of integration you're talking about, but if just want to
create a project with the HBase source then just grab an SVN checkout of an
HBase repo and just do:
mvn eclipse:eclipse
This creates all the necessary project files. Then just add new project from
existing source.
Yes, you can use the Result once you give back the HTable reference. Result is
self contained.
-Original Message-
From: Joel Halbert [mailto:j...@su3analytics.com]
Sent: Wednesday, September 28, 2011 6:27 AM
To: user@hbase.apache.org
Subject: Re: Correct use of HTablePool
Sure,
The DFS errors are after the server aborts. What is in the log before the
server abort? Doesn't seem to show any reason here which is unusual.
Anything in the master? Did it time out this RS? You're running with
replication = 1?
-Original Message-
From: Bradford Stephens
Just to chime in with my usual take on this (seems like the tall vs. wide
discussion happens every few weeks...)
For get all children of a parent, doing a get() on the wide table vs. doing a
scan() on the tall table (as long as you set scanner caching appropriately)
will be almost identical.
There is not really a limit on that. Underneath, the client will deal with
following regions around if the move, a new master if there is a failover,
etc...
The only thing that cannot be left open indefinitely is a scanner (they have a
server-side lease and expire if left idle).
JG
There is only one active HBase master at any given time, but there can be any
number of backup masters. The failover is automated and coordinated via
ZooKeeper. Regionservers and clients use ZooKeeper to determine who is the
current active master. You can run with as many as you want.
On
There are others who have had far more experience than I have with HBase + EC2,
so will let them chime in. But I personally recommend against this direction
if you expect to have a consistent cluster size and/or a significant amount of
load.
EC2 is great at quickly scaling up/down, but is
@hbase.apache.org
Subject: RE: How long can I have a table open?
Thanks, I have been having some real stability problems with my cluster and
I'm trying to narrow down the possible problems.
-Pete
-Original Message-
From: Jonathan Gray [mailto:jg...@fb.com]
Sent: Monday, February 07
of the three in the
slaves as well I take it then execute your start commands from one master
only example MasterA.
On 2/7/11 1:29 PM, Jonathan Gray jg...@fb.com wrote:
There is only one active HBase master at any given time, but there can
be any number of backup masters. The failover
Result is just the client-side class which wraps whatever the server returns.
The ability to do this query is not really about whether Result has the methods
to get at this data, but rather whether Scan supports this type of query (it
does).
Scan.addFamily(family) will make it so that every
How much heap are you running on your RegionServers?
6GB of total RAM is on the low end. For high throughput applications, I would
recommend at least 6-8GB of heap (so 8+ GB of RAM).
-Original Message-
From: charan kumar [mailto:charan.ku...@gmail.com]
Sent: Thursday, February 03,
The best way to do this is as Friso describes, using the existing stopRow
parameter in Scan.
There is another way to do it with startRow + a filter. There is a
PrefixFilter which could be used here. Looking at the code, it seems as though
the PrefixFilter does an early out and stops the scan
.
Thanks
-Pete
-Original Message-
From: Jonathan Gray [mailto:jg...@fb.com]
Sent: Thursday, January 20, 2011 8:09 AM
To: user@hbase.apache.org
Subject: RE: Scan (Start Row, End Row) vs Scan (Row)
The best way to do this is as Friso describes, using the existing stopRow
parameter
It's strongly recommended that you upgrade to HBase 0.20.6 (at least) if not
HBase 0.90.0. There are several critical bug fixes in the releases between
0.20.2 and 0.20.6 besides stargate.
-Original Message-
From: mike anderson [mailto:saidthero...@gmail.com]
Sent: Thursday, January
the bullet and just update to 0.90.0.
Cheers,
Mike
On Thu, Jan 20, 2011 at 1:07 PM, Jonathan Gray jg...@fb.com wrote:
It's strongly recommended that you upgrade to HBase 0.20.6 (at least)
if not HBase 0.90.0. There are several critical bug fixes in the
releases between 0.20.2
suite?
I really hope not.'
Thanks,
Mark
-Original Message-
From: Jonathan Gray [mailto:jg...@fb.com]
Sent: Tuesday, January 18, 2011 7:56 PM
To: user@hbase.apache.org
Subject: RE: hbase 0.20.6 - HBaseClusterTestCase, DU, cygwin, IntelliJ - arg!
Hey Mark. Sorry to hear about
In HBase 0.90.0 there is a new retain assignment configuration parameter that
makes it so your cluster keeps the same region assignment between full cluster
restarts. It is ON by default.
JG
-Original Message-
From: Tao Xie [mailto:xietao.mail...@gmail.com]
Sent: Thursday, January
The API shows one row per next() call but the number of rows fetched per RPC
can be configured much higher with Scan.setCaching().
Filters are basically just server-side predicates that will dictate which
rows/columns/values will be returned to the client. This does not relate to
the number
Hey Mark. Sorry to hear about your troubles.
There is a new testing facility that has replaced HBaseClusterTestCase. Check
out HBaseTestingUtility. It's JUnit4 based. One sample usage of it is the
test TestFromClientSide.
This new one includes support for multiple DataNodes, RegionServers,
These are a different kind of pause (those caused by blockingStoreFiles).
This is HBase stepping in and actually blocking updates to a region because
compactions have not been able to keep up with the write load. It could
manifest itself in the same way but this is different than shorter
One of the most important factors to look at is how the number of regions
relates to how much heap is available for your RegionServers, and then how that
will impact your expected MemStore flush sizes. More than total number of
regions, this is about the number of actively written to regions.
BatchUpdate is the old, deprecated version of Put. You are using the best APIs.
-Original Message-
From: Weishung Chung [mailto:weish...@gmail.com]
Sent: Monday, January 10, 2011 10:10 AM
To: user@hbase.apache.org
Subject: Re: HTable.put(ListPut puts) perform batch insert?
Thank
It's not really a bug.
I think the assumption is that if you are at the level of doing your own bulk
loads, you should also manage when you want to compact and split. I know in
cases where I've done this, I would usually know at certain points I would want
to trigger major compactions.
At
The first step to debugging HBase is usually going through the Master and
RegionServer logs. Sometimes it can be more art than science but a majority of
our debugging is done with log analysis.
If you can find specific offending regions, you can parse through the logs
looking for mentions of
So there was an existing hbase directory, right? I thought you had said that
was not the case.
If you were attempting to upgrade with existing data, it could be an
incompatibility between 0.20.6 and 0.89. Might be fixed already not sure.
-Original Message-
From: Pete Haidinyak
Seems like hooking into replication would be a good approach.
There's also a JIRA open about a changes API.
https://issues.apache.org/jira/browse/HBASE-3247
Or you could use Coprocessors which are committed in 0.92 / trunk. The
pre/post hooks can be used as a per-operation trigger mechanism.
You have existing data? Try clearing out your hbase directory in hdfs. Looks
like some weird problem reading the hbase.version file out of HDFS.
-Original Message-
From: Pete Haidinyak [mailto:javam...@cox.net]
Sent: Tuesday, December 21, 2010 3:32 PM
To: HBase Group
Subject: I
1. It's a column based sparse table so null's take up no space(ie.
More room when we need to duplicate)
Correct. Nulls take up no space.
2. Indexes take up space in an RDBMS already and are essentially
duplication in your old RDBMS anyways
Secondary indexes in an RDBMS use
?
Thanks
-Pete
On Tue, 21 Dec 2010 17:24:36 -0800, Jonathan Gray jg...@fb.com wrote:
You have existing data? Try clearing out your hbase directory in hdfs.
Looks like some weird problem reading the hbase.version file out of HDFS.
-Original Message-
From: Pete Haidinyak
HBase doesn't hashcode anything. It does strict lexicographical ordering of
the row keys themselves. So yes, keys with similar prefixes may be in the same
partition / next to each other.
Rather than using a hashcode modulo some number, we use the META table to
determine which partition
Hey Peter,
That System.exit line is nothing important, just the main thread waiting for
the tasks to finish before closing.
You're interested in having the MR job return a single result? To do that, you
would need to roll-up the processing done in each of your Map tasks into a
single Reduce
All of my experience doing something like this was with straight Java.
There are MultiGet and MultiPut capabilities in the Java client that will help
you out significantly.
I played with Jython and HBase a couple years ago and back then the performance
was horrible. I never looked back but I
it mean though we would incur Java startup cost? Or do you
propose we write some sort of java server that has the JVM running and is
able to get multi-get queries?
Thanks.
-Jack
On Fri, Dec 17, 2010 at 11:15 AM, Jonathan Gray jg...@fb.com wrote:
All of my experience doing something
much
on coprocessors, can you point me to some examples of their use?
Thanks
-Pete
-Original Message-
From: Jonathan Gray [mailto:jg...@fb.com]
Sent: Friday, December 17, 2010 11:13 AM
To: user@hbase.apache.org
Subject: RE: Results from a Map/Reduce
Hey Peter
available to
send back as a web page.
This seems like such a basic operation that I am hoping there are 'Best
Practices' or examples on how to accomplish this. I would also like a pony
too.
:-)
Thanks
-Pete
-Original Message-
From: Jonathan Gray [mailto:jg...@fb.com]
Sent
You absolutely need to do some testing and benchmarking.
This sounds like the kind of application that will require lots of tuning to
get right. It also sounds like the kind of thing HDFS is typically not very
good at.
There is an increasing amount of activity in this area (optimizing HDFS
it takes a
couple of minutes to do a scan that brings back several million rows. My boss
wants the query to be in the 'less than five second' range.
Thanks
-Pete
-Original Message-
From: Jonathan Gray [mailto:jg...@fb.com]
Sent: Friday, December 17, 2010 1:19 PM
To: user
possible
-Jack
On Dec 17, 2010, at 1:32 PM, Jonathan Gray jg...@fb.com wrote:
I'm not sure I understand.
Are you trying to build a client? Or you want something that behaves like
the mysql client?
-Original Message-
From: Jack Levin [mailto:magn...@gmail.com]
Sent
Hey Imran,
This looks reasonable but it's hard to say without knowing what the read/write
workload is like. You say all searches are done using Solr... will that also
be hosted on these servers?
One thing. It looks like you have two servers for ZK? ZK should always be run
in odd numbers
Hey Adam,
Do you need to scan all of the entries in order to know which ones you need to
change the expiration of? Or do you have that information as an input?
As for why you can't insert an older version, it is because HBase sorts all
columns in descending version order regardless of
entries
On 12/14/10 12:57 AM, Jonathan Gray wrote:
Hey Adam,
Do you need to scan all of the entries in order to know which ones you
need to change the expiration of? Or do you have that information as an
input?
I don't have to scan everything, but I also can't pinpoint all the entries
There might be a little confusion.
Specifying start/stop rows vs. scanning all rows with a filter... yes, clearly
the start/stop is far more efficient.
What Ryan is talking about is specifying the start row and then using a filter
to determine when you're done with the rows you want. In this
The 5 RS will be connecting to all 10 DNs. However, when writing to HDFS the
first replica always goes to the local node. Because of this, the 5 DNs that
are hosting the 5 RS could potentially have more data than the other 5 DNs.
In almost all installations I've been a part of the #RS == #DN
That sounds right. One node would have NN, HMaster, and ZK. Others would have
DN and RS. You could put the SNN on any of the slave nodes I suppose.
-Original Message-
From: Nanheng Wu [mailto:nanhen...@gmail.com]
Sent: Tuesday, December 14, 2010 10:12 PM
To: user@hbase.apache.org
HBase is not designed or well tested for production or stability on 2 nodes.
It will work on 2 nodes, but do not expect good performance or stability.
What is the hardware configuration and daemon setup on this cluster of 2 nodes?
How many cores, spindles, RAM, heap sizes etc... And you have
Hey,
Need some more info.
Can you paste logs from the MR tasks that fail? What's going on in the cluster
while the MR job is running (cpu, io-wait, memory, etc)?
And what is the setup of your cluster... how many nodes, specs of nodes (cores,
memory, RS heap), and then how many concurrent map
Jiajun,
Hard to say whether you've lost data or not. Something looks wrong with HDFS.
What versions of HBase and HDFS are you running?
What's going on in the logs of the DataNodes and the NameNode when this is
happening? What about the dfs web ui?
Try running Hadoop fsck to see what's up
What? HBase Hackathon: Coprocessor Edition
When? December 13, 2010 @ 11AM
Where? Facebook, Palo Alto
Sign up here: http://www.meetup.com/hackathon/calendar/15597555/
Lunch, dinner, and beers will be provided.
From meetup announcement...
With HBase 0.90 near release, it's time to shift
The recommended setup if you want to put RS and ZK on the same node, is to
ensure ZK has its own dedicated disk.
-Original Message-
From: Michael Segel [mailto:michael_se...@hotmail.com]
Sent: Tuesday, November 30, 2010 5:35 AM
To: user@hbase.apache.org
Subject: RE: Scalability on
Hey Bryan,
All of these approaches could work and seem sane.
My preference these days would be the wide-table approach (#2, 3, 4) rather
than the tall table. Previously #1 was more efficient but in 0.90 and beyond
the same optimizations exist for both tall and wide tables.
For #2, I would
are storing 1 Petabyte of data of images into hbase).
-Jack
On Tue, Nov 23, 2010 at 9:50 AM, Jonathan Gray jg...@fb.com wrote:
It is possible that it could be a bottleneck but usually is not.
Generally production HBase installations have long-lived clients, so
the client-side caching
long tail hits that will be uncached, which
may stress out meta region, that being said, is it possible create
affinity and nail meta region into a beefy server or set of beefy
servers?
-Jack
On Tue, Nov 23, 2010 at 10:58 AM, Jonathan Gray jg...@fb.com wrote:
Are you going to have long
://www.brianfrankcooper.net/pubs/ycsb-v4.pdf
hari
On Mon, Nov 15, 2010 at 4:20 AM, Jonathan Gray jg...@facebook.com
wrote:
HBase is well-suited for a high-write workload.
Hari, I'm not sure what would be different in a database like
Cassandra
with respect to updates
Hari,
When you issue a shutdown to the master process, it performs a full cluster
shutdown. You don't have to issue regionserver stops from the shell, the
Master takes care of it over RPC.
You can stop an individual regionserver (bin/hbase-daemon.sh stop regionserver)
but if you're doing a
NSRE is normal, this happens when regions move around and your client needs to
update the location.
That seems like an awful lot of mappers/reducers on a 5 server / dual core
setup... You have only 2 cores per server but you have a DataNode,
RegionServer, and 4 map tasks and 3 reduce tasks?
I'd recommend HBase over HDFS with file sizes in that range.
It will be faster and far more scalable while inheriting the same durability
guarantees you get from HDFS.
-Original Message-
From: Barney Frank [mailto:barneyfran...@gmail.com]
Sent: Monday, November 08, 2010 11:04 AM
To:
Just avoid the dell hard drives, they are a super-rip off. Which btw
means you'll have to avoid dells, because the _only_ way to get the
dell disk trays which are required is to buy dell hard drives (3-4x
markup btw).
+1 on crazy markup. But there actually are some online retailers out
This is fixed in trunk. There was a bug that was resetting other options to
defaults.
-Original Message-
From: Buttler, David [mailto:buttl...@llnl.gov]
Sent: Friday, November 05, 2010 3:57 PM
To: user@hbase.apache.org
Subject: Unexpected shell behavior -- changing one column
Hi Wojciech,
HBase can easily be used as a versioned key/value store. I'd say that's one of
the easiest ways to use it.
To help you get more throughput, you'll have to provide more details.
What version are you running, what kind of hardware / configuration, and what
does your client look
[mailto:wlangiew...@gmail.com]
Sent: Wednesday, November 03, 2010 7:15 AM
To: user@hbase.apache.org
Subject: Re: HBase as a versioned key/value store
Hello,
2010/11/3 Jonathan Gray jg...@facebook.com
Hi Wojciech,
HBase can easily be used as a versioned key/value store. I'd say
that's
one
wouldn't mind tackling this problem. How much of a skew do we want
to
allow
between the RS and the rest of the cluster?
~Jeff
On 10/28/2010 12:08 PM, Jonathan Gray wrote:
I was discussing this exact issue this morning. Ran into a
problem
where
master was timing out
There is no such atomicity provided by HBase. Recent TableIndexed may
help,
but I have not personally tried it.
Uhm actually there is. :-)
Like I said in the other post, when you insert the rows, you can fetch
the local time on the node and use it when you insert the row as the
a TableIndexed fits, would not an RDBMS be a better
choice?
Sean
On Fri, Oct 29, 2010 at 7:01 PM, Jonathan Gray jg...@facebook.com
wrote:
There is no such atomicity provided by HBase. Recent TableIndexed
may
help,
but I have not personally tried it.
Uhm actually
I was discussing this exact issue this morning. Ran into a problem where
master was timing out a region in transition because the RS was 5 minutes
behind the master.
I like the idea of the RS sending it's timestamp on startup and if it is
outside a certain threshold, the master throws it a
One option is to use EC2 to spin up a cluster for a short period of time and
test on it, but that brings along its own set of complications.
What kind of things are you hoping to contribute? I would say the best way to
do things if you don't have large clusters to test on is write lots of good
You may have had some duplicate assignment issues, so there were some regions
being double counted.
The latest version of HBCK has some fixup stuff and I'm working on adding more
repair functionality to it. Should get into 0.90/trunk this week.
If you're on an 0.89 release, you might be able
Hey Jack,
Seems like you're getting a lot of strange ZooKeeper behavior.
How many nodes are you running with in your quorum? Do you have any weird
networking issues?
Check out the ZK server logs as well and see if there's anything suspicious
going on in there.
Also, if you enable ZK debug
By using the block cache, read blocks are referenced within the block cache
data structures and referenced for a longer amount of time than if not put into
the block cache.
This will definitely add additional stress to the GC.
If you expect a very low hit ratio, it can be advantageous to not
Hi William. Answers inline.
-Original Message-
From: William Kang [mailto:weliam.cl...@gmail.com]
Sent: Monday, October 18, 2010 7:48 PM
To: hbase-user
Subject: HBase random access in HDFS and block indices
Hi,
Recently I have spent some efforts to try to understand the
HFiles are generally 256MB and default block size is 64K, so that's 4000 blocks
(1/16th what you said). That would have a more reasonable block index of 200K.
But the block index is kept in-memory so you only read it once when the file is
first opened. So even if you do lower the block size
Definitely file a new JIRA and put the test case up on it. This is probably an
independent issue from most of the other TS/delete issues.
You guys are good at finding these ;) Keep it up!
JG
From: Evert Arckens [mailto:ev...@outerthought.org]
Sent: Thursday, October 07, 2010 2:13 AM
To:
Currently HBase cannot ride over an HDFS restart.
Might be feasible in the future but not currently planned. Some of the
NameNode HA solutions might indirectly address this.
Why is it that you need to restart your namenode?
-Original Message-
From: Jack Levin
of namenode for any reason.
Generally, I guess I should be stopping regionservers before namenode
restart, at least I won't generate unflushed data.
-Jack
On Tue, Oct 5, 2010 at 11:25 PM, Jonathan Gray jg...@facebook.com
wrote:
Currently HBase cannot ride over an HDFS restart.
Might
HBASE-917 looks relevant too.
-Original Message-
From: Andrew Purtell [mailto:apurt...@apache.org]
Sent: Tuesday, October 05, 2010 11:50 AM
To: user@hbase.apache.org
Subject: Re: Paid OSS task for performing manual major compactions
From: Daniel Einspanjer
Mozilla recently
The layer hiding your cluster is a firewall. You would permit only an explicit
set of IP addresses permission to access HBase.
Your client(s) would be coming from a given set of servers and you would know
the IPs of those servers. Exceptions would be added to your firewall to allow
those IPs
Sorry, forgot to respond to this earlier.
The configuration parameter you need to change is
'hbase.client.keyvalue.maxsize'
Setting it to 0 will remove the limit. Let me know if this does not help.
-Original Message-
From: Taylor, Ronald C [mailto:ronald.tay...@pnl.gov]
Sent:
to be the
only one mentioned. Am I missing something ?
Thanks,
Naresh.
On 10/01/2010 09:40 PM, Jonathan Gray wrote:
Yes. RegionServers will throw a NotServingRegionException. This, in
turn, will cause the client to grab the location from META again.
-Original Message-
From
Yes. RegionServers will throw a NotServingRegionException. This, in turn,
will cause the client to grab the location from META again.
-Original Message-
From: Naresh Rapolu [mailto:nrap...@purdue.edu]
Sent: Friday, October 01, 2010 5:35 PM
To: user@hbase.apache.org
Subject:
Hello HBasers,
A bit of a late announcement to the list, but there is a HUG meetup in NYC on
October 11, the night before Hadoop World.
More information here:
http://www.meetup.com/hbaseusergroup/calendar/14606174/
The meetup is being hosted by StumbleUpon at their NYC offices. Snacks and
This sounds reasonable.
We are tracking min/max timestamps in storefiles too, so it's possible that we
could expire some files of a region as well, even if the region was not
completely expired.
Jinsong, mind filing a jira?
JG
-Original Message-
From: Jinsong Hu
there is some work going on to do concurrent priority
compaction (Jonathan Gray has been working on it) but I haven't seen
anything yet in hbase and don't know the time line. My personal
opinion
is that we should integrate the patch into trunk and use it until the
more
advanced compactions
Lots of reasons. Given the way deletes currently work, it would be extremely
expensive to process multi-row deletes.
At this point there are already people questioning if we should have row/family
deletes because they are expensive to process.
If we move towards a new delete mechanism or
I ran into something like this as well but were in a rush to get the import
done so didn't look into it. I forgot about it so didn't follow up.
We ended up ensuring regions would not be split during the job (configuring the
split size way up) and reran the MR job.
JG
-Original
. And in this case, the system goes to
the mode of repeatedly splitting this Hfile...
Shall I report a bug and follow up on it?
Vidhya
On 9/10/10 1:42 PM, Jonathan Gray jg...@facebook.com wrote:
I ran into something like this as well but were in a rush to get the
import done so didn't
Hi Doug,
Out of order insertion of timestamps is supported in 0.89/0.90/trunk but not
fully supported in the 0.20.x series. Primarily, you can see some weird stuff
using Gets in 0.20 if you do out of order timestamp insertion. Scans are
mostly okay.
JG
-Original Message-
From:
no matter the size of the heap.
When/if
HBase RPC can send large objects in smaller chunks, this will be less
of an
issue.
Best regards,
- Andy
Why is this email five sentences or less?
http://five.sentenc.es/
--- On Mon, 9/6/10, Jonathan Gray jg...@facebook.com wrote
I'm not sure what you mean by optimized cell size or whether you're just
asking about practical limits?
HBase is generally used with cells in the range of tens of bytes to hundreds of
kilobytes. However, I have used it with cells that are several megabytes, up
to about 50MB. Up at that
) {
break;
}
}
return nRegions;
}
2010/9/7 Jonathan Gray jg...@facebook.com
That code does actually exist in the latest 0.89 release.
It was a protection put in place to guard against a weird behavior
that we
had seen during load balancing.
As Ryan suggests
But your boss seems rather to be criticizing the fact that our system
is made of components. In software engineering, this is usually
considered a strength. As to 'roles', one of the bigtable author's
argues that a cluster of master and slaves makes for simpler systems
[1].
I
of
the total number of regions that can be supported and I don't run into
this
IO issue.
Can any body show us the actual example of the hbase data size and
cluster
size ?
Jimmy.
--
From: Jonathan Gray jg...@facebook.com
Sent: Friday, August 27
Been doing lots of importing recently. There are two easy ways to get big
performance boosts.
The first is HFileOuputFormat. It works into existing tables now.
Consistently see 10X+ performance this way versus API.
If you must use the API, pre-create a bunch of regions for your table. You
There is no fixed limit, it has much more to do with the read/write load than
the actual dataset size.
HBase is usually fine having very densely packed RegionServers, if much of the
data is rarely accessed. If you have extremely high numbers of regions per
server and you are writing to all of
Vidhya,
Could you post a snippet of an RS log during this time? You should be able to
see what's happening between when the OPEN message gets there and the OPEN
completes.
Like Stack said, it's probably that its single-threaded in the version you're
using and with all the file opening, your
Yes, something like:
ListResult multiGet(ListGet gets, int maxThreads)
In general, you should assume that HTable instances are not thread-safe.
Behind the scenes, HTables are sharing TCP connections to RS, but from client
POV you should have one HTable per thread per table.
-Original
Himanshu,
Seems like you might have an interest in using Coprocessors to do stuff like
low-latency aggregates. This is a big area of interest for some of us but not
a lot of concerted effort in this direction yet. There is plenty to do here
for a research project.
Check out:
Also seek/reseek hooks in the filters will allow skipping of blocks, which for
some queries (returning high % of total data) it won't matter but for more
sparse filters that want to jump this can be significant.
These are being worked on by an intern here and should have some patches up in
a
Can you provide more links to comments in jira mentioning loss of zero copy
reads?
Basically what this is referring to are changes made in the 0.20 release of
HBase related to the block-based HFile format, the KeyValue data pointer, and
other stuff like the Result client return type and the
scanning, it would not necessarily change
the implementation of these client (to RS) calls.
Hard to say what would make this flakey besides some of the older bugs around
lots of META StoreFiles.
What version of HBase are you running?
JG
-Original Message-
From: Jonathan Gray [mailto:jg
1 - 100 of 126 matches
Mail list logo