My setup seems to have a lot of regions with no data that just keep
accumulating over time. Here are some details:
I have time-series data (created by opentsdb) being inserted into hbase
every minute. Since the data has little value after say 15 days, I go
ahead and delete all old data.
When I
It occurred in an RegionServer with an un-known reason. I have check this
RegionServer logs, there's no prev-aborting, and no other infos showed the
RegionServer has aborted. So I saw the following msg showed in a sudden.
[logs]
2011-05-25 09:15:44,232 INFO
You can use the merge tool to combine adjacent regions. It requires a
bit of manual work because you need to specify the regions by hand. The
cluster also needs to be offline (I recommend to keep zookeeper running
though). Check if merging succeeded with the hbck tool.
There are some jira
MutipleInputs would be ideal, but that seems pretty complicated.
MultiTableInputFormat seems like a simple change in the getSplits() method
of TableInputFormat + support for a collection of table and their matching
scanners instead of a single table and scanner, doesn't sound too
complicated.
Any
Eran,
You want to join two tables? The short answer is to use a relational database
to solve that problem.
Longer answer:
You're using HBase so you don't need to think in terms of a reducer.
You can create a temp table for your query.
You can then run one map job to scan and filter table A,
Re: The problem is that the few references to that question I found recommend
pulling one table to the mapper and then do a lookup for the referred row in
the second table.
With multi-get in .90.x you could perform some reasonably clever processing and
not do the lookups one-by-one but in
Eran's observation was that a join is solvable in a Mapper via lookups on a 2nd
HBase table, but it might not be that efficient if the lookups are 1 by 1. I
agree with that.
My suggestion was to use multi-Get for the lookups instead. So you'd hold onto
a batch of records in the Mapper and
Hello,
Are there scripts available to create a HBase cluster on Rackspace - like
there are for Amazon EC2? A quick Google search didn't come up with
anything useful.
Any help in this regard would be greatly appreciated. Thanks.
- Ajay
On May 31, Ferdy Galema wrote:
You can use the merge tool to combine adjacent regions. It requires a
bit of manual work because you need to specify the regions by hand. The
cluster also needs to be offline (I recommend to keep zookeeper running
though). Check if merging succeeded with the hbck
hbase noob question: do compactions (major/minor) always work in the
scope of a region but they don't do region merges?
That's what HBASE-1621 is about, merges can't be done while the
cluster is running and compactions only happen when hbase is running.
J-D
Rackspace doesn't have an API, so no. This is one of the primary
disadvantages of rackspace, its all hands on/manual.
Just boot up your instances and use the standard management tools.
On Tue, May 31, 2011 at 10:23 AM, Something Something
mailinglist...@gmail.com wrote:
Hello,
Are there
Hi all,
I'm doing some work to read records directly from the HFiles of a damaged
table. When I scan through the records in the HFile using
org.apache.hadoop.hbase.io.hfile.HFileScanner, will I get only the latest
version of the record as with a default HBase Scan? Or do I need to do some
Now I'm getting the wrong region exception on the new table that I'm copying
the old table to. Running hbck reveals an inconsistency in the new table. The
frustration is unbelievable. Like I said before, it doesn't appear that HBase
is ready for prime time. I don't know how companies are
Can you post the full log somewhere? You talk about several Exceptions
but we can't see them.
J-D
On Tue, May 31, 2011 at 4:41 AM, bijieshan bijies...@huawei.com wrote:
It occurred in an RegionServer with an un-known reason. I have check this
RegionServer logs, there's no prev-aborting, and
Thanks everyone for the great feedback. I'll try to address all the
suggestions.
My data sets go between large and very large. One is in the order of many
billions of rows, although the input for a typical MR job will be in the
hundreds of millions, the second table is in the tens of millions. I
Doesn't Hive for HBase enable joins?
On Tue, May 31, 2011 at 5:06 AM, Eran Kutner e...@gigya.com wrote:
Hi,
I need to join two HBase tables. The obvious way is to use a M/R job for
that. The problem is that the few references to that question I found
recommend pulling one table to the mapper
Doug,
I read the OP's post as the following:
Hi,
I need to join two HBase tables.
The obvious way is to use a M/R job for that.
The problem is that the few references to that question I found recommend
pulling one table to the mapper
and then do a lookup for the referred row in the second
For my need I don't really need the general case, but even if I did I think
it can probably be done simpler.
The main problem is getting the data from both tables into the same MR job,
without resorting to lookups. So without the theoretical
MutliTableInputFormat, I could just copy all the data
Try adding this change:
Index: bin/check_meta.rb
===
--- bin/check_meta.rb (revision 1129468)
+++ bin/check_meta.rb (working copy)
@@ -127,11 +127,13 @@
scan = Scan.new()
scanner = metatable.getScanner(scan)
oldHRI = nil
-bad
On Tue, May 31, 2011 at 10:42 AM, Robert Gonzalez
robert.gonza...@maxpointinteractive.com wrote:
Now I'm getting the wrong region exception on the new table that I'm copying
the old table to. Running hbck reveals an inconsistency in the new table.
The frustration is unbelievable. Like I
On Tue, May 31, 2011 at 11:05 AM, Sandy Pratt prat...@adobe.com wrote:
Hi all,
I'm doing some work to read records directly from the HFiles of a damaged
table. When I scan through the records in the HFile using
org.apache.hadoop.hbase.io.hfile.HFileScanner, will I get only the latest
Your mapper can tell which file is being read and add source tags to the
data records.
The reducer can do the cartesian product (if you really need that).
On Tue, May 31, 2011 at 12:19 PM, Eran Kutner e...@gigya.com wrote:
For my need I don't really need the general case, but even if I did I
Sorry Gao, what is your question?
St.Ack
2011/5/31 Gaojinchao gaojinc...@huawei.com:
For one our application, There is 3 node.
All process disposing and machine configure is as below.
Who has experience about this?
The use rate of cpu is about 70%~80%, Does it make HBase or zookeeper
From: doug.m...@explorysmedical.com
To: user@hbase.apache.org
Date: Tue, 31 May 2011 15:39:14 -0400
Subject: RE: How to efficiently join HBase tables?
Re: Didn't see a multi-get...
This is what I'm talking about...
On Tue, May 31, 2011 at 3:19 PM, Eran Kutner e...@gigya.com wrote:
For my need I don't really need the general case, but even if I did I think
it can probably be done simpler.
The main problem is getting the data from both tables into the same MR job,
without resorting to lookups. So without
Thanks for the pointers.
The damage manifested as scanners skipping over a range in our time series
data. We knew from other systems that there should be some records in that
region that weren't returned. When we looked closely we saw an extremely
improbable jump in rowkeys that should by
Yeah, we learned the hard way early last year to follow the guidelines
religiously. I've gone over the requirements and checked off everything. We
even re-did our tables to only have 4 column families, down from 4x that
amount. We are at a loss to find out why we seemed to be cursed when it
The script ran without the previous problem, but it did not fix the problem.
When I ran hbck or check_meta.rb again they indicated that the problem was
still there. Do I need to do something else in preparation before running
check_meta?
Thanks,
Robert
-Original Message-
From:
Hello, is there a git repo URL I could use to check out that code version?
-Jack
On Thu, May 19, 2011 at 2:35 PM, Stack st...@duboce.net wrote:
The Apache HBase team is happy to announce that HBase 0.90.3 is
available from the Apache mirror of choice:
From: Jack Levin magn...@gmail.com
Hello, is there a git repo URL I could use to check out that
code version?
git://git.apache.org/hbase.git
or
git://github.com/apache/hbase.git
or
https://github.com/apache/hbase.git
Then checkout tag '0.90.3'
The script doesn't work because it attempts to fix the hole by finding a region
in the hdfs filesystem that fills the hole. But in this case there is no such
file. The hole is just there.
-Original Message-
From: Robert Gonzalez [mailto:robert.gonza...@maxpointinteractive.com]
Sent:
On Tue, May 31, 2011 at 3:34 PM, Robert Gonzalez
robert.gonza...@maxpointinteractive.com wrote:
The script doesn't work because it attempts to fix the hole by finding a
region in the hdfs filesystem that fills the hole. But in this case there is
no such file. The hole is just there.
OK.
So, what about this new WrongRegionException in the new cluster. Can
you figure how it came about? In the new cluster, is there also a
hole? Did you start the new cluster fresh or copy from old cluster?
St.Ack
On Tue, May 31, 2011 at 1:55 PM, Robert Gonzalez
Hello,
I am trying to autogen some code off of 90.3. I made some custom additions to
our thrift server, however the code that gets generated uses ByteBuffers as
opposed to byte[]. How can I get around having to manually add to the autogen
code to match?
Is there a thrift flag or different
The Hive-HBase integration allows you to create Hive tables that are backed
by HBase
In addition, HBase can be made to go faster for MapReduce jobs, if the
HFile's could be used directly in HDFS, rather than proxying through
the RegionServer.
I'd imagine that join operations do not require
This may help:
http://download.oracle.com/javase/1,5.0/docs/api/java/nio/ByteBuffer.html#array()
http://download.oracle.com/javase/1,5.0/docs/api/java/nio/ByteBuffer.html#array()What
is it you are actually trying to do?
On Tue, May 31, 2011 at 5:14 PM, Matthew Ward m...@imageshack.net wrote:
The issue I am encountering is that the code generated doing 'thrift --gen java
Hbase.thrift' outputs code utilizing the 'ByteBuffer' type instead of
'bytes[]'. All the code in org.apache.hadoop.hbase.thrift utilizes byte[]. So
basically the code generated via thrift is incompatible with the
We use Pig to join HBase tables using HBaseStorage which has worked well. If
you're using HBase = 0.89 you'll need to build from the trunk or the Pig
0.8 branch.
On Tue, May 31, 2011 at 5:18 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
The Hive-HBase integration allows you to
Which versions of thrift are involved here? This sounds like a Thrift
version mismatch.
What does [thrift -version] say? What is the hbase dependency?
On Tue, May 31, 2011 at 5:32 PM, Matthew Ward m...@imageshack.net wrote:
The issue I am encountering is that the code generated doing 'thrift
$ thrift -version
Thrift version 0.6.0
Not sure about the Hbase Dependency.
On May 31, 2011, at 5:45 PM, Ted Dunning wrote:
Which versions of thrift are involved here? This sounds like a Thrift
version mismatch.
What does [thrift -version] say? What is the hbase dependency?
On Tue,
Yes. You have a version problem with Thrift.
From the 0.6.0 release notes for Thrift:
THRIFT-830 Java Switch binary field implementation from
byte[] to ByteBuffer (Bryan Duxbury)
If you look at THRIFT-830
https://issues.apache.org/jira/browse/THRIFT-830 you
will see the trenchant
thrift.version0.5.0/thrift.version!-- newer version available --
On Tue, May 31, 2011 at 5:54 PM, Matthew Ward m...@imageshack.net wrote:
$ thrift -version
Thrift version 0.6.0
Not sure about the Hbase Dependency.
On May 31, 2011, at 5:45 PM, Ted Dunning wrote:
Which versions of
Good catch! Thanks.
On May 31, 2011, at 5:55 PM, Ted Dunning wrote:
thrift.version0.5.0/thrift.version!-- newer version available --
On Tue, May 31, 2011 at 5:54 PM, Matthew Ward m...@imageshack.net wrote:
$ thrift -version
Thrift version 0.6.0
Not sure about the Hbase
Per I know:
1.zookeeper is sensitive to resources(Memory, Disk, CPU, NetWork).
If there is some underprovisioning on server, then
a) Server may not respond to client requests in time.
b) Client assumes server is down, closes the socket and it connects to other
server.
2. Hbase is sensitive to
Sorry for a long time break of the discussion about this problem.
Till now, I found one possible reason cause this problem.
The main reason of this problem is the splitted region could be online again.
The following is my anylysis:
(The cluster has two HMatser, one active and one standby)
Woof.
Of course.
Harold,
You appear to be running on about 10 disks total. Each disk should be
capable of about 100 ops per second but they appear to be doing about 70.
This is plausible overhead.
Try attaching 5 or 10 small EBS partitions to each of your nodes and use
them in HDFS. That
46 matches
Mail list logo