Hi,
I have an use case for my data stored in HBase where I need to make
a query for 20K-30K keys at once. I know that the HBase client API
supports get operation with a list of gets, so a naive
implementation would probably just make one or more batch get calls.
First of all I am wondering if I
it's pretty efficient.
I think processes the RS-groups serially in 0.90.x, and I thought I saw a
ticket about multi-threaded processing, but you'll have to check the code.
On 7/22/11 9:46 AM, Nanheng Wu nanhen...@gmail.com wrote:
Hi,
I have an use case for my data stored in HBase where
Hi,
I am bulk loading data into HBase using a MR job with HFileOutput
format, the data is read-only once it's loaded. Is it possible to
still enable Bloomfilter? I am guessing no, since it needs to be
written as part of the HFile and at least for Hbase-0.20.6 I don't see
such option. Is my
Hi,
When a user does not explicitly set the max versions of a Get, does
HBase try to retrieve just the lastest version or the CF's max
version? Thanks!
Best,
Alex
, one thing I'd like
to see is the result of this command:
scan '.META.', {STARTROW = myTable,,, LIMIT = 261}
It's going to be big. Then grep in the result for the string SPLIT,
and please post back here the lines that match.
J-D
On Mon, Feb 28, 2011 at 5:04 PM, Nanheng Wu nanhen
after running it).
Lastly, upgrading to HBase 0.90.1 and a hadoop that supports append
should be a priority.
J-D
On Tue, Mar 1, 2011 at 9:30 AM, Nanheng Wu nanhen...@gmail.com wrote:
Hi J-D:
I did the scan like you suggested but no splits came up. This kind
of makes sense to me, since we
to query .META. first to get the location of the
region that hosts the row.
J-D
On Tue, Mar 1, 2011 at 10:45 AM, Nanheng Wu nanhen...@gmail.com wrote:
Man I appreciate so much all the help you provided so far. I guess
I'll keep digging. Would this meta scan cause Get or Scan on user
tables
My cluster (10 nodes, hbase-0.20.6 + hadoop 0.20.2) is very very slow
for any operation like disable table or delete. Master's thread dump
says they are blocked by the metaScanner thread. When I looked at the
log file on the .META RS there are no outputs at all! (INFO debug
level). J-D has been
stack traces with HRegionServer doing stuff like get, next, put, etc
You should also try scanning '.META.' from the shell and if it's slow,
do the jstack'ing at the same time.
J-D
On Tue, Mar 1, 2011 at 5:07 PM, Nanheng Wu nanhen...@gmail.com wrote:
My cluster (10 nodes, hbase-0.20.6
)
org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153)
org.apache.hadoop.hbase.Chore.run(Chore.java:68)
On Tue, Mar 1, 2011 at 5:22 PM, Nanheng Wu nanhen...@gmail.com wrote:
Thanks man I'll try that and post back when I find something. BTW, I
ran the script to set the memstore flush size on .META., now I am
seeing
...@apache.org wrote:
Yes, and on the other side (which is the region server that hosts
.META.) you should be able to see that call. Well, not that specific
one, but one of them :)
J-D
On Tue, Mar 1, 2011 at 5:30 PM, Nanheng Wu nanhen...@gmail.com wrote:
You said next, I don't know
And what's next? and what's next?
On Tue, Mar 1, 2011 at 5:41 PM, Nanheng Wu nanhen...@gmail.com wrote:
I just took the stack track of both master and the meta RS. the
master's still waiting for that thread which called next, but no IPC
Server handler on the RS has that call
in 0.20.6 that
almost prevent disabling a table (or re-enabling) if any region
recently split and the parent wasn't cleaned yet from .META. and that
is fixed in 0.90.1
J-D
On Thu, Feb 24, 2011 at 11:37 PM, Nanheng Wu nanhen...@gmail.com wrote:
I think you are right, maybe in the long run I need
the region server that hosts .META. and see
where it's blocked.
if latter, then it means your .META. region is slow? Again, what's
going on on the RS that hosts .META.?
Finally, what's the master's log like during that time?
J-D
On Mon, Feb 28, 2011 at 2:41 PM, Nanheng Wu nanhen...@gmail.com
or a completely separate file.
J-D
On Mon, Feb 28, 2011 at 2:54 PM, Nanheng Wu nanhen...@gmail.com wrote:
I see, so I should jstack the .META region. I'll do that.
The master log pretty much looks like this: should I grep for
something specific?
11/02/28 22:52:56 INFO master.BaseScanner
destructive feature so some people
might disagree with having it in the codebase :)
J-D
On Wed, Feb 16, 2011 at 4:26 PM, Nanheng Wu nanhen...@gmail.com wrote:
Actually I wanted to disable the table so I can drop it. It would be
nice to be able to disable the table without flushing
:
Exactly.
J-D
On Thu, Feb 24, 2011 at 2:45 PM, Nanheng Wu nanhen...@gmail.com wrote:
Sorry for trying to bring this topic back again guys, so currently in
0.20.6 is there's no way to drop a table without large amount of
flushing?
On Tue, Feb 22, 2011 at 3:04 PM, Jean-Daniel Cryans jdcry
:
I haven't tried, but it seems incredibly hacky and bound to generate
more problems than it solves. Instead you could consider using
different table names.
J-D
On Thu, Feb 24, 2011 at 3:21 PM, Nanheng Wu nanhen...@gmail.com wrote:
What would happen if I try to remove the region files from
What are some of the trade-offs of using larger region files and less
regions vs the other way round? Currently each of my host has ~700
regions with the default hfile size, is this an acceptable number?
(hosts have 16GB of RAM). Another totally unrelated question: I have
Gzip enabled on the hfile
From time to time I run into issues where disabling a table pretty
much hangs. I am simply calling the disableTable method fo HBaseAdmin.
The table has ~ 500 regions with default region file size. I couldn't
tell anything abnormal from the master's log. When I click on the
region from Master's web
a
flush on the table from the shell first and then some time later doing
the disable. How much later you ask? Well there's currently no easy
way to tell, I usually just tail any region server log file until I
see they're done.
J-D
On Wed, Feb 16, 2011 at 2:21 PM, Nanheng Wu nanhen...@gmail.com
if I can figure something out by comparing the two
version's Hfile. Thanks again!
On Fri, Jan 28, 2011 at 9:14 AM, Stack st...@duboce.net wrote:
On Thu, Jan 27, 2011 at 9:35 PM, Nanheng Wu nanhen...@gmail.com wrote:
In the compressed case, there are 8 regions and the region start/end
keys do
you w/
your explorations.
St.Ack
On Fri, Jan 28, 2011 at 9:38 AM, Nanheng Wu nanhen...@gmail.com wrote:
Hi Stack,
Get doesn't work either. It was a fresh table created by
loadtable.rb. Finally, the uncompressed version had the same number of
regions (8 total). I totally understand you guys
metadata is.
St.Ack
On Fri, Jan 28, 2011 at 9:58 AM, Nanheng Wu nanhen...@gmail.com wrote:
Awesome. I ran it on one of the hfiles and got this:
11/01/28 09:57:15 INFO compress.CodecPool: Got brand-new decompressor
java.io.IOException: Not in GZIP format
Ah, sorry I should've read the usage. I ran it just now and the meta
data dump threw the same error Not in GZIP format
On Fri, Jan 28, 2011 at 10:51 AM, Stack st...@duboce.net wrote:
hfile metadata, the -m option?
St.Ack
On Fri, Jan 28, 2011 at 10:41 AM, Nanheng Wu nanhen...@gmail.com wrote
Hi,
I am using hbase 0.20.6. Is it possible for the loadtable.rb script
to create the table from compressed output? I have a MR job where the
reducer outputs Gzip compressed HFiles. When I ran loadtable.rb it
didn't have any complaints and seemed to update the meta data table
correctly. But when
27, 2011 at 8:54 PM, Nanheng Wu nanhen...@gmail.com wrote:
Hi,
I am using hbase 0.20.6. Is it possible for the loadtable.rb script
to create the table from compressed output? I have a MR job where the
reducer outputs Gzip compressed HFiles. When I ran loadtable.rb it
didn't have any complaints
are same) in both compressed and uncompressed
version. So what else should I look into to fix this? Thanks again!
On Thu, Jan 27, 2011 at 9:24 PM, Stack st...@duboce.net wrote:
On Thu, Jan 27, 2011 at 9:08 PM, Nanheng Wu nanhen...@gmail.com wrote:
Hi Stack, thanks for the answers! I am
Hi,
I am doing some tests on a HBase cluster and after a while (when the
cluster reached capacity limit) I wanted to just remove all the data
in it. Instead of dropping each table one by one I just removed /hbase
directory from HDFS altogether. When I tried to restart the cluster I
got errors
I am sorry if this has been asked before: To bulk load into HBase I am
using a mapper only job to generate the HFiles and then run
loadtable.rb. Everything seems fine now but I want to turn on GZIP
compression on the table. I did
HFileOutputFormat.setCompressOutput(job, true); in the MR job and
Hi,
Sorry for the stupid question. I want to execute some hbase shell
commands like list or create table from the command line directly,
instead of through the interactive hbase shell. How can this be done?
Thanks!
, Jan 6, 2011 at 3:12 PM, Nanheng Wu nanhen...@gmail.com wrote:
Yes, it only seconds. Just for several seconds I can see the table in
the HBase UI but when I clicked through it I got an error about no
entries were found in the .META. table. I guess it's not too bad since
it's only a few seconds
, Jan 5, 2011 at 3:54 PM, Nanheng Wu nanhen...@gmail.com wrote:
Hi,
I am new to HBase and Hadoop and I am trying to find the best way to
bulk load a table from HDFS to HBase. I don't mind creating a new
table for each batch and what I understand using HFileOutputFormat
directly in a MR job
in .META. would be very helpful.
On Thu, Jan 6, 2011 at 2:42 PM, Stack st...@duboce.net wrote:
On Thu, Jan 6, 2011 at 10:17 AM, Nanheng Wu nanhen...@gmail.com wrote:
Thanks for the answer Todd. I realized that I was making my life
harder by using the low level record writer directly. Instead I
Hi,
I am new to HBase and Hadoop and I am trying to find the best way to
bulk load a table from HDFS to HBase. I don't mind creating a new
table for each batch and what I understand using HFileOutputFormat
directly in a MR job is the most efficient method. My input data set
is already in sorted
certainly easier using TOF.
Unless you have special needs, I'd stick w/ TOF.
Good luck,
St.Ack
On Mon, Dec 27, 2010 at 1:03 PM, Nanheng Wu nanhen...@gmail.com wrote:
Thanks for the answers. I will use these as my basis for
investigation. I am using a mapper only job, is it better to use
Hi group,
Which knob controls how many hregions each server should handle, or
how I can control when a newly splitted region will go across region
servers? I want to set a smaller hbase.hregion.max.filesize than the
default, so that there will be more regions and they can quickly
distribute
I am running some tests to load data from HDFS into HBase in a MR job.
I am pretty new to HBase and I have some questions regarding bulk load
performance: I have a small cluster with 4 nodes, I set up one node to
run Namenode/JobTracker/ZK, and the other three nodes all run
Thanks for the answers. I will use these as my basis for
investigation. I am using a mapper only job, is it better to use the
HBase client to write to HBase or TableOutputFormat?
On Mon, Dec 27, 2010 at 8:38 AM, Stack st...@duboce.net wrote:
On Mon, Dec 27, 2010 at 1:54 AM, Nanheng Wu nanhen
Hi, I am planning to set up HDFS and HBase on 3 or 4 hosts. What's the
recommended strategy to use these hosts? I guess one node should be
the name node and the rest be Datanodes, then is it advisable that I
run HBase master and Zookeeper on the same host as the name node? If
not, how should I
.
There also seems to be a thrift interface for HBase. You could use the
java
thrift client to access HBase.
These are the methods I am aware of. There could be better methods too.
I would be interested in knowing them too :)
Thanks
Vijay
On Sat, Dec 4, 2010 at 12:59 PM, Nanheng Wu nanhen
interface for HBase. You could use the java
thrift client to access HBase.
These are the methods I am aware of. There could be better methods too.
I would be interested in knowing them too :)
Thanks
Vijay
On Sat, Dec 4, 2010 at 12:59 PM, Nanheng Wu nanhen...@gmail.com wrote:
Hi,
I set up
Hi,
I set up a small test hbase cluster on ec2. If I want to now store
some data in the cluster from outside ec2 using the java client, what
should I do? I am very new to hbase and ec2 so any help would be
appreciated!
Best,
Alex
.
Lars
On Nov 25, 2010, at 17:31, Nanheng Wu nanhen...@gmail.com wrote:
Hello,
I am very new to HBase and I hope to get some feedback from the
community on this: I want to use HBase to store some data with pretty
simple structure: each key has ~50 attributes. These data are computed
daily
reading multiple versions in one go?
Lars
On Nov 25, 2010, at 21:22, Nanheng Wu nanhen...@gmail.com wrote:
Hi Lars,
Thank you so much for the response. So if I understand correctly, if
I want to use columns for my use-case I would keep adding columns to
the row during each load where
way I need to know how you
access your data. How often do you access older versions and are you
accessing them separately or are you reading multiple versions in one go?
Lars
On Nov 25, 2010, at 21:22, Nanheng Wu nanhen...@gmail.com wrote:
Hi Lars,
Thank you so much for the response
46 matches
Mail list logo