How to implement efficient bulk query

2011-07-22 Thread Nanheng Wu
Hi, I have an use case for my data stored in HBase where I need to make a query for 20K-30K keys at once. I know that the HBase client API supports get operation with a list of gets, so a naive implementation would probably just make one or more batch get calls. First of all I am wondering if I

Re: How to implement efficient bulk query

2011-07-22 Thread Nanheng Wu
it's pretty efficient.  I think processes the RS-groups serially in 0.90.x, and I thought I saw a ticket about multi-threaded processing, but you'll have to check the code. On 7/22/11 9:46 AM, Nanheng Wu nanhen...@gmail.com wrote: Hi,  I have an use case for my data stored in HBase where

Enable Bloomfilter on HFile

2011-03-15 Thread Nanheng Wu
Hi, I am bulk loading data into HBase using a MR job with HFileOutput format, the data is read-only once it's loaded. Is it possible to still enable Bloomfilter? I am guessing no, since it needs to be written as part of the HFile and at least for Hbase-0.20.6 I don't see such option. Is my

How many version does Get retrieve

2011-03-14 Thread Nanheng Wu
Hi, When a user does not explicitly set the max versions of a Get, does HBase try to retrieve just the lastest version or the CF's max version? Thanks! Best, Alex

Re: Disabling a table taking very long time

2011-03-01 Thread Nanheng Wu
, one thing I'd like to see is the result of this command: scan '.META.', {STARTROW = myTable,,, LIMIT = 261} It's going to be big. Then grep in the result for the string SPLIT, and please post back here the lines that match. J-D On Mon, Feb 28, 2011 at 5:04 PM, Nanheng Wu nanhen

Re: Disabling a table taking very long time

2011-03-01 Thread Nanheng Wu
after running it). Lastly, upgrading to HBase 0.90.1 and a hadoop that supports append should be a priority. J-D On Tue, Mar 1, 2011 at 9:30 AM, Nanheng Wu nanhen...@gmail.com wrote: Hi J-D:  I did the scan like you suggested but no splits came up. This kind of makes sense to me, since we

Re: Disabling a table taking very long time

2011-03-01 Thread Nanheng Wu
to query .META. first to get the location of the region that hosts the row. J-D On Tue, Mar 1, 2011 at 10:45 AM, Nanheng Wu nanhen...@gmail.com wrote: Man I appreciate so much all the help you provided so far. I guess I'll keep digging. Would this meta scan cause Get or Scan on user tables

What's the region server doing?

2011-03-01 Thread Nanheng Wu
My cluster (10 nodes, hbase-0.20.6 + hadoop 0.20.2) is very very slow for any operation like disable table or delete. Master's thread dump says they are blocked by the metaScanner thread. When I looked at the log file on the .META RS there are no outputs at all! (INFO debug level). J-D has been

Re: What's the region server doing?

2011-03-01 Thread Nanheng Wu
stack traces with HRegionServer doing stuff like get, next, put, etc You should also try scanning '.META.' from the shell and if it's slow, do the jstack'ing at the same time. J-D On Tue, Mar 1, 2011 at 5:07 PM, Nanheng Wu nanhen...@gmail.com wrote: My cluster (10 nodes, hbase-0.20.6

Re: What's the region server doing?

2011-03-01 Thread Nanheng Wu
) org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153) org.apache.hadoop.hbase.Chore.run(Chore.java:68) On Tue, Mar 1, 2011 at 5:22 PM, Nanheng Wu nanhen...@gmail.com wrote: Thanks man I'll try that and post back when I find something. BTW, I ran the script to set the memstore flush size on .META., now I am seeing

Re: What's the region server doing?

2011-03-01 Thread Nanheng Wu
...@apache.org wrote: Yes, and on the other side (which is the region server that hosts .META.) you should be able to see that call. Well, not that specific one, but one of them :) J-D On Tue, Mar 1, 2011 at 5:30 PM, Nanheng Wu nanhen...@gmail.com wrote: You said next, I don't know

Re: What's the region server doing?

2011-03-01 Thread Nanheng Wu
And what's next? and what's next? On Tue, Mar 1, 2011 at 5:41 PM, Nanheng Wu nanhen...@gmail.com wrote: I just took the stack track of both master and the meta RS. the master's still waiting for that thread which called next, but no IPC Server handler on the RS has that call

Re: Disabling a table taking very long time

2011-02-28 Thread Nanheng Wu
in 0.20.6 that almost prevent disabling a table (or re-enabling) if any region recently split and the parent wasn't cleaned yet from .META. and that is fixed in 0.90.1 J-D On Thu, Feb 24, 2011 at 11:37 PM, Nanheng Wu nanhen...@gmail.com wrote: I think you are right, maybe in the long run I need

Re: Disabling a table taking very long time

2011-02-28 Thread Nanheng Wu
the region server that hosts .META. and see where it's blocked. if latter, then it means your .META. region is slow? Again, what's going on on the RS that hosts .META.? Finally, what's the master's log like during that time? J-D On Mon, Feb 28, 2011 at 2:41 PM, Nanheng Wu nanhen...@gmail.com

Re: Disabling a table taking very long time

2011-02-28 Thread Nanheng Wu
or a completely separate file. J-D On Mon, Feb 28, 2011 at 2:54 PM, Nanheng Wu nanhen...@gmail.com wrote: I see, so I should jstack the .META region. I'll do that. The master log pretty much looks like this: should I grep for something specific? 11/02/28 22:52:56 INFO master.BaseScanner

Re: Disabling a table taking very long time

2011-02-24 Thread Nanheng Wu
destructive feature so some people might disagree with having it in the codebase :) J-D On Wed, Feb 16, 2011 at 4:26 PM, Nanheng Wu nanhen...@gmail.com wrote: Actually I wanted to disable the table so I can drop it. It would be nice to be able to disable the table without flushing

Re: Disabling a table taking very long time

2011-02-24 Thread Nanheng Wu
: Exactly. J-D On Thu, Feb 24, 2011 at 2:45 PM, Nanheng Wu nanhen...@gmail.com wrote: Sorry for trying to bring this topic back again guys, so currently in 0.20.6 is there's no way to drop a table without large amount of flushing? On Tue, Feb 22, 2011 at 3:04 PM, Jean-Daniel Cryans jdcry

Re: Disabling a table taking very long time

2011-02-24 Thread Nanheng Wu
: I haven't tried, but it seems incredibly hacky and bound to generate more problems than it solves. Instead you could consider using different table names. J-D On Thu, Feb 24, 2011 at 3:21 PM, Nanheng Wu nanhen...@gmail.com wrote: What would happen if I try to remove the region files from

Number of regions

2011-02-23 Thread Nanheng Wu
What are some of the trade-offs of using larger region files and less regions vs the other way round? Currently each of my host has ~700 regions with the default hfile size, is this an acceptable number? (hosts have 16GB of RAM). Another totally unrelated question: I have Gzip enabled on the hfile

Disabling a table taking very long time

2011-02-16 Thread Nanheng Wu
From time to time I run into issues where disabling a table pretty much hangs. I am simply calling the disableTable method fo HBaseAdmin. The table has ~ 500 regions with default region file size. I couldn't tell anything abnormal from the master's log. When I click on the region from Master's web

Re: Disabling a table taking very long time

2011-02-16 Thread Nanheng Wu
a flush on the table from the shell first and then some time later doing the disable. How much later you ask? Well there's currently no easy way to tell, I usually just tail any region server log file until I see they're done. J-D On Wed, Feb 16, 2011 at 2:21 PM, Nanheng Wu nanhen...@gmail.com

Re: Use loadtable.rb with compressed data?

2011-01-28 Thread Nanheng Wu
if I can figure something out by comparing the two version's Hfile. Thanks again! On Fri, Jan 28, 2011 at 9:14 AM, Stack st...@duboce.net wrote: On Thu, Jan 27, 2011 at 9:35 PM, Nanheng Wu nanhen...@gmail.com wrote: In the compressed case, there are 8 regions and the region start/end keys do

Re: Use loadtable.rb with compressed data?

2011-01-28 Thread Nanheng Wu
you w/ your explorations. St.Ack On Fri, Jan 28, 2011 at 9:38 AM, Nanheng Wu nanhen...@gmail.com wrote: Hi Stack,  Get doesn't work either. It was a fresh table created by loadtable.rb. Finally, the uncompressed version had the same number of regions (8 total). I totally understand you guys

Re: Use loadtable.rb with compressed data?

2011-01-28 Thread Nanheng Wu
metadata is. St.Ack On Fri, Jan 28, 2011 at 9:58 AM, Nanheng Wu nanhen...@gmail.com wrote: Awesome. I ran it on one of the hfiles and got this: 11/01/28 09:57:15 INFO compress.CodecPool: Got brand-new decompressor java.io.IOException: Not in GZIP format

Re: Use loadtable.rb with compressed data?

2011-01-28 Thread Nanheng Wu
Ah, sorry I should've read the usage. I ran it just now and the meta data dump threw the same error Not in GZIP format On Fri, Jan 28, 2011 at 10:51 AM, Stack st...@duboce.net wrote: hfile metadata, the -m option? St.Ack On Fri, Jan 28, 2011 at 10:41 AM, Nanheng Wu nanhen...@gmail.com wrote

Use loadtable.rb with compressed data?

2011-01-27 Thread Nanheng Wu
Hi, I am using hbase 0.20.6. Is it possible for the loadtable.rb script to create the table from compressed output? I have a MR job where the reducer outputs Gzip compressed HFiles. When I ran loadtable.rb it didn't have any complaints and seemed to update the meta data table correctly. But when

Re: Use loadtable.rb with compressed data?

2011-01-27 Thread Nanheng Wu
27, 2011 at 8:54 PM, Nanheng Wu nanhen...@gmail.com wrote: Hi, I am using hbase 0.20.6.  Is it possible for the loadtable.rb script to create the table from compressed output? I have a MR job where the reducer outputs Gzip compressed HFiles. When I ran loadtable.rb it didn't have any complaints

Re: Use loadtable.rb with compressed data?

2011-01-27 Thread Nanheng Wu
are same) in both compressed and uncompressed version. So what else should I look into to fix this? Thanks again! On Thu, Jan 27, 2011 at 9:24 PM, Stack st...@duboce.net wrote: On Thu, Jan 27, 2011 at 9:08 PM, Nanheng Wu nanhen...@gmail.com wrote: Hi Stack, thanks for the answers! I am

Removed /hbase and cluster won't start

2011-01-26 Thread Nanheng Wu
Hi, I am doing some tests on a HBase cluster and after a while (when the cluster reached capacity limit) I wanted to just remove all the data in it. Instead of dropping each table one by one I just removed /hbase directory from HDFS altogether. When I tried to restart the cluster I got errors

Compress output using HFileOutputFormat

2011-01-26 Thread Nanheng Wu
I am sorry if this has been asked before: To bulk load into HBase I am using a mapper only job to generate the HFiles and then run loadtable.rb. Everything seems fine now but I want to turn on GZIP compression on the table. I did HFileOutputFormat.setCompressOutput(job, true); in the MR job and

How to pass command to HBase shell?

2011-01-19 Thread Nanheng Wu
Hi, Sorry for the stupid question. I want to execute some hbase shell commands like list or create table from the command line directly, instead of through the interactive hbase shell. How can this be done? Thanks!

Re: Bulk load using HFileOutputFormat.RecordWriter

2011-01-07 Thread Nanheng Wu
, Jan 6, 2011 at 3:12 PM, Nanheng Wu nanhen...@gmail.com wrote: Yes, it only seconds. Just for several seconds I can see the table in the HBase UI but when I clicked through it I got an error about no entries were found in the .META. table. I guess it's not too bad since it's only a few seconds

Re: Bulk load using HFileOutputFormat.RecordWriter

2011-01-06 Thread Nanheng Wu
, Jan 5, 2011 at 3:54 PM, Nanheng Wu nanhen...@gmail.com wrote: Hi,  I am new to HBase and Hadoop and I am trying to find the best way to bulk load a table from HDFS to HBase. I don't mind creating a new table for each batch and what I understand using HFileOutputFormat directly in a MR job

Re: Bulk load using HFileOutputFormat.RecordWriter

2011-01-06 Thread Nanheng Wu
in .META. would be very helpful. On Thu, Jan 6, 2011 at 2:42 PM, Stack st...@duboce.net wrote: On Thu, Jan 6, 2011 at 10:17 AM, Nanheng Wu nanhen...@gmail.com wrote: Thanks for the answer Todd. I realized that I was making my life harder by using the low level record writer directly. Instead I

Bulk load using HFileOutputFormat.RecordWriter

2011-01-05 Thread Nanheng Wu
Hi, I am new to HBase and Hadoop and I am trying to find the best way to bulk load a table from HDFS to HBase. I don't mind creating a new table for each batch and what I understand using HFileOutputFormat directly in a MR job is the most efficient method. My input data set is already in sorted

Re: Bulk load questions

2010-12-29 Thread Nanheng Wu
certainly easier using TOF.  Unless you have special needs, I'd stick w/ TOF. Good luck, St.Ack On Mon, Dec 27, 2010 at 1:03 PM, Nanheng Wu nanhen...@gmail.com wrote: Thanks for the answers. I will use these as my basis for investigation. I am using a mapper only job, is it better to use

Number of regions per server

2010-12-28 Thread Nanheng Wu
Hi group, Which knob controls how many hregions each server should handle, or how I can control when a newly splitted region will go across region servers? I want to set a smaller hbase.hregion.max.filesize than the default, so that there will be more regions and they can quickly distribute

Bulk load questions

2010-12-27 Thread Nanheng Wu
I am running some tests to load data from HDFS into HBase in a MR job. I am pretty new to HBase and I have some questions regarding bulk load performance: I have a small cluster with 4 nodes, I set up one node to run Namenode/JobTracker/ZK, and the other three nodes all run

Re: Bulk load questions

2010-12-27 Thread Nanheng Wu
Thanks for the answers. I will use these as my basis for investigation. I am using a mapper only job, is it better to use the HBase client to write to HBase or TableOutputFormat? On Mon, Dec 27, 2010 at 8:38 AM, Stack st...@duboce.net wrote: On Mon, Dec 27, 2010 at 1:54 AM, Nanheng Wu nanhen

Recommended setup for a small cluster

2010-12-14 Thread Nanheng Wu
Hi, I am planning to set up HDFS and HBase on 3 or 4 hosts. What's the recommended strategy to use these hosts? I guess one node should be the name node and the rest be Datanodes, then is it advisable that I run HBase master and Zookeeper on the same host as the name node? If not, how should I

Re: Writing to HBase cluster on EC2 with Java client

2010-12-05 Thread Nanheng Wu
. There also seems to be a thrift interface for HBase. You could use the java thrift client to access HBase. These are the methods I am aware of. There could be better methods too. I would be interested in knowing them too :) Thanks Vijay On Sat, Dec 4, 2010 at 12:59 PM, Nanheng Wu nanhen

Re: Writing to HBase cluster on EC2 with Java client

2010-12-04 Thread Nanheng Wu
interface for HBase. You could use the java thrift client to access HBase. These are the methods I am aware of. There could be better methods too. I would be interested in knowing them too :) Thanks Vijay On Sat, Dec 4, 2010 at 12:59 PM, Nanheng Wu nanhen...@gmail.com wrote: Hi,  I set up

How to write to hbase cluster on ec2

2010-12-03 Thread Nanheng Wu
Hi, I set up a small test hbase cluster on ec2. If I want to now store some data in the cluster from outside ec2 using the java client, what should I do? I am very new to hbase and ec2 so any help would be appreciated! Best, Alex

Re: (Newbie) Use column family for versioning?

2010-11-25 Thread Nanheng Wu
. Lars On Nov 25, 2010, at 17:31, Nanheng Wu nanhen...@gmail.com wrote: Hello,  I am very new to HBase and I hope to get some feedback from the community on this: I want to use HBase to store some data with pretty simple structure: each key has ~50 attributes. These data are computed daily

Re: (Newbie) Use column family for versioning?

2010-11-25 Thread Nanheng Wu
reading multiple versions in one go? Lars On Nov 25, 2010, at 21:22, Nanheng Wu nanhen...@gmail.com wrote: Hi Lars, Thank you so much for the response. So if I understand correctly, if I want to use columns for my use-case I would keep adding columns to the row during each load where

Re: (Newbie) Use column family for versioning?

2010-11-25 Thread Nanheng Wu
way I need to know how you access your data. How often do you access older versions and are you accessing them separately or are you reading multiple versions in one go? Lars On Nov 25, 2010, at 21:22, Nanheng Wu nanhen...@gmail.com wrote: Hi Lars, Thank you so much for the response