Re: MapReduce Program for the bulkUpload into the hbase-0.89

2011-02-02 Thread Mark Kerzner
Jason, attached is RowCounter.java from 0.90 test code - I think it has what you need. Cheers, Mark On Wed, Feb 2, 2011 at 12:33 AM, Jason Bourne prabas...@gmail.com wrote: Hi, I need to upload the bulk load of data into the Hbase-0.89 using the Map Reduce program, I tried with the

Re: MapReduce Program for the bulkUpload into the hbase-0.89

2011-02-02 Thread praba karan
Thanks Mark, But this is now what I need? I am trying to upload the bulk of data from hdfs file system to the Hbase-0.89. I need the Map Reduce program for that. Regards Jason On Wed, Feb 2, 2011 at 7:46 PM, Mark Kerzner markkerz...@gmail.com wrote: Jason, attached is RowCounter.java

Re: MapReduce Program for the bulkUpload into the hbase-0.89

2011-02-02 Thread Stack
See http://hbase.apache.org/bulk-loads.html St.Ack On Wed, Feb 2, 2011 at 3:26 PM, praba karan prabas...@gmail.com wrote: Thanks Mark, But this is now what I need? I am trying to upload the bulk of data from hdfs file system to the Hbase-0.89. I need the Map Reduce program for that.

Re: MapReduce Program for the bulkUpload into the hbase-0.89

2011-02-02 Thread praba karan
Yeah, I had seen this. I developed the Map Reduce program based on the Source Codes of Importtsv. I am getting an exception like NoRegionForServer exception. I am trying to resolve this too. So I am Requesting the users to post the complete Map Reduce program, so that Everyone can understand it

Re: MapReduce Program for the bulkUpload into the hbase-0.89

2011-02-02 Thread Mark Kerzner
Well Praba, we are in the same boat. I also feel that there is not enough published examples, and I have started accumulating mine here, http://hadoopinpractice.com/code.html http://hadoopinpractice.com/code.htmlCheers, Mark On Wed, Feb 2, 2011 at 9:56 AM, praba karan prabas...@gmail.com wrote:

Re: MapReduce Program for the bulkUpload into the hbase-0.89

2011-02-02 Thread praba karan
Yeah Mark! I proceeded certain distance and struct well. Need some more inputs from the well known guys. Well we ll wait with hope!! Regards Prabkaran On Wed, Feb 2, 2011 at 9:31 PM, Mark Kerzner markkerz...@gmail.com wrote: Well Praba, we are in the same boat. I also feel that there is

Re: Upgrading from HBase 0.20 to 0.89 code question

2011-02-02 Thread praba karan
Hey Ryan, I just got uploaded the small sample data into the Hbase-0.89. I will post the Map Reduce code after completing the test. I need to get rid of the exception which I am facing now. When I run the Map Reduce program in my machine I am getting the following error.

Region Balancing

2011-02-02 Thread Wayne
I know there were some changes in .90 in terms of how region balancing occurs. Is there a resource somewhere that describes the options for the configuration? Per Jonathan Gray's recommendation we are trying to keep our region count down to 100 per region server (we are up to 5gb region size).

Re: Tables rows disappear

2011-02-02 Thread Something Something
Stack - Any thoughts on this? On Mon, Jan 31, 2011 at 6:27 PM, Something Something mailinglist...@gmail.com wrote: 1) Version numbers: hadoop-0.20.2 hbase-0.20.6 2) autoFlush to 'true' works, but wouldn't that slow down the insertion process? 3) Here's how I had set it up: In my

Re: Tables rows disappear

2011-02-02 Thread Ryan Rawson
I'm guessing that you arent having as clean as a shutdown as you might think if you are seeing tables dissapear. Here is a quick way to tell, if you think table 'x' should exist, but it doesnt seem to, do this: bin/hadoop fs -ls /hbase/x if that directory exists, I think you might be running

Re: Region Balancing

2011-02-02 Thread Stack
In shell is a move command. You can also force running of balancer (or even turn it off). For how the balancer works, in short, it runs every 5 minutes by default (configurable) and when it runs, using its in-memory notion of how the cluster is balanced, creates move plans that are immediately

Re: Region Balancing

2011-02-02 Thread Wayne
The regions counts are the same per region server which is good. My problem is that I have 5 tables and several region servers only serve 1 table's regions. I would like to round robin and scatter all tables across all region servers. Basically the distribution is not round-robin enough. Manually

Re: Region Balancing

2011-02-02 Thread Stack
On Wed, Feb 2, 2011 at 8:41 PM, Wayne wav...@gmail.com wrote: The regions counts are the same per region server which is good. My problem is that I have 5 tables and several region servers only serve 1 table's regions. I wonder if this an effect of our deploying splits to same server as split

Keyword schema

2011-02-02 Thread Peter Haidinyak
Hi all, I was just tasked to take the keywords used for a search and put them in HBase so we can slice and dice them. They are interested in standard stuff like highest frequency word, word pairs, etc. I know I'm not the first to do this so does anyone have a recommendation on

Re: Region Balancing

2011-02-02 Thread Sebastian Bauer
I have small question, is there any method to get region numberOfRequests, master server have server.getLoad().getNumberOfRequests() but i cannot find any similar to region? On 02.02.2011 22:06, Stack wrote: On Wed, Feb 2, 2011 at 8:41 PM, Waynewav...@gmail.com wrote: The regions counts

HBase client side configuration example?

2011-02-02 Thread Jérôme Verstrynge
Hi, I have managed to successfully install HBase on a remote linux node (using Cloudera's CDH3). The next step is to implement a small Java application to connect to it. I have tried to find some documentation to configure HBaseConfiguration, but I could not find proper examples for

Re: Region Balancing

2011-02-02 Thread Wayne
hbase.master.startup.retainassign=false works like a charm. After a restart all tables are scattered across all region servers. Thanks! On Wed, Feb 2, 2011 at 4:06 PM, Stack st...@duboce.net wrote: On Wed, Feb 2, 2011 at 8:41 PM, Wayne wav...@gmail.com wrote: The regions counts are the same

Re: HBase client side configuration example?

2011-02-02 Thread Ted Yu
You should be able to find them under remote linux node's $HBASE_HOME/conf On Wed, Feb 2, 2011 at 1:57 PM, Jérôme Verstrynge jvers...@gmail.comwrote: Hi, I have managed to successfully install HBase on a remote linux node (using Cloudera's CDH3). The next step is to implement a small Java

RE: HBase client side configuration example?

2011-02-02 Thread Peter Haidinyak
I connect to HBase from Java using the following final Configuration configuration = HBaseConfiguration.create(); configuration.clear(); configuration.set(hbase.zookeeper.quorum, caiss01a,caiss01b)); // - Our two zookeeper machines

Re: Keyword schema

2011-02-02 Thread Jean-Daniel Cryans
I don't think HBase is really needed here, unless you somehow need random read/write to those search queries. J-D On Wed, Feb 2, 2011 at 1:27 PM, Peter Haidinyak phaidin...@local.com wrote: Hi all,        I was just tasked to take the keywords used for a search and put them in HBase so we

Fastest way to read only the keys of a HTable?

2011-02-02 Thread Something Something
I want to read only the keys in a table. I tried this... try { HTable table = new HTable(myTable); Scan scan = new Scan(); scan.addFamily(Bytes.toBytes(Info)); ResultScanner scanner = table.getScanner(scan); Result result = scanner.next(); while (result != null) { so on...

Re: Fastest way to read only the keys of a HTable?

2011-02-02 Thread Stack
See http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html St.Ack On Thu, Feb 3, 2011 at 6:01 AM, Something Something mailinglist...@gmail.com wrote: I want to read only the keys in a table. I tried this...    try {  HTable table = new HTable(myTable);  

Re: Fastest way to read only the keys of a HTable?

2011-02-02 Thread Something Something
Thanks. So I will add this... scan.setFilter(new FirstKeyOnlyFilter()); But after I do this... Result result = scanner.next(); There's no... result.getKey() - so what method would give me the Key value? On Wed, Feb 2, 2011 at 10:20 PM, Stack st...@duboce.net wrote: See

Re: Fastest way to read only the keys of a HTable?

2011-02-02 Thread Stack
I don't see a getKey on Result. Use http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#getRow(). Here is how its used in the shell table.rb class: # Count rows in a table def count(interval = 1000, caching_rows = 10) # We can safely set scanner caching with

Re: Keyword schema

2011-02-02 Thread Pete Haidinyak
I will be updating the keywords and their frequency every X minutes so I don't believe M/R would work well but I could be wrong. I've been doing this approach with other data and have been receiving sub 1 second on my queries with two overtaxed servers. I figured this problem has been solved

Re: Keyword schema

2011-02-02 Thread Ted Dunning
A small map-reduce program could do updates to Hbase or if your incremental data is relatively small, you can do the update one by one. This can work fine, but it doesn't really solve the top-100 term problem. For that, it may be nice to have an occasional MR program that over-produces a list of