HBase MapReduce Zookeeper

2011-07-19 Thread Andre Reiter
Hi folks, i'm running in an interesting issue: we have a zookeeper cluster running on 3 servers we run mapreduce jobs using org.apache.hadoop.conf.Configuration to pass parameters to our mappers the string based (key/value) approach is imho not the most elegant way, i would prefer to however

Re: how to restart a hbase cluster

2011-07-19 Thread Weihua JIANG
Thanks a lot, Stack. Now, I have a much clearer understanding. I think I made a mistake in my previous experimentation. Since I use CDH3 for testing, I shutdown the master using command service hadoop-hbase-master stop It turns to shutdown the master via hbase-daemon.sh which just send

Re: Hash indexing of HFiles

2011-07-19 Thread Claudio Martella
This looks great. Actually, more than BDZ, the intriguing part is CHM as it's order preserving. I guess how it behaves for unseen keys. Do you know about it? What did you find more intriguing on this topic? :) On 7/19/11 3:02 AM, Casey Stella wrote: I looked into MPH a while ago and came

Re: HBase MapReduce Zookeeper

2011-07-19 Thread Doug Meil
Hi there- re: that we have to reuse the Configuration object You are probably referring to this... http://hbase.apache.org/book.html#client.connections ... yes, that is general guidance on client connection.. re: do i have to create a pool of Configuration objects, to share them

Re: Hash indexing of HFiles

2011-07-19 Thread Casey Stella
I didn't get a chance to investigate thoroughly or get any benchmarks. We were looking for an alternate indexing strategy to B+ trees (with JDBM2) since we knew the keys a priori, but looking at the source I was a bit daunted at porting it and the license wasn't something that we could use. I

Re: HBase MapReduce Zookeeper

2011-07-19 Thread Andre Reiter
Hi Doug, thanks a lot for reply, it's clear, that there is a parameter for maxClientCnxns, which is 10 by default of course i could increase it to s.th. big. but like i said, the old connections are still there, and i cannot imagine, that this is a correct behaviour, to let them open

Re: HBase MapReduce Zookeeper

2011-07-19 Thread Stack
Configuration is not Comparable. Its instance identity that is used comparing Configurations down in the guts of HConnectionManager in 0.90.x hbase so even if you reuse a Configuration and tweak it per job, as far as HCM is concerned its the 'same'. Are you seeing otherwise? St.Ack On Tue, Jul

loadtable.rb reads from LocalFS than DFS

2011-07-19 Thread Dhaval Makawana
Hi, I am running loadtable.rb script to bulk upload file created by HFileOutputFormat. Following is my command line call to the script. /usr/lib/hbase/bin/hbase org.jruby.Main /usr/lib/hbase/bin/loadtable.rb MyTable /output_dir where /output_dir is path where mapreduce output is stored in

Re: loadtable.rb reads from LocalFS than DFS

2011-07-19 Thread Ted Yu
In TRUNK, puts 'DISABLED Use completebulkload instead. See tail of http://hbase.apache.org/bulk-loads.html' Which version of HBase are you using ? On Tue, Jul 19, 2011 at 6:50 AM, Dhaval Makawana dhaval.makaw...@gmail.comwrote: Hi, I am running loadtable.rb script to bulk upload file

Re: hbase table as a queue.

2011-07-19 Thread Daniel Einspanjer
We use a queue table like this too and ran into the same problem. How did you configure it such that it never splits? -Daniel On 7/16/11 4:24 PM, Stack wrote: I learned friday that our fellas on the frontend are using an hbase table to do simple queuing. They insert stuff to be processed by

RE: hbase table as a queue.

2011-07-19 Thread Michael Segel
I'm not sure how they are doing this, but just a quick thought... You can increase the file size 1-2GB as an example and then run compactions on a regular basis to clean up rows deleted from the queue. This will stop the table from splitting. The assumption is that your MAX_FILESIZE is much

Re: hbase table as a queue.

2011-07-19 Thread Stack
Set region size very large (In trunk you can actually disable splitting). St.Ack On Tue, Jul 19, 2011 at 8:26 AM, Daniel Einspanjer deinspan...@mozilla.com wrote: We use a queue table like this too and ran into the same problem.  How did you configure it such that it never splits? -Daniel

hbase + lucene?

2011-07-19 Thread Geoff Hendrey
Hi - At hadoop summit it was mentioned that there was a planning meeting for a project regarding hbase and lucene. I believe the meeting was scheduled for the day after the summit. I wasn't able to attend, but I would like to keep abreast of what's going on in this regard. Anyone know anything

Re: Hadoop/HBase Upgrade from 0.20.3 to 0.90.2

2011-07-19 Thread Joey Echeverria
Hey Andy, This looks like the log from the regions server. Any chance you can post the log for the HMaster? -Joey On Fri, Jul 15, 2011 at 12:43 AM, Zhong, Andy sheng.zh...@searshc.comwrote: St.Ack, It's weird. I did Hbase upgrade from Hbase 0.20.3 to Hbase 0.90.2 by following replace the

Re: hbase table as a queue.

2011-07-19 Thread Daniel Einspanjer
Cool. filed a task for us to work on that. https://bugzilla.mozilla.org/show_bug.cgi?id=672527 On 7/19/11 12:05 PM, Stack wrote: Set region size very large (In trunk you can actually disable splitting). St.Ack On Tue, Jul 19, 2011 at 8:26 AM, Daniel Einspanjer deinspan...@mozilla.com wrote:

RE: how to restart a hbase cluster

2011-07-19 Thread Buttler, David
Hi Stack, As a further data point, I always use the hbase-daemon.sh scripts to start/stop HBase. I modified the start/stop-hbase.sh scripts so that they don't start/stop zookeeper, and I have a modified version that I call start/stop-zookeeper.sh. This allows me to use HBase to manage

Re: how to restart a hbase cluster

2011-07-19 Thread highpointe
Dave, Would you be willing to post your custom scripts? Your setup sounds useful for what we are doing. Thanks. Sent from my iPhone On Jul 19, 2011, at 10:49 AM, Buttler, David buttl...@llnl.gov wrote: Hi Stack, As a further data point, I always use the hbase-daemon.sh scripts to

Re: hbase + lucene?

2011-07-19 Thread Stack
Here is the issue: https://issues.apache.org/jira/browse/HBASE-3529 And let me chase Jason to post his slides. St.Ack On Tue, Jul 19, 2011 at 9:12 AM, Geoff Hendrey ghend...@decarta.com wrote: Hi - At hadoop summit it was mentioned that there was a planning meeting for a project regarding

RE: HBase Read and Write Issues in Mutlithreaded Environments

2011-07-19 Thread Srikanth P. Shreenivas
Doug, St.ack, We changed our production setup to CDH3 to resolve below mentioned issue. I noticed that even though the severs were running JDK 1.6 u25 (as per JAVA_HOME in hbase-env.sh), I still ran into read taking more a than minute issue. So, I have added -XX:+UseMembar and it seems to be

Re: hbase + lucene?

2011-07-19 Thread Gary Helmling
I wasn't at the day-after presentation, but I believe these are the slides? https://docs.google.com/viewer?a=vpid=explorerchrome=truesrcid=0B2c-FWyLSJBCN2E5MTdmOGMtY2U5NS00NmEwLWE2NmItZTYxOTI0MTJmMzU5hl=en_US On Tue, Jul 19, 2011 at 10:29 AM, Stack st...@duboce.net wrote: Here is the issue:

Re: hbase table as a queue.

2011-07-19 Thread Gary Helmling
All excellent points here in terms of tuning! For the higher-level question about using a table as a queue, I just wanted to add in a link to the Lily guys' rowlog library, since it does exactly that: http://www.lilyproject.org/lily/about/playground/hbaserowlog.html On Tue, Jul 19, 2011 at

RE: how to restart a hbase cluster

2011-07-19 Thread Buttler, David
They are not really worth posting: ${HBASE_HOME}/bin/hbase-daemon.sh start master ssh node1 ${HBASE_HOME}/bin/hbase-daemon.sh start regionserver ... --- Start-zookeeper.sh: ${HBASE_HOME}/bin/hbase-daemon.sh start zookeeper I need to set up pdsh on all of my clusters so I have a more consistent

Hbase-Indexing

2011-07-19 Thread Chaitali Shah
Can anyone please giude for indexing in Hbase. Thanks. -- View this message in context: http://old.nabble.com/Hbase-Indexing-tp32094911p32094911.html Sent from the HBase User mailing list archive at Nabble.com.

Re: Hbase-Indexing

2011-07-19 Thread Blake Lemoine
Here's a page that might be of use to you. It points to two methods of indexing that are equivalent. I'm currently building indexes in the manner described by them. http://nosql.mypopescu.com/post/410963261/hbase-secondary-indexes Blake Lemoine On Tue, Jul 19, 2011 at 4:19 PM, Chaitali Shah

Re: Hbase-Indexing

2011-07-19 Thread Ted Yu
The second solution in the blog below is no longer supported by stock HBase (0.90.x and beyond). On Tue, Jul 19, 2011 at 2:25 PM, Blake Lemoine bal2...@gmail.com wrote: Here's a page that might be of use to you. It points to two methods of indexing that are equivalent. I'm currently building

Re: how to restart a hbase cluster

2011-07-19 Thread Stack
On Tue, Jul 19, 2011 at 9:49 AM, Buttler, David buttl...@llnl.gov wrote: Sometimes the region servers don't die when I want them to, so I have another script that calls the hbase-daemon.sh stop regionserver script in parallel on all of the machines.  Only rarely do I have to kill -9 one.  

Re: Multiple column families or multiple tables?

2011-07-19 Thread Sheng Chen
Very helpful, thank you. Sheng 2011/7/19 Zhoushuaifeng zhoushuaif...@huawei.com Hi, By my understanding, it's according the data size and access pattern. If the data size in the two families are significantly different it's better to use different tables. Multiple families with few data

Hbase - limit connection per IP?

2011-07-19 Thread King JKing
Dear all, I application run well with HBase. But when I deploy my application to 10 instances, the eleventh application not run. Maybe HBase have limit connection per IP? How can I fix my problem? I use zookeeper dump to see connection from my server to Hbase server. It show 10 connection. My