Re: importtsv bulk load error

2011-11-17 Thread Bill Graham
Make sure guava.jar is in your classpath. On Thu, Nov 17, 2011 at 12:23 PM, Denis Kreis de.kr...@gmail.com wrote: Hi, i'm getting this error when trying to use the importtsv tool with hadoop-0.20.205.0 and hbase-0.92.0 hadoop jar ../../hbase-0.92.0-SNAPSHOT/hbase-0.92.0-SNAPSHOT.jar

Re: Development

2011-08-26 Thread Bill Graham
Before you close your laptop and take it home, do you gracefully stop your local HBase instance? When I do this I'm able to start up from home without a problem. But when I forget, all goes to crap. On Fri, Aug 26, 2011 at 9:15 AM, Mark static.void@gmail.com wrote: Ok so I'm not the only

Re: Versioning

2011-08-26 Thread Bill Graham
This issue is a common pitfall to those new to HBase and I think it could be a good thing to have in the HBase book. Once someone realizes that you can store multiple values for the same cell, each with a timestamp there can be a natural tendency to think hey, I can store a one-to-many using

Re: mini-hbase configuration for tests

2011-08-15 Thread Bill Graham
Hey Garrett, I'm not sure about a config setting but in Pig we changed TestHBaseStorage to delete all rows of tables instead of truncate them. This was faster since the tables are typically small in tests. See Dymitriy's note in the deleteAllRows method here:

Re: Reg. support for using HBase as a source and sink for a Map-Reduce streaming job

2011-08-07 Thread Bill Graham
Yes, you can do this via the thrift API: http://yannramin.com/2008/07/19/using-facebook-thrift-with-python-and-hbase/ Alternatively you can use Pig's HBaseStorage (r/w), or HBase's ImportTsv (w). On Sun, Aug 7, 2011 at 5:35 AM, Varadharajan Mukundan srinath...@gmail.comwrote: Greetings,

Re: export data from HBase to mysql

2011-06-23 Thread Bill Graham
AFAIK, sqoop can only write to HBase at this point. We use Pig's HBaseStorage class to read from HBase and transform data for import into other system, which has worked well for us. On Thu, Jun 23, 2011 at 11:38 AM, Vishal Kapoor vishal.kapoor...@gmail.comwrote: thought it was only

Re: any multitenancy suggestions for HBase?

2011-06-20 Thread Bill Graham
with! Will do, thanks! Gary On Mon, Jun 20, 2011 at 10:06 AM, Bill Graham billgra...@gmail.comwrote: Thanks Dean, that sounds similar to the approach we're considering. Andy, I can see value in having ACLs on a per-column-pattern (or maybe just per-prefix to make multiple pattern conflict

Re: Difficulty using importtsv tool

2011-06-15 Thread Bill Graham
Try removing the spaces in the column list, i.e. commas only. On Tue, Jun 14, 2011 at 11:29 PM, James Ram hbas...@gmail.com wrote: Hi, I'm having trouble with using the importtsv tool. I ran the following command: hadoop jar hadoop_sws/hbase-0.90.0/hbase-0.90.0.jar importtsv

Re: Best way to Import data from Cassandra to HBase

2011-06-14 Thread Bill Graham
Also, you might want to look at HBASE-3880, which is committed but not released yet. It allows you to specify a custom Mapper class when running ImportTsv. It seems like a similar patch to make the input format plug-able would be needed in your case though. On Tue, Jun 14, 2011 at 9:53 AM, Todd

Re: exporting from hbase as text (tsv)

2011-06-06 Thread Bill Graham
You can do this in a few lines of Pig, check out the HBaseStorage class. You'll need to now the names of your column families, but besides that it could be done fairly generically. On Mon, Jun 6, 2011 at 3:57 PM, Jack Levin magn...@gmail.com wrote: Hello, does anyone have any tools you could

Re: Reading a Hdfs file using HBase

2011-06-06 Thread Bill Graham
You can load the HDFS files into HBase. Check out importtsv to generate HFiles and completebulkload to load them into a table: http://hbase.apache.org/bulk-loads.html On Mon, Jun 6, 2011 at 9:38 PM, James Ram hbas...@gmail.com wrote: Hi, I too have the same situation. The data in HDFS

Re: feature request (count)

2011-06-03 Thread Bill Graham
One alternative option is to calculate some stats during compactions and store that somewhere for retrieval. The metrics wouldn't be up to date of course, since they've be stats from the last compaction time. I think that would still be useful info to have, but it's different than what's being

Re: How to efficiently join HBase tables?

2011-05-31 Thread Bill Graham
We use Pig to join HBase tables using HBaseStorage which has worked well. If you're using HBase = 0.89 you'll need to build from the trunk or the Pig 0.8 branch. On Tue, May 31, 2011 at 5:18 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: The Hive-HBase integration allows you to

Re: Any trigger like facility for HBase tables

2011-05-26 Thread Bill Graham
don't think so. On Tue, May 24, 2011 at 1:45 PM, Himanish Kushary himan...@gmail.com wrote: Thanks Ted and Bill. Will take a look into both of these. I am using CDH3, does it have co-processors ? On Tue, May 24, 2011 at 3:24 PM, Bill Graham billgra...@gmail.com wrote

Re: HBase Not Starting after improper shutdown

2011-05-23 Thread Bill Graham
Is there anything meaningful in the RS logs? I've seen situations like this where a RS is failing to start due to issues reading the WAL. If this is the case it would list which WAL is problematic, which is zero-length in my experience, so I delete it from HDFS and things start up. On Mon, May

Re: IO Error when using multiple HBaseStorage in PIG

2011-05-21 Thread Bill Graham
the HConnectionManager.deleteAllConnections() call so that more than one STORE commands can be used at the same time. But I am not sure how HConnectionManager.deleteAllConnections() itself can be triggered by a PIG command. Cheers On Fri, May 20, 2011 at 11:07 PM, Bill Graham billgra...@gmail.comwrote

Re: any static column name behavior in hbase? (ie. not storing column name per row)

2011-05-11 Thread Bill Graham
HBase will always need to store the column name in each cell that uses it. The only way to reduce the size taken by storing repeated column names (besides using compression) is to instead store a small pointer to a lookup table that holds the column name. Check out OpenTSDB, which does something

Re: What is the recommended way to get pig 0.8 to talk with CDH3u0 HBase

2011-04-24 Thread Bill Graham
I had this issue and had to add the HBase conf dir to HADOOP_CLASSPATH in conf/hadoop-env.sh on each of the nodes in the cluster so they could find Zookeeper. On Sun, Apr 24, 2011 at 1:04 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: I suspect the problem here is that you don't have your hbase

Re: HBase - Map Reduce - Client Question

2011-04-19 Thread Bill Graham
We've been using pig to read bulk data from hdfs, transform it and load it into HBase using the HBaseStorage class, which has worked well for us. If you try it out you'll want to build from the 0.9.0 branch (being cut as we speak I beleive) or the trunk. There's an open pig JIRA with a patch to

Re: HBase is not ready for Primetime

2011-04-12 Thread Bill Graham
Agreed. I've seen similar issues when upon startup where for whatever reason an hlog (often empty) can't be read, which hangs the startup process. Manually deleting it from HDFS clears the issue. On Tue, Apr 12, 2011 at 10:01 AM, Jinsong Hu jinsong...@hotmail.com wrote: You probably should stop

Tips on pre-splitting

2011-03-29 Thread Bill Graham
I've been thinking about this topic lately so I'll fork from another discussion to ask if anyone has a good approach to determining keys for pre-splitting from a known dataset. We have a key scenario similar to what Ted describes below. We periodically run MR jobs to transform and bulk load data

region in a bad state - how to manually fix

2011-03-29 Thread Bill Graham
Hi, We have an empty table that is somehow in a bad state that I'm unable to disable or drop. We're running 0.90.0 on CDH3b2. Is there a way that I can manually remove this table from HBase without making a mess of things? The table has 2 CFs and it's empty. When I do a scan I get this:

Re: Tips on pre-splitting

2011-03-29 Thread Bill Graham
with Pig. Assuming reducer output file is SequenceFile, steps 2 and 3 can be automated. On Tue, Mar 29, 2011 at 2:15 PM, Bill Graham billgra...@gmail.com wrote: I've been thinking about this topic lately so I'll fork from another discussion to ask if anyone has a good approach to determining keys

Re: Tips on pre-splitting

2011-03-29 Thread Bill Graham
want are the keys. On Tue, Mar 29, 2011 at 2:15 PM, Bill Graham billgra...@gmail.com wrote: 1. use Pig to read in our datasets, join/filter/transform/etc before writing the output back to HDFS with N reducers ordered by key, where N is the number of splits we'll create. 2. Manually plucking

Re: Row Counters

2011-03-16 Thread Bill Graham
Back to the issue of keeping a count, I've often wondered if this would be easy to do without much cost at compaction time? It of course wouldn't be a true real-time total but something like a compactedRowCount. It could be a useful metric to expose via JMX to get a feel for growth over time. On

Re: Data is always written to one node

2011-03-15 Thread Bill Graham
On Mon, Mar 14, 2011 at 8:54 PM, Stack st...@duboce.net wrote: On Mon, Mar 14, 2011 at 4:09 PM, Bill Graham billgra...@gmail.com wrote: Anyway, it's been about a week and all regions for the table are still on 1 node. I see messages like this in the logs every 5 minutes: 2011-03-14 15:59

Re: Data is always written to one node

2011-03-14 Thread Bill Graham
I hope I'm not hijacking the thread but I'm seeing what I think is a similar issue. About a week ago I loaded a bunch of data into a newly created table. It took about an hour and resulted in 12 regions being created on a single node. (Afterwards I remembered a conversation with JD where he

Re: intersection of row ids

2011-03-11 Thread Bill Graham
You could also do this with MR easily using Pig's HBaseStorage and either an inner join or an outer join with a filter on null, depending on if you want matches or misses, respectively. On Fri, Mar 11, 2011 at 4:25 PM, Usman Waheed usm...@opera.com wrote: I suggest it to be ROWCOL because you

Re: stop-hbase.sh bug or feature?

2011-03-04 Thread Bill Graham
? For example: rs=$(cat ${HBASE_CONF_DIR}/regionservers | xargs) if ( whiptail --yesno Do you want to shutdown the cluster with the following regionserver $rs\n[y/n] 10 40 ) then        # proceed with the shutdown else        # exit fi On 03/02/2011 05:23 PM, Bill Graham wrote: Hi, We had

Re: stop-hbase.sh bug or feature?

2011-03-03 Thread Bill Graham
at 5:23 PM, Bill Graham billgra...@gmail.com wrote: So the question is, is this a bug or a feature? If it's a feature it seems like an incredibly dangerous one. Once our live cluster is running, those configs will also be needed on the client so really bad things could happen by mistake. One

Re: min, max

2011-03-03 Thread Bill Graham
This first region starts with an empty byte[] and the last region ends with one. Those in between have non-empy byte[]s to specify their boundaries. On Thu, Mar 3, 2011 at 7:18 AM, Weishung Chung weish...@gmail.com wrote: Thanks, Stack! Got a few more questions. Does every region start with

stop-hbase.sh bug or feature?

2011-03-02 Thread Bill Graham
Hi, We had a troubling experience today that I wanted to share. Our dev cluster got completely shut down by a developer by mistake, without said developer even realizing it. Here's how... We have multiple sets of HBase configs checked into SVN that developers can checkout and point their

Re: FilterList not working as expected

2011-02-18 Thread Bill Graham
Just to follow up, this appears to be a bug. I've created a JIRA. https://issues.apache.org/jira/browse/HBASE-3550 On Fri, Feb 18, 2011 at 10:57 AM, Bill Graham billgra...@gmail.com wrote: Hi, I'm unable to get ColumnPrefixFilter working when I use it in a FilterList and I'm wondering

Re: multiple masters

2011-01-28 Thread Bill Graham
I also don't have a solid understanding of the responsibilities of master, but it seems like it's job is really about managing regions (i.e., coordinating splits and compactions, etc.) and updating ROOT and META. Is that correct? On Fri, Jan 28, 2011 at 9:31 AM, Weishung Chung weish...@gmail.com

HBaseStorage feature review

2011-01-27 Thread Bill Graham
Hello all, I'm working on a patch to HBaseStorage to support additional functionality with respect to how columns are specified and how HBase data is converted into Pig data structures. If you use Pig to read HBase data, please take a look at this JIRA and provide feedback if you have it:

Re: LZO Codec not found

2011-01-25 Thread Bill Graham
This wiki shows how to build the lzo jar: http://wiki.apache.org/hadoop/UsingLzoCompression You'll get that exception if the jar is not found in lib/. On Tue, Jan 25, 2011 at 10:38 AM, Peter Haidinyak phaidin...@local.com wrote: Hi        I am using HBase version .89.20100924+28 and Hadoop

Region is not online: -ROOT-,,0

2011-01-25 Thread Bill Graham
Hi, A developer on our team created a table today and something failed and we fell back into the dire scenario we were in earlier this week. When I got on the scene 2 of our 4 regions had crashed. When I brought them back up, they wouldn't come online and the master was scrolling messages like

Re: Region is not online: -ROOT-,,0

2011-01-25 Thread Bill Graham
, 2011 at 3:27 PM, Bill Graham billgra...@gmail.com wrote: Hi, A developer on our team created a table today and something failed and we fell back into the dire scenario we were in earlier this week. When I got on the scene 2 of our 4 regions had crashed. When I brought them back up

Re: Region is not online: -ROOT-,,0

2011-01-25 Thread Bill Graham
Thanks for the comments. Attached is the log file from the master after the restart. The last error message was repeated every second. See comments below. On Tue, Jan 25, 2011 at 7:20 PM, Stack st...@duboce.net wrote: On Tue, Jan 25, 2011 at 3:27 PM, Bill Graham billgra...@gmail.com wrote: Hi

Re: delete using server's timestamp

2011-01-21 Thread Bill Graham
If you use some combination of delete requests and leave a row without any column data will the row/rowkey still exist? I'm thinking of the use case where you want to prune all old data, including row keys, from a table. On Fri, Jan 21, 2011 at 2:04 PM, Ryan Rawson ryano...@gmail.com wrote:

Re: delete using server's timestamp

2011-01-21 Thread Bill Graham
, -ryan On Fri, Jan 21, 2011 at 3:26 PM, Bill Graham billgra...@gmail.com wrote: I follow the tombstone/compact/delete cycle of the column values, but I'm still unclear of the row key life cycle. Is it that the bytes that represent the actual row key are associated with and removed with each

HBase fails to start - DataXceiver Version Mismatch

2011-01-10 Thread Bill Graham
Hi, Today I upgraded from Hadoop 0.20.1 to CHD3b2 0.20.2 to get the append functionality that HBase requires and now I can't start HBase. Hadoop and HDFS seem to be working just fine, but when I start up the HBase master, I get this error in the NNs: 2011-01-10 21:20:36,134 ERROR

Re: HBase fails to start - DataXceiver Version Mismatch

2011-01-10 Thread Bill Graham
with the same version from your HDFS install. -Todd On Mon, Jan 10, 2011 at 9:45 PM, Bill Graham billgra...@gmail.com wrote: Hi, Today I upgraded from Hadoop 0.20.1 to CHD3b2 0.20.2 to get the append functionality that HBase requires and now I can't start HBase. Hadoop and HDFS seem

Re: HBase fails to start - DataXceiver Version Mismatch

2011-01-10 Thread Bill Graham
, Bill Graham billgra...@gmail.com wrote: Thanks for the quick reply Todd. I did that before I first tried starting HBase, but I'm still seeing the issues. Any other suggestions? On Mon, Jan 10, 2011 at 10:00 PM, Todd Lipcon t...@cloudera.com wrote: Hi Bill, You simply need to replace

Re: Scheduling map/reduce jobs

2011-01-05 Thread Bill Graham
Take a look at Oozie or Azkaban: http://www.quora.com/What-are-the-differences-advantages-disadvantages-of-Azkaban-vs-Oozie On Wed, Jan 5, 2011 at 9:35 AM, Peter Veentjer alarmnum...@gmail.com wrote: He Guys, although it isn't completely related to HBase. Is there support for scheduling map

Re: provide a 0.20-append tarball?

2010-12-21 Thread Bill Graham
Hi Andrew, Just to make sure I'm clear, are you saying that HBase 0.90.0 is incompatible with CDH3b3 due to the security changes? We're just getting going with HBase and have been running 0.90.0rc1 on an un-patched version of Hadoop in dev. We were planning on upgrading to CDH3b3 to get the sync