Re: Call for Presentations - HBase User group meeting

2014-11-12 Thread Ryan Rawson
even present a few of your own! -ryan On Mon, Nov 10, 2014 at 2:58 PM, Ryan Rawson ryano...@gmail.com wrote: Hi all, The next HBase user group meeting is on November the 20th. We need a few more presenters still! Please send me your proposals - summary and outline of your talk! Thanks

Call for Presentations - HBase User group meeting

2014-11-10 Thread Ryan Rawson
Hi all, The next HBase user group meeting is on November the 20th. We need a few more presenters still! Please send me your proposals - summary and outline of your talk! Thanks! -ryan

Re: Silly question... Coprocessor writes to ZK?

2014-09-05 Thread Ryan Rawson
I guess my thought is that it'd be nice to minimize dependency on ZK, and eventually remove it all together. It just adds too much deployment complexity, and code complexity -- about 1 lines of code. I do like the notion of HBase self-hosting it's own performance data, it's what Oracle and

HBase won't run on OSX 10.8

2012-07-31 Thread Ryan Rawson
Hi all, Something has changed in how OSX and java handles IPv6, and now you will get a log like: 2012-07-31 18:21:39,824 INFO org.apache.hadoop.hbase.master.HMaster: Server active/primary master; 0:0:0:0:0:0:0:0%0, 59736,1343784093521, sessionid=0x138dfc60416, cluster-up flag was=false

Re: HBase won't run on OSX 10.8

2012-07-31 Thread Ryan Rawson
I shall try that. I submitted a patch too that quashes the extra % where it is causing problems. On Tue, Jul 31, 2012 at 6:28 PM, Andrew Purtell apurt...@apache.org wrote: -Djava.net.preferIPv4Stack=true ? Does that still work? On Tue, Jul 31, 2012 at 6:24 PM, Ryan Rawson ryano

Re: speeding up rowcount

2011-10-09 Thread Ryan Rawson
Are you sure the job is running on the cluster and not running in single node mode? This happens a lot... On Oct 9, 2011 7:50 AM, Rita rmorgan...@gmail.com wrote: Hi, I have been doing a rowcount via mapreduce and its taking about 4-5 hours to count a 500million rows in a table. I was

Re: [announce] Accord: A high-performance coordination service for write-intensive workloads

2011-09-23 Thread Ryan Rawson
Did you guys run HBase with accord and see improved performance? What other hooks can you tell us that would be worth the immense task of learning the ins and outs of a new distributed system? Performance is great, but you can hack around that, and HBase is not a heavy user of ZK. -ryan On Fri,

Re: HBase with Castle - faster HBase?

2011-09-05 Thread Ryan Rawson
I saw the acunu guy at oscon data, and from what I could tell is they completely rewrote Cassandra to get out of java land... On Sep 5, 2011 1:50 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, Has anyone done any work towards making HBase work with Castle? (looks like negative:

Re: HBase and Cassandra on StackOverflow

2011-08-30 Thread Ryan Rawson
The Hdfs write pipeline is synchronous, so there is no window. On Aug 30, 2011 4:35 AM, Sam Seigal selek...@yahoo.com wrote: A question inline: On Tue, Aug 30, 2011 at 2:47 AM, Andrew Purtell apurt...@apache.org wrote: Hi Chris, Appreciate your answer on the post. Personally speaking

Re: HBase and Cassandra on StackOverflow

2011-08-30 Thread Ryan Rawson
I really like the theory of operation stuff. People say that centralized operation is a flaw, but I say it's a strength. In a single datacenter, you have extremely fast .1ms ping or less, there is no need for a fully decentralized architecture - it can be really hard to debug. -ryan On Tue, Aug

Re: HBase and Cassandra on StackOverflow

2011-08-30 Thread Ryan Rawson
While data is not fsynced to disk immediately, it is acked by 3 different nodes (Assuming r=3) before HBase acks the client. -ryan On Tue, Aug 30, 2011 at 1:04 PM, Joseph Boyd joseph.b...@cbsinteractive.com wrote: On Tue, Aug 30, 2011 at 12:22 PM, Sam Seigal selek...@yahoo.com wrote: Will the

Re: HBase and Cassandra on StackOverflow

2011-08-30 Thread Ryan Rawson
On Tue, Aug 30, 2011 at 10:42 AM, Joe Pallas joseph.pal...@oracle.com wrote: On Aug 30, 2011, at 2:47 AM, Andrew Purtell wrote: Better to focus on improving HBase than play whack a mole. Absolutely.  So let's talk about improving HBase.  I'm speaking here as someone who has been learning

Re: Avatar namenode?

2011-08-18 Thread Ryan Rawson
There are a few problems for Avatar node which would prevent me from ever using it: - assumption of highly available NFS, this would typically mean specialized hardware - failover time is potentially lengthy (article says 60 seconds), and HBase regionservers might fail It's an interesting hack,

Re: About puppet and fabric (WAS: operational overhead for HBase)

2011-08-17 Thread Ryan Rawson
I think my assessment would be that everyone has their pre chosen toolset and goes with it. You can make any of them work (with enough effort). Personally, we are using chef. They are building service orchestration, which few toolsets support. On Aug 17, 2011 1:42 PM, Alex Holmes

Re: Finding the trace of a query

2011-08-11 Thread Ryan Rawson
Why not just read the source code? It isnt that many LOC, and it doesnt really use anything that obscures the call chain, few interfaces, etc. A solid IDE with code inspection will make short work of it, just go at it! Start at HRegionServer - it has the top level RPC calls that are made.

Re: Mongo vs HBase

2011-08-10 Thread Ryan Rawson
Mongodb does an excellent job at single node scalability - they use mmap and many smart things and really kick ass ... ON A SINGLE NODE. That single node must have raid (raid it going out of fashion btw), and you wont be able to scale without resorting to: - replication (complex setup!) -

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-05 Thread Ryan Rawson
The IO fencing was an accidental byproduct of how HDFS-200 was implemented, so in fact, HBase won't run correctly on HDFS-265 which does NOT have that IO fencing, right? On Fri, Aug 5, 2011 at 9:42 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Fri, Aug 5, 2011 at 8:52 AM, M. C. Srivas

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-04 Thread Ryan Rawson
Another possibility is the logs were not replayed correctly during the region startup. We put in a lot of tests to cover this case, so it should not be so. Essentially the WAL replay looks at the current HFiles state, then decides which log entries to replay or skip. This is because a log might

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-04 Thread Ryan Rawson
? The tables are very small and inactive (probably only 50-100 rows changing per day). Thanks, Jacques On Thu, Aug 4, 2011 at 9:09 AM, Ryan Rawson ryano...@gmail.com wrote: Another possibility is the logs were not replayed correctly during the region startup.  We put in a lot of tests

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-04 Thread Ryan Rawson
Yes, that is what JD is referring to, the so-called IO fence. It works like so: - regionserver is appending to an HLog, continues to do so, hasnt gotten the ZK kill yourself signal yet - hmaster splits the logs - the hmaster yanks the writer from under the regionserver, and the RS then starts to

Re: So Bad Random Read Performance

2011-07-30 Thread Ryan Rawson
What is the level of concurrency? I find that HDFS gets worse performing with more concurrent read threads. -ryan 2011/7/30 seven garfee garfee.se...@gmail.com: hi,all I set up a cluster on 4 machine.(1 HMaster,4RegionServer). Each Machine has a 16G mem, one 2T Sata disk,CentOS 5.3,XFS .

Re: Monitoring

2011-07-25 Thread Ryan Rawson
But surely for logical consistency, we should not favor one vendor (as we have been for a year now), over another. So would it be correct to continue to suggest to users they use CDH? After all, even though it is ASF2.0 and free, it is still giving one vendor a leg up over others (including

Re: JSR-347

2011-07-13 Thread Ryan Rawson
How do you intend on addressing gc scalability? On Jul 13, 2011 9:20 AM, Pete Muir pm...@redhat.com wrote: Hi, I am looking to round out the EG membership of JSR-347 so that we can get going with discussions. It would be great if someone from the HBase community could join to represent the

Re: client-side caching

2011-07-05 Thread Ryan Rawson
Caching sounds easy until you need to worry about invalidation. It's hard to build efficient and correct invalidation. On Jul 5, 2011 2:13 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: I've seen that. But that's about caching on regionserver-side through memcache. You still have the

Re: Random Reads throughput/performance

2011-06-24 Thread Ryan Rawson
If you are defeating caching you will want to patch in HDFS-347. Good luck! On Fri, Jun 24, 2011 at 3:25 PM, Sateesh Lakkarsu lakka...@gmail.com wrote: block cache was at default 0.2%, the id's being looked up don't repeat and each one has a lot of versions, so not expecting cache hits - also

Re: Is there a reason mapreduce.TableOutputFormat doesn't support Increment?

2011-06-17 Thread Ryan Rawson
Watch out - increment is not idempotent, so you will have to somehow ensure that a map runs exactly 1x and never more or less than that. Also job failures will ruin the data as well. -ryan On Fri, Jun 17, 2011 at 1:57 PM, Stack st...@duboce.net wrote: Go for it! St.Ack On Fri, Jun 17, 2011

Re: feature request (count)

2011-06-03 Thread Ryan Rawson
This is a commonly requested feature, and it remains unimplemented because it is actually quite hard. Each HFile knows how many KV entries there are in it, but this does not map in a general way to the number of rows, or the number of rows with a specific column. Keeping track of the row count as

Re: feature request (count)

2011-06-03 Thread Ryan Rawson
that require counts. -Jack On Fri, Jun 3, 2011 at 3:24 PM, Ryan Rawson ryano...@gmail.com wrote: This is a commonly requested feature, and it remains unimplemented because it is actually quite hard.  Each HFile knows how many KV entries there are in it, but this does not map in a general way

Re: Starting Hadoop/HBase cluster on Rackspace

2011-05-31 Thread Ryan Rawson
Rackspace doesn't have an API, so no. This is one of the primary disadvantages of rackspace, its all hands on/manual. Just boot up your instances and use the standard management tools. On Tue, May 31, 2011 at 10:23 AM, Something Something mailinglist...@gmail.com wrote: Hello, Are there

Re: Very slow Scan performance using Filters

2011-05-12 Thread Ryan Rawson
Gets' from the UI instead of Scans'. Thanks Himanish On Thu, May 12, 2011 at 2:21 AM, Ryan Rawson ryano...@gmail.com wrote: Scans are in serial. To use DB parlance, consider a Scan + filter the moral equivalent of a SELECT * FROM WHERE col='val' with no index, and a full table scan

Re: Connecting JPA with HBase

2011-04-14 Thread Ryan Rawson
sorry, this doesnt look like an actual HBase issue. You should also be using 0.90.2 -ryan On Wed, Apr 13, 2011 at 11:11 PM, James Ram hbas...@gmail.com wrote: Hi, I decided to go ahead with the JPA - HBase route. I tried to install the hbase jar using maven but it is throwing the following

Re: Row key stored many times?

2011-04-14 Thread Ryan Rawson
Yes, the row key is stored with every column. Avoid ridiculously long row keys :-) Use compression. On Thu, Apr 14, 2011 at 1:54 PM, Yves Langisch y...@langisch.ch wrote: Hi, On the opentsdb website [1] you can read the following: --- The problem with HBase's implementation is that every

Re: Row key stored many times?

2011-04-14 Thread Ryan Rawson
Good question, I'd try to keep most row keys 30 bytes, and definitely avoid 1000 bytes. On Thu, Apr 14, 2011 at 2:22 PM, David Schnepper dave...@yahoo-inc.com wrote: On 14/Apr/2011 13:55, Ryan Rawson wrote: Yes, the row key is stored with every column. Avoid ridiculously long row keys

Re: HBase is not ready for Primetime

2011-04-13 Thread Ryan Rawson
To bring it back to the original point and a high level view, the fact is that HBase is not Oracle, nor MySQL. It doesnt have multiple decades, and futhermore distributed systems are inherently more difficult (more failure cases) than single node DBs. Having said that, the grass is certainly not

Re: cpu profiling

2011-04-10 Thread Ryan Rawson
I enjoy yourkit, it's pretty tight, and easier to set up (imho) than jprofiler. you can of course do http://poormansprofiler.org/ -ryan On Sun, Apr 10, 2011 at 9:08 PM, Jack Levin magn...@gmail.com wrote: Hi all, what is the best way to profile CPU on Region Server JVM? Does anyone have any

Re: Is there a setting to cap row size?

2011-04-07 Thread Ryan Rawson
Sounds like you are having a HDFS related problem. Check those datanode logs for errors. As for a setting for max row size, this might not be so easy to do, since during the Put time we don't actually know anything about the existing row data. To find that out we'd have to go and read the row

Re: File formats in Hadoop

2011-03-22 Thread Ryan Rawson
Curious, why do you mention SequenceFile and TFile. Neither of those are either in the hbase.io, and TFile is not used anywhere in HBase. -ryan On Sat, Mar 19, 2011 at 9:01 AM, Weishung Chung weish...@gmail.com wrote: I am browsing through the hadoop.io package and was wondering what other

Re: problem bringing Hbase back up after power outage and removal of nodes

2011-03-17 Thread Ryan Rawson
If you are in safe mode it's because not all datanodes have reported in. So actually NO your hadoop did NOT come up properly. Check your nn pages, look for any missing nodes. It won't help you any more than telling you what is online or not. Good luck! -ryan On Thu, Mar 17, 2011 at 11:12 AM,

Re: Error after upgrading to 0.90.1, java.net.URISyntaxException

2011-03-17 Thread Ryan Rawson
If you know you had a clean shutdown just nuke all directories in /hbase/.logs we hit this @ SU as well, its older logfile formats messing us up. remember, only if you had a CLEAN shutdown, or else you lose data On Thu, Mar 17, 2011 at 4:20 PM, Chris Tarnas c...@email.com wrote: I just

Re: which hadoop and zookeeper version should I use with hbase 0.90.1

2011-03-16 Thread Ryan Rawson
Thats the correct branch, so you should be good! On Wed, Mar 16, 2011 at 1:17 PM, Oleg Ruchovets oruchov...@gmail.com wrote: I get the src from here. http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append/ On Wed, Mar 16, 2011 at 7:40 PM, Stack st...@duboce.net wrote:

Re: Data is always written to one node

2011-03-14 Thread Ryan Rawson
What version of HBase are you testing? Is it literally 0 vs N assignments? On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong xion...@gmail.com wrote: Thanks! I checked the master log and found some info like this: timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster: balance hri=***,

Re: Data is always written to one node

2011-03-14 Thread Ryan Rawson
data rebalancing? I guess HBase should also support data rebalancing otherwise every time I restart HBase the regions will have to be rebalanced again. Will someone tell me how to configure or program HBase to do data rebalancing? Thanks, -- Weiwei On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson

Re: Data is always written to one node

2011-03-14 Thread Ryan Rawson
this? If it's automatic, how frequently is it performed? I am running 1 replication. Thanks, -- Weiwei On Mon, Mar 14, 2011 at 3:18 PM, Ryan Rawson ryano...@gmail.com wrote: HDFS does the data rebalancing, over time as major compactions and new data comes in, files are written first

Re: major hdfs issues

2011-03-10 Thread Ryan Rawson
Looks like a datanode went down. InterruptedException is how java uses to interrupt IO in threads, its similar to the EINTR errno. That means the actual source of the abort is higher up... So back to how InterruptedException works... at some point a thread in the JVM decides that the VM should

Re: Get Question

2011-03-10 Thread Ryan Rawson
Depends on how well cached you are. Remember, random gets require disk seeks. 239 gets/sec is 239 * 1-3 seeks/sec (1-3 store files per get appx). So that seems reasonable yes, sorry. -ryan On Thu, Mar 10, 2011 at 3:55 PM, Peter Haidinyak phaidin...@local.com wrote: For the first time I am

Re: creating multiple columns inside a column

2011-03-09 Thread Ryan Rawson
Better not to use 100 column families...perf might be strange and not optimal. Also you can encode complex data structures inside a column, using for example json, thrift, etc. sup yes basically. But hbase won't help you much there. On Mar 9, 2011 10:06 PM, James Ram hbas...@gmail.com wrote: Hi,

Re: Blob storage

2011-03-08 Thread Ryan Rawson
Probably the soft limit flushes, eh? On Mar 8, 2011 11:15 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Tue, Mar 8, 2011 at 11:04 AM, Chris Tarnas c...@email.com wrote: Just as a point of reference, in one of our systems we have 500+million rows that have a cell in its own column family

Ryan Rawson wants to chat

2011-03-08 Thread Ryan Rawson
--- Ryan Rawson wants to stay in better touch using some of Google's coolest new products. If you already have Gmail or Google Talk, visit: http://mail.google.com/mail/b-38842cc238-75b5c419e8-B2MruStMZTyLyPWgVT55H4l3Mfs You'll

Re: Ryan Rawson wants to chat

2011-03-08 Thread Ryan Rawson
Ignore this all, gtalk/gmail has taken a turn for the dumb. -ryan On Tue, Mar 8, 2011 at 12:35 PM, Ryan Rawson ryano...@gmail.com wrote: --- Ryan Rawson wants to stay in better touch using some of Google's coolest new

Re: HFile output - hfile.AVG_KEY_LEN always zero?

2011-03-08 Thread Ryan Rawson
Ascii table tells me bang = 33 http://www.asciitable.com/ so the average key len is 33. :-) -ryan On Tue, Mar 8, 2011 at 1:00 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, I thought I could use the handy HFile from command line to give me some stats about a given region or

Re: filters performance

2011-03-08 Thread Ryan Rawson
Filters only reduce the amount of data returned to the client over the wire, it does NOT reduce how much data we must read from disk. But the former savings can be substantial depending on the amount pruning out. On Tue, Mar 8, 2011 at 5:38 PM, large data lrgd...@gmail.com wrote: If I use

Re: will HBase detect NN failure?

2011-03-07 Thread Ryan Rawson
There are a series of patches that address this, check the recent commit history of append branch. On Mar 7, 2011 1:52 AM, Gokulakannan M gok...@huawei.com wrote: Hi All, In HBase 0.90 I have seen that it has a fault tolerant behavior of triggering lease recovery and closing the file when

Re: LZO Compression

2011-03-07 Thread Ryan Rawson
Just copy in the hadoop-gpl-compression*.jar and the native/Linux*/libgplcompression* to every node and you can use 'LZO' compression type in your tables without doing anything else. -ryan On Mon, Mar 7, 2011 at 3:18 PM, Peter Haidinyak phaidin...@local.com wrote: Hi,        When you are

Re: 0.90.1 hbase-default.xml

2011-03-07 Thread Ryan Rawson
This message is reliable, you should worry. The docs talk about this: http://hbase.apache.org/book/notsoquick.html Basically you need to do exactly what that message says. At SU we personally run CDH3b2. I know CDH3 is at a higher beta now, you can give CDH3b4 a shot, it also contains the same

Re: LZO Compression

2011-03-07 Thread Ryan Rawson
been told that I should build the jars and native libraries and then deploy these to the cluster. -Pete -Original Message- From: Ryan Rawson [mailto:ryano...@gmail.com] Sent: Monday, March 07, 2011 3:23 PM To: user@hbase.apache.org Cc: Peter Haidinyak Subject: Re: LZO Compression

Re: stop-hbase.sh bug or feature?

2011-03-03 Thread Ryan Rawson
with multiple masters, shutting down a master should NOT cause a cluster death! I ran in to this once, sucked. I have previously commented, I thought we had removed the 'master exit = cluster death' but I'm not sure. -ryan On Thu, Mar 3, 2011 at 4:14 PM, Ted Dunning tdunn...@maprtech.com

Re: stop-hbase.sh bug or feature?

2011-03-02 Thread Ryan Rawson
Mis feature, basically a master will tell the regionservers to 'shutdown and flush gracefully' via RPC. Since we don't ship with any cluster management tools - to make your life easier we have a 'master tells RS to shutdown' path. I wouldn't be against removing it and relying on regular process

Re: HBase Prompt missing

2011-03-02 Thread Ryan Rawson
Also there will not be a 0.20.7, so you'll never get bug fixes. 0.90.1 is the way to go. On Mar 2, 2011 9:40 PM, Ted Dunning tdunn...@maprtech.com wrote: 0.20.6 is stable, but I warrant that 0.90 is the better choice by a good margin. On Wed, Mar 2, 2011 at 9:36 PM, James Ram

Re: Increment fails on simple test

2011-02-25 Thread Ryan Rawson
Increment expects a long as per returned by Bytes.toBytes(long). ie: 8 bytes, big endian. you put '1' array length 1. when increment finds no value, it assumes '0'. if you want 0 based counting dont put an initial value. On Fri, Feb 25, 2011 at 12:23 AM, Sandesh Devaraju

Re: How does compression really work in hbase?

2011-02-24 Thread Ryan Rawson
HFile is our format, and compression is done at a block-by-block level, each block around 64k pre-compression. The file is pretty clear and easy to read, check it out! -ryan On Thu, Feb 24, 2011 at 10:13 PM, Hari Sreekumar hsreeku...@clickable.com wrote: Does it compress only the key and

Re: Trying to contact region Some region

2011-02-23 Thread Ryan Rawson
We fixed a lot of the exception handling in 0.90. The exception text is much better. Check it out! -ryan On Wed, Feb 23, 2011 at 11:18 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: It could be due to slow splits, heavy GC, etc. Make sure your machines don't swap at all, that HBase has

Re: table creation is failing now and then (CDH3b3)

2011-02-23 Thread Ryan Rawson
You should consider upgrading to hbase 0.90.1, a lot of these kinds of issues were fixed. -ryan On Wed, Feb 23, 2011 at 12:02 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Hi all, from time to time we come to a sitation where .META. table seems to be stuck in some corrupted state. In

Re: when does put return to the caller?

2011-02-23 Thread Ryan Rawson
There is a batch put call, should be trivial to use some kind of background thread to invoke callbacks when it returns. Check out the HTable API, javadoc, etc. All available via http://hbase.org ! -ryan On Wed, Feb 23, 2011 at 1:25 PM, Hiller, Dean (Contractor) dean.hil...@broadridge.com

Re: async table updates?

2011-02-23 Thread Ryan Rawson
In thrift there is a 'oneway' or 'async' or 'fire and forget' call type. I cant recommend those kinds of approaches, since once your system runs into problems you have no feedback. So if you are asking for a one shot, no reply assume it worked call, we don't have one (nor would I wish that hell

Re: I can't get many versions of the specified column,but only get the latest version of the specified column

2011-02-23 Thread Ryan Rawson
There are test cases for this, the functionality DOES work, something is up... Without full code and full descriptions of your tables, debugging is harder than it needs to be. It's probably a simple typo or something, check your code and table descriptions again. Many people rely on the multi

Re: Number of regions

2011-02-23 Thread Ryan Rawson
There have been threads about this lately, check out the search box on hbase.org which searches the list archives. On Feb 23, 2011 6:56 PM, Nanheng Wu nanhen...@gmail.com wrote: What are some of the trade-offs of using larger region files and less regions vs the other way round? Currently each

Re: I can't get many versions of the specified column,but only get the latest version of the specified column

2011-02-23 Thread Ryan Rawson
(?) codes when necessary. You might not want to specify the timestamp by yourself but want to let HBase to store appropriate ones. -- Tatsuya Kawano (Mr.) Tokyo, Japan On Feb 24, 2011, at 11:30 AM, Ryan Rawson ryano...@gmail.com wrote: There are test cases for this, the functionality

Re: I can't get many versions of the specified column,but only get the latest version of the specified column

2011-02-23 Thread Ryan Rawson
, final ListKeyValue list = r.list(); r is null ! 2011/2/24 Ryan Rawson ryano...@gmail.com Which line is line 89? Also it's preferable to do: assertEquals(3, versionMap.size()); vs: assertTrue(versionMap.size() == 3); since the error messages from the former are more descriptive expected 3

Re: Scanning over key values timestamp?

2011-02-18 Thread Ryan Rawson
There is minimal/no underlying efficiency. It's basically a full table/region scan with a filter to discard the uninteresting values. We have various timestamp filtering techniques to avoid reading from files, eg: if you specify a time range [100,200) and a hfile only contains [0,50) we'll not

Re: using VERSIONS vs unique row key

2011-02-16 Thread Ryan Rawson
Please check the archives, there have been some threads about this recently. On Feb 16, 2011 9:51 AM, Venkatesh vramanatha...@aol.com wrote: If I have to store multiple events (time-based) for multiple users, - either I could create a unique row key for every event (or) - use user id as the

Re: Major compactions and OS cache

2011-02-16 Thread Ryan Rawson
, 2011 at 11:48 AM, Ryan Rawson ryano...@gmail.com wrote: That would be cool, I think we should probably also push for HSDF-347 while we are at it as well. The situation for HDFS improvements has not been good, but might improve in the mid-future. Thanks for the pointer! -ryan On Wed, Feb 16

Re: Major compactions and OS cache

2011-02-16 Thread Ryan Rawson
is a huge clear win but still no plans to include it in any hadoop version. Why's that?  It seems to be fairly logical.  Does it affect the 'over-the-wire' protocol? On Wed, Feb 16, 2011 at 6:23 PM, Ryan Rawson ryano...@gmail.com wrote: There is a patch that causes us to evict the block

Re: Put errors via thrift

2011-02-15 Thread Ryan Rawson
If you were using 0.90, that unhelpful error message would be much more helpful! On Tue, Feb 15, 2011 at 9:56 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: Compactions are done in the background, they won't block writes. Regarding splitting time, it could be that it had to retry a bunch of

Re: Put errors via thrift

2011-02-15 Thread Ryan Rawson
: We are running cdh3b3 - so next week when they go to b4 we'll be up to 0.90 - I'm looking forward to it. -chris On Feb 15, 2011, at 11:05 AM, Ryan Rawson wrote: If you were using 0.90, that unhelpful error message would be much more helpful! On Tue, Feb 15, 2011 at 9:56 AM, Jean-Daniel

Re: Using the Hadoop bundled in the lib directory of HBase

2011-02-13 Thread Ryan Rawson
-append-r1056497.jar contains org/apache/hadoop/hdfs/server/datanode/BlockChannel.class but I am having trouble figuring out why.  From where in SVN does that come? Is it not in the append-20-branch ? Thanks, Mike Spreitzer From:   Ryan Rawson ryano...@gmail.com To:     user@hbase.apache.org

Re: Designing table with auto increment key

2011-02-13 Thread Ryan Rawson
you can also stripe, eg: c_1 starts at 1, skip=100 c_2 starts at 2, skip=100 c_$i starts at $i, skip=100 for 3..99 now you have 100x speed/parallelism. If single regionserver assignment becomes a problem, use multiple tables. On Sun, Feb 13, 2011 at 10:12 PM, Lars George lars.geo...@gmail.com

Re: Using the Hadoop bundled in the lib directory of HBase

2011-02-12 Thread Ryan Rawson
If you are taking the jar that we ship and slamming it in a hadoop 0.20.2 based distro that might work. I'm not sure if there are any differences than pure code (which would then be expressed in the jar only), so this approach might work. You could also check out to the revision that we built

Re: hbase version

2011-02-11 Thread Ryan Rawson
You should run hadoop-20-append or cdh3 and run hbase 0.90.1 which is set to be released next week. -ryan On Fri, Feb 11, 2011 at 8:12 AM, Joseph Coleman joe.cole...@infinitecampus.com wrote: Hello if I am going to run hadoop 0.20.2 what version should I use for Hbase that is compatable?

Re: LZO Compression

2011-02-11 Thread Ryan Rawson
the .so has to be the same machine arch as your java binary. meaning if you are using 64bit java your lib should also be 64 bit. -ryan On Fri, Feb 11, 2011 at 11:00 AM, Peter Haidinyak phaidin...@local.com wrote: HBase version: 0.89.20100924+28 Hadoop version: 0.20.2+737 Howdy,   My boss

Re: Need to have hbase-site.xml in hadoop conf dir?

2011-02-11 Thread Ryan Rawson
we include $HBASE_HOME/conf on the HADOOP_CLASSPATH in hadoop-env.sh. It goes like this: export HBASE_HOME=/home/hadoop/hbase JAR=`ls $HBASE_HOME/*.jar` export HBASE_JAR=$JAR # Extra Java CLASSPATH elements. Optional. export

Re: RE: getSplits question

2011-02-10 Thread Ryan Rawson
Message- From: Ryan Rawson [mailto:ryano...@gmail.com] Sent: Wednesday, February 09, 2011 11:43 PM To: user@hbase.apache.org Cc: hbase-u...@hadoop.apache.org Subject: Re: getSplits question You shouldn't need to write your own getSplits() method to run a map reduce, I never did

Re: question about org.apache.hadoop.hbase.util.Merge

2011-02-10 Thread Ryan Rawson
Since the Merge tool works on an offline cluster, it goes straight to the META HFiles, thus cannot be run in parallel. It shouldn't be too hard to hack up Merge to work on an online cluster, offline table. On Thu, Feb 10, 2011 at 10:09 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: I think

Re: Using the Hadoop bundled in the lib directory of HBase

2011-02-10 Thread Ryan Rawson
Hey guys, If you are running on hadoop 0.20.2 you are going to lose data when you crash. So don't do it :-) You will need to either use a cdh3 beta (we use b2), or build the hadoop-20-append branch. We have built the hadoop-20-append tip and included the JAR with the default distribution. It

Re: Parent/child relation - go vertical, horizontal, or many tables?

2011-02-10 Thread Ryan Rawson
You want to choose the schema that minimizes the # of RPCs you are doing. -ryan On Thu, Feb 10, 2011 at 4:55 PM, Jason urg...@gmail.com wrote: Hi all, Let's say I have two entities Parent and Child. There could be many children in one parent (from hundreds to tens of millions) A child can

Re: working jgit+hbase and reasonable test result

2011-02-09 Thread Ryan Rawson
Well done! Perhaps you can sell this to Google and they can finally kill the svn googlecode feature! Or maybe hit up github :-) -ryan On Wed, Feb 9, 2011 at 11:06 AM, Andrew Purtell apurt...@apache.org wrote: See https://github.com/trendmicro/jgit-hbase Use branch 'jgit.storage.hbase.v4'

Re: getSplits question

2011-02-09 Thread Ryan Rawson
You shouldn't need to write your own getSplits() method to run a map reduce, I never did at least... -ryan On Wed, Feb 9, 2011 at 11:36 PM, Geoff Hendrey ghend...@decarta.com wrote: Are endrows inclusive or exclusive? The docs say exclusive, but then the question arises as to how to form the

Re: getSplits question

2011-02-09 Thread Ryan Rawson
) and the startrow and endrow, then I thought I had to write my own getSplits(). Is there another way to accomplish this, because I do need the combination of controlled splitsize and start/endrow. -geoff -Original Message- From: Ryan Rawson [mailto:ryano...@gmail.com] Sent: Wednesday

Re: Amazon EC2

2011-02-07 Thread Ryan Rawson
There are other virtualizing environments that offer better perf/$, such as softlayer, rackspace cloud, and more. EC2 is popular... and hence oversubscribed. People complain about IO perf, and while it's not as bad as some people claim, you have to be aware that EC2 isnt some magical land where

Re: Region Servers Crashing during Random Reads

2011-02-04 Thread Ryan Rawson
Under our load at su, the new gen would grow to max size and take 800+ ms. I would consider setting the ms goal to 20-40ms (what we get in prod now). At 1gb par new i would expect large pauses. Plus in my previous tests the promotion was like 75% even with a huge par new. This is all based on my

Re: is there any tool that facilitate the import of data to hbase

2011-02-03 Thread Ryan Rawson
ImportTSV? http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/ImportTsv.html Also writing a job to read from JDBC and write to hbase isnt too bad if your schema isnt too insanely complex. -ryan On Thu, Feb 3, 2011 at 1:23 PM, Buttler, David buttl...@llnl.gov wrote: Sqoop?

Re: HBase as a backend for GUI app?

2011-02-03 Thread Ryan Rawson
I think the answer is 'it depends'. What exactly is a GUI app anyways these days? The wording is a little vague to me, does that include things like amazon.com and google reader? Or is it limited to things like Firefox, and desktop applications? I think ultimately the only thing that is a must

Re: HBase as a backend for GUI app?

2011-02-03 Thread Ryan Rawson
? Thanks again for your time. On Thu, Feb 3, 2011 at 2:53 PM, Ryan Rawson ryano...@gmail.com wrote: I think the answer is 'it depends'. What exactly is a GUI app anyways these days? The wording is a little vague to me, does that include things like amazon.com and google reader

Re: Why Random Reads are much slower than the Writes

2011-02-03 Thread Ryan Rawson
Sequential writes vs random reads on disk are always faster. You want caching. Lots of it :) On Feb 3, 2011 10:24 PM, charan kumar charan.ku...@gmail.com wrote: Hello, I am using Hbase 0.90.0 with hadoop-append. on a 30 m/c cluster (1950, 2 CPU, 6 G). Writes peak at 5000 per second. But

Re: Tables rows disappear

2011-02-02 Thread Ryan Rawson
I'm guessing that you arent having as clean as a shutdown as you might think if you are seeing tables dissapear. Here is a quick way to tell, if you think table 'x' should exist, but it doesnt seem to, do this: bin/hadoop fs -ls /hbase/x if that directory exists, I think you might be running

Re: Accessing column information via Thrift/PHP/Scanner

2011-02-01 Thread Ryan Rawson
when you scan using the shell what do you see? Note that qualifier names are just byte[] and thus caps sensitive. -ryan On Tue, Feb 1, 2011 at 6:38 AM, Stuart Scott stuart.sc...@e-mis.com wrote: Hi, Wonder if anyone could offer any advice please? I've been working on this for a few hours

Re: Hadoop setup question.

2011-02-01 Thread Ryan Rawson
We have dell 1950s, I didn't do the setup, but from what I recall... basically you have no choice but to use the raid controller. Think of it as a super advanced SATA controller instead. But the Dell 1950 raid card did NOT support jbod from what I recalled. You can raid0 it (Stripe only), and

Re: How does checkAndPut guarantee atomicity?

2011-01-31 Thread Ryan Rawson
Hi, Good catch, while the API does let you specify 2 different row keys, one in the 'put' and one in the call, doing so would be ... not advised. Right now there is no check for this, and if you were to pass 2 different rows, things would not be so good. Here is an issue:

Re: Open Scanner Latency

2011-01-31 Thread Ryan Rawson
Hey, The region location cache is held by a soft reference, so as long as you dont have memory pressure, it will never get invalidated just because of time. Another thing to consider, in HBase, the open scanner code also seeks and reads the first block of the scan. This may incur a read to disk

Re: Open Scanner Latency

2011-01-31 Thread Ryan Rawson
between read #1 and read #2 that I can only explain by region location search. Our writes our so heavy I assume this region location information flushed always in 30-60 minutes. On Mon, Jan 31, 2011 at 4:44 PM, Ryan Rawson ryano...@gmail.com wrote: Hey, The region location cache is held by a soft

Re: Open Scanner Latency

2011-01-31 Thread Ryan Rawson
on all tables. On Mon, Jan 31, 2011 at 4:54 PM, Ryan Rawson ryano...@gmail.com wrote: The Regionserver caches blocks, so a second read would benefit from the caching of the first read.  Over time blocks get evicted in a LRU manner, and things would get slow again. Does this make sense to you

  1   2   3   >