even present a few of
your own!
-ryan
On Mon, Nov 10, 2014 at 2:58 PM, Ryan Rawson ryano...@gmail.com wrote:
Hi all,
The next HBase user group meeting is on November the 20th. We need a
few more presenters still!
Please send me your proposals - summary and outline of your talk!
Thanks
Hi all,
The next HBase user group meeting is on November the 20th. We need a
few more presenters still!
Please send me your proposals - summary and outline of your talk!
Thanks!
-ryan
I guess my thought is that it'd be nice to minimize dependency on ZK,
and eventually remove it all together. It just adds too much
deployment complexity, and code complexity -- about 1 lines of
code.
I do like the notion of HBase self-hosting it's own performance data,
it's what Oracle and
Hi all,
Something has changed in how OSX and java handles IPv6, and now you
will get a log like:
2012-07-31 18:21:39,824 INFO org.apache.hadoop.hbase.master.HMaster:
Server active/primary master; 0:0:0:0:0:0:0:0%0,
59736,1343784093521, sessionid=0x138dfc60416, cluster-up flag was=false
I shall try that. I submitted a patch too that quashes the extra %
where it is causing problems.
On Tue, Jul 31, 2012 at 6:28 PM, Andrew Purtell apurt...@apache.org wrote:
-Djava.net.preferIPv4Stack=true ?
Does that still work?
On Tue, Jul 31, 2012 at 6:24 PM, Ryan Rawson ryano
Are you sure the job is running on the cluster and not running in single
node mode? This happens a lot...
On Oct 9, 2011 7:50 AM, Rita rmorgan...@gmail.com wrote:
Hi,
I have been doing a rowcount via mapreduce and its taking about 4-5 hours
to
count a 500million rows in a table. I was
Did you guys run HBase with accord and see improved performance?
What other hooks can you tell us that would be worth the immense
task of learning the ins and outs of a new distributed system?
Performance is great, but you can hack around that, and HBase is not a
heavy user of ZK.
-ryan
On Fri,
I saw the acunu guy at oscon data, and from what I could tell is they
completely rewrote Cassandra to get out of java land...
On Sep 5, 2011 1:50 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
wrote:
Hello,
Has anyone done any work towards making HBase work with Castle? (looks
like negative:
The Hdfs write pipeline is synchronous, so there is no window.
On Aug 30, 2011 4:35 AM, Sam Seigal selek...@yahoo.com wrote:
A question inline:
On Tue, Aug 30, 2011 at 2:47 AM, Andrew Purtell apurt...@apache.org
wrote:
Hi Chris,
Appreciate your answer on the post.
Personally speaking
I really like the theory of operation stuff. People say that
centralized operation is a flaw, but I say it's a strength. In a
single datacenter, you have extremely fast .1ms ping or less, there is
no need for a fully decentralized architecture - it can be really hard
to debug.
-ryan
On Tue, Aug
While data is not fsynced to disk immediately, it is acked by 3
different nodes (Assuming r=3) before HBase acks the client.
-ryan
On Tue, Aug 30, 2011 at 1:04 PM, Joseph Boyd
joseph.b...@cbsinteractive.com wrote:
On Tue, Aug 30, 2011 at 12:22 PM, Sam Seigal selek...@yahoo.com wrote:
Will the
On Tue, Aug 30, 2011 at 10:42 AM, Joe Pallas joseph.pal...@oracle.com wrote:
On Aug 30, 2011, at 2:47 AM, Andrew Purtell wrote:
Better to focus on improving HBase than play whack a mole.
Absolutely. So let's talk about improving HBase. I'm speaking here as
someone who has been learning
There are a few problems for Avatar node which would prevent me from
ever using it:
- assumption of highly available NFS, this would typically mean
specialized hardware
- failover time is potentially lengthy (article says 60 seconds), and
HBase regionservers might fail
It's an interesting hack,
I think my assessment would be that everyone has their pre chosen toolset
and goes with it. You can make any of them work (with enough effort).
Personally, we are using chef. They are building service orchestration,
which few toolsets support.
On Aug 17, 2011 1:42 PM, Alex Holmes
Why not just read the source code? It isnt that many LOC, and it
doesnt really use anything that obscures the call chain, few
interfaces, etc. A solid IDE with code inspection will make short
work of it, just go at it!
Start at HRegionServer - it has the top level RPC calls that are
made.
Mongodb does an excellent job at single node scalability - they use
mmap and many smart things and really kick ass ... ON A SINGLE NODE.
That single node must have raid (raid it going out of fashion btw),
and you wont be able to scale without resorting to:
- replication (complex setup!)
-
The IO fencing was an accidental byproduct of how HDFS-200 was
implemented, so in fact, HBase won't run correctly on HDFS-265 which
does NOT have that IO fencing, right?
On Fri, Aug 5, 2011 at 9:42 AM, Jean-Daniel Cryans jdcry...@apache.org wrote:
On Fri, Aug 5, 2011 at 8:52 AM, M. C. Srivas
Another possibility is the logs were not replayed correctly during the
region startup. We put in a lot of tests to cover this case, so it
should not be so.
Essentially the WAL replay looks at the current HFiles state, then
decides which log entries to replay or skip. This is because a log
might
?
The tables are very small and inactive (probably only 50-100 rows changing
per day).
Thanks,
Jacques
On Thu, Aug 4, 2011 at 9:09 AM, Ryan Rawson ryano...@gmail.com wrote:
Another possibility is the logs were not replayed correctly during the
region startup. We put in a lot of tests
Yes, that is what JD is referring to, the so-called IO fence.
It works like so:
- regionserver is appending to an HLog, continues to do so, hasnt
gotten the ZK kill yourself signal yet
- hmaster splits the logs
- the hmaster yanks the writer from under the regionserver, and the RS
then starts to
What is the level of concurrency? I find that HDFS gets worse
performing with more concurrent read threads.
-ryan
2011/7/30 seven garfee garfee.se...@gmail.com:
hi,all
I set up a cluster on 4 machine.(1 HMaster,4RegionServer).
Each Machine has a 16G mem, one 2T Sata disk,CentOS 5.3,XFS .
But surely for logical consistency, we should not favor one vendor (as
we have been for a year now), over another. So would it be correct to
continue to suggest to users they use CDH? After all, even though it
is ASF2.0 and free, it is still giving one vendor a leg up over others
(including
How do you intend on addressing gc scalability?
On Jul 13, 2011 9:20 AM, Pete Muir pm...@redhat.com wrote:
Hi,
I am looking to round out the EG membership of JSR-347 so that we can
get going with discussions. It would be great if someone from the HBase
community could join to represent the
Caching sounds easy until you need to worry about invalidation. It's hard to
build efficient and correct invalidation.
On Jul 5, 2011 2:13 AM, Claudio Martella claudio.marte...@tis.bz.it
wrote:
I've seen that. But that's about caching on regionserver-side through
memcache.
You still have the
If you are defeating caching you will want to patch in HDFS-347.
Good luck!
On Fri, Jun 24, 2011 at 3:25 PM, Sateesh Lakkarsu lakka...@gmail.com wrote:
block cache was at default 0.2%, the id's being looked up don't repeat and
each one has a lot of versions, so not expecting cache hits - also
Watch out - increment is not idempotent, so you will have to somehow
ensure that a map runs exactly 1x and never more or less than that.
Also job failures will ruin the data as well.
-ryan
On Fri, Jun 17, 2011 at 1:57 PM, Stack st...@duboce.net wrote:
Go for it!
St.Ack
On Fri, Jun 17, 2011
This is a commonly requested feature, and it remains unimplemented
because it is actually quite hard. Each HFile knows how many KV
entries there are in it, but this does not map in a general way to the
number of rows, or the number of rows with a specific column. Keeping
track of the row count as
that require counts.
-Jack
On Fri, Jun 3, 2011 at 3:24 PM, Ryan Rawson ryano...@gmail.com wrote:
This is a commonly requested feature, and it remains unimplemented
because it is actually quite hard. Each HFile knows how many KV
entries there are in it, but this does not map in a general way
Rackspace doesn't have an API, so no. This is one of the primary
disadvantages of rackspace, its all hands on/manual.
Just boot up your instances and use the standard management tools.
On Tue, May 31, 2011 at 10:23 AM, Something Something
mailinglist...@gmail.com wrote:
Hello,
Are there
Gets' from the UI instead of Scans'.
Thanks
Himanish
On Thu, May 12, 2011 at 2:21 AM, Ryan Rawson ryano...@gmail.com wrote:
Scans are in serial.
To use DB parlance, consider a Scan + filter the moral equivalent of a
SELECT * FROM WHERE col='val' with no index, and a full table
scan
sorry, this doesnt look like an actual HBase issue. You should also
be using 0.90.2
-ryan
On Wed, Apr 13, 2011 at 11:11 PM, James Ram hbas...@gmail.com wrote:
Hi,
I decided to go ahead with the JPA - HBase route. I tried to install the
hbase jar using maven but it is throwing the following
Yes, the row key is stored with every column.
Avoid ridiculously long row keys :-) Use compression.
On Thu, Apr 14, 2011 at 1:54 PM, Yves Langisch y...@langisch.ch wrote:
Hi,
On the opentsdb website [1] you can read the following:
---
The problem with HBase's implementation is that every
Good question, I'd try to keep most row keys 30 bytes, and
definitely avoid 1000 bytes.
On Thu, Apr 14, 2011 at 2:22 PM, David Schnepper dave...@yahoo-inc.com wrote:
On 14/Apr/2011 13:55, Ryan Rawson wrote:
Yes, the row key is stored with every column.
Avoid ridiculously long row keys
To bring it back to the original point and a high level view, the fact
is that HBase is not Oracle, nor MySQL. It doesnt have multiple
decades, and futhermore distributed systems are inherently more
difficult (more failure cases) than single node DBs. Having said
that, the grass is certainly not
I enjoy yourkit, it's pretty tight, and easier to set up (imho) than jprofiler.
you can of course do http://poormansprofiler.org/
-ryan
On Sun, Apr 10, 2011 at 9:08 PM, Jack Levin magn...@gmail.com wrote:
Hi all, what is the best way to profile CPU on Region Server JVM?
Does anyone have any
Sounds like you are having a HDFS related problem. Check those
datanode logs for errors.
As for a setting for max row size, this might not be so easy to do,
since during the Put time we don't actually know anything about the
existing row data. To find that out we'd have to go and read the row
Curious, why do you mention SequenceFile and TFile. Neither of
those are either in the hbase.io, and TFile is not used anywhere in
HBase.
-ryan
On Sat, Mar 19, 2011 at 9:01 AM, Weishung Chung weish...@gmail.com wrote:
I am browsing through the hadoop.io package and was wondering what other
If you are in safe mode it's because not all datanodes have reported
in. So actually NO your hadoop did NOT come up properly.
Check your nn pages, look for any missing nodes. It won't help you
any more than telling you what is online or not.
Good luck!
-ryan
On Thu, Mar 17, 2011 at 11:12 AM,
If you know you had a clean shutdown just nuke all directories in /hbase/.logs
we hit this @ SU as well, its older logfile formats messing us up.
remember, only if you had a CLEAN shutdown, or else you lose data
On Thu, Mar 17, 2011 at 4:20 PM, Chris Tarnas c...@email.com wrote:
I just
Thats the correct branch, so you should be good!
On Wed, Mar 16, 2011 at 1:17 PM, Oleg Ruchovets oruchov...@gmail.com wrote:
I get the src from here.
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append/
On Wed, Mar 16, 2011 at 7:40 PM, Stack st...@duboce.net wrote:
What version of HBase are you testing?
Is it literally 0 vs N assignments?
On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong xion...@gmail.com wrote:
Thanks!
I checked the master log and found some info like this:
timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster: balance
hri=***,
data rebalancing? I guess HBase should also
support data rebalancing otherwise every time I restart HBase the regions
will have to be rebalanced again. Will someone tell me how to configure or
program HBase to do data rebalancing?
Thanks,
-- Weiwei
On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson
this?
If it's automatic, how frequently is it performed?
I am running 1 replication.
Thanks,
-- Weiwei
On Mon, Mar 14, 2011 at 3:18 PM, Ryan Rawson ryano...@gmail.com wrote:
HDFS does the data rebalancing, over time as major compactions and new
data comes in, files are written first
Looks like a datanode went down. InterruptedException is how java
uses to interrupt IO in threads, its similar to the EINTR errno. That
means the actual source of the abort is higher up...
So back to how InterruptedException works... at some point a thread in
the JVM decides that the VM should
Depends on how well cached you are.
Remember, random gets require disk seeks. 239 gets/sec is 239 * 1-3
seeks/sec (1-3 store files per get appx). So that seems reasonable
yes, sorry.
-ryan
On Thu, Mar 10, 2011 at 3:55 PM, Peter Haidinyak phaidin...@local.com wrote:
For the first time I am
Better not to use 100 column families...perf might be strange and not
optimal. Also you can encode complex data structures inside a column, using
for example json, thrift, etc. sup yes basically. But hbase won't help you
much there.
On Mar 9, 2011 10:06 PM, James Ram hbas...@gmail.com wrote:
Hi,
Probably the soft limit flushes, eh?
On Mar 8, 2011 11:15 AM, Jean-Daniel Cryans jdcry...@apache.org wrote:
On Tue, Mar 8, 2011 at 11:04 AM, Chris Tarnas c...@email.com wrote:
Just as a point of reference, in one of our systems we have 500+million
rows that have a cell in its own column family
---
Ryan Rawson wants to stay in better touch using some of Google's coolest new
products.
If you already have Gmail or Google Talk, visit:
http://mail.google.com/mail/b-38842cc238-75b5c419e8-B2MruStMZTyLyPWgVT55H4l3Mfs
You'll
Ignore this all, gtalk/gmail has taken a turn for the dumb.
-ryan
On Tue, Mar 8, 2011 at 12:35 PM, Ryan Rawson ryano...@gmail.com wrote:
---
Ryan Rawson wants to stay in better touch using some of Google's coolest new
Ascii table tells me bang = 33
http://www.asciitable.com/
so the average key len is 33.
:-)
-ryan
On Tue, Mar 8, 2011 at 1:00 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
Hi,
I thought I could use the handy HFile from command line to give me some stats
about a given region or
Filters only reduce the amount of data returned to the client over the
wire, it does NOT reduce how much data we must read from disk. But
the former savings can be substantial depending on the amount pruning
out.
On Tue, Mar 8, 2011 at 5:38 PM, large data lrgd...@gmail.com wrote:
If I use
There are a series of patches that address this, check the recent commit
history of append branch.
On Mar 7, 2011 1:52 AM, Gokulakannan M gok...@huawei.com wrote:
Hi All,
In HBase 0.90 I have seen that it has a fault tolerant behavior
of triggering lease recovery and closing the file when
Just copy in the hadoop-gpl-compression*.jar and the
native/Linux*/libgplcompression* to every node and you can use 'LZO'
compression type in your tables without doing anything else.
-ryan
On Mon, Mar 7, 2011 at 3:18 PM, Peter Haidinyak phaidin...@local.com wrote:
Hi,
When you are
This message is reliable, you should worry. The docs talk about this:
http://hbase.apache.org/book/notsoquick.html
Basically you need to do exactly what that message says. At SU we
personally run CDH3b2. I know CDH3 is at a higher beta now, you can
give CDH3b4 a shot, it also contains the same
been told that I should build the jars and native libraries and then
deploy these to the cluster.
-Pete
-Original Message-
From: Ryan Rawson [mailto:ryano...@gmail.com]
Sent: Monday, March 07, 2011 3:23 PM
To: user@hbase.apache.org
Cc: Peter Haidinyak
Subject: Re: LZO Compression
with multiple masters, shutting down a master should NOT cause a
cluster death!
I ran in to this once, sucked.
I have previously commented, I thought we had removed the 'master exit
= cluster death' but I'm not sure.
-ryan
On Thu, Mar 3, 2011 at 4:14 PM, Ted Dunning tdunn...@maprtech.com
Mis feature, basically a master will tell the regionservers to
'shutdown and flush gracefully' via RPC.
Since we don't ship with any cluster management tools - to make your
life easier we have a 'master tells RS to shutdown' path. I wouldn't
be against removing it and relying on regular process
Also there will not be a 0.20.7, so you'll never get bug fixes.
0.90.1 is the way to go.
On Mar 2, 2011 9:40 PM, Ted Dunning tdunn...@maprtech.com wrote:
0.20.6 is stable, but I warrant that 0.90 is the better choice by a good
margin.
On Wed, Mar 2, 2011 at 9:36 PM, James Ram
Increment expects a long as per returned by Bytes.toBytes(long). ie:
8 bytes, big endian.
you put '1' array length 1.
when increment finds no value, it assumes '0'. if you want 0 based
counting dont put an initial value.
On Fri, Feb 25, 2011 at 12:23 AM, Sandesh Devaraju
HFile is our format, and compression is done at a block-by-block
level, each block around 64k pre-compression.
The file is pretty clear and easy to read, check it out!
-ryan
On Thu, Feb 24, 2011 at 10:13 PM, Hari Sreekumar
hsreeku...@clickable.com wrote:
Does it compress only the key and
We fixed a lot of the exception handling in 0.90. The exception text
is much better. Check it out!
-ryan
On Wed, Feb 23, 2011 at 11:18 AM, Jean-Daniel Cryans
jdcry...@apache.org wrote:
It could be due to slow splits, heavy GC, etc. Make sure your machines
don't swap at all, that HBase has
You should consider upgrading to hbase 0.90.1, a lot of these kinds of
issues were fixed.
-ryan
On Wed, Feb 23, 2011 at 12:02 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
Hi all,
from time to time we come to a sitation where .META. table seems to be stuck
in some corrupted state.
In
There is a batch put call, should be trivial to use some kind of
background thread to invoke callbacks when it returns.
Check out the HTable API, javadoc, etc. All available via http://hbase.org !
-ryan
On Wed, Feb 23, 2011 at 1:25 PM, Hiller, Dean (Contractor)
dean.hil...@broadridge.com
In thrift there is a 'oneway' or 'async' or 'fire and forget' call
type. I cant recommend those kinds of approaches, since once your
system runs into problems you have no feedback. So if you are asking
for a one shot, no reply assume it worked call, we don't have one
(nor would I wish that hell
There are test cases for this, the functionality DOES work, something is up...
Without full code and full descriptions of your tables, debugging is
harder than it needs to be. It's probably a simple typo or something,
check your code and table descriptions again. Many people rely on the
multi
There have been threads about this lately, check out the search box on
hbase.org which searches the list archives.
On Feb 23, 2011 6:56 PM, Nanheng Wu nanhen...@gmail.com wrote:
What are some of the trade-offs of using larger region files and less
regions vs the other way round? Currently each
(?) codes when necessary.
You might not want to specify the timestamp by yourself but want to let
HBase to store appropriate ones.
--
Tatsuya Kawano (Mr.)
Tokyo, Japan
On Feb 24, 2011, at 11:30 AM, Ryan Rawson ryano...@gmail.com wrote:
There are test cases for this, the functionality
,
final ListKeyValue list = r.list();
r is null !
2011/2/24 Ryan Rawson ryano...@gmail.com
Which line is line 89?
Also it's preferable to do:
assertEquals(3, versionMap.size());
vs:
assertTrue(versionMap.size() == 3);
since the error messages from the former are more descriptive
expected 3
There is minimal/no underlying efficiency. It's basically a full
table/region scan with a filter to discard the uninteresting values.
We have various timestamp filtering techniques to avoid reading from
files, eg: if you specify a time range [100,200) and a hfile only
contains [0,50) we'll not
Please check the archives, there have been some threads about this recently.
On Feb 16, 2011 9:51 AM, Venkatesh vramanatha...@aol.com wrote:
If I have to store multiple events (time-based) for multiple users,
- either I could create a unique row key for every event (or)
- use user id as the
, 2011 at 11:48 AM, Ryan Rawson ryano...@gmail.com wrote:
That would be cool, I think we should probably also push for HSDF-347
while we are at it as well. The situation for HDFS improvements has
not been good, but might improve in the mid-future.
Thanks for the pointer!
-ryan
On Wed, Feb 16
is a huge
clear win but still no plans to include it in any hadoop version.
Why's that? It seems to be fairly logical. Does it affect the
'over-the-wire' protocol?
On Wed, Feb 16, 2011 at 6:23 PM, Ryan Rawson ryano...@gmail.com wrote:
There is a patch that causes us to evict the block
If you were using 0.90, that unhelpful error message would be much more helpful!
On Tue, Feb 15, 2011 at 9:56 AM, Jean-Daniel Cryans jdcry...@apache.org wrote:
Compactions are done in the background, they won't block writes.
Regarding splitting time, it could be that it had to retry a bunch of
:
We are running cdh3b3 - so next week when they go to b4 we'll be up to 0.90 -
I'm looking forward to it.
-chris
On Feb 15, 2011, at 11:05 AM, Ryan Rawson wrote:
If you were using 0.90, that unhelpful error message would be much more
helpful!
On Tue, Feb 15, 2011 at 9:56 AM, Jean-Daniel
-append-r1056497.jar contains
org/apache/hadoop/hdfs/server/datanode/BlockChannel.class but I am having
trouble figuring out why. From where in SVN does that come?
Is it not in the append-20-branch ?
Thanks,
Mike Spreitzer
From: Ryan Rawson ryano...@gmail.com
To: user@hbase.apache.org
you can also stripe, eg:
c_1 starts at 1, skip=100
c_2 starts at 2, skip=100
c_$i starts at $i, skip=100 for 3..99
now you have 100x speed/parallelism. If single regionserver
assignment becomes a problem, use multiple tables.
On Sun, Feb 13, 2011 at 10:12 PM, Lars George lars.geo...@gmail.com
If you are taking the jar that we ship and slamming it in a hadoop
0.20.2 based distro that might work. I'm not sure if there are any
differences than pure code (which would then be expressed in the jar
only), so this approach might work.
You could also check out to the revision that we built
You should run hadoop-20-append or cdh3 and run hbase 0.90.1 which is
set to be released next week.
-ryan
On Fri, Feb 11, 2011 at 8:12 AM, Joseph Coleman
joe.cole...@infinitecampus.com wrote:
Hello if I am going to run hadoop 0.20.2 what version should I use for Hbase
that is compatable?
the .so has to be the same machine arch as your java binary. meaning
if you are using 64bit java your lib should also be 64 bit.
-ryan
On Fri, Feb 11, 2011 at 11:00 AM, Peter Haidinyak phaidin...@local.com wrote:
HBase version: 0.89.20100924+28
Hadoop version: 0.20.2+737
Howdy,
My boss
we include $HBASE_HOME/conf on the HADOOP_CLASSPATH in hadoop-env.sh.
It goes like this:
export HBASE_HOME=/home/hadoop/hbase
JAR=`ls $HBASE_HOME/*.jar`
export HBASE_JAR=$JAR
# Extra Java CLASSPATH elements. Optional.
export
Message-
From: Ryan Rawson [mailto:ryano...@gmail.com]
Sent: Wednesday, February 09, 2011 11:43 PM
To: user@hbase.apache.org
Cc: hbase-u...@hadoop.apache.org
Subject: Re: getSplits question
You shouldn't need to write your own getSplits() method to run a map
reduce, I never did
Since the Merge tool works on an offline cluster, it goes straight to
the META HFiles, thus cannot be run in parallel.
It shouldn't be too hard to hack up Merge to work on an online
cluster, offline table.
On Thu, Feb 10, 2011 at 10:09 AM, Jean-Daniel Cryans
jdcry...@apache.org wrote:
I think
Hey guys,
If you are running on hadoop 0.20.2 you are going to lose data when
you crash. So don't do it :-)
You will need to either use a cdh3 beta (we use b2), or build the
hadoop-20-append branch. We have built the hadoop-20-append tip and
included the JAR with the default distribution. It
You want to choose the schema that minimizes the # of RPCs you are doing.
-ryan
On Thu, Feb 10, 2011 at 4:55 PM, Jason urg...@gmail.com wrote:
Hi all,
Let's say I have two entities Parent and Child. There could be many children
in one parent (from hundreds to tens of millions)
A child can
Well done! Perhaps you can sell this to Google and they can finally
kill the svn googlecode feature!
Or maybe hit up github :-)
-ryan
On Wed, Feb 9, 2011 at 11:06 AM, Andrew Purtell apurt...@apache.org wrote:
See https://github.com/trendmicro/jgit-hbase
Use branch 'jgit.storage.hbase.v4'
You shouldn't need to write your own getSplits() method to run a map
reduce, I never did at least...
-ryan
On Wed, Feb 9, 2011 at 11:36 PM, Geoff Hendrey ghend...@decarta.com wrote:
Are endrows inclusive or exclusive? The docs say exclusive, but then the
question arises as to how to form the
) and the
startrow and endrow, then I thought I had to write my own getSplits(). Is
there another way to accomplish this, because I do need the combination of
controlled splitsize and start/endrow.
-geoff
-Original Message-
From: Ryan Rawson [mailto:ryano...@gmail.com]
Sent: Wednesday
There are other virtualizing environments that offer better perf/$,
such as softlayer, rackspace cloud, and more.
EC2 is popular... and hence oversubscribed. People complain about IO
perf, and while it's not as bad as some people claim, you have to be
aware that EC2 isnt some magical land where
Under our load at su, the new gen would grow to max size and take 800+ ms. I
would consider setting the ms goal to 20-40ms (what we get in prod now). At
1gb par new i would expect large pauses. Plus in my previous tests the
promotion was like 75% even with a huge par new.
This is all based on my
ImportTSV?
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/ImportTsv.html
Also writing a job to read from JDBC and write to hbase isnt too bad
if your schema isnt too insanely complex.
-ryan
On Thu, Feb 3, 2011 at 1:23 PM, Buttler, David buttl...@llnl.gov wrote:
Sqoop?
I think the answer is 'it depends'. What exactly is a GUI app
anyways these days? The wording is a little vague to me, does that
include things like amazon.com and google reader? Or is it limited to
things like Firefox, and desktop applications?
I think ultimately the only thing that is a must
?
Thanks again for your time.
On Thu, Feb 3, 2011 at 2:53 PM, Ryan Rawson ryano...@gmail.com wrote:
I think the answer is 'it depends'. What exactly is a GUI app
anyways these days? The wording is a little vague to me, does that
include things like amazon.com and google reader
Sequential writes vs random reads on disk are always faster. You want
caching. Lots of it :)
On Feb 3, 2011 10:24 PM, charan kumar charan.ku...@gmail.com wrote:
Hello,
I am using Hbase 0.90.0 with hadoop-append. on a 30 m/c cluster (1950, 2
CPU, 6 G).
Writes peak at 5000 per second. But
I'm guessing that you arent having as clean as a shutdown as you might
think if you are seeing tables dissapear. Here is a quick way to
tell, if you think table 'x' should exist, but it doesnt seem to, do
this:
bin/hadoop fs -ls /hbase/x
if that directory exists, I think you might be running
when you scan using the shell what do you see? Note that qualifier
names are just byte[] and thus caps sensitive.
-ryan
On Tue, Feb 1, 2011 at 6:38 AM, Stuart Scott stuart.sc...@e-mis.com wrote:
Hi,
Wonder if anyone could offer any advice please? I've been working on
this for a few hours
We have dell 1950s, I didn't do the setup, but from what I recall...
basically you have no choice but to use the raid controller. Think of
it as a super advanced SATA controller instead. But the Dell 1950
raid card did NOT support jbod from what I recalled. You can raid0 it
(Stripe only), and
Hi,
Good catch, while the API does let you specify 2 different row keys,
one in the 'put' and one in the call, doing so would be ... not
advised. Right now there is no check for this, and if you were to
pass 2 different rows, things would not be so good.
Here is an issue:
Hey,
The region location cache is held by a soft reference, so as long as
you dont have memory pressure, it will never get invalidated just
because of time.
Another thing to consider, in HBase, the open scanner code also seeks
and reads the first block of the scan. This may incur a read to disk
between read #1 and read #2 that I can only explain by
region location search. Our writes our so heavy I assume this region
location information flushed always in 30-60 minutes.
On Mon, Jan 31, 2011 at 4:44 PM, Ryan Rawson ryano...@gmail.com wrote:
Hey,
The region location cache is held by a soft
on all tables.
On Mon, Jan 31, 2011 at 4:54 PM, Ryan Rawson ryano...@gmail.com wrote:
The Regionserver caches blocks, so a second read would benefit from
the caching of the first read. Over time blocks get evicted in a LRU
manner, and things would get slow again.
Does this make sense to you
1 - 100 of 269 matches
Mail list logo