, its (supposed to be) production. Please
elaboration on your inferred sigh of dispair
On 12 June 2012 15:48, Michael Segel michael_se...@hotmail.com wrote:
Ok...
Please tell me that this isn't a production system.
Is this on EC2?
On Jun 12, 2012, at 6:55 AM, Simon Kelly wrote
It depends... There are some reasons to do this however in general, you don't
need to do this...
The course is wrong to suggest this as a best practice.
Sent from my iPhone
On Jun 5, 2012, at 5:00 PM, Atif Khan atif_ijaz_k...@hotmail.com wrote:
During a recent Cloudera course we were told
Just out of curiosity, describe the data?
Sorted?
The more we know, the easier it is to help... Also, can you recheck your math ?
Sent from my iPhone
On Jun 6, 2012, at 6:17 PM, NNever nnever...@gmail.com wrote:
It comes again. I truncate the table, and put about 10million datas into it
last
, 2012, at 11:25 AM, Michael Segel wrote:
Hi,
Jumping in on this late...
To cut a long story, is the region size the only current HBase
technique to balance load, esp. w.r.t query load? Or perhaps there are
some more advanced techniques to do that ?
So maybe I'm missing something but I
Hi,
Seems we just had someone talk about this just the other day...
1) 8GB of memory isn't enough to run both M/R and HBase.
Ok, yes you can run it, however don't expect it to perform well.
2) You never want a user to run their own code from the cluster itself. Use an
*edge* node.
There's
On Mon, May 21, 2012 at 4:30 PM, Michael Segel
michael_se...@hotmail.com wrote:
Hi,
Seems we just had someone talk about this just the other day...
1) 8GB of memory isn't enough to run both M/R and HBase.
Ok, yes you can run it, however don't expect it to perform well.
2) You never want
What did you see when you ran the HBase shell's status?
Did you run status w higher details?
(see status help)
On May 20, 2012, at 2:12 AM, Ben Cuthbert wrote:
All
We run a load test and after about 3 hours our application stopped. Check the
logs I see this in the hbase-master log
The number of regions per RS has always been a good point of debate.
There's a max number of 1500 (hardcoded) however, you'll see performance
degrade before that limit.
I've tried to set a goal of keeping the number of regions per RS down around
500-600 because I didn't have time to monitor
diminishing returns. Regions aren't a
logical notion, they correspond with physical files and buffers. Consider
setting N to something like 500, that's my ballpark for reasonable, totally
unscientific of course.
- Andy
On May 19, 2012, at 6:03 AM, Michael Segel michael_se...@hotmail.com
which when solved will help with the
evolution of HBase more as a Database than as a persistent object store.
On May 17, 2012, at 7:38 PM, fding hbase wrote:
Hi Michel,
On Fri, May 18, 2012 at 1:39 AM, Michael Segel
michael_se...@hotmail.comwrote:
You should not let just any user run
Head over to Cloudera's site and look at a couple of blog posts from Todd
Lipcon.
Also look at MSLABs .
On a side note... you don't have a lot of memory to play with...
On May 18, 2012, at 6:54 AM, Simon Kelly wrote:
Hi
Firstly, let me complement the Hbase team on a great piece of
it help to
allocate more memory for jobs like that?
Yes, I have 12 cores also. Are there any HDFS/MR/Hbase tuning tips for
this many processors?
btw, 64GB is a lot for us :-)
On Fri, May 11, 2012 at 7:29 AM, Michael Segel
michael_se...@hotmail.comwrote:
Funny, but this is part
You should not let just any user run coprocessors on the server. That's
madness.
Best regards,
- Andy
Fei Ding,
I'm a little confused.
Are you trying to solve the problem of querying data efficiently from a table,
or are you trying to find an example of where and when to use
Ok...
I think you need to step away from your solution and take a look at the problem
from a different perspective.
From my limited understanding of Co-processors, this doesn't fit well in what
you want to do.
I don't believe that you want to run a M/R query within a Co-processor.
In short,
I think we need to look at the base problem that is trying to be solved.
I mean the discussion on the RPC mechanism. but the problem that the OP is
trying to solve is how to use multiple indexes in a 'query'.
Note: I put ' ' around query because its a m/r job or a single thread where the
?
Cheers,
Dave
On Wed, May 16, 2012 at 12:07 PM, Michael Segel
michael_se...@hotmail.comwrote:
I think we need to look at the base problem that is trying to be solved.
I mean the discussion on the RPC mechanism. but the problem that the OP is
trying to solve is how to use multiple indexes
Funny, but this is part of a talk that I submitted to Strata
64GB and HBase isn't necessarily a 'large machine'.
If you're running w 12 cores, you're talking about a minimum of 48GB just for
M/R.
(4GB a core is a good rule of thumb )
Depending on what you want to do, you could set aside
a table is splitting?
On May 11, 2012, at 12:12 AM, Stack wrote:
On Thu, May 10, 2012 at 6:26 AM, Michael Segel
michael_se...@hotmail.com wrote:.
4) google dfs.balance.bandwidthPerSec I believe its also used by HBase when
they need to move regions.
Nah. This is an hdfs setting. HBase
Ok...
So the issue is that you have a lot of regions on a region server, where the
max file size is the default.
On your input to HBase, you have a couple of issues.
1) Your data is most likely sorted. (Not good on inserts)
2) You will want to increase your region size from default (256MB) to
problem and is false.
Many mapreduce algorithms require a reduce phase (e.g. sorting). The fact
that the output is written to HBase or somewhere else is irrelevant.
-Dave
On Thu, May 10, 2012 at 6:26 AM, Michael Segel
michael_se...@hotmail.comwrote:
[SNIP]
/event/osdi04/tech/full_papers/dean/dean.pdf
On Thu, May 10, 2012 at 11:30 AM, Michael Segel
michael_se...@hotmail.comwrote:
Dave, do you really want to go there?
OP has a couple of issues and he was going down a rabbit hole.
(You can choose if that's a reference to 'the Matrix, Jefferson
performance... Not sure. But its always
something to check and think about.
BTW, I did a quick read on your problem. You didn't say which release/version
of HBase you were running
-eran
On Thu, May 10, 2012 at 9:59 PM, Michael Segel
michael_se...@hotmail.comwrote:
Sigh.
Dave,
I
the M/R was a 'distraction'
to the issue at hand.
Not to mention his flip response w the Google paper?
On May 10, 2012, at 4:57 PM, Stack wrote:
On Thu, May 10, 2012 at 11:59 AM, Michael Segel
michael_se...@hotmail.com wrote:
Sigh.
Dave,
I really think you need to think more about
Stack,
Since you brought it up...
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#sink.
Writing, it may make sense to avoid the reduce step and write yourself back
into HBase from inside your map. You'd do this when your job does not need the
sort and
looked at, you can refactor it
to not use a reducer.
I also think you may read a bit more in to my posts that I intend. ;-)
-Mike
On May 10, 2012, at 10:28 PM, Stack wrote:
On Thu, May 10, 2012 at 6:28 PM, Michael Segel
michael_se...@hotmail.com wrote:
That section was written by Doug after
Too small a machine.
Better question. Why do you want to get 2 nodes on one machine?
On May 9, 2012, at 1:49 PM, Marcos Ortiz wrote:
Which is your current hardware configuration?
1- One way is to use two VMs in Vmware Workstation, Vmware Server, or
Virtualbox, with similar configuration
Have you thought about Garbage Collection?
-Grover
Sent from my iPhone
On Apr 24, 2012, at 12:41 PM, Skchaudhary schoudh...@ivp.in wrote:
I have a cluster Hbase set-up. In that I have 3 Region Servers. There is a
table which has 27 Regions equally distributed among 3 Region servers--9
Narendra,
I think you are still missing the point.
130 seconds to scan the table per iteration.
Even if you have 10K rows
130 * 10^4 or 1.3*10^6 seconds. ~361 hours
Compare that to 10K rows where you then select a single row in your sub select
that has a list of all of the associated rows.
the index. Thanks a lot for the insights.
Narendra
On Thu, Apr 19, 2012 at 9:56 PM, Michael Segel
michael_se...@hotmail.comwrote:
Narendra,
I think you are still missing the point.
130 seconds to scan the table per iteration.
Even if you have 10K rows
130 * 10^4 or 1.3*10^6 seconds
-1. It's a boring topic.
And it's one of those things that you either get it right or you end up hiring
a voodoo witch doctor to curse the author of the chapter...
I agree w Jack, it's not difficult just takes some planning and forethought.
Also reading lots of blogs... And some practice...
In theory, you could go as large as a region size minus the key and overhead.
(rows can't span regions)
Realistically you'd want to go much smaller.
Sent from my iPhone
On Apr 17, 2012, at 1:49 PM, Wei Shung Chung weish...@gmail.com wrote:
What would be the max affordable size one could
Uhm,
Lets take a look back at the original post :
I'm confused with a read latency I got, comparing to what YCSB team achieved
and showed in their YCSB paper. They achieved throughput of up to 7000
ops/sec with a latency of 15 ms (page 10, read latency chart). I can't get
throughput higher than
anyway. /eot
Best regards,
- Andy
On Apr 11, 2012, at 11:04 PM, Michael Segel michael_se...@hotmail.com wrote:
Uhm,
Lets take a look back at the original post :
I'm confused with a read latency I got, comparing to what YCSB team achieved
and showed in their YCSB paper
Was the YCSB test also run on Amazon?
Sent from my iPhone
On Apr 6, 2012, at 10:18 AM, ijanitran taz...@yahoo.com wrote:
I have 4 nodes HBase v0.90.4-cdh3u3 cluster deployed on Amazon XLarge
instances (16Gb RAM, 4 cores CPU) with 8Gb heap -Xmx allocated for HRegion
servers, 2Gb for
Do you really need to go to 2048 or 4096 xcievers?
Lars George just wrote a blog on it...
its on the cloudera site.
This formula is used to calculate the number of xcievers for HBase.
Since this number is usually calculated when building a system, you're going to
have to estimate this and
Yes,
Currently if one of the column family causes a split, then all of the column
families get split. So if you are dealing with a large blob, you're going to
shoot yourself in the foot.
Are you filtering on any of the values in the 'info' family?
If not, you could try creating a serialized
Why not make your properties a map object?
On Mar 20, 2012, at 4:32 AM, Qian Ye wrote:
I think the average number of properties users would add to a specific page
should be estimated. I guess, about 99.9% pages would not be associated
with too many properties. The others can be handled with
What happens if you apply a row lock to a row in the .META. table?
Its 5:00 am my local time and I was thinking about solving a problem.
(Again thinking this early in the morning without the aide of caffeine is not a
good idea.) :-)
Does RLL lock just updates or does it stop all access to
sounds like your cpu is blocked waiting on your disks.
What does your cluster look like?
how many cores per node? how many spindles?
On Mar 14, 2012, at 1:08 AM, raghavendhra rahul wrote:
Hi,
I m running coprocessor aggregation for some million rows. When
execution cpu is waiting for
Hey do that, things go boom. :-)
Before you do that I would suggest running top and seeing if there is any
swapping occurring.
Sent from my iPhone
On Mar 8, 2012, at 4:29 PM, Jean-Daniel Cryans jdcry...@apache.org wrote:
When real cpu is bigger than user cpu it very often points to
Just a couple of things...
MapR doesn't have the NN limitations.
So if your design requires lots of small files, look at MapR...
You could store your large blobs in a sequence file or series of sequence files
using HBase to store the index.
Sort of a hybrid approach.
Sent from my iPhone
On Mar
It depends on your use case. You can store binary data in HBase...
Sent from my iPhone
On Mar 4, 2012, at 2:15 AM, Konrad Tendera kon...@tendera.eu wrote:
Hello,
I'm wondering whether it worths to store my binary data using HBase?
I've read lots of articles and presentation which say that it
The better question is why would you need a reducer?
That's a bit cryptic, I understand, but you have to ask yourself when do you
need to use a reducer when you are writing to a database... ;-)
Sent from my iPhone
On Feb 28, 2012, at 10:14 AM, T Vinod Gupta tvi...@readypulse.com wrote:
LOL...
You don't have to live on the West Coast. ;-)
But to JDC's point... it depends on how good you are.
Although I don't know if I'd base my career on a product or a specific niche
like that.
While there is a premium for good talented developers, architects, etc...
Over time the hoarde
No.
What tuning did you do?
Why such a small cluster?
Sorry, but when you start off with a bad hardware configuration, you can get
Hadoop/HBase to work, but performance will always be sub-optimal.
Sent from my iPhone
On Feb 1, 2012, at 6:52 AM, Tim Robertson timrobertson...@gmail.com wrote:
I'm confused...
You mention that you are hashing your key, and you want to do a scan w a start
and stop value?
Could you elaborate?
With respect to hashing, if you use a SHA-1 hash, your values will be unique.
(you talked about rehashing ...)
Sent from my iPhone
On Jan 25, 2012, at 7:56 AM,
You don't do joins.
Sorry, but you need to put this in perspective...
You need to get really drunk and with the next morning's hang over you need to
look at HBASE as HBASE and do not think in terms of a relational schema.
Having said that, you can do joins, however they are tricky to do
Subject: Re: Question about HBase for OLTP
From: mdcal...@gmail.com
To: user@hbase.apache.org
On Mon, Jan 9, 2012 at 4:37 PM, Michael Segel michael_se...@hotmail.com
wrote:
Ok..
Look, here's the thing... HBase has no transactional support.
OLTP systems like PoS systems, Hotel
All,
Just my $0.02 worth of 'expertise'...
1) Just because you can do something doesn't mean you should.
2) One should always try to use the right tool for the job regardless of your
'fashion sense'.
3) Just because someone says Facebook or Yahoo! does X, doesn't mean its a
good idea, or
Uhmmm. Well... It depends on your data and what you want to do...
Can you fit all of the data into a single row?
Does it make sense to use a sequence file for the raw data and then use HBase
to maintain indexes?
Just some food for thought.
From: t...@cloudera.com
Date: Mon, 9 Jan 2012
are currently working on HBase Snapshots to allow disaster
recovery with HBase alone, but you shouldn't hedge bets on it being
completed within your timeframe.
On 1/9/12 2:31 PM, Michael Segel michael_se...@hotmail.com wrote:
All,
Just my $0.02 worth of 'expertise'...
1) Just
with that and for the record
that's pointed out here...
http://hbase.apache.org/book.html#arch.overview
... with the section When Should I Use HBase. I'll add something about
(lack of) transactional support in there as well.
On 1/9/12 7:37 PM, Michael Segel michael_se...@hotmail.com
Uhmm...
You're copying data from Table A back to Table A?
Ok... you really want to disable your caching altogether and make sure each row
as you write it is committed to the table.
Try that... it will hurt your performance, but it may keep you afloat.
HTH
-Mike
You've got a scanner and
Hierarchical data doesn't necessarily has anything to do w column families. You
can do a hierarchical model in a single column family.
It's pretty straight forward.
Sent from my iPhone
On Dec 30, 2011, at 6:34 PM, Imran M Yousuf imyou...@gmail.com wrote:
Hi,
Rather than addressing the
Hi Mohammad,
It sounds like you want to implement a hierarchial data model within HBase. You
can do this, albeit there are some drawbacks...
In terms of drawbacks...
The best example that I can think of is implementing a point of sale solution
in Dick Pick's Revelation system.
Here you store
Mohammad,
I'm tight on time... Short answer...
Strip out the xml in to some object and then consider using Avro to write the
object to HBase.
This could probably shrink your footprint per record/row.
Note: I don't know anything about your data so you really have to take what I
say with a
First don't use AIX.
It's nothing against IBM, except that you will find that you are going to run
unsupported unless you run the apache release and IBM is selling you support
and you will end up with a derivative.
Second, convince a bunch of open source contributors to switch to ksh.
(good
Not sure if you'd consider this a 'big data' problem.
First, IMHO you're better off serving this out of a relational model.
Having said that
'Hot Row' as in reads isn't a bad thing since its in cache.
'Hot Row' as in updates... not really a good thing since you have to lock the
row to
Matthew,
What did you set your max region size to be for this table?
14K files totalling 650GB means you have a lot of small files...
On average ~45MB (rough calc).
How many regions? Do you have mslabs set up?
(GC tuning?)
Sorry for jumping in on the end of this conversation.
-Mike
From:
One other option...
Your map() method has a null writeable and you handle the put() to the table(s)
yourself within the map() method.
You can also set the autoflush within your job too.
Date: Tue, 4 Oct 2011 16:20:25 +0200
From: christopher.dor...@gmail.com
To: user@hbase.apache.org
First, I'd suggest switching to DNS.
From: stutiawas...@hcl.com
To: user@hbase.apache.org
Date: Tue, 20 Sep 2011 20:42:46 +0530
Subject: RE: Dynamic addition of RegionServer
Hi ,
I was able to add region server dynamically in running cluster . But this
happens only when hostname of
Sonal,
Just because you have a m/r job doesn't mean that you need to reduce anything.
You can have a job that contains only a mapper.
Or your job runner can have a series of map jobs in serial.
Most if not all of the map/reduce jobs where we pull data from HBase, don't
require a reducer.
To
with HBase.
-chris
On Sep 16, 2011, at 11:43 AM, Michael Segel wrote:
Sonal,
You do realize that HBase is a database, right? ;-)
So again, why do you need a reducer? ;-)
Using your example...
Again, there will be many cases where one may want a reducer, say trying
to count
load the
results back in. Definitely specialized usage, but I could see other
perfectly valid uses for reducers with HBase.
-chris
On Sep 16, 2011, at 11:43 AM, Michael Segel wrote:
Sonal,
You do realize that HBase is a database, right? ;-)
So again, why do you need a reducer
What happens after a compaction is run on the table?
From: vidhy...@yahoo-inc.com
To: hbase-u...@hadoop.apache.org
Date: Fri, 16 Sep 2011 13:35:35 -0700
Subject: Unassigned holes in tables
This must possibly be a common occurrence but we noticed through hbck some
regions (5 out of
I realize that this is an HBase group, however nothing in the stated problem
would suggest that an RDBMs couldn't handle the problem.
Inserting 10K rows every 5 minutes poses a challenge to the database?
I guess it would be a challenge based on the size and type of data along with
the
not
(say, if you didn't really need that power and a higher level, more abstract
tool set like a relational database would suffice).
Ian
On Sep 14, 2011, at 1:17 PM, Michael Segel wrote:
I realize that this is an HBase group, however nothing in the stated
problem would suggest
From: doug.m...@explorysmedical.com
To: user@hbase.apache.org
Date: Tue, 6 Sep 2011 09:42:07 -0400
Subject: Re: HBase Meetup during Hadoop World NYC '11
Explorys is sending a few, so +3 or so.
Oh! So you are now forcing your employees to drink copious amounts of beer and
eat lots
on StackOverflow
To: user@hbase.apache.org
From: Michael Segel michael_se...@hotmail.com
Can't we just all get along? :-)
My personal introduction to Cassandra came maybe in the 2009 timeframe. We
evaluated it and HBase at the time and chose HBase. No point to discuss why,
the world has
Date: Thu, 1 Sep 2011 15:13:13 -0700
Subject: Re: HBase and Cassandra on StackOverflow
From: timelessn...@gmail.com
To: user@hbase.apache.org
[BIG SNIP]
While you guys are going back and forth... a simple reminder.
Not everyone has the same base level of experience so their ability to
I don't understand why you're having trouble with this.
You have a simple geo location search based on zip and then a product and
inventory count.
I mean its not really geo-spatial because you're searching based on zip code.
So you don't need to worry about any sort of geospatial or geodetic
You still need to organize your vendors by delivery zip.
Which gets very ugly when you try product code grocerCode.
Even doing something like zipproductvendor as your key gets you a lot of
rows.
This will work, where you have columns for price, qty on hand , sku, etc...
The problem gets
Sean,
You wrote the following:
But sometimes, we do need to save multiple versions of values, such as
logging events, or messages of Facebook. In these cases, what is the trade
off between saving them in different rows, and in different versions of one
row?
You're not updating logging
I won't say you're crazy but .5 GB per mapper?
I would say tune conservatively like you are suggesting 1GB for OS, but also
I'd suggest tuning to 80% utilization instead of 100% utilization.
From: buttl...@llnl.gov
To: user@hbase.apache.org
Date: Tue, 23 Aug 2011 16:35:22 -0700
Subject:
Sujee,
You are correct in creating a separate HTable instance in each thread. (HTable
isn't thread safe, but since the scope is within the thread it works.)
You could use the HTablePool class, but I don't think its a better solution for
what you are doing.
In your example it sounds like
It could be that its the results from the reducer.
My guess is that he's got an issue where he's over extending his system.
Sounds like a tuning issue.
How much memory on the system?
What's being used by HBase?
How many reducers, How many mappers?
How large is the cache on DN, and how much
Uhm Silly question...
Why would you ever need a reduce step when you're writing to an HBase table?
Now I'm sure that there may be some fringe case, but in the past two years,
I've never come across a case where you would need to do a reducer when you're
writing to HBase.
So what am I
Tomas,
If I understand you correctly you have a row key of A,B,C and you wan to fetch
only the rows on A and C
You can do a start row of A
And then do the end row of A1
So that you get the first row for the give vehicle_id, and then stop when the
vehicle_id changes.
You would then have to
On Tue, Jul 26, 2011 at 7:39 AM, Mark static.void@gmail.com wrote:
So my first question is, would HBase fit our use case? If not
can anyone offer some advice on what would/should be used?
You mean HBase as the sink for your log emitters?
The pattern I usually see is that there
I'm not sure how they are doing this, but just a quick thought...
You can increase the file size 1-2GB as an example and then run compactions on
a regular basis to clean up rows deleted from the queue.
This will stop the table from splitting.
The assumption is that your MAX_FILESIZE is much
Are you doing scans or are you doing get() with a known key?
There's a big difference and scans are very expensive.
You also don't talk about your hardware. How much memory, how many cores per
node, how you have your m/r configured (even if you're not running a m/r job,
you still have to
for the backup question.
On 7/14/11 12:14 PM, Michael Segel michael_se...@hotmail.com wrote:
Not sure what you read in Otis' bog but pretty ssure it's out of date.
Check out MapR stuff.
Sent from my Palm Pre on ATamp;T
On Jul 14, 2011 6:57 AM, Steinmaurer Thomas
lt
Claudio,
I'm not sure on how to answer this...
Yes, we've got a prototype of a Lucene on HBase w Spatial that we're starting
to test.
With respect to hashing...
In one project we just hashed the key using the SHA-1 hash already in Java.
This gave us the randomness without having to try to
Not sure what you read in Otis' bog but pretty ssure it's out of date.
Check out MapR stuff.
Sent from my Palm Pre on ATamp;T
On Jul 14, 2011 6:57 AM, Steinmaurer Thomas lt;thomas.steinmau...@scch.atgt;
wrote:
Hello,
we are currently evaluating HBase for a project. In respect to
Yeah, but you don't want to drop all of the machines at the same time. When you
decommission a node, you need to give the cluster time to rebalance before
dropping a second node.
That is of course if you don't mind losing any data.
:-)
Date: Tue, 28 Jun 2011 10:33:39 -0700
Subject: Re:
Yes, its stable enough.
Date: Sat, 25 Jun 2011 02:09:06 +
From: jack.zhangj...@huawei.com
Subject: Does anybody enable MSLAB in production system? I am not sure if
it's stable enough for production system?
To: hbase-u...@hadoop.apache.org
Unfortunately no.
Columns may or may not exist and its on a per row basis.
You can write a simple map job (No reduce) to use dynamic counters to determine
the unique names of the columns and in how many rows these columns exist.
You could also keep track of the number of columns per row, the
And even that recommendation isn't right. ;-)
I think Sandy Bridge and SolarFlare are changing some of the design
considerations.
Date: Tue, 7 Jun 2011 10:32:58 +0200
Subject: Re: Hbase Hardware requirement
From: timrobertson...@gmail.com
To: user@hbase.apache.org
the cartesian product. This allows
you to inject whatever cleverness you need at this point.
Common kinds of cleverness include down-sampling of problematically
large
sets of candidates.
On Tue, May 31, 2011 at 11:56 AM, Michael Segel
michael_se...@hotmail.comwrote
Eran,
You want to join two tables? The short answer is to use a relational database
to solve that problem.
Longer answer:
You're using HBase so you don't need to think in terms of a reducer.
You can create a temp table for your query.
You can then run one map job to scan and filter table A,
for the lookups instead. So you'd hold
onto a batch of records in the Mapper and then the batch size is filled, then
you do the lookups (and then any required emitting, etc.).
-Original Message-
From: Michael Segel [mailto:michael_se...@hotmail.com]
Sent: Tuesday, May 31, 2011 10:56
From: doug.m...@explorysmedical.com
To: user@hbase.apache.org
Date: Tue, 31 May 2011 15:39:14 -0400
Subject: RE: How to efficiently join HBase tables?
Re: Didn't see a multi-get...
This is what I'm talking about...
Himanish,
Are we talking about an African or European Swallow?
(Sorry its a reference to the Monty Python movie scene where they cross the
bridge after being asked 3 questions which they must answer correctly? [What's
the forward air speed velocity of an unladen swallow?])
The point is that
Sorry,
Its been one of those days.
From: michael_se...@hotmail.com
To: user@hbase.apache.org
Subject: RE: HBase Transaction per second in Map-Reduce
Date: Tue, 24 May 2011 14:18:28 -0500
Himanish,
Are we talking about an African or European
Besides this...
JRE version: 6.0_17-b17
Just a silly question ...
What happens if you double the zookeeper time out to 120 seconds?
Also I'm going to assume that you're not running your ZK on the same nodes as
your data nodes, but you know what they say about assumptions...
From:
Sorry to jump in on the tail end.
What do you mean to say that they key is generated randomly?
I mean are you using a key and then applying a SHA-1 hash?
Which node is serving your -ROOT- and META tables?
Have you applied the GC hints recommended by Todd L in his blog?
Also you said:
'
And
Hi,
That's actually a really good question.
Unfortunately, the answer isn't really simple.
You're going to need to estimate your growth and you're going to need to
estimate your configuration.
Suppose I know that within 2 years, the amount of data that I want to retain is
going to be 1PB,
a 50TB
machine take, a day a
week, longer?
/Ian
Architect / Mgr - Novell Vibe
On 05/02/2011 09:57 AM, Michael Segel wrote:
Hi,
That's actually a really good question.
Unfortunately, the answer isn't really simple.
You're going to need to estimate your growth and you're going
What's your xceivers set to?
What's the ulimit -n set for hdfs/hadoop user... (You didn't say which
release/version you were using.)
Date: Sun, 1 May 2011 17:47:18 -0700
Subject: one of our datanodes stops working after few hours
From: magn...@gmail.com
To: user@hbase.apache.org
I
Felix,
You're going to want to upgrade to CDH3u0 for two main reasons
1) There was a bug in the HBase Load Balancer that was fixed in 90.2 but Todd
said that they were going to back port it to 90.1
2) There's another bug in the WAL that is fixed in 0.90 (CDH3B4)
0.89 is much better than
401 - 500 of 583 matches
Mail list logo