Re: Pre-split table using shell

2012-06-12 Thread Michael Segel
, its (supposed to be) production. Please elaboration on your inferred sigh of dispair On 12 June 2012 15:48, Michael Segel michael_se...@hotmail.com wrote: Ok... Please tell me that this isn't a production system. Is this on EC2? On Jun 12, 2012, at 6:55 AM, Simon Kelly wrote

Re: Shared Cluster between HBase and MapReduce

2012-06-06 Thread Michael Segel
It depends... There are some reasons to do this however in general, you don't need to do this... The course is wrong to suggest this as a best practice. Sent from my iPhone On Jun 5, 2012, at 5:00 PM, Atif Khan atif_ijaz_k...@hotmail.com wrote: During a recent Cloudera course we were told

Re: Region autoSplit when not reach 'hbase.hregion.max.filesize' ?

2012-06-06 Thread Michael Segel
Just out of curiosity, describe the data? Sorted? The more we know, the easier it is to help... Also, can you recheck your math ? Sent from my iPhone On Jun 6, 2012, at 6:17 PM, NNever nnever...@gmail.com wrote: It comes again. I truncate the table, and put about 10million datas into it last

Re: Of hbase key distribution and query scalability, again.

2012-05-26 Thread Michael Segel
, 2012, at 11:25 AM, Michael Segel wrote: Hi, Jumping in on this late... To cut a long story, is the region size the only current HBase technique to balance load, esp. w.r.t query load? Or perhaps there are some more advanced techniques to do that ? So maybe I'm missing something but I

Re: hbase insert performance test (from hbasemaster and regionservers)

2012-05-21 Thread Michael Segel
Hi, Seems we just had someone talk about this just the other day... 1) 8GB of memory isn't enough to run both M/R and HBase. Ok, yes you can run it, however don't expect it to perform well. 2) You never want a user to run their own code from the cluster itself. Use an *edge* node. There's

Re: hbase insert performance test (from hbasemaster and regionservers)

2012-05-21 Thread Michael Segel
On Mon, May 21, 2012 at 4:30 PM, Michael Segel michael_se...@hotmail.com wrote: Hi, Seems we just had someone talk about this just the other day... 1) 8GB of memory isn't enough to run both M/R and HBase. Ok, yes you can run it, however don't expect it to perform well. 2) You never want

Re: forcing offline

2012-05-20 Thread Michael Segel
What did you see when you ran the HBase shell's status? Did you run status w higher details? (see status help) On May 20, 2012, at 2:12 AM, Ben Cuthbert wrote: All We run a load test and after about 3 hours our application stopped. Check the logs I see this in the hbase-master log

Re: How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)

2012-05-19 Thread Michael Segel
The number of regions per RS has always been a good point of debate. There's a max number of 1500 (hardcoded) however, you'll see performance degrade before that limit. I've tried to set a goal of keeping the number of regions per RS down around 500-600 because I didn't have time to monitor

Re: How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)

2012-05-19 Thread Michael Segel
diminishing returns. Regions aren't a logical notion, they correspond with physical files and buffers. Consider setting N to something like 500, that's my ballpark for reasonable, totally unscientific of course. - Andy On May 19, 2012, at 6:03 AM, Michael Segel michael_se...@hotmail.com

Re: EndPoint Coprocessor could be dealocked?

2012-05-18 Thread Michael Segel
which when solved will help with the evolution of HBase more as a Database than as a persistent object store. On May 17, 2012, at 7:38 PM, fding hbase wrote: Hi Michel, On Fri, May 18, 2012 at 1:39 AM, Michael Segel michael_se...@hotmail.comwrote: You should not let just any user run

Re: Garbage collection issues

2012-05-18 Thread Michael Segel
Head over to Cloudera's site and look at a couple of blog posts from Todd Lipcon. Also look at MSLABs . On a side note... you don't have a lot of memory to play with... On May 18, 2012, at 6:54 AM, Simon Kelly wrote: Hi Firstly, let me complement the Hbase team on a great piece of

Re: large machine configuration

2012-05-18 Thread Michael Segel
it help to allocate more memory for jobs like that? Yes, I have 12 cores also. Are there any HDFS/MR/Hbase tuning tips for this many processors? btw, 64GB is a lot for us :-) On Fri, May 11, 2012 at 7:29 AM, Michael Segel michael_se...@hotmail.comwrote: Funny, but this is part

Re: EndPoint Coprocessor could be dealocked?

2012-05-17 Thread Michael Segel
You should not let just any user run coprocessors on the server. That's madness. Best regards, - Andy Fei Ding, I'm a little confused. Are you trying to solve the problem of querying data efficiently from a table, or are you trying to find an example of where and when to use

Re: EndPoint Coprocessor could be dealocked?

2012-05-16 Thread Michael Segel
Ok... I think you need to step away from your solution and take a look at the problem from a different perspective. From my limited understanding of Co-processors, this doesn't fit well in what you want to do. I don't believe that you want to run a M/R query within a Co-processor. In short,

Re: EndPoint Coprocessor could be dealocked?

2012-05-16 Thread Michael Segel
I think we need to look at the base problem that is trying to be solved. I mean the discussion on the RPC mechanism. but the problem that the OP is trying to solve is how to use multiple indexes in a 'query'. Note: I put ' ' around query because its a m/r job or a single thread where the

Re: EndPoint Coprocessor could be dealocked?

2012-05-16 Thread Michael Segel
? Cheers, Dave On Wed, May 16, 2012 at 12:07 PM, Michael Segel michael_se...@hotmail.comwrote: I think we need to look at the base problem that is trying to be solved. I mean the discussion on the RPC mechanism. but the problem that the OP is trying to solve is how to use multiple indexes

Re: large machine configuration

2012-05-11 Thread Michael Segel
Funny, but this is part of a talk that I submitted to Strata 64GB and HBase isn't necessarily a 'large machine'. If you're running w 12 cores, you're talking about a minimum of 48GB just for M/R. (4GB a core is a good rule of thumb ) Depending on what you want to do, you could set aside

Re: Occasional regionserver crashes following socket errors writing to HDFS

2012-05-11 Thread Michael Segel
a table is splitting? On May 11, 2012, at 12:12 AM, Stack wrote: On Thu, May 10, 2012 at 6:26 AM, Michael Segel michael_se...@hotmail.com wrote:. 4) google dfs.balance.bandwidthPerSec I believe its also used by HBase when they need to move regions. Nah. This is an hdfs setting. HBase

Re: Occasional regionserver crashes following socket errors writing to HDFS

2012-05-10 Thread Michael Segel
Ok... So the issue is that you have a lot of regions on a region server, where the max file size is the default. On your input to HBase, you have a couple of issues. 1) Your data is most likely sorted. (Not good on inserts) 2) You will want to increase your region size from default (256MB) to

Re: Occasional regionserver crashes following socket errors writing to HDFS

2012-05-10 Thread Michael Segel
problem and is false. Many mapreduce algorithms require a reduce phase (e.g. sorting). The fact that the output is written to HBase or somewhere else is irrelevant. -Dave On Thu, May 10, 2012 at 6:26 AM, Michael Segel michael_se...@hotmail.comwrote: [SNIP]

Re: Occasional regionserver crashes following socket errors writing to HDFS

2012-05-10 Thread Michael Segel
/event/osdi04/tech/full_papers/dean/dean.pdf On Thu, May 10, 2012 at 11:30 AM, Michael Segel michael_se...@hotmail.comwrote: Dave, do you really want to go there? OP has a couple of issues and he was going down a rabbit hole. (You can choose if that's a reference to 'the Matrix, Jefferson

Re: Occasional regionserver crashes following socket errors writing to HDFS

2012-05-10 Thread Michael Segel
performance... Not sure. But its always something to check and think about. BTW, I did a quick read on your problem. You didn't say which release/version of HBase you were running -eran On Thu, May 10, 2012 at 9:59 PM, Michael Segel michael_se...@hotmail.comwrote: Sigh. Dave, I

Re: Occasional regionserver crashes following socket errors writing to HDFS

2012-05-10 Thread Michael Segel
the M/R was a 'distraction' to the issue at hand. Not to mention his flip response w the Google paper? On May 10, 2012, at 4:57 PM, Stack wrote: On Thu, May 10, 2012 at 11:59 AM, Michael Segel michael_se...@hotmail.com wrote: Sigh. Dave, I really think you need to think more about

Re: Occasional regionserver crashes following socket errors writing to HDFS

2012-05-10 Thread Michael Segel
Stack, Since you brought it up... http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#sink. Writing, it may make sense to avoid the reduce step and write yourself back into HBase from inside your map. You'd do this when your job does not need the sort and

Re: Occasional regionserver crashes following socket errors writing to HDFS

2012-05-10 Thread Michael Segel
looked at, you can refactor it to not use a reducer. I also think you may read a bit more in to my posts that I intend. ;-) -Mike On May 10, 2012, at 10:28 PM, Stack wrote: On Thu, May 10, 2012 at 6:28 PM, Michael Segel michael_se...@hotmail.com wrote: That section was written by Doug after

Re: How to run two data nodes on one pc?

2012-05-09 Thread Michael Segel
Too small a machine. Better question. Why do you want to get 2 nodes on one machine? On May 9, 2012, at 1:49 PM, Marcos Ortiz wrote: Which is your current hardware configuration? 1- One way is to use two VMs in Vmware Workstation, Vmware Server, or Virtualbox, with similar configuration

Re: Hbase Quality Of Service: large standarad deviation in insert time while inserting same type of rows in Hbase

2012-04-24 Thread Michael Segel
Have you thought about Garbage Collection? -Grover Sent from my iPhone On Apr 24, 2012, at 12:41 PM, Skchaudhary schoudh...@ivp.in wrote: I have a cluster Hbase set-up. In that I have 3 Region Servers. There is a table which has 27 Regions equally distributed among 3 Region servers--9

Re: HBase parallel scanner performance

2012-04-19 Thread Michael Segel
Narendra, I think you are still missing the point. 130 seconds to scan the table per iteration. Even if you have 10K rows 130 * 10^4 or 1.3*10^6 seconds. ~361 hours Compare that to 10K rows where you then select a single row in your sub select that has a list of all of the associated rows.

Re: HBase parallel scanner performance

2012-04-19 Thread Michael Segel
the index. Thanks a lot for the insights. Narendra On Thu, Apr 19, 2012 at 9:56 PM, Michael Segel michael_se...@hotmail.comwrote: Narendra, I think you are still missing the point. 130 seconds to scan the table per iteration. Even if you have 10K rows 130 * 10^4 or 1.3*10^6 seconds

Re: Storing extremely large size file

2012-04-17 Thread Michael Segel
-1. It's a boring topic. And it's one of those things that you either get it right or you end up hiring a voodoo witch doctor to curse the author of the chapter... I agree w Jack, it's not difficult just takes some planning and forethought. Also reading lots of blogs... And some practice...

Re: Storing extremely large size file

2012-04-17 Thread Michael Segel
In theory, you could go as large as a region size minus the key and overhead. (rows can't span regions) Realistically you'd want to go much smaller. Sent from my iPhone On Apr 17, 2012, at 1:49 PM, Wei Shung Chung weish...@gmail.com wrote: What would be the max affordable size one could

Re: Speeding up HBase read response

2012-04-12 Thread Michael Segel
Uhm, Lets take a look back at the original post : I'm confused with a read latency I got, comparing to what YCSB team achieved and showed in their YCSB paper. They achieved throughput of up to 7000 ops/sec with a latency of 15 ms (page 10, read latency chart). I can't get throughput higher than

Re: Speeding up HBase read response

2012-04-12 Thread Michael Segel
anyway. /eot Best regards, - Andy On Apr 11, 2012, at 11:04 PM, Michael Segel michael_se...@hotmail.com wrote: Uhm, Lets take a look back at the original post : I'm confused with a read latency I got, comparing to what YCSB team achieved and showed in their YCSB paper

Re: Speeding up HBase read response

2012-04-06 Thread Michael Segel
Was the YCSB test also run on Amazon? Sent from my iPhone On Apr 6, 2012, at 10:18 AM, ijanitran taz...@yahoo.com wrote: I have 4 nodes HBase v0.90.4-cdh3u3 cluster deployed on Amazon XLarge instances (16Gb RAM, 4 cores CPU) with 8Gb heap -Xmx allocated for HRegion servers, 2Gb for

Re: gc pause killing regionserver

2012-03-21 Thread Michael Segel
Do you really need to go to 2048 or 4096 xcievers? Lars George just wrote a blog on it... its on the cloudera site. This formula is used to calculate the number of xcievers for HBase. Since this number is usually calculated when building a system, you're going to have to estimate this and

Re: Rows vs. Columns

2012-03-20 Thread Michael Segel
Yes, Currently if one of the column family causes a split, then all of the column families get split. So if you are dealing with a large blob, you're going to shoot yourself in the foot. Are you filtering on any of the values in the 'info' family? If not, you could try creating a serialized

Re: Rows vs. Columns

2012-03-20 Thread Michael Segel
Why not make your properties a map object? On Mar 20, 2012, at 4:32 AM, Qian Ye wrote: I think the average number of properties users would add to a specific page should be estimated. I guess, about 99.9% pages would not be associated with too many properties. The others can be handled with

Silly question...

2012-03-20 Thread Michael Segel
What happens if you apply a row lock to a row in the .META. table? Its 5:00 am my local time and I was thinking about solving a problem. (Again thinking this early in the morning without the aide of caffeine is not a good idea.) :-) Does RLL lock just updates or does it stop all access to

Re: IO problem

2012-03-14 Thread Michael Segel
sounds like your cpu is blocked waiting on your disks. What does your cluster look like? how many cores per node? how many spindles? On Mar 14, 2012, at 1:08 AM, raghavendhra rahul wrote: Hi, I m running coprocessor aggregation for some million rows. When execution cpu is waiting for

Re: gc pause killing regionserver

2012-03-08 Thread Michael Segel
Hey do that, things go boom. :-) Before you do that I would suggest running top and seeing if there is any swapping occurring. Sent from my iPhone On Mar 8, 2012, at 4:29 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: When real cpu is bigger than user cpu it very often points to

Re: What about storing binary data(e.g. images) in HBase?

2012-03-05 Thread Michael Segel
Just a couple of things... MapR doesn't have the NN limitations. So if your design requires lots of small files, look at MapR... You could store your large blobs in a sequence file or series of sequence files using HBase to store the index. Sort of a hybrid approach. Sent from my iPhone On Mar

Re: What about storing binary data(e.g. images) in HBase?

2012-03-04 Thread Michael Segel
It depends on your use case. You can store binary data in HBase... Sent from my iPhone On Mar 4, 2012, at 2:15 AM, Konrad Tendera kon...@tendera.eu wrote: Hello, I'm wondering whether it worths to store my binary data using HBase? I've read lots of articles and presentation which say that it

Re: multiple puts in reducer?

2012-02-28 Thread Michael Segel
The better question is why would you need a reducer? That's a bit cryptic, I understand, but you have to ask yourself when do you need to use a reducer when you are writing to a database... ;-) Sent from my iPhone On Feb 28, 2012, at 10:14 AM, T Vinod Gupta tvi...@readypulse.com wrote:

RE: hbase and consulting

2012-02-24 Thread Michael Segel
LOL... You don't have to live on the West Coast. ;-) But to JDC's point... it depends on how good you are. Although I don't know if I'd base my career on a product or a specific niche like that. While there is a premium for good talented developers, architects, etc... Over time the hoarde

Re: PerformanceEvaluation results

2012-02-01 Thread Michael Segel
No. What tuning did you do? Why such a small cluster? Sorry, but when you start off with a bad hardware configuration, you can get Hadoop/HBase to work, but performance will always be sub-optimal. Sent from my iPhone On Feb 1, 2012, at 6:52 AM, Tim Robertson timrobertson...@gmail.com wrote:

Re: Speeding up Scans

2012-01-25 Thread Michael Segel
I'm confused... You mention that you are hashing your key, and you want to do a scan w a start and stop value? Could you elaborate? With respect to hashing, if you use a SHA-1 hash, your values will be unique. (you talked about rehashing ...) Sent from my iPhone On Jan 25, 2012, at 7:56 AM,

Re: HBase schema question

2012-01-21 Thread Michael Segel
You don't do joins. Sorry, but you need to put this in perspective... You need to get really drunk and with the next morning's hang over you need to look at HBASE as HBASE and do not think in terms of a relational schema. Having said that, you can do joins, however they are tricky to do

RE: Question about HBase for OLTP

2012-01-12 Thread Michael Segel
Subject: Re: Question about HBase for OLTP From: mdcal...@gmail.com To: user@hbase.apache.org On Mon, Jan 9, 2012 at 4:37 PM, Michael Segel michael_se...@hotmail.com wrote: Ok.. Look, here's the thing... HBase has no transactional support. OLTP systems like PoS systems, Hotel

RE: Question about HBase for OLTP

2012-01-09 Thread Michael Segel
All, Just my $0.02 worth of 'expertise'... 1) Just because you can do something doesn't mean you should. 2) One should always try to use the right tool for the job regardless of your 'fashion sense'. 3) Just because someone says Facebook or Yahoo! does X, doesn't mean its a good idea, or

RE: Question about HBase for OLTP

2012-01-09 Thread Michael Segel
Uhmmm. Well... It depends on your data and what you want to do... Can you fit all of the data into a single row? Does it make sense to use a sequence file for the raw data and then use HBase to maintain indexes? Just some food for thought. From: t...@cloudera.com Date: Mon, 9 Jan 2012

RE: Question about HBase for OLTP

2012-01-09 Thread Michael Segel
are currently working on HBase Snapshots to allow disaster recovery with HBase alone, but you shouldn't hedge bets on it being completed within your timeframe. On 1/9/12 2:31 PM, Michael Segel michael_se...@hotmail.com wrote: All, Just my $0.02 worth of 'expertise'... 1) Just

RE: Question about HBase for OLTP

2012-01-09 Thread Michael Segel
with that and for the record that's pointed out here... http://hbase.apache.org/book.html#arch.overview ... with the section When Should I Use HBase. I'll add something about (lack of) transactional support in there as well. On 1/9/12 7:37 PM, Michael Segel michael_se...@hotmail.com

RE: copy job for mapreduce failing due to large rows

2012-01-09 Thread Michael Segel
Uhmm... You're copying data from Table A back to Table A? Ok... you really want to disable your caching altogether and make sure each row as you write it is committed to the table. Try that... it will hurt your performance, but it may keep you afloat. HTH -Mike You've got a scanner and

Re: No. of families

2011-12-30 Thread Michael Segel
Hierarchical data doesn't necessarily has anything to do w column families. You can do a hierarchical model in a single column family. It's pretty straight forward. Sent from my iPhone On Dec 30, 2011, at 6:34 PM, Imran M Yousuf imyou...@gmail.com wrote: Hi, Rather than addressing the

RE: Creating columns within columns

2011-12-15 Thread Michael Segel
Hi Mohammad, It sounds like you want to implement a hierarchial data model within HBase. You can do this, albeit there are some drawbacks... In terms of drawbacks... The best example that I can think of is implementing a point of sale solution in Dick Pick's Revelation system. Here you store

RE: Creating columns within columns

2011-12-15 Thread Michael Segel
Mohammad, I'm tight on time... Short answer... Strip out the xml in to some object and then consider using Avro to write the object to HBase. This could probably shrink your footprint per record/row. Note: I don't know anything about your data so you really have to take what I say with a

Re: Installing HBase

2011-12-07 Thread Michael Segel
First don't use AIX. It's nothing against IBM, except that you will find that you are going to run unsupported unless you run the apache release and IBM is selling you support and you will end up with a derivative. Second, convince a bunch of open source contributors to switch to ksh. (good

RE: Schema design question - Hot Key concerns

2011-11-18 Thread Michael Segel
Not sure if you'd consider this a 'big data' problem. First, IMHO you're better off serving this out of a relational model. Having said that 'Hot Row' as in reads isn't a bad thing since its in cache. 'Hot Row' as in updates... not really a good thing since you have to lock the row to

RE: Region has been OPENING for too long

2011-10-31 Thread Michael Segel
Matthew, What did you set your max region size to be for this table? 14K files totalling 650GB means you have a lot of small files... On average ~45MB (rough calc). How many regions? Do you have mslabs set up? (GC tuning?) Sorry for jumping in on the end of this conversation. -Mike From:

RE: Best way to write to multiple tables in one map-only job

2011-10-04 Thread Michael Segel
One other option... Your map() method has a null writeable and you handle the put() to the table(s) yourself within the map() method. You can also set the autoflush within your job too. Date: Tue, 4 Oct 2011 16:20:25 +0200 From: christopher.dor...@gmail.com To: user@hbase.apache.org

RE: Dynamic addition of RegionServer

2011-09-20 Thread Michael Segel
First, I'd suggest switching to DNS. From: stutiawas...@hcl.com To: user@hbase.apache.org Date: Tue, 20 Sep 2011 20:42:46 +0530 Subject: RE: Dynamic addition of RegionServer Hi , I was able to add region server dynamically in running cluster . But this happens only when hostname of

RE: Writing MR-Job: Something like OracleReducer, JDBCReducer ...

2011-09-16 Thread Michael Segel
Sonal, Just because you have a m/r job doesn't mean that you need to reduce anything. You can have a job that contains only a mapper. Or your job runner can have a series of map jobs in serial. Most if not all of the map/reduce jobs where we pull data from HBase, don't require a reducer. To

RE: Writing MR-Job: Something like OracleReducer, JDBCReducer ...

2011-09-16 Thread Michael Segel
with HBase. -chris On Sep 16, 2011, at 11:43 AM, Michael Segel wrote: Sonal, You do realize that HBase is a database, right? ;-) So again, why do you need a reducer? ;-) Using your example... Again, there will be many cases where one may want a reducer, say trying to count

RE: Writing MR-Job: Something like OracleReducer, JDBCReducer ...

2011-09-16 Thread Michael Segel
load the results back in. Definitely specialized usage, but I could see other perfectly valid uses for reducers with HBase. -chris On Sep 16, 2011, at 11:43 AM, Michael Segel wrote: Sonal, You do realize that HBase is a database, right? ;-) So again, why do you need a reducer

RE: Unassigned holes in tables

2011-09-16 Thread Michael Segel
What happens after a compaction is run on the table? From: vidhy...@yahoo-inc.com To: hbase-u...@hadoop.apache.org Date: Fri, 16 Sep 2011 13:35:35 -0700 Subject: Unassigned holes in tables This must possibly be a common occurrence but we noticed through hbck some regions (5 out of

RE: Should I use HBASE?

2011-09-14 Thread Michael Segel
I realize that this is an HBase group, however nothing in the stated problem would suggest that an RDBMs couldn't handle the problem. Inserting 10K rows every 5 minutes poses a challenge to the database? I guess it would be a challenge based on the size and type of data along with the

RE: Should I use HBASE?

2011-09-14 Thread Michael Segel
not (say, if you didn't really need that power and a higher level, more abstract tool set like a relational database would suffice). Ian On Sep 14, 2011, at 1:17 PM, Michael Segel wrote: I realize that this is an HBase group, however nothing in the stated problem would suggest

RE: HBase Meetup during Hadoop World NYC '11

2011-09-06 Thread Michael Segel
From: doug.m...@explorysmedical.com To: user@hbase.apache.org Date: Tue, 6 Sep 2011 09:42:07 -0400 Subject: Re: HBase Meetup during Hadoop World NYC '11 Explorys is sending a few, so +3 or so. Oh! So you are now forcing your employees to drink copious amounts of beer and eat lots

RE: HBase and Cassandra on StackOverflow

2011-09-05 Thread Michael Segel
on StackOverflow To: user@hbase.apache.org From: Michael Segel michael_se...@hotmail.com Can't we just all get along? :-) My personal introduction to Cassandra came maybe in the 2009 timeframe. We evaluated it and HBase at the time and chose HBase. No point to discuss why, the world has

RE: HBase and Cassandra on StackOverflow

2011-09-01 Thread Michael Segel
Date: Thu, 1 Sep 2011 15:13:13 -0700 Subject: Re: HBase and Cassandra on StackOverflow From: timelessn...@gmail.com To: user@hbase.apache.org [BIG SNIP] While you guys are going back and forth... a simple reminder. Not everyone has the same base level of experience so their ability to

RE: Real time dynamic data and hbase

2011-08-30 Thread Michael Segel
I don't understand why you're having trouble with this. You have a simple geo location search based on zip and then a product and inventory count. I mean its not really geo-spatial because you're searching based on zip code. So you don't need to worry about any sort of geospatial or geodetic

RE: Real time dynamic data and hbase

2011-08-30 Thread Michael Segel
You still need to organize your vendors by delivery zip. Which gets very ugly when you try product code grocerCode. Even doing something like zipproductvendor as your key gets you a lot of rows. This will work, where you have columns for price, qty on hand , sku, etc... The problem gets

RE: Versioning

2011-08-26 Thread Michael Segel
Sean, You wrote the following: But sometimes, we do need to save multiple versions of values, such as logging events, or messages of Facebook. In these cases, what is the trade off between saving them in different rows, and in different versions of one row? You're not updating logging

RE: how to make tuning for hbase (every couple of days hbase region sever/s crashe)

2011-08-23 Thread Michael Segel
I won't say you're crazy but .5 GB per mapper? I would say tune conservatively like you are suggesting 1GB for OS, but also I'd suggest tuning to 80% utilization instead of 100% utilization. From: buttl...@llnl.gov To: user@hbase.apache.org Date: Tue, 23 Aug 2011 16:35:22 -0700 Subject:

RE: question on HTablePool and threads

2011-08-23 Thread Michael Segel
Sujee, You are correct in creating a separate HTable instance in each thread. (HTable isn't thread safe, but since the scope is within the thread it works.) You could use the HTablePool class, but I don't think its a better solution for what you are doing. In your example it sounds like

RE: M/R vs hbase problem in production

2011-08-15 Thread Michael Segel
It could be that its the results from the reducer. My guess is that he's got an issue where he's over extending his system. Sounds like a tuning issue. How much memory on the system? What's being used by HBase? How many reducers, How many mappers? How large is the cache on DN, and how much

RE: loading data in HBase table using APIs

2011-08-04 Thread Michael Segel
Uhm Silly question... Why would you ever need a reduce step when you're writing to an HBase table? Now I'm sure that there may be some fringe case, but in the past two years, I've never come across a case where you would need to do a reducer when you're writing to HBase. So what am I

RE: Something like Execution Plan as in the RDBMS world?

2011-08-04 Thread Michael Segel
Tomas, If I understand you correctly you have a row key of A,B,C and you wan to fetch only the rows on A and C You can do a start row of A And then do the end row of A1 So that you get the first row for the give vehicle_id, and then stop when the vehicle_id changes. You would then have to

RE: Design/Schema questions

2011-07-26 Thread Michael Segel
On Tue, Jul 26, 2011 at 7:39 AM, Mark static.void@gmail.com wrote: So my first question is, would HBase fit our use case? If not can anyone offer some advice on what would/should be used? You mean HBase as the sink for your log emitters? The pattern I usually see is that there

RE: hbase table as a queue.

2011-07-19 Thread Michael Segel
I'm not sure how they are doing this, but just a quick thought... You can increase the file size 1-2GB as an example and then run compactions on a regular basis to clean up rows deleted from the queue. This will stop the table from splitting. The assumption is that your MAX_FILESIZE is much

RE: HBase reading performance

2011-07-18 Thread Michael Segel
Are you doing scans or are you doing get() with a known key? There's a big difference and scans are very expensive. You also don't talk about your hardware. How much memory, how many cores per node, how you have your m/r configured (even if you're not running a m/r job, you still have to

RE: HBase backup and outage scenarios in practice?

2011-07-15 Thread Michael Segel
for the backup question. On 7/14/11 12:14 PM, Michael Segel michael_se...@hotmail.com wrote: Not sure what you read in Otis' bog but pretty ssure it's out of date. Check out MapR stuff. Sent from my Palm Pre on ATamp;T On Jul 14, 2011 6:57 AM, Steinmaurer Thomas lt

RE: Hash indexing of HFiles

2011-07-15 Thread Michael Segel
Claudio, I'm not sure on how to answer this... Yes, we've got a prototype of a Lucene on HBase w Spatial that we're starting to test. With respect to hashing... In one project we just hashed the key using the SHA-1 hash already in Java. This gave us the randomness without having to try to

Re: HBase backup and outage scenarios in practice?

2011-07-14 Thread Michael Segel
Not sure what you read in Otis' bog but pretty ssure it's out of date. Check out MapR stuff. Sent from my Palm Pre on ATamp;T On Jul 14, 2011 6:57 AM, Steinmaurer Thomas lt;thomas.steinmau...@scch.atgt; wrote: Hello, we are currently evaluating HBase for a project. In respect to

RE: descaling hbase

2011-06-28 Thread Michael Segel
Yeah, but you don't want to drop all of the machines at the same time. When you decommission a node, you need to give the cluster time to rebalance before dropping a second node. That is of course if you don't mind losing any data. :-) Date: Tue, 28 Jun 2011 10:33:39 -0700 Subject: Re:

RE: Does anybody enable MSLAB in production system? I am not sure if it's stable enough for production system?

2011-06-25 Thread Michael Segel
Yes, its stable enough. Date: Sat, 25 Jun 2011 02:09:06 + From: jack.zhangj...@huawei.com Subject: Does anybody enable MSLAB in production system? I am not sure if it's stable enough for production system? To: hbase-u...@hadoop.apache.org

RE: How to count the number of columns in a row

2011-06-22 Thread Michael Segel
Unfortunately no. Columns may or may not exist and its on a per row basis. You can write a simple map job (No reduce) to use dynamic counters to determine the unique names of the columns and in how many rows these columns exist. You could also keep track of the number of columns per row, the

RE: Hbase Hardware requirement

2011-06-07 Thread Michael Segel
And even that recommendation isn't right. ;-) I think Sandy Bridge and SolarFlare are changing some of the design considerations. Date: Tue, 7 Jun 2011 10:32:58 +0200 Subject: Re: Hbase Hardware requirement From: timrobertson...@gmail.com To: user@hbase.apache.org

RE: How to efficiently join HBase tables?

2011-06-02 Thread Michael Segel
the cartesian product. This allows you to inject whatever cleverness you need at this point. Common kinds of cleverness include down-sampling of problematically large sets of candidates. On Tue, May 31, 2011 at 11:56 AM, Michael Segel michael_se...@hotmail.comwrote

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
Eran, You want to join two tables? The short answer is to use a relational database to solve that problem. Longer answer: You're using HBase so you don't need to think in terms of a reducer. You can create a temp table for your query. You can then run one map job to scan and filter table A,

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
for the lookups instead. So you'd hold onto a batch of records in the Mapper and then the batch size is filled, then you do the lookups (and then any required emitting, etc.). -Original Message- From: Michael Segel [mailto:michael_se...@hotmail.com] Sent: Tuesday, May 31, 2011 10:56

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
From: doug.m...@explorysmedical.com To: user@hbase.apache.org Date: Tue, 31 May 2011 15:39:14 -0400 Subject: RE: How to efficiently join HBase tables? Re: Didn't see a multi-get... This is what I'm talking about...

RE: HBase Transaction per second in Map-Reduce

2011-05-24 Thread Michael Segel
Himanish, Are we talking about an African or European Swallow? (Sorry its a reference to the Monty Python movie scene where they cross the bridge after being asked 3 questions which they must answer correctly? [What's the forward air speed velocity of an unladen swallow?]) The point is that

RE: HBase Transaction per second in Map-Reduce

2011-05-24 Thread Michael Segel
Sorry, Its been one of those days. From: michael_se...@hotmail.com To: user@hbase.apache.org Subject: RE: HBase Transaction per second in Map-Reduce Date: Tue, 24 May 2011 14:18:28 -0500 Himanish, Are we talking about an African or European

RE: mslab enabled jvm crash

2011-05-23 Thread Michael Segel
Besides this... JRE version: 6.0_17-b17 Just a silly question ... What happens if you double the zookeeper time out to 120 seconds? Also I'm going to assume that you're not running your ZK on the same nodes as your data nodes, but you know what they say about assumptions... From:

RE: How to speedup Hbase query throughput

2011-05-17 Thread Michael Segel
Sorry to jump in on the tail end. What do you mean to say that they key is generated randomly? I mean are you using a key and then applying a SHA-1 hash? Which node is serving your -ROOT- and META tables? Have you applied the GC hints recommended by Todd L in his blog? Also you said: ' And

RE: Hardware configuration

2011-05-02 Thread Michael Segel
Hi, That's actually a really good question. Unfortunately, the answer isn't really simple. You're going to need to estimate your growth and you're going to need to estimate your configuration. Suppose I know that within 2 years, the amount of data that I want to retain is going to be 1PB,

RE: Hardware configuration

2011-05-02 Thread Michael Segel
a 50TB machine take, a day a week, longer? /Ian Architect / Mgr - Novell Vibe On 05/02/2011 09:57 AM, Michael Segel wrote: Hi, That's actually a really good question. Unfortunately, the answer isn't really simple. You're going to need to estimate your growth and you're going

RE: one of our datanodes stops working after few hours

2011-05-01 Thread Michael Segel
What's your xceivers set to? What's the ulimit -n set for hdfs/hadoop user... (You didn't say which release/version you were using.) Date: Sun, 1 May 2011 17:47:18 -0700 Subject: one of our datanodes stops working after few hours From: magn...@gmail.com To: user@hbase.apache.org I

RE: Impacting the Hbase load balancer

2011-04-21 Thread Michael Segel
Felix, You're going to want to upgrade to CDH3u0 for two main reasons 1) There was a bug in the HBase Load Balancer that was fixed in 90.2 but Todd said that they were going to back port it to 90.1 2) There's another bug in the WAL that is fixed in 0.90 (CDH3B4) 0.89 is much better than

<    1   2   3   4   5   6   >