Re: 2 bucket caches?

2015-06-29 Thread Michael Segel
are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: How to make the client fast fail

2015-06-22 Thread Michael Segel
to be thrown from the HBase client itself. On Thu, Jun 11, 2015 at 5:16 AM, Michael Segel wrote: threads? So that regardless of your hadoop settings, if you want something faster, you can use one thread for a timer and then the request is in another. So if you hit your timeout before you

Re: Fix Number of Regions per Node ?

2015-06-22 Thread Michael Segel
etc for any table created ? Thanks, Rahul -- Thanks Regards, Anil Gupta The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: How to make the client fast fail

2015-06-16 Thread Michael Segel
time out instead of doing this I want the timeout or some exception to be thrown from the HBase client itself. On Thu, Jun 11, 2015 at 5:16 AM, Michael Segel michael_se...@hotmail.com wrote: threads? So that regardless of your hadoop settings, if you want something faster, you can use one

Re: How to make the client fast fail

2015-06-16 Thread Michael Segel
. On Thu, Jun 11, 2015 at 5:16 AM, Michael Segel wrote: threads? So that regardless of your hadoop settings, if you want something faster, you can use one thread for a timer and then the request is in another. So if you hit your timeout before you get a response, you can stop your thread

Re: Hbase: TransactionManager: Create table

2015-06-12 Thread Michael Segel
TM == Trade Mark On Jun 12, 2015, at 11:55 AM, hariharan_sethura...@dell.com hariharan_sethura...@dell.com wrote: The article starts with Apache HBase (TM)) - does is stand for Transaction Manager? Apache HBase (TM) is not an ACID compliant database ... -Original Message-

Re: Iterate hbase resultscanner

2015-06-10 Thread Michael Segel
When in doubt, printf() can be your friend. Yeah its primitive (old school) but effective. Then you will know what you’re adding to your list for sure. On Jun 10, 2015, at 12:39 PM, beeshma r beeshm...@gmail.com wrote: HI Devaraj Thanks for your suggestion. Yes i coded like this as

Re: How to make the client fast fail

2015-06-10 Thread Michael Segel
threads? So that regardless of your hadoop settings, if you want something faster, you can use one thread for a timer and then the request is in another. So if you hit your timeout before you get a response, you can stop your thread. (YMMV depending on side effects… ) On Jun 10, 2015, at

Re: Hbase vs Cassandra

2015-06-01 Thread Michael Segel
Well since you brought up coprocessors… lets talk about a lack of security and stability that’s been introduced by coprocessors. ;-) I’m not saying that you don’t want server side extensibility, but you need to recognize the risks introduced by coprocessors. On May 31, 2015, at 3:32 PM,

Re: Hbase vs Cassandra

2015-06-01 Thread Michael Segel
Saying Ambari rules is like saying that you like to drink MD 20/20 and calling it a fine wine. Sorry to all the Hortonworks guys but Amabari has a long way to go…. very immature. What that has to do with Cassandra vs HBase? I haven’t a clue. The key issue is that unless you need or want to

Re: Hbase vs Cassandra

2015-06-01 Thread Michael Segel
not a stand alone product or system. Hello, what is use case of a big data application w/o Hadoop? -Vlad On Mon, Jun 1, 2015 at 2:26 PM, Michael Segel michael_se...@hotmail.com wrote: Saying Ambari rules is like saying that you like to drink MD 20/20 and calling it a fine wine. Sorry

Re: avoiding hot spot for timestamp prefix key

2015-05-22 Thread Michael Segel
This is why I created HBASE-12853. So you don’t have to specify a custom split policy. Of course the simple solutions are often passed over because of NIH. ;-) To be blunt… You encapsulate the bucketing code so that you have a single API in to HBase regardless of the type of storage

Re: Optimizing compactions on super-low-cost HW

2015-05-22 Thread Michael Segel
Look, to be blunt, you’re screwed. If I read your cluster spec.. it sounds like you have a single i7 (quad core) cpu. That’s 4 cores or 8 threads. Mirroring the OS is common practice. Using the same drives for Hadoop… not so good, but once the sever boots up… not so much I/O. Its not good,

Re: Getting intermittent errors while insertind data into HBase

2015-05-21 Thread Michael Segel
Why spring? Why a DAO? I’m not suggesting that using Spring or a DAO is wrong, however, you really should justify it. Since it looks like you’re trying to insert sensor data (based on the naming convention), what’s the velocity of the inserts? Are you manually flushing commits or are you

Re: Scan vs Get

2015-05-19 Thread Michael Segel
C’mon, really? Do they really return the same results? Let me put it this way… are you walking through the same code path? On May 19, 2015, at 10:34 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Are not Scan and Gets supposed to be almost as fast? I have a pretty small

Re: MR against snapshot causes High CPU usage on Datanodes

2015-05-13 Thread Michael Segel
Without knowing your exact configuration… The High CPU may be WAIT IOs, which would mean that you’re cpu is waiting for reads from the local disks. What’s the ratio of cores (physical) to disks? What type of disks are you using? That’s going to be the most likely culprit. On May 13,

Re: MR against snapshot causes High CPU usage on Datanodes

2015-05-13 Thread Michael Segel
version if I can find something. cores / disks == 24 / 12 or 40 / 12. We are using 10K sata drives on our datanodes. Rahul On Wed, May 13, 2015 at 10:00 AM, Michael Segel michael_se...@hotmail.com wrote: Without knowing your exact configuration… The High CPU may be WAIT IOs

Re: Regions and Rowkeys

2015-05-12 Thread Michael Segel
Yeah, its about time. What a slacker! :-P On May 11, 2015, at 6:56 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: This? http://shop.oreilly.com/product/0636920033943.do 2015-05-11 18:55 GMT-04:00 Michael Segel michael_se...@hotmail.com: Why would you expect to have a region

Re: Mapping Over Cells

2015-05-11 Thread Michael Segel
at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: Regions and Rowkeys

2015-05-11 Thread Michael Segel
, Arun The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: How to Restore the block locality of a RegionServer ?

2015-05-09 Thread Michael Segel
thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: MapReduce on Sanpshots

2015-05-08 Thread Michael Segel
files. I think that says it all. Do you really want to open up your HBase snapshots to anyone? The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: RowKey hashing in HBase 1.0

2015-05-06 Thread Michael Segel
working from different assumptions? On Tue, May 5, 2015 at 4:46 PM, Michael Segel michael_se...@hotmail.com wrote: Yes, what you described mod(hash(rowkey),n) where n is the number of regions will remove the hotspotting issue. However, if your key is sequential you will only have regions

Re: RowKey hashing in HBase 1.0

2015-05-05 Thread Michael Segel
situation where I don't need range scans. For example, let's say my key value is a person's last name. That will naturally cluster around certain letters, giving me an uneven distribution. --Jeremy On Sun, May 3, 2015 at 11:46 AM, Michael Segel michael_se...@hotmail.com wrote: Yes

Re: Right value for hbase.rpc.timeout

2015-05-05 Thread Michael Segel
situation to retest it. On Thu, Apr 30, 2015 at 3:56 PM Michael Segel michael_se...@hotmail.com wrote: There is no single ‘right’ value. As you pointed out… some of your Mapper.map() iterations are taking longer than 60 seconds. The first thing is to determine why that happens

Re: HBase Questions

2015-05-03 Thread Michael Segel
with this approach in future? First of all, Is this approach correct? Thanks, Arun The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: HBase Filesystem Adapter

2015-05-03 Thread Michael Segel
reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: RowKey hashing in HBase 1.0

2015-05-03 Thread Michael Segel
they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: Hbase row ingestion ..

2015-04-30 Thread Michael Segel
I wouldn’t call storing attributes in separate columns a ‘rigid schema’. You are correct that you could write your data as a CLOB/BLOB and store it in a single cell. The upside is that its more efficient. The downside is that its really an all or nothing fetch and then you need to write the

Re: HBase Filesystem Adapter

2015-04-30 Thread Michael Segel
I would look at a different solution than HBase. HBase works well because its tied closely to the HDFS and Hadoop ecosystem. Going outside of this… too many headaches and you’d be better off with a NoSQL engine like Cassandra or Riak, or something else. On Apr 30, 2015, at 8:35 AM, Buğra

Re: Hbase row ingestion ..

2015-04-30 Thread Michael Segel
than ours was, but it may be helpful to hear our experience with row key design http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-real-performance-gains-with-real-time-data.html James On Apr 30, 2015, at 7:51 AM, Michael Segel michael_se

Re: Hbase row ingestion ..

2015-04-30 Thread Michael Segel
cleaner (we currently create a scan with our predicate for each bucket, and then push all of those to MultiTableInputFormat). Best, Andrew On 4/30/15 12:36 PM, Michael Segel wrote: The downside here is that you will lose your ability to perform range scans The opinions expressed

Re: Is it safe to set hbase.coprocessor.abortonerror to false on produce environment?

2015-04-30 Thread Michael Segel
at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: Right value for hbase.rpc.timeout

2015-04-30 Thread Michael Segel
There is no single ‘right’ value. As you pointed out… some of your Mapper.map() iterations are taking longer than 60 seconds. The first thing is to determine why that happens. (It could be normal, or it could be bad code on your developers part. We don’t know.) The other thing is that if

Re: Predictive Caching

2015-04-23 Thread Michael Segel
if anyone knows of any related work in this area. Thoughts and suggestions welcome. Thanks, Ayya The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: Rowkey design question

2015-04-17 Thread Michael Segel
and thanks for the tip! :-) On Wed, Apr 8, 2015 at 1:45 PM, Michael Segel michael_se...@hotmail.com wrote: Ok… First, I’d suggest you rethink your schema by adding an additional dimension. You’ll end up with more rows, but a narrower table. In terms of compaction… if the data

Re: Rowkey design question

2015-04-11 Thread Michael Segel
apurt...@apache.org To: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, April 9, 2015 4:53 PM Subject: Re: Rowkey design question On Thu, Apr 9, 2015 at 2:26 PM, Michael Segel michael_se...@hotmail.com wrote: Hint: You could have sandboxed the end user code which makes it a lot

Re: Rowkey design question

2015-04-09 Thread Michael Segel
that would do the same calculation on its own. On Thu, Apr 9, 2015 at 4:43 AM, Michael Segel michael_se...@hotmail.com wrote: When you say coprocessor, do you mean HBase coprocessors or do you mean a physical hardware coprocessor? In terms of queries… HBase can perform a single get

Re: Rowkey design question

2015-04-09 Thread Michael Segel
. On Thu, Apr 9, 2015 at 5:05 AM, Michael Segel michael_se...@hotmail.com wrote: Ok… Coprocessors are poorly implemented in HBase. If you work in a secure environment, outside of the system coprocessors… (ones that you load from hbase-site.xml) , you don’t want to use them. (The coprocessor code

Re: Rowkey design question

2015-04-08 Thread Michael Segel
unaware of? On Wed, Apr 8, 2015 at 7:43 PM, Michael Segel michael_se...@hotmail.com wrote: I think you misunderstood. The suggestion was to put the data in to HDFS sequence files and to use HBase to store an index in to the file. (URL to the file, then offset in to the file

Re: HBase region assignment by range?

2015-04-08 Thread Michael Segel
at 4:41 AM, Michael Segel michael_se...@hotmail.com wrote: Is your table staic? If you know your data and your ranges, you can do it. However as you add data to the table, those regions will eventually split. The other issue that you brought up is that you want to do ‘local’ joins

Re: HBase region assignment by range?

2015-04-08 Thread Michael Segel
for your the problem you are trying to solve is HBASE-10576 by tweaking it a little. cheers, esteban. -- Cloudera, Inc. On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel michael_se...@hotmail.com wrote: Is your table staic? If you know your data and your ranges, you can

Re: Rowkey design question

2015-04-08 Thread Michael Segel
column qualifier. Yes, this is not possible if HBase loads the whole 500MB each time i want to perform this custom query on a row. Hence my question :-) On Tue, Apr 7, 2015 at 11:03 PM, Michael Segel michael_se...@hotmail.com wrote: Sorry, but your initial problem statement

Re: HBase region assignment by range?

2015-04-08 Thread Michael Segel
The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: Rowkey design question

2015-04-08 Thread Michael Segel
actual values (bigger qualifiers) outside HBase. Keeping them in Hadoop why not? Pulling hot ones out on SSD caches would be an interesting solution. And quite a bit simpler. Good call and thanks for the tip! :-) On Wed, Apr 8, 2015 at 1:45 PM, Michael Segel michael_se...@hotmail.com wrote

Re: write availability

2015-04-07 Thread Michael Segel
for a while? The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: Rowkey design question

2015-04-07 Thread Michael Segel
into a direct ByteBuffer) ? Cheers, -Kristoffer The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: How to Manage Data Architecture Modeling for HBase

2015-04-06 Thread Michael Segel
, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: How to Manage Data Architecture Modeling for HBase

2015-04-06 Thread Michael Segel
, Including some insert and some update, this scenario is appropriate to use hbase to build data worehose? 2) Is there some case about Enterprise BI Solutions with HBASE? thanks. Regards, Ben Liang On Apr 6, 2015, at 20:27, Michael Segel michael_se...@hotmail.com wrote: Yeah

Re: How to Manage Data Architecture Modeling for HBase

2015-04-06 Thread Michael Segel
loading every day , Including some insert and some update, this scenario is appropriate to use hbase to build data worehose? 2) Is there some case about Enterprise BI Solutions with HBASE? thanks. Regards, Ben Liang On Apr 6, 2015, at 20:27, Michael Segel michael_se

Re: introducing nodes w/ more storage

2015-04-03 Thread Michael Segel
balancer needs to be run, especially in multi-tenant clusters with archive data. It is best to immediately run a major compaction to restore HBase locality if the HDFS balancer is used. On Mon, Mar 23, 2015 at 10:50 AM, Michael Segel michael_se...@hotmail.com wrote: @lars, How does

Re: introducing nodes w/ more storage

2015-04-02 Thread Michael Segel
used to represent a best practice. In many cases the HDFS balancer needs to be run, especially in multi-tenant clusters with archive data. It is best to immediately run a major compaction to restore HBase locality if the HDFS balancer is used. On Mon, Mar 23, 2015 at 10:50 AM, Michael Segel

Re: Recovering from corrupt blocks in HFile

2015-03-23 Thread Michael Segel
the other blocks except the final one had a size of 67108864 as well. HDFS considered both versions of the block to be corrupt, but at one point I did replace the truncated data on the one node with the full-length data (to no avail). -md On Thu, Mar 19, 2015 at 6:49 PM, Michael Segel

Re: introducing nodes w/ more storage

2015-03-23 Thread Michael Segel
the volumes will be filled. This even though legacy nodes have 5 volumes and total storage of 5X TB. Fact or fantasy? Thanks, Ted The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel

Re: manual merge

2015-03-23 Thread Michael Segel
are currently on HBase 0.98.6 (CDH 5.3.0) Thanks, Abe The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: manual merge

2015-03-23 Thread Michael Segel
, Michael Segel michael_se...@hotmail.com wrote: Hi, I’m trying to understand your problem. You pre-split your regions to help with some load balancing on the load. Ok. So how did you calculate the number of regions to pre-split? You said that the number of regions has grown. How were

Re: How to remove a Column Family Property

2015-03-19 Thread Michael Segel
Copy the table, drop original, rename copy. On Mar 19, 2015, at 3:46 AM, Pankaj kr pankaj...@huawei.com wrote: Thanks for the reply Ashish. I can set EMPTY or NONE value using alter command. alter 't1', {NAME = 'cf1', ENCRYPTION = ''} alter 't1', {NAME = 'cf1', ENCRYPTION =

Re: introducing nodes w/ more storage

2015-03-19 Thread Michael Segel
, Ted The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: Recovering from corrupt blocks in HFile

2015-03-19 Thread Michael Segel
back. - Piet Hein (via Tom White) The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: Splitting up an HBase Table into partitions

2015-03-18 Thread Michael Segel
- a means of creating splits based on regions, without having to iterate over all rows in the table through the client API. Do you have any idea how I might achieve this? Thanks, On Tuesday, March 17, 2015, Michael Segel michael_se...@hotmail.com wrote: Hbase doesn't have partitions. It has

Re: Standalone == Dev Only?

2015-03-16 Thread Michael Segel
, 2015, at 3:44 PM, Sean Busbey bus...@cloudera.com wrote: On Fri, Mar 13, 2015 at 2:41 PM, Michael Segel michael_se...@hotmail.com wrote: In stand alone, you’re writing to local disk. You lose the disk you lose the data, unless of course you’ve raided your drives. Then when you lose

Re: HBase Question

2015-03-13 Thread Michael Segel
-- Abraham Tom Email: work2m...@gmail.com Phone: 415-515-3621 The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: Standalone == Dev Only?

2015-03-13 Thread Michael Segel
On 3/13/15, 1:46 PM, Michael Segel michael_se...@hotmail.com mailto:michael_se...@hotmail.com wrote: Guys, More than just needing some love. No HDFS… means data at risk. No HDFS… means that stand alone will have security issues. Patient Data? HINT: HIPPA. Please think your design

Re: Standalone == Dev Only?

2015-03-08 Thread Michael Segel
reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-08 Thread Michael Segel
install the thrift server locally on every C++ client machine? I'd imagine performance should be similar to native java performance at that point. -Mike On Sat, Mar 7, 2015 at 4:49 PM, Michael Segel michael_se...@hotmail.com wrote: Or you could try a java connection wrapped by JNI so you

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-07 Thread Michael Segel
don't expect any significant difference between Thrift(C++) and Java. Any ideas? Many thanks Demai The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: Dealing with data locality in the HBase Java API

2015-03-05 Thread Michael Segel
The better answer is that you don’t worry about data locality. Its becoming a moot point. On Mar 4, 2015, at 12:32 PM, Andrew Purtell apurt...@apache.org wrote: Spark supports creating RDDs using Hadoop input and output formats (

Re: Dealing with data locality in the HBase Java API

2015-03-05 Thread Michael Segel
) The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: HBase scan time range, inconsistency

2015-02-26 Thread Michael Segel
, it dumps them to a directory in hdfs. -- Sean The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com

Re: HBase Region always in transition + corrupt HDFS

2015-02-23 Thread Michael Segel
On Feb 23, 2015, at 1:47 AM, Arinto Murdopo ari...@gmail.com wrote: We're running HBase (0.94.15-cdh4.6.0) on top of HDFS (Hadoop 2.0.0-cdh4.6.0). For all of our tables, we set the replication factor to 1 (dfs.replication = 1 in hbase-site.xml). We set to 1 because we want to minimize the

Re: data partitioning and data model

2015-02-23 Thread Michael Segel
Hi, Yes you would want to start your key by user_id. But you don’t need the timestamp. The user_id + alert_id should be enough on the key. If you want to get fancy… If your alert_id is not a number, you could use the EPOCH - Timestamp as a way to invert the order of the alerts so that the

Re: HBase Region always in transition + corrupt HDFS

2015-02-23 Thread Michael Segel
, Feb 23, 2015 at 12:25 PM, Michael Segel mse...@segel.com wrote: On Feb 23, 2015, at 1:47 AM, Arinto Murdopo ari...@gmail.com wrote: We're running HBase (0.94.15-cdh4.6.0) on top of HDFS (Hadoop 2.0.0-cdh4.6.0). For all of our tables, we set the replication factor to 1 (dfs.replication = 1

Re: data partitioning and data model

2015-02-23 Thread Michael Segel
Yes and no. Its a bit more complicated and it is also data dependent and how you’re using the data. I wouldn’t go too thin and I wouldn’t go to fat. On Feb 20, 2015, at 2:19 PM, Alok Singh aloksi...@gmail.com wrote: You don't want a lot of columns in a write heavy table. HBase stores

Re: Low CPU usage and slow reads in pseudo-distributed mode - how to fix?

2015-01-11 Thread Michael Segel
@Ted, Pseudo cluster on a machine that has 4GB of memory. If you give HBase 1.5GB for the region server… you are left with 2.5 GB of memory for everything else. You will swap. In short, nothing he can do will help. He’s screwed if he is trying to look improving performance. On Jan 11,

Re: Store aggregates in HBase

2015-01-11 Thread Michael Segel
Storing aggregates on its own? No. Storing aggregates of a data set that is the primary target? Sure. Why not? On Jan 9, 2015, at 9:00 PM, Buntu Dev buntu...@gmail.com wrote: I got a CDH cluster with data being ingested via Flume to store in HDFS as Avro. Currently, I query the dataset using

Re: 1 vs. N CFs, dense vs. sparse CFs, flushing

2015-01-08 Thread Michael Segel
Guys, You have two issues. 1) Physical structure and organization. 2) Logical organization and data usage. This goes to the question of your data access pattern and use case. The best example of how to use Column Families that I can think of is an order entry system. Here you would have

Re: Newbie Question about 37TB binary storage on HBase

2014-12-01 Thread Michael Segel
… It was a mess so we never looked back. And of course the client was/is a java shop. So Java is the first choice. Just my $0.02 cents On Dec 1, 2014, at 2:41 PM, Aleks Laz al-userhb...@none.at wrote: Dear Michael. Am 29-11-2014 23:49, schrieb Michael Segel: Guys, KISS. You can use

Re: Replacing a full Row content in HBase

2014-11-20 Thread Michael Segel
Hi, Lets take a step back… OP’s initial goal is to replace all of the fields/cells on a row at the same time. Thought about doing a delete prior to the put(). Is now a good time to remind people about what happens during a delete and how things can happen out of order? And should we talk

Re: I'm studying hbase with php, and I wonder getRow guarantee sequential order.

2014-11-11 Thread Michael Segel
Not sure of the question. A scan will return multiple rows in sequential order. Note that its sequential byte stream order. The columns will also be in sequential order as well… So if you have a set of column named as ‘foo’+timestamp then for each column in the set of foo, it will be in

Re: OOM when fetching all versions of single row

2014-11-03 Thread Michael Segel
St.Ack, I think you're side stepping the issue concerning schema design. Since HBase isn't my core focus, I also have to ask since when has heap sizes over 16GB been the norm? (Really 8GB seems to be quite a large heap size... ) On Oct 31, 2014, at 11:15 AM, Stack st...@duboce.net wrote:

Re: OOM when fetching all versions of single row

2014-11-03 Thread Michael Segel
rows with less versions each, instead of these fat rows. While not exactly the same, you might be able to use TTL or your own purge job to keep the number of rows limited. On Mon, Nov 3, 2014 at 2:02 PM, Michael Segel mse...@segel.com wrote: St.Ack, I think you're side stepping the issue

Re: OOM when fetching all versions of single row

2014-10-31 Thread Michael Segel
Here’s the simple answer. Don’t do it. They way you are abusing versioning is a bad design. Redesign your schema. On Oct 30, 2014, at 10:20 AM, Andrejs Dubovskis dubis...@gmail.com wrote: Hi! We have a bunch of rows on HBase which store varying sizes of data (1-50MB). We use HBase

Re: Upgrading a coprocessor

2014-10-30 Thread Michael Segel
is upgrade the coprocessor in the Standby and then swap the clusters. But since you would have to stand up a second HBase cluster, this may be a non-starter for you. Just another option thrown into the mix. :) On Wed Oct 29 2014 at 12:07:02 PM Michael Segel mse...@segel.com wrote: Well you

Re: Upgrading a coprocessor

2014-10-29 Thread Michael Segel
Well you could redesign your cp. There is a way to work around the issue by creating a cp that's really a framework and then manage the cps in a different jvm(s) using messaging between the two. So if you want to reload or restart your cp, you can do it outside of the RS. Its a bit more

Re: A use case for ttl deletion?

2014-09-30 Thread Michael Segel
OP wants to know good use cases where to use ttl setting. Answer: Any situation where the cost of retaining the data exceeds the value to be gained from the data. Using ttl allows for automatic purging of data. Answer2: Any situation where you have to enforce specific retention policies

Re: Adding 64-bit nodes to 32-bit cluster?

2014-09-19 Thread Michael Segel
You need to create two sets of Hadoop configurations and deploy them to the correct nodes. Yarn was supposed to be the way to heterogenous clusters. But this begs the question. Why on earth did you have a 32 bit cluster to begin with? On Sep 16, 2014, at 1:13 AM, Esteban Gutierrez

Re: Nested data structures examples for HBase

2014-09-12 Thread Michael Segel
. and this would again be a different discussion.) HTH -Mike On Sep 10, 2014, at 10:25 PM, Wilm Schumacher wilm.schumac...@cawoom.com wrote: Am 10.09.2014 um 22:25 schrieb Michael Segel: Ok, but here’s the thing… you extrapolate the design out… each column with a subordinate record

Re: Scan vs Parallel scan.

2014-09-12 Thread Michael Segel
Lets take a step back…. Your parallel scan is having the client create N threads where in each thread, you’re doing a partial scan of the table where each partial scan takes the first and last row of each region? Is that correct? On Sep 12, 2014, at 7:36 AM, Guillermo Ortiz

Re: Scan vs Parallel scan.

2014-09-12 Thread Michael Segel
) { results.add(result); } connection.close(); table.close(); return results; } They implement Callable. 2014-09-12 9:26 GMT+02:00 Michael Segel michael_se...@hotmail.com: Lets take a step back…. Your parallel scan is having the client create N

Re: Scan vs Parallel scan.

2014-09-12 Thread Michael Segel
and they could compete for resources (network, etc..) on this node. It'd be better to have one thread for RS. But, that doesn't answer your questions. I keep thinking... 2014-09-12 9:40 GMT+02:00 Michael Segel michael_se...@hotmail.com: Hi, I wanted to take a step back from the actual code

Re: Scan vs Parallel scan.

2014-09-12 Thread Michael Segel
14:48 GMT+02:00 Michael Segel michael_se...@hotmail.com: Ok, lets again take a step back… So you are comparing your partial scan(s) against a full table scan? If I understood your question, you launch 3 partial scans where you set the start row and then end row of each scan, right

Re: Nested data structures examples for HBase

2014-09-10 Thread Michael Segel
Because you really don’t want to do that since you need to keep the number of CFs low. Again, you can store the data within the structure and index it. On Sep 10, 2014, at 7:17 AM, Wilm Schumacher wilm.schumac...@cawoom.com wrote: as stated above you can use JSON or something similar, which

Re: Nested data structures examples for HBase

2014-09-10 Thread Michael Segel
wrote: Am 10.09.2014 um 17:33 schrieb Michael Segel: Because you really don’t want to do that since you need to keep the number of CFs low. in my example the number of CFs is 1. So this is not a problem. Best wishes, Wilm

Re: Nested data structures examples for HBase

2014-09-09 Thread Michael Segel
You do realize that everything you store in Hbase are byte arrays, right? That is each cell is a blob. So you have the ability to create nested structures like… JSON records? ;-) So to your point. You can have a column A which represents a set of values. This is one reason why you shouldn’t

Re: HBase - Performance issue

2014-09-09 Thread Michael Segel
So you have large RS and you have large regions. Your regions are huge relative to your RS memory heap. (Not ideal.) You have slow drives (5400rpm) and you have 1GbE network. Do didn’t say how many drives per server. Under load, you will saturate your network with just 4 drives. (Give or

Re: One-table w/ multi-CF or multi-table w/ one-CF?

2014-09-09 Thread Michael Segel
that. With the setLoadColumnFamiliesOnDemand I learned from Ted, looks like the performance should be similar. Am I missing something? Please enlighten me. Jianshi On Mon, Sep 8, 2014 at 3:41 AM, Michael Segel michael_se...@hotmail.com wrote: I would suggest rethinking column families

Re: Nested data structures examples for HBase

2014-09-09 Thread Michael Segel
that determination after having carefully considered the extent of the mismatch. 2014-09-09 13:37 GMT-07:00 Michael Segel michael_se...@hotmail.com: You do realize that everything you store in Hbase are byte arrays, right? That is each cell is a blob. So you have the ability to create

Re: One-table w/ multi-CF or multi-table w/ one-CF?

2014-09-07 Thread Michael Segel
is mostly in mapreduce jobs. Jianshi On Sun, Sep 7, 2014 at 4:52 AM, Michael Segel michael_se...@hotmail.com wrote: Again, a silly question. Why are you using column families? Just to play devil’s advocate in terms of design, why are you not treating your row as a record

Re: One-table w/ multi-CF or multi-table w/ one-CF?

2014-09-06 Thread Michael Segel
Again, a silly question. Why are you using column families? Just to play devil’s advocate in terms of design, why are you not treating your row as a record? Think hierarchal not relational. This really gets in to some design theory. Think Column Family as a way to group data that has the

Re: HBase - Performance issue

2014-09-06 Thread Michael Segel
What type of drives. controllers, and network bandwidth do you have? Just curious. On Sep 6, 2014, at 7:37 PM, kiran kiran.sarvabho...@gmail.com wrote: Also the hbase version is 0.94.1 On Sun, Sep 7, 2014 at 12:00 AM, kiran kiran.sarvabho...@gmail.com wrote: Lars, We are facing a

  1   2   3   4   5   6   >