hbase uniformsplit for non hex keys

2016-05-31 Thread Shushant Arora
1.Can I use Uniform split for non hex keys? 2.if yes, how to specify key range for split. 3.If no then whats the diff between HexSplit and Uniform Split. Thanks!

Re: hbase get and mvcc

2016-05-16 Thread Shushant Arora
thanks! Does puts which fall inside readpoint of ongoing scan/get are preserved in HFile also or only in memstore and it blocks memstore flush until all ongoing scans are completed. On Tue, May 17, 2016 at 5:31 AM, Stack <st...@duboce.net> wrote: > On Mon, May 16, 2016 at 4:55 PM,

hbase get and mvcc

2016-05-16 Thread Shushant Arora
Hi Hbase uses MVCC for achieving consistent result for Get operations . To achieve MVCC it has to maintain multiple versions of same row/cells . How many max version of a row/cell does Hbase keeps at any time to support MVCC. Since say multiple gets started one after the other and has not

hfile v2 and bloomfilter

2016-05-15 Thread Shushant Arora
In Hfile v2 block level blommfilters are stored inb scanned section along with data block and leaf index. Load on open section contains bloomfilter data . Whats this bloom filter data? 1.Does it contains index of bloomchunks stored in scanned section ? 2.What does meta blocks of non scanned

hbase block and columnfamily of a row

2016-05-14 Thread Shushant Arora
can a hbase table with single column family hve its row spawned on multiple blocks in a same HFile ? Suppose there is only one hfile in that case is it possile a column family having 5-6 columns is spawned on multiple blocks ? or its always block is closed at max( 64k default or when all columns

hbase zookeeper lag

2016-05-14 Thread Shushant Arora
Hi Hbase uses zookeeper for various purposes. e.g for region split. Regionserver creates a znode in zookeeper with splitting state and master gets notification of this directory , since zookeeper is not fully consistent - there may be lag between actual directory creation and notification till

Re: hbase architecture doubts

2016-05-09 Thread Shushant Arora
4.Can same row be in 2 blocks in Hfile. One cell in block 1 and another in block2 ? On Mon, May 9, 2016 at 4:57 PM, Shushant Arora <shushantaror...@gmail.com> wrote: > Thanks! > > 1.Will write take lock on all the column families or just the column > family being affected b

Re: hbase architecture doubts

2016-05-09 Thread Shushant Arora
. Now if a new inmemory row comes will it evict from inmemory or single access area ? 3.Why block cache is single per regionserver. Why not single per region. On Sun, May 8, 2016 at 11:43 PM, Stack <st...@duboce.net> wrote: > On Sun, May 8, 2016 at 6:12 AM, Shushant Arora <

Re: hbase architecture doubts

2016-05-08 Thread Shushant Arora
previous to put took lock. Memstore is implemented as CSLM so how does it return the row state previous to put lock when get is fired before put is finished? On Tue, May 3, 2016 at 7:41 AM, Stack <st...@duboce.net> wrote: > On Mon, May 2, 2016 at 5:34 PM, Shushant Arora <shushantaror.

hbase doubts

2016-05-05 Thread Shushant Arora
1.Why is it better to have single file per region than multiple files for read performance. Why can't multile threads read multiple file and give better performance? 2Does hbase regionserver has single thread for compactions and split for all regions its holding? Why can't single thread per

Re: hbase architecture doubts

2016-05-02 Thread Shushant Arora
, 2016 at 12:05 AM, Stack <st...@duboce.net> wrote: > On Mon, May 2, 2016 at 10:06 AM, Shushant Arora <shushantaror...@gmail.com > > > wrote: > > > Thanks Stack > > > > for point 2 : > > I am concerned with downtime of Hbase for read and write. > >

Re: hbase architecture doubts

2016-05-02 Thread Shushant Arora
to Hfile won't we loose the update? On Mon, May 2, 2016 at 9:06 PM, Stack <st...@duboce.net> wrote: > On Mon, May 2, 2016 at 1:25 AM, Shushant Arora <shushantaror...@gmail.com> > wrote: > > > Thanks! > > > > Few doubts; > > > > 1.LSM tree compris

Re: hbase architecture doubts

2016-05-02 Thread Shushant Arora
d read is allowed using snapshot. Thanks! On Mon, May 2, 2016 at 11:39 AM, Stack <st...@duboce.net> wrote: > On Sun, May 1, 2016 at 3:36 AM, Shushant Arora <shushantaror...@gmail.com> > wrote: > > > 1.Does Hbase uses ConcurrentskipListMap(CSLM) to store data in mem

hbase architecture doubts

2016-05-01 Thread Shushant Arora
1.Does Hbase uses ConcurrentskipListMap(CSLM) to store data in memstore? 2.When mwmstore is flushed to HDFS- does it dump the memstore Concurrentskiplist as Hfile2? Then How does it calculates blocks out of CSLM and dmp them in HDFS. 3.After dumping the inmemory CSLM of memstore to HFILe does

Re: hbase custom scan

2016-04-04 Thread Shushant Arora
there a chance that the top N rows come from distinct > regions ? > > On Mon, Apr 4, 2016 at 8:27 PM, Shushant Arora <shushantaror...@gmail.com> > wrote: > > > Hi > > > > I have a requirement to scan a hbase table based on insertion timestamp. > > I nee

hbase custom scan

2016-04-04 Thread Shushant Arora
Hi I have a requirement to scan a hbase table based on insertion timestamp. I need to fetch the keys sorted by insertion timestamp not by key . I can't made timestamp as prefix of key to avoid hot spotting. Is there any efficient way possible for this requirement. Thanks!

does hbase scan doubts

2016-03-13 Thread Shushant Arora
Does hbase scan or get is single threaded? Say I have hbase table with 100 regionservers. When I scan a key rangle say a-z(distributed on all regionservers), will the client make calls to regionservers in parallel all at once or one by one.First it will get all keys from one regionserver then

Re: use of hbase client in application server

2016-03-13 Thread Shushant Arora
); > > if (maxThreads == 0) { > > maxThreads = 1; // is there a better default? > > } > > int corePoolSize = conf.getInt("hbase.htable.threads.coresize", 1); > > long keepAliveTime = conf.getLong("hbase.htable.threads.keepalivetime", >

use of hbase client in application server

2016-03-13 Thread Shushant Arora
I have a requirement to use long running hbase client in application server. 1.Do I need to create multiple HConnections or single Hconnection will work? 2. DO I need to check whether Hconnection is still active before using it to create Htable instance. 3.DO I need to handle region split and

Re: disable major compaction per table

2016-02-18 Thread Shushant Arora
> working in minor compaction? > > > > No, they are not. > > > > -Vlad > > > > On Tue, Feb 16, 2016 at 4:51 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > For #2, see http://hbase.apache.org/book.html#managed.compactions > > > >

disable major compaction per table

2016-02-16 Thread Shushant Arora
Hi 1.does major compaction in hbase runs per table basis. 2.By default every 24 hours? 3.Can I disable automatic major compaction for few tables while keep it enable for rest of tables? 4.Does hbase put ,get and delete are blocked while major compaction and are working in minor compaction?

timestamp/ttl of a cell

2015-11-25 Thread Shushant Arora
Hi Can TTL of rows be set/updated instead of complete column family? or Can timestamp version of a cell be decreased ? Aim is to delete some rows whose timestamp is set to old values so that it matches TTL of column family if tTL of row/cell cannot be specified.

Re: timestamp/ttl of a cell

2015-11-25 Thread Shushant Arora
Thanks! Whats the syntax to set it in shell and java ? On Wed, Nov 25, 2015 at 6:05 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > This? HBASE-10560 > > 2015-11-25 6:45 GMT-05:00 Shushant Arora <shushantaror...@gmail.com>: > > > Hi > > > &g

hbase timerange scan

2015-11-04 Thread Shushant Arora
Does hbase timerange scan is full table scan without the start and stop key? Or is it take care of HFile meta data about min and max timerange n HFile . And how it optimises this metadata after compaction of multiple files?

Re: hbase doubts

2015-08-18 Thread Shushant Arora
and will using keyprefixregionsplit policy instead of default Increasing to upperbound split policy help here? On Wed, Aug 19, 2015 at 10:23 AM, Shushant Arora shushantaror...@gmail.com wrote: When last region gets new data and split in two - what is the split point - say last reagion

Re: hbase doubts

2015-08-18 Thread Shushant Arora
in this scenario (instead of just the day of month), the last region would get new data and be split. Is this effect desirable for your app ? Cheers On Tue, Aug 18, 2015 at 12:55 PM, Shushant Arora shushantaror...@gmail.com wrote: for hbase key containing time as prefix say(-mm-dd#other fields

Re: hbase doubts

2015-08-18 Thread Shushant Arora
, Phoenix provides better integration with hbase. A third possibility is Spark on HBase. If you want to explore these alternatives, I suggest asking on respective mailing lists where you can get expert opinions. Cheers On Tue, Aug 18, 2015 at 9:03 AM, Shushant Arora shushantaror

Re: hbase doubts

2015-08-18 Thread Shushant Arora
17, 2015 at 10:08 PM, Shushant Arora shushantaror...@gmail.com wrote: Thanks ! few more doubts : 1.Say if requirement is to count distinct value of F1- If field is part of key- is hbase can't just scan key and skip value deserialsation and return result to client which

hbase doubts

2015-08-17 Thread Shushant Arora
1.Is there any max limit on key size of hbase table. 2.Is multiple small tables vs one large table which one is preferred. 3.for bulk load -when LoadIncremantalHfile is run it again recalculates the region splits based on region boundary - is this division happens on client side or server side

Re: hbase doubts

2015-08-17 Thread Shushant Arora
access patterns in your app. For #3, adjustment according to current region boundaries is done client side. Take a look at the javadoc for LoadQueueItem in LoadIncrementalHFiles.java Cheers On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora shushantaror...@gmail.com wrote: 1.Is there any

Re: hbase doubts

2015-08-17 Thread Shushant Arora
row / stop row to narrow the key range being scanned. I am leaning toward using second approach. Cheers On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora shushantaror...@gmail.com wrote: ~8-10 fields of size (5 of 20 bytes each )and 3 fields of size 200 bytes each. On Mon, Aug 17

Re: hbase doubts

2015-08-17 Thread Shushant Arora
http://hbase.apache.org/book.html#client.filter.kvm (see ColumnPrefixFilter) Cheers On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora shushantaror...@gmail.com wrote: 1.so size limit is per cell's identifier + value ? What is more optimise - to have field in key or in column family's

Re: hbase doubts

2015-08-17 Thread Shushant Arora
, 2015 at 7:36 AM, Shushant Arora shushantaror...@gmail.com wrote: 1.Is hbase.client.keyvalue.maxsize is max size of row or key only ? Is there any limit on key size only ? 2.Access pattern is mostly on key based only- Is memstores and regions on a regionserver are per table basis

bulk load doubts

2015-07-21 Thread Shushant Arora
1.Does bulk loaded HFile not get replicated? Is it mean if a Regionserver gets down , all Hfiles which were bulk loaded to this server are lost irrespective of HDFS replication set to 3 ? if yes- Why bulk loaded HFiles are not replicated. 2.Is there any issue in timestamp prefix as key of table-

hbase doubts

2015-07-16 Thread Shushant Arora
does bulk put supported in hbase ? And in MR job when we put in a table using TableOutputFormat how is it more efficient than normal put by individual reducers ? Does TableOutputformat not do put one by one ? And in bulkload hadoop job when we specify HFileOutputFormat , does job creates Hfiles

Hbase master selection doubt

2015-06-27 Thread Shushant Arora
How Hbase uses Zookeeper for Master selection and region server failure detection when Zookeeper is not strictly consistent. Say In Hbase Master selection process, how does a node is 100 % sure that a master is created ? Does it has to create the /master node and that node already exists will

Re: Hbase master selection doubt

2015-06-27 Thread Shushant Arora
://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees Cheers On Sat, Jun 27, 2015 at 7:20 AM, Shushant Arora shushantaror...@gmail.com wrote: How Hbase uses Zookeeper for Master selection and region server failure detection when Zookeeper is not strictly consistent

Re: Hbase master selection doubt

2015-06-27 Thread Shushant Arora
dir say(/master ) is visible to C2 not to C1. Till F1 comes in sync with Leader. On Sat, Jun 27, 2015 at 8:23 PM, Shushant Arora shushantaror...@gmail.com wrote: Zookeeper is Sequential Consistency Updates from a client will be applied in the order that they were sent. On Sat, Jun 27, 2015

Re: avoiding hot spot for timestamp prefix key

2015-05-22 Thread Shushant Arora
is '1432104178817#321'. After split, the first row in first daughter region would still be '1432104178817#321'. Right ? Cheers On Thu, May 21, 2015 at 9:57 PM, Shushant Arora shushantaror...@gmail.com wrote: Can I avoid hotspot of region with custom region split policy in hbase 0.96

Re: avoiding hot spot for timestamp prefix key

2015-05-22 Thread Shushant Arora
is the leading part of the rowkey. This would avoid the overlap you mentioned. Cheers On May 21, 2015, at 11:55 PM, Shushant Arora shushantaror...@gmail.com wrote: guid change with every key, patterns is 2015-05-22 00:02:01#AB12EC945 2015-05-22 00:02:02#CD9870001234AB457

avoiding hot spot for timestamp prefix key

2015-05-21 Thread Shushant Arora
Can I avoid hotspot of region with custom region split policy in hbase 0.96 . Key is of the form timestamp#guid. So can I have custom region split policy and use second part of key (i.e) guid as region split criteria and avoid hot spot??

default no of reducers

2015-04-28 Thread Shushant Arora
In Normal MR job can I configure ( cluster wide) default number of reducers - if I don't specify any reducers in my job.

Re: pre split region server

2014-07-16 Thread Shushant Arora
://hbase.apache.org/book/rowkey.design.html#rowkey.regionsplits http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/ On Tue, Jul 15, 2014 at 6:40 PM, Shushant Arora shushantaror...@gmail.com wrote: 1.How to split region servers at table definition time? 2

Re: pre split region server

2014-07-16 Thread Shushant Arora
is suboptimal. For #3, you can split the key space evenly. Using number of region servers as number of splits is Okay. Cheers On Jul 16, 2014, at 12:25 AM, Shushant Arora shushantaror...@gmail.com wrote: Thanks! Few more doubts 1.When I don't supply SPLITS at table creation , all put operation

pre split region server

2014-07-15 Thread Shushant Arora
1.How to split region servers at table definition time? 2.Will hbase write onto only one region server when no splits are defined even if key is not monotonically increasing? 3. When does a region split occurs. 4. Will no of regions be fixed when hbase table is presplitted at table creation

overriding slaves for particular job

2014-06-21 Thread Shushant Arora
Hi Can I override slaves nodes for one of my job only. Let say I want current job to be executed on node1 and node2 only. If both are busy let the job wait. Thanks Shushant

Re: hbase key design to efficient query on base of 2 or more column

2014-05-19 Thread Shushant Arora
/secondary_indexing.html -John On Sat, May 17, 2014 at 8:34 AM, Shushant Arora shushantaror...@gmail.comwrote: Hi I have a requirement to query my data base on date and user category. User category can be Supreme,Normal,Medium. I want to query how many new users are there in my table

Re: hbase key design to efficient query on base of 2 or more column

2014-05-19 Thread Shushant Arora
off using a scan with a start and stop row, then doing the counts on the client side. So as you get back your result set… you process the data. (Either in a M/R job or single client thread.) HTH On May 19, 2014, at 8:48 AM, Shushant Arora shushantaror...@gmail.com wrote: I cannot apply

Re: hbase key design to efficient query on base of 2 or more column

2014-05-19 Thread Shushant Arora
other, API - compatible product? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Shushant Arora [shushantaror...@gmail.com] Sent: Monday, May 19, 2014 12:48 AM To: user

hbase key design to efficient query on base of 2 or more column

2014-05-17 Thread Shushant Arora
Hi I have a requirement to query my data base on date and user category. User category can be Supreme,Normal,Medium. I want to query how many new users are there in my table from date range (2014-01-01) to (2014-05-16) category wise. Another requirement is to query how many users of Supreme

when to use hive vs hbase

2014-04-30 Thread Shushant Arora
I have a requirement of processing huge weblogs on daily basis. 1. data will come incremental to datastore on daily basis and I need cumulative and daily distinct user count from logs and after that aggregated data will be loaded in RDBMS like mydql. 2.data will be loaded in hdfs datawarehouse

Re: when to use hive vs hbase

2014-04-30 Thread Shushant Arora
. It's a bit over simplified, but that should give you some starting points. 2014-04-30 4:34 GMT-04:00 Shushant Arora shushantaror...@gmail.com: I have a requirement of processing huge weblogs on daily basis. 1. data will come incremental to datastore on daily basis and I need

Re: when to use hive vs hbase

2014-04-30 Thread Shushant Arora
of full scans and random read/random writes, then yes, go with it! Last, some full table scan can be good fits with HBase if you use some of it's specific features like TTL on certain columns families when using more than 1, etc. HTH 2014-04-30 8:13 GMT-04:00 Shushant Arora shushantaror

hive hbase integration

2014-04-17 Thread Shushant Arora
Wanna know why hive hbase integration is required. Is it because hbase cannot provide all functionalities of sql like and if yes then why? What is storage handler and best practices for hive hbase integration?