1.Can I use Uniform split for non hex keys?
2.if yes, how to specify key range for split.
3.If no then whats the diff between HexSplit and Uniform Split.
Thanks!
thanks!
Does puts which fall inside readpoint of ongoing scan/get are preserved in
HFile also or only in memstore and it blocks memstore flush until all
ongoing scans are completed.
On Tue, May 17, 2016 at 5:31 AM, Stack <st...@duboce.net> wrote:
> On Mon, May 16, 2016 at 4:55 PM,
Hi
Hbase uses MVCC for achieving consistent result for Get operations .
To achieve MVCC it has to maintain multiple versions of same row/cells .
How many max version of a row/cell does Hbase keeps at any time to support
MVCC.
Since say multiple gets started one after the other and has not
In Hfile v2 block level blommfilters are stored inb scanned section along
with data block and leaf index.
Load on open section contains bloomfilter data . Whats this bloom filter
data?
1.Does it contains index of bloomchunks stored in scanned section ?
2.What does meta blocks of non scanned
can a hbase table with single column family hve its row spawned on
multiple blocks in a same HFile ?
Suppose there is only one hfile in that case is it possile a column family
having 5-6 columns is spawned on multiple blocks ? or its always block is
closed at max( 64k default or when all columns
Hi
Hbase uses zookeeper for various purposes. e.g for region split.
Regionserver creates a znode in zookeeper with splitting state and master
gets notification of this directory , since zookeeper is not fully
consistent - there may be lag between actual directory creation and
notification till
4.Can same row be in 2 blocks in Hfile. One cell in block 1 and another in
block2 ?
On Mon, May 9, 2016 at 4:57 PM, Shushant Arora <shushantaror...@gmail.com>
wrote:
> Thanks!
>
> 1.Will write take lock on all the column families or just the column
> family being affected b
. Now if a new inmemory row
comes will it evict from inmemory or single access area ?
3.Why block cache is single per regionserver. Why not single per region.
On Sun, May 8, 2016 at 11:43 PM, Stack <st...@duboce.net> wrote:
> On Sun, May 8, 2016 at 6:12 AM, Shushant Arora <
previous to put took lock.
Memstore is implemented as CSLM so how does it return the row state
previous to put lock when get is fired before put is finished?
On Tue, May 3, 2016 at 7:41 AM, Stack <st...@duboce.net> wrote:
> On Mon, May 2, 2016 at 5:34 PM, Shushant Arora <shushantaror.
1.Why is it better to have single file per region than multiple files for
read performance. Why can't multile threads read multiple file and give
better performance?
2Does hbase regionserver has single thread for compactions and split for
all regions its holding? Why can't single thread per
, 2016 at 12:05 AM, Stack <st...@duboce.net> wrote:
> On Mon, May 2, 2016 at 10:06 AM, Shushant Arora <shushantaror...@gmail.com
> >
> wrote:
>
> > Thanks Stack
> >
> > for point 2 :
> > I am concerned with downtime of Hbase for read and write.
> >
to Hfile
won't we loose the update?
On Mon, May 2, 2016 at 9:06 PM, Stack <st...@duboce.net> wrote:
> On Mon, May 2, 2016 at 1:25 AM, Shushant Arora <shushantaror...@gmail.com>
> wrote:
>
> > Thanks!
> >
> > Few doubts;
> >
> > 1.LSM tree compris
d read is allowed using snapshot.
Thanks!
On Mon, May 2, 2016 at 11:39 AM, Stack <st...@duboce.net> wrote:
> On Sun, May 1, 2016 at 3:36 AM, Shushant Arora <shushantaror...@gmail.com>
> wrote:
>
> > 1.Does Hbase uses ConcurrentskipListMap(CSLM) to store data in mem
1.Does Hbase uses ConcurrentskipListMap(CSLM) to store data in memstore?
2.When mwmstore is flushed to HDFS- does it dump the memstore
Concurrentskiplist as Hfile2? Then How does it calculates blocks out of
CSLM and dmp them in HDFS.
3.After dumping the inmemory CSLM of memstore to HFILe does
there a chance that the top N rows come from distinct
> regions ?
>
> On Mon, Apr 4, 2016 at 8:27 PM, Shushant Arora <shushantaror...@gmail.com>
> wrote:
>
> > Hi
> >
> > I have a requirement to scan a hbase table based on insertion timestamp.
> > I nee
Hi
I have a requirement to scan a hbase table based on insertion timestamp.
I need to fetch the keys sorted by insertion timestamp not by key .
I can't made timestamp as prefix of key to avoid hot spotting.
Is there any efficient way possible for this requirement.
Thanks!
Does hbase scan or get is single threaded?
Say I have hbase table with 100 regionservers.
When I scan a key rangle say a-z(distributed on all regionservers), will
the client make calls to regionservers in parallel all at once or one by
one.First it will get all keys from one regionserver then
);
>
> if (maxThreads == 0) {
>
> maxThreads = 1; // is there a better default?
>
> }
>
> int corePoolSize = conf.getInt("hbase.htable.threads.coresize", 1);
>
> long keepAliveTime = conf.getLong("hbase.htable.threads.keepalivetime",
>
I have a requirement to use long running hbase client in application server.
1.Do I need to create multiple HConnections or single Hconnection will work?
2. DO I need to check whether Hconnection is still active before using it
to create Htable instance.
3.DO I need to handle region split and
> working in minor compaction?
> >
> > No, they are not.
> >
> > -Vlad
> >
> > On Tue, Feb 16, 2016 at 4:51 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > For #2, see http://hbase.apache.org/book.html#managed.compactions
> > >
>
Hi
1.does major compaction in hbase runs per table basis.
2.By default every 24 hours?
3.Can I disable automatic major compaction for few tables while keep it
enable for rest of tables?
4.Does hbase put ,get and delete are blocked while major compaction and are
working in minor compaction?
Hi
Can TTL of rows be set/updated instead of complete column family?
or
Can timestamp version of a cell be decreased ? Aim is to delete some rows
whose timestamp
is set to old values so that it matches TTL of column family if tTL of
row/cell cannot be specified.
Thanks!
Whats the syntax to set it in shell and java ?
On Wed, Nov 25, 2015 at 6:05 PM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:
> This? HBASE-10560
>
> 2015-11-25 6:45 GMT-05:00 Shushant Arora <shushantaror...@gmail.com>:
>
> > Hi
> >
> &g
Does hbase timerange scan is full table scan without the start and stop key?
Or is it take care of HFile meta data about min and max timerange n HFile .
And how it optimises this metadata after compaction of multiple files?
and will using keyprefixregionsplit policy instead of default Increasing to
upperbound split policy help here?
On Wed, Aug 19, 2015 at 10:23 AM, Shushant Arora shushantaror...@gmail.com
wrote:
When last region gets new data and split in two - what is the split point
- say last reagion
in this scenario (instead of
just the day of month), the last region would get new data and be split.
Is this effect desirable for your app ?
Cheers
On Tue, Aug 18, 2015 at 12:55 PM, Shushant Arora
shushantaror...@gmail.com
wrote:
for hbase key containing time as prefix say(-mm-dd#other fields
, Phoenix provides better integration with hbase.
A third possibility is Spark on HBase.
If you want to explore these alternatives, I suggest asking on respective
mailing lists where you can get expert opinions.
Cheers
On Tue, Aug 18, 2015 at 9:03 AM, Shushant Arora
shushantaror
17, 2015 at 10:08 PM, Shushant Arora
shushantaror...@gmail.com
wrote:
Thanks !
few more doubts :
1.Say if requirement is to count distinct value of F1-
If field is part of key- is hbase can't just scan key and skip value
deserialsation and return result to client which
1.Is there any max limit on key size of hbase table.
2.Is multiple small tables vs one large table which one is preferred.
3.for bulk load -when LoadIncremantalHfile is run it again recalculates
the region splits based on region boundary - is this division happens on
client side or server side
access patterns in
your app.
For #3, adjustment according to current region boundaries is done client
side. Take a look at the javadoc for LoadQueueItem
in LoadIncrementalHFiles.java
Cheers
On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora shushantaror...@gmail.com
wrote:
1.Is there any
row / stop row to narrow the
key range being scanned.
I am leaning toward using second approach.
Cheers
On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora shushantaror...@gmail.com
wrote:
~8-10 fields of size (5 of 20 bytes each )and 3 fields of size 200 bytes
each.
On Mon, Aug 17
http://hbase.apache.org/book.html#client.filter.kvm (see
ColumnPrefixFilter)
Cheers
On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora shushantaror...@gmail.com
wrote:
1.so size limit is per cell's identifier + value ?
What is more optimise - to have field in key or in column family's
, 2015 at 7:36 AM, Shushant Arora shushantaror...@gmail.com
wrote:
1.Is hbase.client.keyvalue.maxsize is max size of row or key only ? Is
there any limit on key size only ?
2.Access pattern is mostly on key based only- Is memstores and regions
on a
regionserver are per table basis
1.Does bulk loaded HFile not get replicated? Is it mean if a Regionserver
gets down , all Hfiles which were bulk loaded to this server are lost
irrespective of HDFS replication set to 3 ? if yes- Why bulk loaded HFiles
are not replicated.
2.Is there any issue in timestamp prefix as key of table-
does bulk put supported in hbase ?
And in MR job when we put in a table using TableOutputFormat how is it more
efficient than normal put by individual reducers ? Does TableOutputformat
not do put one by one ?
And in bulkload hadoop job when we specify HFileOutputFormat , does job
creates Hfiles
How Hbase uses Zookeeper for Master selection and region server failure
detection when Zookeeper is not strictly consistent.
Say In Hbase Master selection process, how does a node is 100 % sure that a
master is created ? Does it has to create the /master node and that node
already exists will
://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees
Cheers
On Sat, Jun 27, 2015 at 7:20 AM, Shushant Arora shushantaror...@gmail.com
wrote:
How Hbase uses Zookeeper for Master selection and region server failure
detection when Zookeeper is not strictly consistent
dir say(/master ) is visible
to C2 not to C1. Till F1 comes in sync with Leader.
On Sat, Jun 27, 2015 at 8:23 PM, Shushant Arora shushantaror...@gmail.com
wrote:
Zookeeper is Sequential Consistency
Updates from a client will be applied in the order that they were sent.
On Sat, Jun 27, 2015
is
'1432104178817#321'. After split, the first row in first daughter region
would still be '1432104178817#321'. Right ?
Cheers
On Thu, May 21, 2015 at 9:57 PM, Shushant Arora shushantaror...@gmail.com
wrote:
Can I avoid hotspot of region with custom region split policy in hbase
0.96
is the
leading part of the rowkey.
This would avoid the overlap you mentioned.
Cheers
On May 21, 2015, at 11:55 PM, Shushant Arora shushantaror...@gmail.com
wrote:
guid change with every key, patterns is
2015-05-22 00:02:01#AB12EC945
2015-05-22 00:02:02#CD9870001234AB457
Can I avoid hotspot of region with custom region split policy in hbase
0.96 .
Key is of the form timestamp#guid.
So can I have custom region split policy and use second part of key (i.e)
guid as region split criteria and avoid hot spot??
In Normal MR job can I configure ( cluster wide) default number of reducers
- if I don't specify any reducers in my job.
://hbase.apache.org/book/rowkey.design.html#rowkey.regionsplits
http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
On Tue, Jul 15, 2014 at 6:40 PM, Shushant Arora
shushantaror...@gmail.com
wrote:
1.How to split region servers at table definition time?
2
is suboptimal.
For #3, you can split the key space evenly. Using number of region servers
as number of splits is Okay.
Cheers
On Jul 16, 2014, at 12:25 AM, Shushant Arora shushantaror...@gmail.com
wrote:
Thanks!
Few more doubts
1.When I don't supply SPLITS at table creation , all put operation
1.How to split region servers at table definition time?
2.Will hbase write onto only one region server when no splits are defined
even if key is not monotonically increasing?
3. When does a region split occurs.
4. Will no of regions be fixed when hbase table is presplitted at table
creation
Hi
Can I override slaves nodes for one of my job only.
Let say I want current job to be executed on node1 and node2 only.
If both are busy let the job wait.
Thanks
Shushant
/secondary_indexing.html
-John
On Sat, May 17, 2014 at 8:34 AM, Shushant Arora
shushantaror...@gmail.comwrote:
Hi
I have a requirement to query my data base on date and user category.
User category can be Supreme,Normal,Medium.
I want to query how many new users are there in my table
off using a scan with a
start and stop row, then doing the counts on the client side.
So as you get back your result set… you process the data. (Either in a M/R
job or single client thread.)
HTH
On May 19, 2014, at 8:48 AM, Shushant Arora shushantaror...@gmail.com
wrote:
I cannot apply
other, API - compatible
product?
Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com
From: Shushant Arora [shushantaror...@gmail.com]
Sent: Monday, May 19, 2014 12:48 AM
To: user
Hi
I have a requirement to query my data base on date and user category.
User category can be Supreme,Normal,Medium.
I want to query how many new users are there in my table from date range
(2014-01-01) to (2014-05-16) category wise.
Another requirement is to query how many users of Supreme
I have a requirement of processing huge weblogs on daily basis.
1. data will come incremental to datastore on daily basis and I need
cumulative and daily
distinct user count from logs and after that aggregated data will be loaded
in RDBMS like mydql.
2.data will be loaded in hdfs datawarehouse
.
It's a bit over simplified, but that should give you some starting points.
2014-04-30 4:34 GMT-04:00 Shushant Arora shushantaror...@gmail.com:
I have a requirement of processing huge weblogs on daily basis.
1. data will come incremental to datastore on daily basis and I need
of full scans and random read/random writes, then
yes, go with it!
Last, some full table scan can be good fits with HBase if you use some of
it's specific features like TTL on certain columns families when using more
than 1, etc.
HTH
2014-04-30 8:13 GMT-04:00 Shushant Arora shushantaror
Wanna know why hive hbase integration is required.
Is it because hbase cannot provide all functionalities of sql like and if
yes then why?
What is storage handler and best practices for hive hbase integration?
54 matches
Mail list logo