Error when starting hbase shell

2016-01-08 Thread Sreeram
Hi,

I built HBase using cygwin in my local machine (the master branch) and I
get below error when starting up hbase shell.

NameError: cannot initialize Java class
org.apache.hadoop.hbase.HColumnDescriptor
  get_proxy_or_package_under_package at
org/jruby/javasupport/JavaUtilities.java:54
  method_missing at
file:/C:/Users//.m2/repository/org/jruby/jruby-complete/1.6.8/jruby-com
plete-1.6.8.jar!/builtin/javasupport/java.rb:51
  HBaseConstants at
d:/hbase-master/hbase-master/bin/../hbase-shell/src/main/ruby/hbase.rb:93
  (root) at
d:/hbase-master/hbase-master/bin/../hbase-shell/src/main/ruby/hbase.rb:34
 require at org/jruby/RubyKernel.java:1062
  (root) at
d:/hbase-master/hbase-master/bin/../bin/hirb.rb:118

The JDK/JRE version I have is 1.8.0_51.
Any thoughts on what is going on here ?

Sreeram


Re: Error when starting hbase shell

2016-01-08 Thread Sreeram
Just to add, I verified that hbase-client-2.0.0-SNAPSHOT.jar (which
contains org.apache.hadoop.hbase.HColumnDescriptor) is in hbase class path.

On Fri, Jan 8, 2016 at 2:44 PM, Sreeram  wrote:

> Hi,
>
> I built HBase using cygwin in my local machine (the master branch) and I
> get below error when starting up hbase shell.
>
> NameError: cannot initialize Java class
> org.apache.hadoop.hbase.HColumnDescriptor
>   get_proxy_or_package_under_package at
> org/jruby/javasupport/JavaUtilities.java:54
>   method_missing at
> file:/C:/Users//.m2/repository/org/jruby/jruby-complete/1.6.8/jruby-com
> plete-1.6.8.jar!/builtin/javasupport/java.rb:51
>   HBaseConstants at
> d:/hbase-master/hbase-master/bin/../hbase-shell/src/main/ruby/hbase.rb:93
>   (root) at
> d:/hbase-master/hbase-master/bin/../hbase-shell/src/main/ruby/hbase.rb:34
>  require at org/jruby/RubyKernel.java:1062
>   (root) at
> d:/hbase-master/hbase-master/bin/../bin/hirb.rb:118
>
> The JDK/JRE version I have is 1.8.0_51.
> Any thoughts on what is going on here ?
>
> Sreeram
>


Re: How to implement increment in an idempotent manner

2016-03-18 Thread Sreeram
The incremented field is more like an amount field that will be storing the
aggregate amount. Since the field will be incremented concurrently by
multiple bolts running in parallel, storing the value before increment and
then doing a put in case of replay will not help.

Reason to have this field is to pre-compute a certain aggregate amount and
materialize it in the Hbase table.

On Fri, Mar 18, 2016 at 3:30 PM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> At the beginning of your Storm bolt process can you not do a put of "0"? So
> it start back from scratch? Or else you will need to query the value, and
> keep the value to put it back if you need to replay your bolt
>
> Other option is, you increment a specific difference column and at the end
> if you are successful with your bolt you increment te initial column with
> the new total counter?
>
> JMS
>
> 2016-03-18 5:33 GMT-04:00 Sreeram :
>
> > Hi,
> >
> >  I am looking for suggestions from community on implementing HBase
> > increment in a idempotent manner.
> >
> >  My use case is a storm Hbase bolt that atomically increments a HBase
> > counter. Replay of the storm bolt results in a double increment.
> >
> >  Any suggestion on the approach to be taken is welcome.
> >
> >  Thank you.
> >
> >  Regards,
> >  Sreeram
> >
>


How to implement increment in an idempotent manner

2016-03-19 Thread Sreeram
Hi,

 I am looking for suggestions from community on implementing HBase
increment in a idempotent manner.

 My use case is a storm Hbase bolt that atomically increments a HBase
counter. Replay of the storm bolt results in a double increment.

 Any suggestion on the approach to be taken is welcome.

 Thank you.

 Regards,
 Sreeram


Re: How to implement increment in an idempotent manner

2016-03-19 Thread Sreeram
All my Hbase processing is going to be inside any given bolt. So I cannot
use the second option.

On Fri, Mar 18, 2016 at 3:46 PM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> What's about the other option where each bolt increment it's own column and
> at theend ou aggregate those few columns together?
>
> 2016-03-18 6:14 GMT-04:00 Sreeram :
>
> > The incremented field is more like an amount field that will be storing
> the
> > aggregate amount. Since the field will be incremented concurrently by
> > multiple bolts running in parallel, storing the value before increment
> and
> > then doing a put in case of replay will not help.
> >
> > Reason to have this field is to pre-compute a certain aggregate amount
> and
> > materialize it in the Hbase table.
> >
> > On Fri, Mar 18, 2016 at 3:30 PM, Jean-Marc Spaggiari <
> > jean-m...@spaggiari.org> wrote:
> >
> > > At the beginning of your Storm bolt process can you not do a put of
> "0"?
> > So
> > > it start back from scratch? Or else you will need to query the value,
> and
> > > keep the value to put it back if you need to replay your bolt
> > >
> > > Other option is, you increment a specific difference column and at the
> > end
> > > if you are successful with your bolt you increment te initial column
> with
> > > the new total counter?
> > >
> > > JMS
> > >
> > > 2016-03-18 5:33 GMT-04:00 Sreeram :
> > >
> > > > Hi,
> > > >
> > > >  I am looking for suggestions from community on implementing HBase
> > > > increment in a idempotent manner.
> > > >
> > > >  My use case is a storm Hbase bolt that atomically increments a HBase
> > > > counter. Replay of the storm bolt results in a double increment.
> > > >
> > > >  Any suggestion on the approach to be taken is welcome.
> > > >
> > > >  Thank you.
> > > >
> > > >  Regards,
> > > >  Sreeram
> > > >
> > >
> >
>


Re: Can not connect local java client to a remote Hbase

2016-04-22 Thread Sreeram
Hi Soufiani,

Can you try changing your configuration to have region server listen on
0.0.0.0:16020 and master listen on 0.0.0.0:16000 ?

127.0.0.1 being local loopback will not be accessible from outside.

Regards,
Sreeram


On Fri, Apr 22, 2016 at 9:00 PM, SOUFIANI Mustapha | السفياني مصطفى <
s.mustaph...@gmail.com> wrote:

> Thanks Sachine for your help, I already checked this issue on the officiel
> pentaho users form and it seems to be OK for them too
> I really don't know what could be the problem for this
> but any way, thanks again for your help
> Regards.
>
> 2016-04-22 16:17 GMT+01:00 Sachin Mittal :
>
> > you ports are open. Your settings are fine. Issue seems to be elsewhere
> bu
> > I am not sure where.
> > check with Pentaho maybe.
> >
> > On Fri, Apr 22, 2016 at 8:44 PM, SOUFIANI Mustapha | السفياني مصطفى <
> > s.mustaph...@gmail.com> wrote:
> >
> > > Maybe those ports are not open:
> > > hduser@big-services:~$ telnet localhost 16020
> > > Trying ::1...
> > > Trying 127.0.0.1...
> > > Connected to localhost.
> > > Escape character is '^]'.
> > >
> >
>


Maximum limit on HBase cluster size

2016-09-07 Thread Sreeram
Dear All,



Looking forward to your views on the maximum limit of HBase cluster size.



We are currently designing a HBase cluster and one of the tables (designed
in wide format) is expected to have roughly 6 billion rows in production by
3 years (with an additional 200 million rows getting added each month). In
addition, we are expecting roughly 250 columns per row.  Expected table
data volume is around 250 TB (at end of 3 years, without considering HDFS
replication) and growing by 7 TB per month.



While we are provisioning the number of nodes based on expected data
volume, wanted to check if there are any limits on the number of rows per
cluster.



Will it be advisable to split the cluster in such situation into two or
more independent clusters?  Will there be any impact to the read/write
throughput/latency as the table grows over time?



Please advise.



Regards,

Sreeram


Re: Maximum limit on HBase cluster size

2016-09-07 Thread Sreeram
Hi Ted,

>From the link
"Around 50-100 regions is a good number for a table with 1 or 2 column
families. Remember that a region is a contiguous segment of a column
family.".

This number 50-100 regions per table at the level of individual region
server or for the entire cluster ?

Thanks,
Sreeram





On Wed, Sep 7, 2016 at 4:18 PM, Ted Yu  wrote:

> With properly designed schema, you don't need to split the cluster.
>
> Please see:
> http://hbase.apache.org/book.html#schema
>
> > On Sep 7, 2016, at 1:59 AM, Sreeram  wrote:
> >
> > Dear All,
> >
> >
> >
> > Looking forward to your views on the maximum limit of HBase cluster size.
> >
> >
> >
> > We are currently designing a HBase cluster and one of the tables
> (designed
> > in wide format) is expected to have roughly 6 billion rows in production
> by
> > 3 years (with an additional 200 million rows getting added each month).
> In
> > addition, we are expecting roughly 250 columns per row.  Expected table
> > data volume is around 250 TB (at end of 3 years, without considering HDFS
> > replication) and growing by 7 TB per month.
> >
> >
> >
> > While we are provisioning the number of nodes based on expected data
> > volume, wanted to check if there are any limits on the number of rows per
> > cluster.
> >
> >
> >
> > Will it be advisable to split the cluster in such situation into two or
> > more independent clusters?  Will there be any impact to the read/write
> > throughput/latency as the table grows over time?
> >
> >
> >
> > Please advise.
> >
> >
> >
> > Regards,
> >
> > Sreeram
>


Viable approaches to fail over HBase cluster across data centers

2016-09-26 Thread Sreeram
Dear All,

 Please let me know your thoughts on viable approaches to fail over HBase
cluster across data centers in case of a primary data center outage. The
deployment scenario has zero data loss as one of the primary design goals.
Deployment scenario is Active-Passive. In case of active cluster being
down, there must be zero data loss fail over to the passive cluster.

I understand that the built-in table level replication using 'add_peer'
might still lead to data loss since it is asynchronous.

As a related note, is there is a way to specify the location (e.g. network
drive) where HBase WAL files in HDFS need to be written to ? The network
drive has synchronous replication across data centers. If the WAL files can
be written to the replicated network drives, can we recover in-flight data
in the passive cluster and resume operations from there ?

Regards,
Sreeram


Maximum size of HBase row

2016-10-17 Thread Sreeram
Hi All,

Please let me know if the maximum size of a HBase row (in terms of storage
space) will be equal to the configured size of a region?

I understand the parameter hbase.table.max.rowsize to be the maximum bytes
that can be transferred in a single get/scan operation and not related to
the actual size of row in HBase.

Is my understanding correct? Kindly let me know.

Regards,
Sreeram


Question on WALEdit

2017-01-28 Thread Sreeram
Hi,

TL;DR:  In my use case I am setting attributes for Puts and Deletes using
setAttribute(). I would like to know if it is possible to get the
attributes that I had set from the WALEdit ?

Here is my use case in detail: I have a replicated cluster A which gets
replicated to cluster B. From cluster B, I would like to track the events
as they get written to B. I set the event-id as an attribute to the
mutation in cluster A.

I will have a coProcessor in cluster B that gets invoked on postWALWrite,
If I can retrieve the event-id from the WALEdit, I would be able to track
those events that got replicated successfully in cluster B.

I went through the WALEdit API and it is not obvious to me if it is
possible to retrieve the attributes set on the row mutation.

Kindly let me know your suggestions.

Regards,
Sreeram


Re: Question on WALEdit

2017-01-29 Thread Sreeram
Thank you very much Ted.  I understand fetching the tags will fetch the
associated attributes for a mutation. Will try out the same.

Regards,
Sreeram

On 29 Jan 2017 00:37, "Ted Yu"  wrote:

In CellUtil, there is the following method:

  public static Tag getTag(Cell cell, byte type){


In MobUtils, you can find sample usage:

  public static Tag getTableNameTag(Cell cell) {

if (cell.getTagsLength() > 0) {

  return CellUtil.getTag(cell, TagType.MOB_TABLE_NAME_TAG_TYPE);

FYI

On Sat, Jan 28, 2017 at 8:29 AM, Ted Yu  wrote:

> I haven't found the API you were looking for.
>
> Which release of hbase are you using ?
> I assume it supports tags.
>
> If you use tag to pass event-id, you can retrieve thru this method of
> WALEdit:
>
>   public ArrayList getCells() {
>
> From Cell, there're 3 methods for retrieving tag starting with:
>
>   byte[] getTagsArray();
>
> Cheers
>
> On Sat, Jan 28, 2017 at 4:23 AM, Sreeram  wrote:
>
>> Hi,
>>
>> TL;DR:  In my use case I am setting attributes for Puts and Deletes using
>> setAttribute(). I would like to know if it is possible to get the
>> attributes that I had set from the WALEdit ?
>>
>> Here is my use case in detail: I have a replicated cluster A which gets
>> replicated to cluster B. From cluster B, I would like to track the events
>> as they get written to B. I set the event-id as an attribute to the
>> mutation in cluster A.
>>
>> I will have a coProcessor in cluster B that gets invoked on postWALWrite,
>> If I can retrieve the event-id from the WALEdit, I would be able to track
>> those events that got replicated successfully in cluster B.
>>
>> I went through the WALEdit API and it is not obvious to me if it is
>> possible to retrieve the attributes set on the row mutation.
>>
>> Kindly let me know your suggestions.
>>
>> Regards,
>> Sreeram
>>
>
>


Unable to get coprocessor debug logs in regionserver.

2017-03-20 Thread Sreeram
Hi,

  I am writing a coprocessor for postWALWrite event.

I do not see the debug logs for the coprocessor in RS log.

I could see that the coprocessor got loaded by the regionserver - below
line is from RS log.

2017-03-20 18:59:17,132 INFO
org.apache.hadoop.hbase.coprocessor.CoprocessorHost: System coprocessor
Test.TestWALEditCP was loaded successfully with priority (536870912).

Any thoughts what can be going wrong ?

Thanks,
Sreeram

PS: My code is below.


package Test;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.hbase.regionserver.wal.FSHLog;
import org.apache.hadoop.hbase.regionserver.wal.WALEdit;
import org.apache.hadoop.hbase.wal.WALKey;

public class TestWALEditCP extends BaseWALObserver {
public static final Log LOG = LogFactory.getLog(FSHLog.class);

@Override
public void postWALWrite(ObserverContext ctx, HRegionInfo info, WALKey logkey, WALEdit
logEdit) {
LOG.info("Post WAL edit is being triggered"); // <--- This line
does not get printed in RS log
return;
}
}


Question in WALEdit

2017-03-22 Thread Sreeram
Hi,

 I have below questions on WALEdit. Looking forward to answer from the
community.

 a) I understand that all Cells in a given WALEdit form part of a single
transaction. Since HBase atomicity is at row level, this implies all Cells
in a given WALEdit have the same row key. is this understanding is correct



 b) With MultiWAL, does log sequence monotonically increase based on
transaction time stamp ?  Specifically, suppose there two transactions for
two different tables for a single region server at times t0 and t1 (t0 <
t1). In the presence of MultiWAL, will the postWALEdit() coprocessor event
for transaction 0 be triggered before than transaction 1?



Thanks,

Sreeram

PS: I use HBase version 1.2.0


Re: Question in WALEdit

2017-03-22 Thread Sreeram
I am sorry for typo. For the second question I meant postWALWrite in
WALObserver.

default void postWALWrite
<https://hbase.apache.org/devapidocs/src-html/org/apache/hadoop/hbase/coprocessor/WALObserver.html#line.86>
(ObserverContext
<https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/ObserverContext.html>https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/WALCoprocessorEnvironment.html>>
ctx,
HRegionInfo
<https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/HRegionInfo.html>
info,
WALKey
<https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/wal/WALKey.html>
logKey,
WALEdit
<https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/wal/WALEdit.html>
logEdit)
throws IOException
<http://docs.oracle.com/javase/8/docs/api/java/io/IOException.html?is-external=true>

Thanks

On 23 Mar 2017 02:37, "Ted Yu"  wrote:

> Sreeram:
> For #2, did you mean this method ?
>
>   default void postWALRestore(final ObserverContext RegionCoprocessorEnvironment> ctx,
>
>   HRegionInfo info, WALKey logKey, WALEdit logEdit) throws IOException
> {}
>
> On Wed, Mar 22, 2017 at 12:56 PM, Vladimir Rodionov <
> vladrodio...@gmail.com>
> wrote:
>
> > a) HBase does not support transaction - it only guarantees that single
> > mutation to a row-key is atomic. WALEdit can contains cells (mutations)
> > from different rows (for example when you do butchMutatate all operations
> > go to the same WALEdit afaik)
> > b) I coud not find postWALEdit()  in RegionObserver API. What coprocessor
> > hook did you mean exactly?
> >
> > -Vlad
> >
> > On Wed, Mar 22, 2017 at 5:19 AM, Sreeram  wrote:
> >
> > > Hi,
> > >
> > >  I have below questions on WALEdit. Looking forward to answer from the
> > > community.
> > >
> > >  a) I understand that all Cells in a given WALEdit form part of a
> single
> > > transaction. Since HBase atomicity is at row level, this implies all
> > Cells
> > > in a given WALEdit have the same row key. is this understanding is
> > correct
> > >
> > >
> > >
> > >  b) With MultiWAL, does log sequence monotonically increase based on
> > > transaction time stamp ?  Specifically, suppose there two transactions
> > for
> > > two different tables for a single region server at times t0 and t1 (t0
> <
> > > t1). In the presence of MultiWAL, will the postWALEdit() coprocessor
> > event
> > > for transaction 0 be triggered before than transaction 1?
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Sreeram
> > >
> > > PS: I use HBase version 1.2.0
> > >
> >
>


RFC: Hash prefix considerations for HBase row key design when storing time series data

2017-04-12 Thread Sreeram
Hi,

 I have put down some thoughts on designing hash prefixes for HBase
row key in below link.

https://tmblr.co/Z4Ek8e2KYFB0h

Request the community to kindly take a look and share your comments.

TL;DR version - In specific HBase tables (where data is stored by time
series e.g. by day or month) with row key prefixed with a hash, it
might be beneficial to limit the number of hash bits to avoid write
performance impact.

Thanks,
Sreeram


Balancing a table's regions across region servers

2017-04-18 Thread Sreeram
Hi,

I have a requirement where, for one specific table, each region server
always needs to manage at least one region - this table has more
regions that the number of region servers.

Based on HBASE-3373, can I assume that this is taken care
automatically by HBase Balancer?

The version of HBase that I use is HBase 1.2.0-cdh5.8.2

Kindly let me know

Thank you

-Sreeram


Re: Balancing a table's regions across region servers

2017-04-18 Thread Sreeram
Thank you Ted for reply.

Can the parameter hbase.master.balancer.stochastic.tableSkewCost be
set at a table level (for a very small table) ?

-Sreeram

On Tue, Apr 18, 2017 at 3:12 PM, Ted Yu  wrote:
> If you look at the patch for HBASE-3373, you would see that there is a
> config to enable per table balancing.
> This was developed before StochasticLoadBalancer became the default
> balancer.
>
> In StochasticLoadBalancer, you need to increase the weight for
> hbase.master.balancer.stochastic.tableSkewCost
> Default weight is 35. Consider increasing to 500 range.
>
> FYI
>
> On Tue, Apr 18, 2017 at 2:21 AM, Sreeram  wrote:
>
>> Hi,
>>
>> I have a requirement where, for one specific table, each region server
>> always needs to manage at least one region - this table has more
>> regions that the number of region servers.
>>
>> Based on HBASE-3373, can I assume that this is taken care
>> automatically by HBase Balancer?
>>
>> The version of HBase that I use is HBase 1.2.0-cdh5.8.2
>>
>> Kindly let me know
>>
>> Thank you
>>
>> -Sreeram
>>


ValueFilter returning earlier values

2017-04-20 Thread Sreeram
Hi,

 When I scan with ValueFilter on a column, I see that it returns older
versions too if they happen to match the value in the ValueFilter.

The table column family has the property VERSIONS set to 1. I had set
setMaxVersions to 1 in the scan object.

I was expecting the value filter to return only the latest values for
the column, provided they match the filter.

Is this the expected behaviour of ValueFilter? Any suggestions if I
must be setting any options to skip the older values from coming in
the result?

Thank you

Regards,
Sreeram


API to get HBase replication status

2017-04-24 Thread Sreeram
Hi,

 I am trying to understand if the hbase shell commands to get the
replication status are based on any underlying API.

Specifically I am trying to fetch values of last shipped timestamp and
replication lag per regionserver. The ReplicationAdmin does not seem
to be providing the information ( or may be its not obvious for me).

The version of HBase that I use is 1.2.0-cdh5.8.2

Any help on this regard ?

Thanks,
Sreeram


How to improve HBase replication throughput between data centers ?

2017-05-18 Thread Sreeram
Hi All,

I have setup HBase replication between two clusters containing 25
nodes each. The inter-data center network link has a capacity of 500
MBPS.

I have been running some tests to understand the speed of replication.
I am observing that the replication speed does not go more than 5
MBPS.

On reading up regarding the same, I understand that the speed of data
transfer depends on OS level TCP socket read and write buffer sizes.

Below are the OS level parameters that I see for the socket size

# cat /proc/sys/net/ipv4/tcp_wmem
4096 (min)   16384 (default)  4194304 (max)

# cat /proc/sys/net/ipv4/tcp_rmem
4096 (min)87380 (default)  6291456 (max)

The default write buffer size for sockets is 16KB and the read buffer
size is around 85KB.

There are suggestions [1] to set higher values for the default read
and write buffers to fully utilize the link capacity.

But I am not sure how to influence HBase to use higher values for the
socket read/write buffers when it does replication.

Any thoughts from the community on the same?

Thanks
Sreeram

[1] http://www.onlamp.com/pub/a/onlamp/2005/11/17/tcp_tuning.html


Slow HBase write across data center

2017-06-29 Thread Sreeram
Hi,

 I am facing very slow write speed when I write to a HBase cluster in
a different data center. The network round trip takes 1 ms. Average
time taken for 100 PUTs (each 500KB) takes over 2 seconds. Any network
or OS parameter that I need to check ?

Will appreciate any inputs from the community on this.

Thanks,
Sreeram


HBase- Scan with wildcard character

2011-12-12 Thread Sreeram K
Hi,

I have a Table defined with the 3 columns.
I am looking for a query in HBase shell to print all the values starting with 
some characters in Rowkey.

Example:
My rowids are:Coulm+Key
4E11676773AC3B6E9A3FE1CCD1051B8C&1323736118749497       
colum=xx:size,timestamp=67667767,value=

4E11676773AC3B6E9A3FE1CCD1051B8C&132373611874988         
colum=11x:size,timestamp=67667767,value=

4E11676773AC3B6E9A3FE1CCD1051B8C&132373611565656         
colum=1xx:size,timestamp=67667767,value=


Something similar to mysql => select * from table where id='%4E1167677%'

do we have any command like this in the HBase shell - Scan with wild characters?

(or) should we end up using HIVE ? what are the other options?

Can you please let me know.

-Sreeram



Re: HBase- Scan with wildcard character

2011-12-13 Thread Sreeram K
Thanks Lars, I will look into that .

one more question: on hbase shell.

If I have :
           hbase> scan 't1.', {COLUMNS => 'info:regioninfo'}  , it is printing 
all the colums of regioninfo.


can I have a condition like:if colum,info.regioninfo=2 (value) than print all 
the associated columns like info:regioninfo1, regioninfo2.



- Original Message -
From: lars hofhansl 
To: "user@hbase.apache.org" ; Sreeram K 

Cc: 
Sent: Monday, December 12, 2011 10:45 PM
Subject: Re: HBase- Scan with wildcard character

First off, what you want is:   select * from table where id like '4E1167677%'   
in MySQL.
Relational databases can typically use indexes to satisfy like "xxx%" type 
queries, but not "%xxx%" queries.

HBase is really good at "xxx%" (prefix) type queries.

Just create a scan object, set the startkey to "4E1167677", then call next 
resulting scanner until the returned key no longer start with "4E1167677".

In your particular case (since your keys are hex numbers), you can even set the 
stopKey to "4E1167677z" (the z will sort after any valid hex digit),
and the scanner will automatically stop at the last possible match.


Have a look at the the Scan object and HTable.getScanner(...)


-- Lars


- Original Message -
From: Sreeram K 
To: "user@hbase.apache.org" 
Cc: 
Sent: Monday, December 12, 2011 6:58 PM
Subject: HBase- Scan with wildcard character

Hi,

I have a Table defined with the 3 columns.
I am looking for a query in HBase shell to print all the values starting with 
some characters in Rowkey.

Example:
My rowids are:Coulm+Key
4E11676773AC3B6E9A3FE1CCD1051B8C&1323736118749497       
colum=xx:size,timestamp=67667767,value=

4E11676773AC3B6E9A3FE1CCD1051B8C&132373611874988         
colum=11x:size,timestamp=67667767,value=

4E11676773AC3B6E9A3FE1CCD1051B8C&132373611565656         
colum=1xx:size,timestamp=67667767,value=


Something similar to mysql => select * from table where id='%4E1167677%'

do we have any command like this in the HBase shell - Scan with wild characters?

(or) should we end up using HIVE ? what are the other options?

Can you please let me know.

-Sreeram



Re: HBase- Scan with wildcard character

2011-12-13 Thread Sreeram K
Thanks Lars. I am looking into that.

Is there a way we can search all the entries starting  with 565HGOUO and print 
all the rows?

Example:
scan 'SAMPLE_TABLE' ,{COLUMNS 
=>['sample_info:FILENAME','event_info:FILENAME'],STARTROW=>'sample1%'}

I am seeing all the Rows and information after that sample1% row in the DB.
if for instance I have extra1rowid after sample1%, I am able to see that also.

I am looking for a query to print only the rows which has Rowid starting with 
sample1%.

can you let me know if we can get a query like that on hbase shell



- Original Message -
From: lars hofhansl 
To: "user@hbase.apache.org" ; Sreeram K 

Cc: 
Sent: Tuesday, December 13, 2011 11:36 AM
Subject: Re: HBase- Scan with wildcard character

info:regioninfo is actually a serialized Java object (HRegionInfo). What you 
see in the shell the result of HRegionInfo.toString(), which looks like a 

ruby object, but it is really just a string (see HRegionInfo.toString()).




From: Sreeram K 
To: "user@hbase.apache.org" ; lars hofhansl 
 
Sent: Tuesday, December 13, 2011 12:16 AM
Subject: Re: HBase- Scan with wildcard character

Thanks Lars, I will look into that .

one more question: on hbase shell.

If I have :
           hbase> scan 't1.', {COLUMNS => 'info:regioninfo'}  , it is printing 
all the colums of regioninfo.


can I have a condition like:if colum,info.regioninfo=2 (value) than print all 
the associated columns like info:regioninfo1, regioninfo2.



- Original Message -
From: lars hofhansl 
To: "user@hbase.apache.org" ; Sreeram K 

Cc: 
Sent: Monday, December 12, 2011 10:45 PM
Subject: Re: HBase- Scan with wildcard character

First off, what you want is:   select * from table where id like '4E1167677%'   
in MySQL.
Relational databases can typically use indexes to satisfy like "xxx%" type 
queries, but not "%xxx%" queries.

HBase is really good at "xxx%" (prefix) type queries.

Just create a scan object, set the startkey to "4E1167677", then call next 
resulting scanner until the returned key no longer start with "4E1167677".

In your particular case (since your keys are hex numbers), you can even set the 
stopKey to "4E1167677z" (the z will sort after any valid hex digit),
and the scanner will automatically stop at the last possible match.


Have a look at the the Scan object and HTable.getScanner(...)


-- Lars


- Original Message -
From: Sreeram K 
To: "user@hbase.apache.org" 
Cc: 
Sent: Monday, December 12, 2011 6:58 PM
Subject: HBase- Scan with wildcard character

Hi,

I have a Table defined with the 3 columns.
I am looking for a query in HBase shell to print all the values starting with 
some characters in Rowkey.

Example:
My rowids are:Coulm+Key
4E11676773AC3B6E9A3FE1CCD1051B8C&1323736118749497       
colum=xx:size,timestamp=67667767,value=

4E11676773AC3B6E9A3FE1CCD1051B8C&132373611874988         
colum=11x:size,timestamp=67667767,value=

4E11676773AC3B6E9A3FE1CCD1051B8C&132373611565656         
colum=1xx:size,timestamp=67667767,value=


Something similar to mysql => select * from table where id='%4E1167677%'

do we have any command like this in the HBase shell - Scan with wild characters?

(or) should we end up using HIVE ? what are the other options?

Can you please let me know.

-Sreeram


Re: HBase- Scan with wildcard character

2011-12-13 Thread Sreeram K
Thanks Doug. I am looking more from HBase shell for this.


- Original Message -
From: Doug Meil 
To: "user@hbase.apache.org" ; Sreeram K 
; lars hofhansl 
Cc: 
Sent: Tuesday, December 13, 2011 2:01 PM
Subject: Re: HBase- Scan with wildcard character


Hi there-

At some point you're probably going to want to get out of the shell, take
a look at this...

http://hbase.apache.org/book.html#scan






On 12/13/11 4:43 PM, "Sreeram K"  wrote:

>Thanks Lars. I am looking into that.
>
>Is there a way we can search all the entries starting  with 565HGOUO and
>print all the rows?
>
>Example:
>scan 'SAMPLE_TABLE' ,{COLUMNS
>=>['sample_info:FILENAME','event_info:FILENAME'],STARTROW=>'sample1%'}
>
>I am seeing all the Rows and information after that sample1% row in the
>DB.
>if for instance I have extra1rowid after sample1%, I am able to see that
>also.
>
>I am looking for a query to print only the rows which has Rowid starting
>with sample1%.
>
>can you let me know if we can get a query like that on hbase shell
>
>
>
>- Original Message -
>From: lars hofhansl 
>To: "user@hbase.apache.org" ; Sreeram K
>
>Cc: 
>Sent: Tuesday, December 13, 2011 11:36 AM
>Subject: Re: HBase- Scan with wildcard character
>
>info:regioninfo is actually a serialized Java object (HRegionInfo). What
>you see in the shell the result of HRegionInfo.toString(), which looks
>like a 
>
>ruby object, but it is really just a string (see HRegionInfo.toString()).
>
>
>
>
>From: Sreeram K 
>To: "user@hbase.apache.org" ; lars hofhansl
>
>Sent: Tuesday, December 13, 2011 12:16 AM
>Subject: Re: HBase- Scan with wildcard character
>
>Thanks Lars, I will look into that .
>
>one more question: on hbase shell.
>
>If I have :
>           hbase> scan 't1.', {COLUMNS => 'info:regioninfo'}  , it is
>printing all the colums of regioninfo.
>
>
>can I have a condition like:if colum,info.regioninfo=2 (value) than print
>all the associated columns like info:regioninfo1, regioninfo2.
>
>
>
>- Original Message -
>From: lars hofhansl 
>To: "user@hbase.apache.org" ; Sreeram K
>
>Cc: 
>Sent: Monday, December 12, 2011 10:45 PM
>Subject: Re: HBase- Scan with wildcard character
>
>First off, what you want is:   select * from table where id like
>'4E1167677%'   in MySQL.
>Relational databases can typically use indexes to satisfy like "xxx%"
>type queries, but not "%xxx%" queries.
>
>HBase is really good at "xxx%" (prefix) type queries.
>
>Just create a scan object, set the startkey to "4E1167677", then call
>next resulting scanner until the returned key no longer start with
>"4E1167677".
>
>In your particular case (since your keys are hex numbers), you can even
>set the stopKey to "4E1167677z" (the z will sort after any valid hex
>digit),
>and the scanner will automatically stop at the last possible match.
>
>
>Have a look at the the Scan object and HTable.getScanner(...)
>
>
>-- Lars
>
>
>- Original Message -
>From: Sreeram K 
>To: "user@hbase.apache.org" 
>Cc: 
>Sent: Monday, December 12, 2011 6:58 PM
>Subject: HBase- Scan with wildcard character
>
>Hi,
>
>I have a Table defined with the 3 columns.
>I am looking for a query in HBase shell to print all the values starting
>with some characters in Rowkey.
>
>Example:
>My rowids are:Coulm+Key
>4E11676773AC3B6E9A3FE1CCD1051B8C&1323736118749497
>colum=xx:size,timestamp=67667767,value=
>
>4E11676773AC3B6E9A3FE1CCD1051B8C&132373611874988
>colum=11x:size,timestamp=67667767,value=
>
>4E11676773AC3B6E9A3FE1CCD1051B8C&132373611565656
>colum=1xx:size,timestamp=67667767,value=
>
>
>Something similar to mysql => select * from table where id='%4E1167677%'
>
>do we have any command like this in the HBase shell - Scan with wild
>characters?
>
>(or) should we end up using HIVE ? what are the other options?
>
>Can you please let me know.
>
>-Sreeram 
>


Re: HBase- Scan with wildcard character

2011-12-14 Thread Sreeram K
Thank you Lars.
STOPROW did work in my hbase shell as you suggested



- Original Message -
From: lars hofhansl 
To: "user@hbase.apache.org" ; Sreeram K 

Cc: 
Sent: Tuesday, December 13, 2011 3:56 PM
Subject: Re: HBase- Scan with wildcard character

The shell lets you only do that much.
HBase does not support % wildcard. It just happens to work in your case because 
% has a low ascii code.


You set the startRow of the scan. It does not need to exist, but the value must 
sort before the rows your are looking for and after all rows before it.
Same for the stopRow. It does not need to exist, but it must sort after the 
rows your are looking and before all rows you do not want to see.

Try setting STARTROW to "sample1" and STOPROW to "sample1\255". That will work 
as long as ascii 255 is not used in your row keys.

-- Lars



____
From: Sreeram K 
To: "user@hbase.apache.org" ; lars hofhansl 
 
Sent: Tuesday, December 13, 2011 2:16 PM
Subject: Re: HBase- Scan with wildcard character

Thanks Doug. I am looking more from HBase shell for this.


- Original Message -
From: Doug Meil 
To: "user@hbase.apache.org" ; Sreeram K 
; lars hofhansl 
Cc: 
Sent: Tuesday, December 13, 2011 2:01 PM
Subject: Re: HBase- Scan with wildcard character


Hi there-

At some point you're probably going to want to get out of the shell, take
a look at this...

http://hbase.apache.org/book.html#scan






On 12/13/11 4:43 PM, "Sreeram K"  wrote:

>Thanks Lars. I am looking into that.
>
>Is there a way we can search all the entries starting  with 565HGOUO and
>print all the rows?
>
>Example:
>scan 'SAMPLE_TABLE' ,{COLUMNS
>=>['sample_info:FILENAME','event_info:FILENAME'],STARTROW=>'sample1%'}
>
>I am seeing all the Rows and information after that sample1% row in the
>DB.
>if for instance I have extra1rowid after sample1%, I am able to see that
>also.
>
>I am looking for a query to print only the rows which has Rowid starting
>with sample1%.
>
>can you let me know if we can get a query like that on hbase shell
>
>
>
>- Original Message -
>From: lars hofhansl 
>To: "user@hbase.apache.org" ; Sreeram K
>
>Cc: 
>Sent: Tuesday, December 13, 2011 11:36 AM
>Subject: Re: HBase- Scan with wildcard character
>
>info:regioninfo is actually a serialized Java object (HRegionInfo). What
>you see in the shell the result of HRegionInfo.toString(), which looks
>like a 
>
>ruby object, but it is really just a string (see HRegionInfo.toString()).
>
>
>
>
>From: Sreeram K 
>To: "user@hbase.apache.org" ; lars hofhansl
>
>Sent: Tuesday, December 13, 2011 12:16 AM
>Subject: Re: HBase- Scan with wildcard character
>
>Thanks Lars, I will look into that .
>
>one more question: on hbase shell.
>
>If I have :
>           hbase> scan 't1.', {COLUMNS => 'info:regioninfo'}  , it is
>printing all the colums of regioninfo.
>
>
>can I have a condition like:if colum,info.regioninfo=2 (value) than print
>all the associated columns like info:regioninfo1, regioninfo2.
>
>
>
>- Original Message -
>From: lars hofhansl 
>To: "user@hbase.apache.org" ; Sreeram K
>
>Cc: 
>Sent: Monday, December 12, 2011 10:45 PM
>Subject: Re: HBase- Scan with wildcard character
>
>First off, what you want is:   select * from table where id like
>'4E1167677%'   in MySQL.
>Relational databases can typically use indexes to satisfy like "xxx%"
>type queries, but not "%xxx%" queries.
>
>HBase is really good at "xxx%" (prefix) type queries.
>
>Just create a scan object, set the startkey to "4E1167677", then call
>next resulting scanner until the returned key no longer start with
>"4E1167677".
>
>In your particular case (since your keys are hex numbers), you can even
>set the stopKey to "4E1167677z" (the z will sort after any valid hex
>digit),
>and the scanner will automatically stop at the last possible match.
>
>
>Have a look at the the Scan object and HTable.getScanner(...)
>
>
>-- Lars
>
>
>- Original Message -
>From: Sreeram K 
>To: "user@hbase.apache.org" 
>Cc: 
>Sent: Monday, December 12, 2011 6:58 PM
>Subject: HBase- Scan with wildcard character
>
>Hi,
>
>I have a Table defined with the 3 columns.
>I am looking for a query in HBase shell to print all the values starting
>with some characters in Rowkey.
>
>Example:
>My rowids are:Coulm+Key
>4E11676773AC3B6E9A3FE1CCD1051B8C&1323736118749497
>colum=xx:size,timestamp=67667767,value=
>
>4E11676773AC3B6E9A3FE1CCD1051B8C&132373611874988
>colum=11x:size,timestamp=67667767,value=
>
>4E11676773AC3B6E9A3FE1CCD1051B8C&132373611565656
>colum=1xx:size,timestamp=67667767,value=
>
>
>Something similar to mysql => select * from table where id='%4E1167677%'
>
>do we have any command like this in the HBase shell - Scan with wild
>characters?
>
>(or) should we end up using HIVE ? what are the other options?
>
>Can you please let me know.
>
>-Sreeram 
>


Re: HBase- Scan with wildcard character

2011-12-15 Thread Sreeram K
I have one more question..
Can we have a query in HBase shell based on Colum Value.

I am looking at scan-> with Coulm ID? is that possible..the way we are doing 
with STARTROW?
Can you pl pont me to an example..



- Original Message -
From: Sreeram K 
To: "user@hbase.apache.org" ; lars hofhansl 

Cc: 
Sent: Wednesday, December 14, 2011 6:46 AM
Subject: Re: HBase- Scan with wildcard character

Thank you Lars.
STOPROW did work in my hbase shell as you suggested



- Original Message -
From: lars hofhansl 
To: "user@hbase.apache.org" ; Sreeram K 

Cc: 
Sent: Tuesday, December 13, 2011 3:56 PM
Subject: Re: HBase- Scan with wildcard character

The shell lets you only do that much.
HBase does not support % wildcard. It just happens to work in your case because 
% has a low ascii code.


You set the startRow of the scan. It does not need to exist, but the value must 
sort before the rows your are looking for and after all rows before it.
Same for the stopRow. It does not need to exist, but it must sort after the 
rows your are looking and before all rows you do not want to see.

Try setting STARTROW to "sample1" and STOPROW to "sample1\255". That will work 
as long as ascii 255 is not used in your row keys.

-- Lars



____
From: Sreeram K 
To: "user@hbase.apache.org" ; lars hofhansl 
 
Sent: Tuesday, December 13, 2011 2:16 PM
Subject: Re: HBase- Scan with wildcard character

Thanks Doug. I am looking more from HBase shell for this.


- Original Message -
From: Doug Meil 
To: "user@hbase.apache.org" ; Sreeram K 
; lars hofhansl 
Cc: 
Sent: Tuesday, December 13, 2011 2:01 PM
Subject: Re: HBase- Scan with wildcard character


Hi there-

At some point you're probably going to want to get out of the shell, take
a look at this...

http://hbase.apache.org/book.html#scan






On 12/13/11 4:43 PM, "Sreeram K"  wrote:

>Thanks Lars. I am looking into that.
>
>Is there a way we can search all the entries starting  with 565HGOUO and
>print all the rows?
>
>Example:
>scan 'SAMPLE_TABLE' ,{COLUMNS
>=>['sample_info:FILENAME','event_info:FILENAME'],STARTROW=>'sample1%'}
>
>I am seeing all the Rows and information after that sample1% row in the
>DB.
>if for instance I have extra1rowid after sample1%, I am able to see that
>also.
>
>I am looking for a query to print only the rows which has Rowid starting
>with sample1%.
>
>can you let me know if we can get a query like that on hbase shell
>
>
>
>- Original Message -
>From: lars hofhansl 
>To: "user@hbase.apache.org" ; Sreeram K
>
>Cc: 
>Sent: Tuesday, December 13, 2011 11:36 AM
>Subject: Re: HBase- Scan with wildcard character
>
>info:regioninfo is actually a serialized Java object (HRegionInfo). What
>you see in the shell the result of HRegionInfo.toString(), which looks
>like a 
>
>ruby object, but it is really just a string (see HRegionInfo.toString()).
>
>
>
>
>From: Sreeram K 
>To: "user@hbase.apache.org" ; lars hofhansl
>
>Sent: Tuesday, December 13, 2011 12:16 AM
>Subject: Re: HBase- Scan with wildcard character
>
>Thanks Lars, I will look into that .
>
>one more question: on hbase shell.
>
>If I have :
>           hbase> scan 't1.', {COLUMNS => 'info:regioninfo'}  , it is
>printing all the colums of regioninfo.
>
>
>can I have a condition like:if colum,info.regioninfo=2 (value) than print
>all the associated columns like info:regioninfo1, regioninfo2.
>
>
>
>- Original Message -
>From: lars hofhansl 
>To: "user@hbase.apache.org" ; Sreeram K
>
>Cc: 
>Sent: Monday, December 12, 2011 10:45 PM
>Subject: Re: HBase- Scan with wildcard character
>
>First off, what you want is:   select * from table where id like
>'4E1167677%'   in MySQL.
>Relational databases can typically use indexes to satisfy like "xxx%"
>type queries, but not "%xxx%" queries.
>
>HBase is really good at "xxx%" (prefix) type queries.
>
>Just create a scan object, set the startkey to "4E1167677", then call
>next resulting scanner until the returned key no longer start with
>"4E1167677".
>
>In your particular case (since your keys are hex numbers), you can even
>set the stopKey to "4E1167677z" (the z will sort after any valid hex
>digit),
>and the scanner will automatically stop at the last possible match.
>
>
>Have a look at the the Scan object and HTable.getScanner(...)
>
>
>-- Lars
>
>
>- Original Message -
>From: Sreeram K 
>To: "user@hbase.apache.org" 
>Cc: 
>Sent: 

Re: HBase- Scan with wildcard character

2011-12-15 Thread Sreeram K
Thanks for the reply.
But that is from Java ..I am looking from the HBase shell?


- Original Message -
From: Stack 
To: user@hbase.apache.org; Sreeram K 
Cc: 
Sent: Thursday, December 15, 2011 10:10 AM
Subject: Re: HBase- Scan with wildcard character

On Thu, Dec 15, 2011 at 8:59 AM, Sreeram K  wrote:
> I have one more question..
> Can we have a query in HBase shell based on Colum Value.
>
> I am looking at scan-> with Coulm ID? is that possible..the way we are doing 
> with STARTROW?
> Can you pl pont me to an example..
>
>

You need to use a value filter:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ValueFilter.html

St.Ack



Pagination of HBase Scan output

2016-01-05 Thread Sreeram Venkatasubramanian
Hi,

 I am looking for options to batch the output of HBase scan with prefix filter, 
so that it can be paginated at the front end.

Please let me know if there recommended methods to do the same.

Thank you.

Sreeram=
 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***