Error when starting hbase shell
Hi, I built HBase using cygwin in my local machine (the master branch) and I get below error when starting up hbase shell. NameError: cannot initialize Java class org.apache.hadoop.hbase.HColumnDescriptor get_proxy_or_package_under_package at org/jruby/javasupport/JavaUtilities.java:54 method_missing at file:/C:/Users//.m2/repository/org/jruby/jruby-complete/1.6.8/jruby-com plete-1.6.8.jar!/builtin/javasupport/java.rb:51 HBaseConstants at d:/hbase-master/hbase-master/bin/../hbase-shell/src/main/ruby/hbase.rb:93 (root) at d:/hbase-master/hbase-master/bin/../hbase-shell/src/main/ruby/hbase.rb:34 require at org/jruby/RubyKernel.java:1062 (root) at d:/hbase-master/hbase-master/bin/../bin/hirb.rb:118 The JDK/JRE version I have is 1.8.0_51. Any thoughts on what is going on here ? Sreeram
Re: Error when starting hbase shell
Just to add, I verified that hbase-client-2.0.0-SNAPSHOT.jar (which contains org.apache.hadoop.hbase.HColumnDescriptor) is in hbase class path. On Fri, Jan 8, 2016 at 2:44 PM, Sreeram wrote: > Hi, > > I built HBase using cygwin in my local machine (the master branch) and I > get below error when starting up hbase shell. > > NameError: cannot initialize Java class > org.apache.hadoop.hbase.HColumnDescriptor > get_proxy_or_package_under_package at > org/jruby/javasupport/JavaUtilities.java:54 > method_missing at > file:/C:/Users//.m2/repository/org/jruby/jruby-complete/1.6.8/jruby-com > plete-1.6.8.jar!/builtin/javasupport/java.rb:51 > HBaseConstants at > d:/hbase-master/hbase-master/bin/../hbase-shell/src/main/ruby/hbase.rb:93 > (root) at > d:/hbase-master/hbase-master/bin/../hbase-shell/src/main/ruby/hbase.rb:34 > require at org/jruby/RubyKernel.java:1062 > (root) at > d:/hbase-master/hbase-master/bin/../bin/hirb.rb:118 > > The JDK/JRE version I have is 1.8.0_51. > Any thoughts on what is going on here ? > > Sreeram >
Re: How to implement increment in an idempotent manner
The incremented field is more like an amount field that will be storing the aggregate amount. Since the field will be incremented concurrently by multiple bolts running in parallel, storing the value before increment and then doing a put in case of replay will not help. Reason to have this field is to pre-compute a certain aggregate amount and materialize it in the Hbase table. On Fri, Mar 18, 2016 at 3:30 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > At the beginning of your Storm bolt process can you not do a put of "0"? So > it start back from scratch? Or else you will need to query the value, and > keep the value to put it back if you need to replay your bolt > > Other option is, you increment a specific difference column and at the end > if you are successful with your bolt you increment te initial column with > the new total counter? > > JMS > > 2016-03-18 5:33 GMT-04:00 Sreeram : > > > Hi, > > > > I am looking for suggestions from community on implementing HBase > > increment in a idempotent manner. > > > > My use case is a storm Hbase bolt that atomically increments a HBase > > counter. Replay of the storm bolt results in a double increment. > > > > Any suggestion on the approach to be taken is welcome. > > > > Thank you. > > > > Regards, > > Sreeram > > >
How to implement increment in an idempotent manner
Hi, I am looking for suggestions from community on implementing HBase increment in a idempotent manner. My use case is a storm Hbase bolt that atomically increments a HBase counter. Replay of the storm bolt results in a double increment. Any suggestion on the approach to be taken is welcome. Thank you. Regards, Sreeram
Re: How to implement increment in an idempotent manner
All my Hbase processing is going to be inside any given bolt. So I cannot use the second option. On Fri, Mar 18, 2016 at 3:46 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > What's about the other option where each bolt increment it's own column and > at theend ou aggregate those few columns together? > > 2016-03-18 6:14 GMT-04:00 Sreeram : > > > The incremented field is more like an amount field that will be storing > the > > aggregate amount. Since the field will be incremented concurrently by > > multiple bolts running in parallel, storing the value before increment > and > > then doing a put in case of replay will not help. > > > > Reason to have this field is to pre-compute a certain aggregate amount > and > > materialize it in the Hbase table. > > > > On Fri, Mar 18, 2016 at 3:30 PM, Jean-Marc Spaggiari < > > jean-m...@spaggiari.org> wrote: > > > > > At the beginning of your Storm bolt process can you not do a put of > "0"? > > So > > > it start back from scratch? Or else you will need to query the value, > and > > > keep the value to put it back if you need to replay your bolt > > > > > > Other option is, you increment a specific difference column and at the > > end > > > if you are successful with your bolt you increment te initial column > with > > > the new total counter? > > > > > > JMS > > > > > > 2016-03-18 5:33 GMT-04:00 Sreeram : > > > > > > > Hi, > > > > > > > > I am looking for suggestions from community on implementing HBase > > > > increment in a idempotent manner. > > > > > > > > My use case is a storm Hbase bolt that atomically increments a HBase > > > > counter. Replay of the storm bolt results in a double increment. > > > > > > > > Any suggestion on the approach to be taken is welcome. > > > > > > > > Thank you. > > > > > > > > Regards, > > > > Sreeram > > > > > > > > > >
Re: Can not connect local java client to a remote Hbase
Hi Soufiani, Can you try changing your configuration to have region server listen on 0.0.0.0:16020 and master listen on 0.0.0.0:16000 ? 127.0.0.1 being local loopback will not be accessible from outside. Regards, Sreeram On Fri, Apr 22, 2016 at 9:00 PM, SOUFIANI Mustapha | السفياني مصطفى < s.mustaph...@gmail.com> wrote: > Thanks Sachine for your help, I already checked this issue on the officiel > pentaho users form and it seems to be OK for them too > I really don't know what could be the problem for this > but any way, thanks again for your help > Regards. > > 2016-04-22 16:17 GMT+01:00 Sachin Mittal : > > > you ports are open. Your settings are fine. Issue seems to be elsewhere > bu > > I am not sure where. > > check with Pentaho maybe. > > > > On Fri, Apr 22, 2016 at 8:44 PM, SOUFIANI Mustapha | السفياني مصطفى < > > s.mustaph...@gmail.com> wrote: > > > > > Maybe those ports are not open: > > > hduser@big-services:~$ telnet localhost 16020 > > > Trying ::1... > > > Trying 127.0.0.1... > > > Connected to localhost. > > > Escape character is '^]'. > > > > > >
Maximum limit on HBase cluster size
Dear All, Looking forward to your views on the maximum limit of HBase cluster size. We are currently designing a HBase cluster and one of the tables (designed in wide format) is expected to have roughly 6 billion rows in production by 3 years (with an additional 200 million rows getting added each month). In addition, we are expecting roughly 250 columns per row. Expected table data volume is around 250 TB (at end of 3 years, without considering HDFS replication) and growing by 7 TB per month. While we are provisioning the number of nodes based on expected data volume, wanted to check if there are any limits on the number of rows per cluster. Will it be advisable to split the cluster in such situation into two or more independent clusters? Will there be any impact to the read/write throughput/latency as the table grows over time? Please advise. Regards, Sreeram
Re: Maximum limit on HBase cluster size
Hi Ted, >From the link "Around 50-100 regions is a good number for a table with 1 or 2 column families. Remember that a region is a contiguous segment of a column family.". This number 50-100 regions per table at the level of individual region server or for the entire cluster ? Thanks, Sreeram On Wed, Sep 7, 2016 at 4:18 PM, Ted Yu wrote: > With properly designed schema, you don't need to split the cluster. > > Please see: > http://hbase.apache.org/book.html#schema > > > On Sep 7, 2016, at 1:59 AM, Sreeram wrote: > > > > Dear All, > > > > > > > > Looking forward to your views on the maximum limit of HBase cluster size. > > > > > > > > We are currently designing a HBase cluster and one of the tables > (designed > > in wide format) is expected to have roughly 6 billion rows in production > by > > 3 years (with an additional 200 million rows getting added each month). > In > > addition, we are expecting roughly 250 columns per row. Expected table > > data volume is around 250 TB (at end of 3 years, without considering HDFS > > replication) and growing by 7 TB per month. > > > > > > > > While we are provisioning the number of nodes based on expected data > > volume, wanted to check if there are any limits on the number of rows per > > cluster. > > > > > > > > Will it be advisable to split the cluster in such situation into two or > > more independent clusters? Will there be any impact to the read/write > > throughput/latency as the table grows over time? > > > > > > > > Please advise. > > > > > > > > Regards, > > > > Sreeram >
Viable approaches to fail over HBase cluster across data centers
Dear All, Please let me know your thoughts on viable approaches to fail over HBase cluster across data centers in case of a primary data center outage. The deployment scenario has zero data loss as one of the primary design goals. Deployment scenario is Active-Passive. In case of active cluster being down, there must be zero data loss fail over to the passive cluster. I understand that the built-in table level replication using 'add_peer' might still lead to data loss since it is asynchronous. As a related note, is there is a way to specify the location (e.g. network drive) where HBase WAL files in HDFS need to be written to ? The network drive has synchronous replication across data centers. If the WAL files can be written to the replicated network drives, can we recover in-flight data in the passive cluster and resume operations from there ? Regards, Sreeram
Maximum size of HBase row
Hi All, Please let me know if the maximum size of a HBase row (in terms of storage space) will be equal to the configured size of a region? I understand the parameter hbase.table.max.rowsize to be the maximum bytes that can be transferred in a single get/scan operation and not related to the actual size of row in HBase. Is my understanding correct? Kindly let me know. Regards, Sreeram
Question on WALEdit
Hi, TL;DR: In my use case I am setting attributes for Puts and Deletes using setAttribute(). I would like to know if it is possible to get the attributes that I had set from the WALEdit ? Here is my use case in detail: I have a replicated cluster A which gets replicated to cluster B. From cluster B, I would like to track the events as they get written to B. I set the event-id as an attribute to the mutation in cluster A. I will have a coProcessor in cluster B that gets invoked on postWALWrite, If I can retrieve the event-id from the WALEdit, I would be able to track those events that got replicated successfully in cluster B. I went through the WALEdit API and it is not obvious to me if it is possible to retrieve the attributes set on the row mutation. Kindly let me know your suggestions. Regards, Sreeram
Re: Question on WALEdit
Thank you very much Ted. I understand fetching the tags will fetch the associated attributes for a mutation. Will try out the same. Regards, Sreeram On 29 Jan 2017 00:37, "Ted Yu" wrote: In CellUtil, there is the following method: public static Tag getTag(Cell cell, byte type){ In MobUtils, you can find sample usage: public static Tag getTableNameTag(Cell cell) { if (cell.getTagsLength() > 0) { return CellUtil.getTag(cell, TagType.MOB_TABLE_NAME_TAG_TYPE); FYI On Sat, Jan 28, 2017 at 8:29 AM, Ted Yu wrote: > I haven't found the API you were looking for. > > Which release of hbase are you using ? > I assume it supports tags. > > If you use tag to pass event-id, you can retrieve thru this method of > WALEdit: > > public ArrayList getCells() { > > From Cell, there're 3 methods for retrieving tag starting with: > > byte[] getTagsArray(); > > Cheers > > On Sat, Jan 28, 2017 at 4:23 AM, Sreeram wrote: > >> Hi, >> >> TL;DR: In my use case I am setting attributes for Puts and Deletes using >> setAttribute(). I would like to know if it is possible to get the >> attributes that I had set from the WALEdit ? >> >> Here is my use case in detail: I have a replicated cluster A which gets >> replicated to cluster B. From cluster B, I would like to track the events >> as they get written to B. I set the event-id as an attribute to the >> mutation in cluster A. >> >> I will have a coProcessor in cluster B that gets invoked on postWALWrite, >> If I can retrieve the event-id from the WALEdit, I would be able to track >> those events that got replicated successfully in cluster B. >> >> I went through the WALEdit API and it is not obvious to me if it is >> possible to retrieve the attributes set on the row mutation. >> >> Kindly let me know your suggestions. >> >> Regards, >> Sreeram >> > >
Unable to get coprocessor debug logs in regionserver.
Hi, I am writing a coprocessor for postWALWrite event. I do not see the debug logs for the coprocessor in RS log. I could see that the coprocessor got loaded by the regionserver - below line is from RS log. 2017-03-20 18:59:17,132 INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: System coprocessor Test.TestWALEditCP was loaded successfully with priority (536870912). Any thoughts what can be going wrong ? Thanks, Sreeram PS: My code is below. package Test; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.hadoop.hbase.regionserver.wal.FSHLog; import org.apache.hadoop.hbase.regionserver.wal.WALEdit; import org.apache.hadoop.hbase.wal.WALKey; public class TestWALEditCP extends BaseWALObserver { public static final Log LOG = LogFactory.getLog(FSHLog.class); @Override public void postWALWrite(ObserverContext ctx, HRegionInfo info, WALKey logkey, WALEdit logEdit) { LOG.info("Post WAL edit is being triggered"); // <--- This line does not get printed in RS log return; } }
Question in WALEdit
Hi, I have below questions on WALEdit. Looking forward to answer from the community. a) I understand that all Cells in a given WALEdit form part of a single transaction. Since HBase atomicity is at row level, this implies all Cells in a given WALEdit have the same row key. is this understanding is correct b) With MultiWAL, does log sequence monotonically increase based on transaction time stamp ? Specifically, suppose there two transactions for two different tables for a single region server at times t0 and t1 (t0 < t1). In the presence of MultiWAL, will the postWALEdit() coprocessor event for transaction 0 be triggered before than transaction 1? Thanks, Sreeram PS: I use HBase version 1.2.0
Re: Question in WALEdit
I am sorry for typo. For the second question I meant postWALWrite in WALObserver. default void postWALWrite <https://hbase.apache.org/devapidocs/src-html/org/apache/hadoop/hbase/coprocessor/WALObserver.html#line.86> (ObserverContext <https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/ObserverContext.html>https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/WALCoprocessorEnvironment.html>> ctx, HRegionInfo <https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/HRegionInfo.html> info, WALKey <https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/wal/WALKey.html> logKey, WALEdit <https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/wal/WALEdit.html> logEdit) throws IOException <http://docs.oracle.com/javase/8/docs/api/java/io/IOException.html?is-external=true> Thanks On 23 Mar 2017 02:37, "Ted Yu" wrote: > Sreeram: > For #2, did you mean this method ? > > default void postWALRestore(final ObserverContext RegionCoprocessorEnvironment> ctx, > > HRegionInfo info, WALKey logKey, WALEdit logEdit) throws IOException > {} > > On Wed, Mar 22, 2017 at 12:56 PM, Vladimir Rodionov < > vladrodio...@gmail.com> > wrote: > > > a) HBase does not support transaction - it only guarantees that single > > mutation to a row-key is atomic. WALEdit can contains cells (mutations) > > from different rows (for example when you do butchMutatate all operations > > go to the same WALEdit afaik) > > b) I coud not find postWALEdit() in RegionObserver API. What coprocessor > > hook did you mean exactly? > > > > -Vlad > > > > On Wed, Mar 22, 2017 at 5:19 AM, Sreeram wrote: > > > > > Hi, > > > > > > I have below questions on WALEdit. Looking forward to answer from the > > > community. > > > > > > a) I understand that all Cells in a given WALEdit form part of a > single > > > transaction. Since HBase atomicity is at row level, this implies all > > Cells > > > in a given WALEdit have the same row key. is this understanding is > > correct > > > > > > > > > > > > b) With MultiWAL, does log sequence monotonically increase based on > > > transaction time stamp ? Specifically, suppose there two transactions > > for > > > two different tables for a single region server at times t0 and t1 (t0 > < > > > t1). In the presence of MultiWAL, will the postWALEdit() coprocessor > > event > > > for transaction 0 be triggered before than transaction 1? > > > > > > > > > > > > Thanks, > > > > > > Sreeram > > > > > > PS: I use HBase version 1.2.0 > > > > > >
RFC: Hash prefix considerations for HBase row key design when storing time series data
Hi, I have put down some thoughts on designing hash prefixes for HBase row key in below link. https://tmblr.co/Z4Ek8e2KYFB0h Request the community to kindly take a look and share your comments. TL;DR version - In specific HBase tables (where data is stored by time series e.g. by day or month) with row key prefixed with a hash, it might be beneficial to limit the number of hash bits to avoid write performance impact. Thanks, Sreeram
Balancing a table's regions across region servers
Hi, I have a requirement where, for one specific table, each region server always needs to manage at least one region - this table has more regions that the number of region servers. Based on HBASE-3373, can I assume that this is taken care automatically by HBase Balancer? The version of HBase that I use is HBase 1.2.0-cdh5.8.2 Kindly let me know Thank you -Sreeram
Re: Balancing a table's regions across region servers
Thank you Ted for reply. Can the parameter hbase.master.balancer.stochastic.tableSkewCost be set at a table level (for a very small table) ? -Sreeram On Tue, Apr 18, 2017 at 3:12 PM, Ted Yu wrote: > If you look at the patch for HBASE-3373, you would see that there is a > config to enable per table balancing. > This was developed before StochasticLoadBalancer became the default > balancer. > > In StochasticLoadBalancer, you need to increase the weight for > hbase.master.balancer.stochastic.tableSkewCost > Default weight is 35. Consider increasing to 500 range. > > FYI > > On Tue, Apr 18, 2017 at 2:21 AM, Sreeram wrote: > >> Hi, >> >> I have a requirement where, for one specific table, each region server >> always needs to manage at least one region - this table has more >> regions that the number of region servers. >> >> Based on HBASE-3373, can I assume that this is taken care >> automatically by HBase Balancer? >> >> The version of HBase that I use is HBase 1.2.0-cdh5.8.2 >> >> Kindly let me know >> >> Thank you >> >> -Sreeram >>
ValueFilter returning earlier values
Hi, When I scan with ValueFilter on a column, I see that it returns older versions too if they happen to match the value in the ValueFilter. The table column family has the property VERSIONS set to 1. I had set setMaxVersions to 1 in the scan object. I was expecting the value filter to return only the latest values for the column, provided they match the filter. Is this the expected behaviour of ValueFilter? Any suggestions if I must be setting any options to skip the older values from coming in the result? Thank you Regards, Sreeram
API to get HBase replication status
Hi, I am trying to understand if the hbase shell commands to get the replication status are based on any underlying API. Specifically I am trying to fetch values of last shipped timestamp and replication lag per regionserver. The ReplicationAdmin does not seem to be providing the information ( or may be its not obvious for me). The version of HBase that I use is 1.2.0-cdh5.8.2 Any help on this regard ? Thanks, Sreeram
How to improve HBase replication throughput between data centers ?
Hi All, I have setup HBase replication between two clusters containing 25 nodes each. The inter-data center network link has a capacity of 500 MBPS. I have been running some tests to understand the speed of replication. I am observing that the replication speed does not go more than 5 MBPS. On reading up regarding the same, I understand that the speed of data transfer depends on OS level TCP socket read and write buffer sizes. Below are the OS level parameters that I see for the socket size # cat /proc/sys/net/ipv4/tcp_wmem 4096 (min) 16384 (default) 4194304 (max) # cat /proc/sys/net/ipv4/tcp_rmem 4096 (min)87380 (default) 6291456 (max) The default write buffer size for sockets is 16KB and the read buffer size is around 85KB. There are suggestions [1] to set higher values for the default read and write buffers to fully utilize the link capacity. But I am not sure how to influence HBase to use higher values for the socket read/write buffers when it does replication. Any thoughts from the community on the same? Thanks Sreeram [1] http://www.onlamp.com/pub/a/onlamp/2005/11/17/tcp_tuning.html
Slow HBase write across data center
Hi, I am facing very slow write speed when I write to a HBase cluster in a different data center. The network round trip takes 1 ms. Average time taken for 100 PUTs (each 500KB) takes over 2 seconds. Any network or OS parameter that I need to check ? Will appreciate any inputs from the community on this. Thanks, Sreeram
HBase- Scan with wildcard character
Hi, I have a Table defined with the 3 columns. I am looking for a query in HBase shell to print all the values starting with some characters in Rowkey. Example: My rowids are:Coulm+Key 4E11676773AC3B6E9A3FE1CCD1051B8C&1323736118749497 colum=xx:size,timestamp=67667767,value= 4E11676773AC3B6E9A3FE1CCD1051B8C&132373611874988 colum=11x:size,timestamp=67667767,value= 4E11676773AC3B6E9A3FE1CCD1051B8C&132373611565656 colum=1xx:size,timestamp=67667767,value= Something similar to mysql => select * from table where id='%4E1167677%' do we have any command like this in the HBase shell - Scan with wild characters? (or) should we end up using HIVE ? what are the other options? Can you please let me know. -Sreeram
Re: HBase- Scan with wildcard character
Thanks Lars, I will look into that . one more question: on hbase shell. If I have : hbase> scan 't1.', {COLUMNS => 'info:regioninfo'} , it is printing all the colums of regioninfo. can I have a condition like:if colum,info.regioninfo=2 (value) than print all the associated columns like info:regioninfo1, regioninfo2. - Original Message - From: lars hofhansl To: "user@hbase.apache.org" ; Sreeram K Cc: Sent: Monday, December 12, 2011 10:45 PM Subject: Re: HBase- Scan with wildcard character First off, what you want is: select * from table where id like '4E1167677%' in MySQL. Relational databases can typically use indexes to satisfy like "xxx%" type queries, but not "%xxx%" queries. HBase is really good at "xxx%" (prefix) type queries. Just create a scan object, set the startkey to "4E1167677", then call next resulting scanner until the returned key no longer start with "4E1167677". In your particular case (since your keys are hex numbers), you can even set the stopKey to "4E1167677z" (the z will sort after any valid hex digit), and the scanner will automatically stop at the last possible match. Have a look at the the Scan object and HTable.getScanner(...) -- Lars - Original Message - From: Sreeram K To: "user@hbase.apache.org" Cc: Sent: Monday, December 12, 2011 6:58 PM Subject: HBase- Scan with wildcard character Hi, I have a Table defined with the 3 columns. I am looking for a query in HBase shell to print all the values starting with some characters in Rowkey. Example: My rowids are:Coulm+Key 4E11676773AC3B6E9A3FE1CCD1051B8C&1323736118749497 colum=xx:size,timestamp=67667767,value= 4E11676773AC3B6E9A3FE1CCD1051B8C&132373611874988 colum=11x:size,timestamp=67667767,value= 4E11676773AC3B6E9A3FE1CCD1051B8C&132373611565656 colum=1xx:size,timestamp=67667767,value= Something similar to mysql => select * from table where id='%4E1167677%' do we have any command like this in the HBase shell - Scan with wild characters? (or) should we end up using HIVE ? what are the other options? Can you please let me know. -Sreeram
Re: HBase- Scan with wildcard character
Thanks Lars. I am looking into that. Is there a way we can search all the entries starting with 565HGOUO and print all the rows? Example: scan 'SAMPLE_TABLE' ,{COLUMNS =>['sample_info:FILENAME','event_info:FILENAME'],STARTROW=>'sample1%'} I am seeing all the Rows and information after that sample1% row in the DB. if for instance I have extra1rowid after sample1%, I am able to see that also. I am looking for a query to print only the rows which has Rowid starting with sample1%. can you let me know if we can get a query like that on hbase shell - Original Message - From: lars hofhansl To: "user@hbase.apache.org" ; Sreeram K Cc: Sent: Tuesday, December 13, 2011 11:36 AM Subject: Re: HBase- Scan with wildcard character info:regioninfo is actually a serialized Java object (HRegionInfo). What you see in the shell the result of HRegionInfo.toString(), which looks like a ruby object, but it is really just a string (see HRegionInfo.toString()). From: Sreeram K To: "user@hbase.apache.org" ; lars hofhansl Sent: Tuesday, December 13, 2011 12:16 AM Subject: Re: HBase- Scan with wildcard character Thanks Lars, I will look into that . one more question: on hbase shell. If I have : hbase> scan 't1.', {COLUMNS => 'info:regioninfo'} , it is printing all the colums of regioninfo. can I have a condition like:if colum,info.regioninfo=2 (value) than print all the associated columns like info:regioninfo1, regioninfo2. - Original Message - From: lars hofhansl To: "user@hbase.apache.org" ; Sreeram K Cc: Sent: Monday, December 12, 2011 10:45 PM Subject: Re: HBase- Scan with wildcard character First off, what you want is: select * from table where id like '4E1167677%' in MySQL. Relational databases can typically use indexes to satisfy like "xxx%" type queries, but not "%xxx%" queries. HBase is really good at "xxx%" (prefix) type queries. Just create a scan object, set the startkey to "4E1167677", then call next resulting scanner until the returned key no longer start with "4E1167677". In your particular case (since your keys are hex numbers), you can even set the stopKey to "4E1167677z" (the z will sort after any valid hex digit), and the scanner will automatically stop at the last possible match. Have a look at the the Scan object and HTable.getScanner(...) -- Lars - Original Message - From: Sreeram K To: "user@hbase.apache.org" Cc: Sent: Monday, December 12, 2011 6:58 PM Subject: HBase- Scan with wildcard character Hi, I have a Table defined with the 3 columns. I am looking for a query in HBase shell to print all the values starting with some characters in Rowkey. Example: My rowids are:Coulm+Key 4E11676773AC3B6E9A3FE1CCD1051B8C&1323736118749497 colum=xx:size,timestamp=67667767,value= 4E11676773AC3B6E9A3FE1CCD1051B8C&132373611874988 colum=11x:size,timestamp=67667767,value= 4E11676773AC3B6E9A3FE1CCD1051B8C&132373611565656 colum=1xx:size,timestamp=67667767,value= Something similar to mysql => select * from table where id='%4E1167677%' do we have any command like this in the HBase shell - Scan with wild characters? (or) should we end up using HIVE ? what are the other options? Can you please let me know. -Sreeram
Re: HBase- Scan with wildcard character
Thanks Doug. I am looking more from HBase shell for this. - Original Message - From: Doug Meil To: "user@hbase.apache.org" ; Sreeram K ; lars hofhansl Cc: Sent: Tuesday, December 13, 2011 2:01 PM Subject: Re: HBase- Scan with wildcard character Hi there- At some point you're probably going to want to get out of the shell, take a look at this... http://hbase.apache.org/book.html#scan On 12/13/11 4:43 PM, "Sreeram K" wrote: >Thanks Lars. I am looking into that. > >Is there a way we can search all the entries starting with 565HGOUO and >print all the rows? > >Example: >scan 'SAMPLE_TABLE' ,{COLUMNS >=>['sample_info:FILENAME','event_info:FILENAME'],STARTROW=>'sample1%'} > >I am seeing all the Rows and information after that sample1% row in the >DB. >if for instance I have extra1rowid after sample1%, I am able to see that >also. > >I am looking for a query to print only the rows which has Rowid starting >with sample1%. > >can you let me know if we can get a query like that on hbase shell > > > >- Original Message - >From: lars hofhansl >To: "user@hbase.apache.org" ; Sreeram K > >Cc: >Sent: Tuesday, December 13, 2011 11:36 AM >Subject: Re: HBase- Scan with wildcard character > >info:regioninfo is actually a serialized Java object (HRegionInfo). What >you see in the shell the result of HRegionInfo.toString(), which looks >like a > >ruby object, but it is really just a string (see HRegionInfo.toString()). > > > > >From: Sreeram K >To: "user@hbase.apache.org" ; lars hofhansl > >Sent: Tuesday, December 13, 2011 12:16 AM >Subject: Re: HBase- Scan with wildcard character > >Thanks Lars, I will look into that . > >one more question: on hbase shell. > >If I have : > hbase> scan 't1.', {COLUMNS => 'info:regioninfo'} , it is >printing all the colums of regioninfo. > > >can I have a condition like:if colum,info.regioninfo=2 (value) than print >all the associated columns like info:regioninfo1, regioninfo2. > > > >- Original Message - >From: lars hofhansl >To: "user@hbase.apache.org" ; Sreeram K > >Cc: >Sent: Monday, December 12, 2011 10:45 PM >Subject: Re: HBase- Scan with wildcard character > >First off, what you want is: select * from table where id like >'4E1167677%' in MySQL. >Relational databases can typically use indexes to satisfy like "xxx%" >type queries, but not "%xxx%" queries. > >HBase is really good at "xxx%" (prefix) type queries. > >Just create a scan object, set the startkey to "4E1167677", then call >next resulting scanner until the returned key no longer start with >"4E1167677". > >In your particular case (since your keys are hex numbers), you can even >set the stopKey to "4E1167677z" (the z will sort after any valid hex >digit), >and the scanner will automatically stop at the last possible match. > > >Have a look at the the Scan object and HTable.getScanner(...) > > >-- Lars > > >- Original Message - >From: Sreeram K >To: "user@hbase.apache.org" >Cc: >Sent: Monday, December 12, 2011 6:58 PM >Subject: HBase- Scan with wildcard character > >Hi, > >I have a Table defined with the 3 columns. >I am looking for a query in HBase shell to print all the values starting >with some characters in Rowkey. > >Example: >My rowids are:Coulm+Key >4E11676773AC3B6E9A3FE1CCD1051B8C&1323736118749497 >colum=xx:size,timestamp=67667767,value= > >4E11676773AC3B6E9A3FE1CCD1051B8C&132373611874988 >colum=11x:size,timestamp=67667767,value= > >4E11676773AC3B6E9A3FE1CCD1051B8C&132373611565656 >colum=1xx:size,timestamp=67667767,value= > > >Something similar to mysql => select * from table where id='%4E1167677%' > >do we have any command like this in the HBase shell - Scan with wild >characters? > >(or) should we end up using HIVE ? what are the other options? > >Can you please let me know. > >-Sreeram >
Re: HBase- Scan with wildcard character
Thank you Lars. STOPROW did work in my hbase shell as you suggested - Original Message - From: lars hofhansl To: "user@hbase.apache.org" ; Sreeram K Cc: Sent: Tuesday, December 13, 2011 3:56 PM Subject: Re: HBase- Scan with wildcard character The shell lets you only do that much. HBase does not support % wildcard. It just happens to work in your case because % has a low ascii code. You set the startRow of the scan. It does not need to exist, but the value must sort before the rows your are looking for and after all rows before it. Same for the stopRow. It does not need to exist, but it must sort after the rows your are looking and before all rows you do not want to see. Try setting STARTROW to "sample1" and STOPROW to "sample1\255". That will work as long as ascii 255 is not used in your row keys. -- Lars ____ From: Sreeram K To: "user@hbase.apache.org" ; lars hofhansl Sent: Tuesday, December 13, 2011 2:16 PM Subject: Re: HBase- Scan with wildcard character Thanks Doug. I am looking more from HBase shell for this. - Original Message - From: Doug Meil To: "user@hbase.apache.org" ; Sreeram K ; lars hofhansl Cc: Sent: Tuesday, December 13, 2011 2:01 PM Subject: Re: HBase- Scan with wildcard character Hi there- At some point you're probably going to want to get out of the shell, take a look at this... http://hbase.apache.org/book.html#scan On 12/13/11 4:43 PM, "Sreeram K" wrote: >Thanks Lars. I am looking into that. > >Is there a way we can search all the entries starting with 565HGOUO and >print all the rows? > >Example: >scan 'SAMPLE_TABLE' ,{COLUMNS >=>['sample_info:FILENAME','event_info:FILENAME'],STARTROW=>'sample1%'} > >I am seeing all the Rows and information after that sample1% row in the >DB. >if for instance I have extra1rowid after sample1%, I am able to see that >also. > >I am looking for a query to print only the rows which has Rowid starting >with sample1%. > >can you let me know if we can get a query like that on hbase shell > > > >- Original Message - >From: lars hofhansl >To: "user@hbase.apache.org" ; Sreeram K > >Cc: >Sent: Tuesday, December 13, 2011 11:36 AM >Subject: Re: HBase- Scan with wildcard character > >info:regioninfo is actually a serialized Java object (HRegionInfo). What >you see in the shell the result of HRegionInfo.toString(), which looks >like a > >ruby object, but it is really just a string (see HRegionInfo.toString()). > > > > >From: Sreeram K >To: "user@hbase.apache.org" ; lars hofhansl > >Sent: Tuesday, December 13, 2011 12:16 AM >Subject: Re: HBase- Scan with wildcard character > >Thanks Lars, I will look into that . > >one more question: on hbase shell. > >If I have : > hbase> scan 't1.', {COLUMNS => 'info:regioninfo'} , it is >printing all the colums of regioninfo. > > >can I have a condition like:if colum,info.regioninfo=2 (value) than print >all the associated columns like info:regioninfo1, regioninfo2. > > > >- Original Message - >From: lars hofhansl >To: "user@hbase.apache.org" ; Sreeram K > >Cc: >Sent: Monday, December 12, 2011 10:45 PM >Subject: Re: HBase- Scan with wildcard character > >First off, what you want is: select * from table where id like >'4E1167677%' in MySQL. >Relational databases can typically use indexes to satisfy like "xxx%" >type queries, but not "%xxx%" queries. > >HBase is really good at "xxx%" (prefix) type queries. > >Just create a scan object, set the startkey to "4E1167677", then call >next resulting scanner until the returned key no longer start with >"4E1167677". > >In your particular case (since your keys are hex numbers), you can even >set the stopKey to "4E1167677z" (the z will sort after any valid hex >digit), >and the scanner will automatically stop at the last possible match. > > >Have a look at the the Scan object and HTable.getScanner(...) > > >-- Lars > > >- Original Message - >From: Sreeram K >To: "user@hbase.apache.org" >Cc: >Sent: Monday, December 12, 2011 6:58 PM >Subject: HBase- Scan with wildcard character > >Hi, > >I have a Table defined with the 3 columns. >I am looking for a query in HBase shell to print all the values starting >with some characters in Rowkey. > >Example: >My rowids are:Coulm+Key >4E11676773AC3B6E9A3FE1CCD1051B8C&1323736118749497 >colum=xx:size,timestamp=67667767,value= > >4E11676773AC3B6E9A3FE1CCD1051B8C&132373611874988 >colum=11x:size,timestamp=67667767,value= > >4E11676773AC3B6E9A3FE1CCD1051B8C&132373611565656 >colum=1xx:size,timestamp=67667767,value= > > >Something similar to mysql => select * from table where id='%4E1167677%' > >do we have any command like this in the HBase shell - Scan with wild >characters? > >(or) should we end up using HIVE ? what are the other options? > >Can you please let me know. > >-Sreeram >
Re: HBase- Scan with wildcard character
I have one more question.. Can we have a query in HBase shell based on Colum Value. I am looking at scan-> with Coulm ID? is that possible..the way we are doing with STARTROW? Can you pl pont me to an example.. - Original Message - From: Sreeram K To: "user@hbase.apache.org" ; lars hofhansl Cc: Sent: Wednesday, December 14, 2011 6:46 AM Subject: Re: HBase- Scan with wildcard character Thank you Lars. STOPROW did work in my hbase shell as you suggested - Original Message - From: lars hofhansl To: "user@hbase.apache.org" ; Sreeram K Cc: Sent: Tuesday, December 13, 2011 3:56 PM Subject: Re: HBase- Scan with wildcard character The shell lets you only do that much. HBase does not support % wildcard. It just happens to work in your case because % has a low ascii code. You set the startRow of the scan. It does not need to exist, but the value must sort before the rows your are looking for and after all rows before it. Same for the stopRow. It does not need to exist, but it must sort after the rows your are looking and before all rows you do not want to see. Try setting STARTROW to "sample1" and STOPROW to "sample1\255". That will work as long as ascii 255 is not used in your row keys. -- Lars ____ From: Sreeram K To: "user@hbase.apache.org" ; lars hofhansl Sent: Tuesday, December 13, 2011 2:16 PM Subject: Re: HBase- Scan with wildcard character Thanks Doug. I am looking more from HBase shell for this. - Original Message - From: Doug Meil To: "user@hbase.apache.org" ; Sreeram K ; lars hofhansl Cc: Sent: Tuesday, December 13, 2011 2:01 PM Subject: Re: HBase- Scan with wildcard character Hi there- At some point you're probably going to want to get out of the shell, take a look at this... http://hbase.apache.org/book.html#scan On 12/13/11 4:43 PM, "Sreeram K" wrote: >Thanks Lars. I am looking into that. > >Is there a way we can search all the entries starting with 565HGOUO and >print all the rows? > >Example: >scan 'SAMPLE_TABLE' ,{COLUMNS >=>['sample_info:FILENAME','event_info:FILENAME'],STARTROW=>'sample1%'} > >I am seeing all the Rows and information after that sample1% row in the >DB. >if for instance I have extra1rowid after sample1%, I am able to see that >also. > >I am looking for a query to print only the rows which has Rowid starting >with sample1%. > >can you let me know if we can get a query like that on hbase shell > > > >- Original Message - >From: lars hofhansl >To: "user@hbase.apache.org" ; Sreeram K > >Cc: >Sent: Tuesday, December 13, 2011 11:36 AM >Subject: Re: HBase- Scan with wildcard character > >info:regioninfo is actually a serialized Java object (HRegionInfo). What >you see in the shell the result of HRegionInfo.toString(), which looks >like a > >ruby object, but it is really just a string (see HRegionInfo.toString()). > > > > >From: Sreeram K >To: "user@hbase.apache.org" ; lars hofhansl > >Sent: Tuesday, December 13, 2011 12:16 AM >Subject: Re: HBase- Scan with wildcard character > >Thanks Lars, I will look into that . > >one more question: on hbase shell. > >If I have : > hbase> scan 't1.', {COLUMNS => 'info:regioninfo'} , it is >printing all the colums of regioninfo. > > >can I have a condition like:if colum,info.regioninfo=2 (value) than print >all the associated columns like info:regioninfo1, regioninfo2. > > > >- Original Message - >From: lars hofhansl >To: "user@hbase.apache.org" ; Sreeram K > >Cc: >Sent: Monday, December 12, 2011 10:45 PM >Subject: Re: HBase- Scan with wildcard character > >First off, what you want is: select * from table where id like >'4E1167677%' in MySQL. >Relational databases can typically use indexes to satisfy like "xxx%" >type queries, but not "%xxx%" queries. > >HBase is really good at "xxx%" (prefix) type queries. > >Just create a scan object, set the startkey to "4E1167677", then call >next resulting scanner until the returned key no longer start with >"4E1167677". > >In your particular case (since your keys are hex numbers), you can even >set the stopKey to "4E1167677z" (the z will sort after any valid hex >digit), >and the scanner will automatically stop at the last possible match. > > >Have a look at the the Scan object and HTable.getScanner(...) > > >-- Lars > > >- Original Message - >From: Sreeram K >To: "user@hbase.apache.org" >Cc: >Sent:
Re: HBase- Scan with wildcard character
Thanks for the reply. But that is from Java ..I am looking from the HBase shell? - Original Message - From: Stack To: user@hbase.apache.org; Sreeram K Cc: Sent: Thursday, December 15, 2011 10:10 AM Subject: Re: HBase- Scan with wildcard character On Thu, Dec 15, 2011 at 8:59 AM, Sreeram K wrote: > I have one more question.. > Can we have a query in HBase shell based on Colum Value. > > I am looking at scan-> with Coulm ID? is that possible..the way we are doing > with STARTROW? > Can you pl pont me to an example.. > > You need to use a value filter: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ValueFilter.html St.Ack
Pagination of HBase Scan output
Hi, I am looking for options to batch the output of HBase scan with prefix filter, so that it can be paginated at the front end. Please let me know if there recommended methods to do the same. Thank you. Sreeram= CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***