Re: Version in HBase
So you want one version with ts= give ts? Have a look at Scan#setTimeRange(long minStamp, long maxStamp) If you know the exact ts for cells, you can use Scan#setTimeStamp(long timestamp) -Anoop- On Wed, Nov 12, 2014 at 11:17 AM, Krishna Kalyan krishnakaly...@gmail.com wrote: For Example for table 'test_table', Values inserted are: Row1 - Val1 = t Row1 - Val2 = t + 3 Row1 - Val3 = t + 5 Row2 - Val1 = t Row2 - Val2 = t + 3 Row2 - Val3 = t + 5 on scan 'test_table' where version = t + 4 should return Row1 - Val1 = t + 3 Row2 - Val2 = t + 3 How do i achieve time stamp based scans?. Thanks and Regards, Krishna On Wed, Nov 12, 2014 at 10:56 AM, Krishna Kalyan krishnakaly...@gmail.com wrote: Hi, Is it possible to do a select * from table_name where version = somedate ; using HBase APIs?. (Scanning for values where version = somedate ) Could you please direct me to appropriate links to achieve this?. Regards, Krishna
Re: Is it possible that HBase update performance is much better than read in YCSB test?
Try this HBase YCSB client instead: https://github.com/apurtell/ycsb/tree/new_hbase_client The HBase YCSB driver in the master repo holds on to one HTable instance per driver thread. We accumulate writes into a 12MB write buffer before flushing them en masse. This is why the behavior you are seeing confounds your expectations. It's not correct behavior IMHO. YCSB wants to measure the round trip of every op, not the non-cost of local caching. Worse, if we have a lot of driver threads accumulating 12MB of edits more or less at the same rate, then we will flush these buffers more or less at the same time and stampede the cluster, which leads to deep valleys in observed write performance of 30-60 seconds or longer. On Tue, Nov 11, 2014 at 8:40 PM, Liu, Ming (HPIT-GADSC) ming.l...@hp.com wrote: Hi, all, I am trying to use YCSB to test on our HBase 0.98.5 instance and got a strange result: update is 6x better than read. It is just an exercise, so the HBase is running in a workstation in standalone mode. I modified the workloada shipped with YCSB into two new workloads: workloadr and workloadu, where workloadr is do 100% read operation and workloadu is do 100% update operation. At the bottom is the workloadr and workloadu config files for your reference. I found out that the read performance is much worse than the update performance, read is about 6000: YCSB Client 0.1 Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadr -p columnfamily=family -s -t [OVERALL], RunTime(ms), 16565.0 [OVERALL], Throughput(ops/sec), 6036.824630244491 And the update performance is about 36000, 6x better than read. YCSB Client 0.1 Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadu -p columnfamily=family -s -t [OVERALL], RunTime(ms), 2767.0 [OVERALL], Throughput(ops/sec), 36140.22406938923 Is this possible? IMHO, read should be faster than update. Maybe I am wrong in the workload file? Or there is a possibility that update is faster than read? I don't find a YCSB mailing list, if anyone knows, please give me a link, so I can also ask question on that mailing list. But is it possible that put is faster than get in hbase? If not, the result must be wrong and I need to debug the YCSB code to figure out what is going wrong. Workloadr: recordcount=10 operationcount=10 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=true readproportion=1 updateproportion=0 scanproportion=0 insertproportion=0 requestdistribution=zipfian workloadu: recordcount=10 operationcount=10 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=true readproportion=0 updateproportion=1 scanproportion=0 insertproportion=0 requestdistribution=zipfian Thanks, Ming -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Programmatic HBase version detection/extraction
Hi, Is there a way to detect which version of HBase one is running? Is there an API for that, or a constant with this value, or maybe an MBean or some other way to get to this info? Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: Programmatic HBase version detection/extraction
Using hbase shell: hbase(main):002:0 status 'detailed' version 0.98.4.2-hadoop2 Cheers On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Is there a way to detect which version of HBase one is running? Is there an API for that, or a constant with this value, or maybe an MBean or some other way to get to this info? Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: Programmatic HBase version detection/extraction
Yes, you can use the org.apache.hadoop.hbase.util.VersionInfo class. From java code, you can use VersionInfo.getVersion(). From shell scripts, you can just run hbase version and parse the output. On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Is there a way to detect which version of HBase one is running? Is there an API for that, or a constant with this value, or maybe an MBean or some other way to get to this info? Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: Programmatic HBase version detection/extraction
Otis: You can parse the output from status 'detailed' command - look for the line starting with 'version' I checked the output from /jmx but didn't find such information there. The version would appear in the classpath but that's not easy to parse. One note about hbase version is that it returns the version of HBase client was built with - not the version of the cluster the client is talking to. Cheers On Wed, Nov 12, 2014 at 1:49 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Ted, Thanks, but I'm looking for something I can grab programmatically (not manually), for example from a Java app. Maybe there is some API that exposes this information or an MBean? Here's the use case: SPM monitors HBase http://sematext.com/spm/, but HBase MBeans and metrics have changed over time. How will SPM agent know which MBeans to look for, which metrics to extract, and how to interpret values it extracts without knowing which version of HBase it's monitoring? It could try proming for some known MBeans and deduce HBase version from that, but that feels a little sloppy. Ideally, we'd be able to grab the version from some MBean and based on that extract metrics we know are exposed in that version of HBase. Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Wed, Nov 12, 2014 at 4:41 PM, Ted Yu yuzhih...@gmail.com wrote: Using hbase shell: hbase(main):002:0 status 'detailed' version 0.98.4.2-hadoop2 Cheers On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Is there a way to detect which version of HBase one is running? Is there an API for that, or a constant with this value, or maybe an MBean or some other way to get to this info? Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: Call for Presentations - HBase User group meeting
Just popping this back to the top, we are still looking for people to present at the HBase User Group Meetup in 2 weeks: http://www.meetup.com/hbaseusergroup/events/205219992/ As always, food and beverages are being provided. Come and hear about the cool goings on in HBase land, and possibly even present a few of your own! -ryan On Mon, Nov 10, 2014 at 2:58 PM, Ryan Rawson ryano...@gmail.com wrote: Hi all, The next HBase user group meeting is on November the 20th. We need a few more presenters still! Please send me your proposals - summary and outline of your talk! Thanks! -ryan
Re: Programmatic HBase version detection/extraction
Java-wise, you can use this API in HBaseAdmin: ClusterStatus getClusterStatus() throws IOException; ClusterStatus provides: public String getHBaseVersion() { Cheers On Wed, Nov 12, 2014 at 2:06 PM, Ted Yu yuzhih...@gmail.com wrote: Otis: You can parse the output from status 'detailed' command - look for the line starting with 'version' I checked the output from /jmx but didn't find such information there. The version would appear in the classpath but that's not easy to parse. One note about hbase version is that it returns the version of HBase client was built with - not the version of the cluster the client is talking to. Cheers On Wed, Nov 12, 2014 at 1:49 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Ted, Thanks, but I'm looking for something I can grab programmatically (not manually), for example from a Java app. Maybe there is some API that exposes this information or an MBean? Here's the use case: SPM monitors HBase http://sematext.com/spm/, but HBase MBeans and metrics have changed over time. How will SPM agent know which MBeans to look for, which metrics to extract, and how to interpret values it extracts without knowing which version of HBase it's monitoring? It could try proming for some known MBeans and deduce HBase version from that, but that feels a little sloppy. Ideally, we'd be able to grab the version from some MBean and based on that extract metrics we know are exposed in that version of HBase. Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Wed, Nov 12, 2014 at 4:41 PM, Ted Yu yuzhih...@gmail.com wrote: Using hbase shell: hbase(main):002:0 status 'detailed' version 0.98.4.2-hadoop2 Cheers On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Is there a way to detect which version of HBase one is running? Is there an API for that, or a constant with this value, or maybe an MBean or some other way to get to this info? Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: Programmatic HBase version detection/extraction
Hi, Thanks Gary, I think this is exactly what I was after! Btw. might be nice to expose this via JMX, too, for apps who needs this info but are not in process. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Wed, Nov 12, 2014 at 4:44 PM, Gary Helmling ghelml...@gmail.com wrote: Yes, you can use the org.apache.hadoop.hbase.util.VersionInfo class. From java code, you can use VersionInfo.getVersion(). From shell scripts, you can just run hbase version and parse the output. On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Is there a way to detect which version of HBase one is running? Is there an API for that, or a constant with this value, or maybe an MBean or some other way to get to this info? Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Trying to connect HBase Java Client I get: Failed to locate the winutils binary in the hadoop binary path
Hi I'm creating my first HBase application and I'm trying to connect from the Java application in my Java IDE to my HBase server on a Horton Workds 2.1 Virtual Machine. When I run I get: Failed to locate the winutils binary in the hadoop binary path Does this mean that I have to have hadoop installed in my laptop to be able to test connections to HBase? Regards, Néstor
Re: Trying to connect HBase Java Client I get: Failed to locate the winutils binary in the hadoop binary path
Cycling bits: http://search-hadoop.com/m/DHED4y3J2B On Wed, Nov 12, 2014 at 3:27 PM, Néstor Boscán nesto...@gmail.com wrote: Hi I'm creating my first HBase application and I'm trying to connect from the Java application in my Java IDE to my HBase server on a Horton Workds 2.1 Virtual Machine. When I run I get: Failed to locate the winutils binary in the hadoop binary path Does this mean that I have to have hadoop installed in my laptop to be able to test connections to HBase? Regards, Néstor
Re: Trying to connect HBase Java Client I get: Failed to locate the winutils binary in the hadoop binary path
Yes I already applied that. I just wanted to understand that if I have a web application then I'll have to have the hadoop distribution installed to use the hbase client. Regards, Néstor On Wed, Nov 12, 2014 at 7:57 PM, Ted Yu yuzhih...@gmail.com wrote: Cycling bits: http://search-hadoop.com/m/DHED4y3J2B On Wed, Nov 12, 2014 at 3:27 PM, Néstor Boscán nesto...@gmail.com wrote: Hi I'm creating my first HBase application and I'm trying to connect from the Java application in my Java IDE to my HBase server on a Horton Workds 2.1 Virtual Machine. When I run I get: Failed to locate the winutils binary in the hadoop binary path Does this mean that I have to have hadoop installed in my laptop to be able to test connections to HBase? Regards, Néstor
RE: Is it possible that HBase update performance is much better than read in YCSB test?
Thank you Andrew, this is an excellent answer, I get it now. I will try your hbase client for a 'fair' test :-) Best Regards, Ming -Original Message- From: Andrew Purtell [mailto:apurt...@apache.org] Sent: Thursday, November 13, 2014 2:08 AM To: user@hbase.apache.org Cc: DeRoo, John Subject: Re: Is it possible that HBase update performance is much better than read in YCSB test? Try this HBase YCSB client instead: https://github.com/apurtell/ycsb/tree/new_hbase_client The HBase YCSB driver in the master repo holds on to one HTable instance per driver thread. We accumulate writes into a 12MB write buffer before flushing them en masse. This is why the behavior you are seeing confounds your expectations. It's not correct behavior IMHO. YCSB wants to measure the round trip of every op, not the non-cost of local caching. Worse, if we have a lot of driver threads accumulating 12MB of edits more or less at the same rate, then we will flush these buffers more or less at the same time and stampede the cluster, which leads to deep valleys in observed write performance of 30-60 seconds or longer. On Tue, Nov 11, 2014 at 8:40 PM, Liu, Ming (HPIT-GADSC) ming.l...@hp.com wrote: Hi, all, I am trying to use YCSB to test on our HBase 0.98.5 instance and got a strange result: update is 6x better than read. It is just an exercise, so the HBase is running in a workstation in standalone mode. I modified the workloada shipped with YCSB into two new workloads: workloadr and workloadu, where workloadr is do 100% read operation and workloadu is do 100% update operation. At the bottom is the workloadr and workloadu config files for your reference. I found out that the read performance is much worse than the update performance, read is about 6000: YCSB Client 0.1 Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadr -p columnfamily=family -s -t [OVERALL], RunTime(ms), 16565.0 [OVERALL], Throughput(ops/sec), 6036.824630244491 And the update performance is about 36000, 6x better than read. YCSB Client 0.1 Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadu -p columnfamily=family -s -t [OVERALL], RunTime(ms), 2767.0 [OVERALL], Throughput(ops/sec), 36140.22406938923 Is this possible? IMHO, read should be faster than update. Maybe I am wrong in the workload file? Or there is a possibility that update is faster than read? I don't find a YCSB mailing list, if anyone knows, please give me a link, so I can also ask question on that mailing list. But is it possible that put is faster than get in hbase? If not, the result must be wrong and I need to debug the YCSB code to figure out what is going wrong. Workloadr: recordcount=10 operationcount=10 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=true readproportion=1 updateproportion=0 scanproportion=0 insertproportion=0 requestdistribution=zipfian workloadu: recordcount=10 operationcount=10 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=true readproportion=0 updateproportion=1 scanproportion=0 insertproportion=0 requestdistribution=zipfian Thanks, Ming -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Is it possible that HBase update performance is much better than read in YCSB test?
Thanks Andrew. This would be a very useful information along with the github link. Regards Ram On Thu, Nov 13, 2014 at 9:00 AM, Liu, Ming (HPIT-GADSC) ming.l...@hp.com wrote: Thank you Andrew, this is an excellent answer, I get it now. I will try your hbase client for a 'fair' test :-) Best Regards, Ming -Original Message- From: Andrew Purtell [mailto:apurt...@apache.org] Sent: Thursday, November 13, 2014 2:08 AM To: user@hbase.apache.org Cc: DeRoo, John Subject: Re: Is it possible that HBase update performance is much better than read in YCSB test? Try this HBase YCSB client instead: https://github.com/apurtell/ycsb/tree/new_hbase_client The HBase YCSB driver in the master repo holds on to one HTable instance per driver thread. We accumulate writes into a 12MB write buffer before flushing them en masse. This is why the behavior you are seeing confounds your expectations. It's not correct behavior IMHO. YCSB wants to measure the round trip of every op, not the non-cost of local caching. Worse, if we have a lot of driver threads accumulating 12MB of edits more or less at the same rate, then we will flush these buffers more or less at the same time and stampede the cluster, which leads to deep valleys in observed write performance of 30-60 seconds or longer. On Tue, Nov 11, 2014 at 8:40 PM, Liu, Ming (HPIT-GADSC) ming.l...@hp.com wrote: Hi, all, I am trying to use YCSB to test on our HBase 0.98.5 instance and got a strange result: update is 6x better than read. It is just an exercise, so the HBase is running in a workstation in standalone mode. I modified the workloada shipped with YCSB into two new workloads: workloadr and workloadu, where workloadr is do 100% read operation and workloadu is do 100% update operation. At the bottom is the workloadr and workloadu config files for your reference. I found out that the read performance is much worse than the update performance, read is about 6000: YCSB Client 0.1 Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadr -p columnfamily=family -s -t [OVERALL], RunTime(ms), 16565.0 [OVERALL], Throughput(ops/sec), 6036.824630244491 And the update performance is about 36000, 6x better than read. YCSB Client 0.1 Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadu -p columnfamily=family -s -t [OVERALL], RunTime(ms), 2767.0 [OVERALL], Throughput(ops/sec), 36140.22406938923 Is this possible? IMHO, read should be faster than update. Maybe I am wrong in the workload file? Or there is a possibility that update is faster than read? I don't find a YCSB mailing list, if anyone knows, please give me a link, so I can also ask question on that mailing list. But is it possible that put is faster than get in hbase? If not, the result must be wrong and I need to debug the YCSB code to figure out what is going wrong. Workloadr: recordcount=10 operationcount=10 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=true readproportion=1 updateproportion=0 scanproportion=0 insertproportion=0 requestdistribution=zipfian workloadu: recordcount=10 operationcount=10 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=true readproportion=0 updateproportion=1 scanproportion=0 insertproportion=0 requestdistribution=zipfian Thanks, Ming -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Storing JSON in HBase value cell, which serialization format is most compact?
Hi, I'm currently saving JSON in pure String format in the value cell and depends on HBase' block compression to reduce the overhead of JSON. I'm wondering if there's a more space efficient way to store JSON? (there're lots of 0s and 1s, JSON String actually is an OK format) I want to keep the value as a Map since the schema of source data might change over time. Also is there a DIFF based encoding for values? Since I'm storing historical data (snapshot data) and changes between adjacent value cells are relatively small. Thanks, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/
Re: Storing JSON in HBase value cell, which serialization format is most compact?
There is FASTDIFF data block encoding. See also http://bjson.org/ Cheers On Nov 12, 2014, at 9:08 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Hi, I'm currently saving JSON in pure String format in the value cell and depends on HBase' block compression to reduce the overhead of JSON. I'm wondering if there's a more space efficient way to store JSON? (there're lots of 0s and 1s, JSON String actually is an OK format) I want to keep the value as a Map since the schema of source data might change over time. Also is there a DIFF based encoding for values? Since I'm storing historical data (snapshot data) and changes between adjacent value cells are relatively small. Thanks, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/
Re: Storing JSON in HBase value cell, which serialization format is most compact?
I thought FASTDIFF was only for rowkey and columns, great if it also works in value cell. And thanks for the bjson link! Jianshi On Thu, Nov 13, 2014 at 1:18 PM, Ted Yu yuzhih...@gmail.com wrote: There is FASTDIFF data block encoding. See also http://bjson.org/ Cheers On Nov 12, 2014, at 9:08 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Hi, I'm currently saving JSON in pure String format in the value cell and depends on HBase' block compression to reduce the overhead of JSON. I'm wondering if there's a more space efficient way to store JSON? (there're lots of 0s and 1s, JSON String actually is an OK format) I want to keep the value as a Map since the schema of source data might change over time. Also is there a DIFF based encoding for values? Since I'm storing historical data (snapshot data) and changes between adjacent value cells are relatively small. Thanks, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/ -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/
Re: Storing JSON in HBase value cell, which serialization format is most compact?
Hi Since I'm storing historical data (snapshot data) and changes between adjacent value cells are relatively small. If the values are changing even if it is smaller the FASTDIFF will rewrite the value part. Only if there are exact matches then it would skip the value part. JFYI. Regards Ram On Thu, Nov 13, 2014 at 11:23 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: I thought FASTDIFF was only for rowkey and columns, great if it also works in value cell. And thanks for the bjson link! Jianshi On Thu, Nov 13, 2014 at 1:18 PM, Ted Yu yuzhih...@gmail.com wrote: There is FASTDIFF data block encoding. See also http://bjson.org/ Cheers On Nov 12, 2014, at 9:08 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Hi, I'm currently saving JSON in pure String format in the value cell and depends on HBase' block compression to reduce the overhead of JSON. I'm wondering if there's a more space efficient way to store JSON? (there're lots of 0s and 1s, JSON String actually is an OK format) I want to keep the value as a Map since the schema of source data might change over time. Also is there a DIFF based encoding for values? Since I'm storing historical data (snapshot data) and changes between adjacent value cells are relatively small. Thanks, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/ -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/