Re: Version in HBase

2014-11-12 Thread Anoop John
So you want one version with ts= give ts?

Have a look at Scan#setTimeRange(long minStamp, long maxStamp)
If you know the exact ts for cells, you can use Scan#setTimeStamp(long
timestamp)

-Anoop-

On Wed, Nov 12, 2014 at 11:17 AM, Krishna Kalyan krishnakaly...@gmail.com
wrote:

 For Example for table 'test_table', Values inserted are:

 Row1 - Val1 = t
 Row1 - Val2 = t + 3
 Row1 - Val3 = t + 5

 Row2 - Val1 = t
 Row2 - Val2 = t + 3
 Row2 - Val3 = t + 5

 on scan 'test_table' where version = t + 4 should return
 Row1 - Val1 = t + 3
 Row2 - Val2 = t + 3

 How do i achieve time stamp based scans?.

 Thanks and Regards,
 Krishna




 On Wed, Nov 12, 2014 at 10:56 AM, Krishna Kalyan krishnakaly...@gmail.com
 
 wrote:

  Hi,
  Is it possible to do a
  select * from table_name where version = somedate ; using HBase
 APIs?.
  (Scanning for values where version = somedate )
  Could you please direct me to appropriate links to achieve this?.
 
 
  Regards,
  Krishna
 
 
 



Re: Is it possible that HBase update performance is much better than read in YCSB test?

2014-11-12 Thread Andrew Purtell
Try this HBase YCSB client instead:
https://github.com/apurtell/ycsb/tree/new_hbase_client

The HBase YCSB driver in the master repo holds on to one HTable instance
per driver thread. We accumulate writes into a 12MB write buffer before
flushing them en masse. This is why the behavior you are seeing confounds
your expectations. It's not correct behavior IMHO. YCSB wants to measure
the round trip of every op, not the non-cost of local caching. Worse, if we
have a lot of driver threads accumulating 12MB of edits more or less at the
same rate, then we will flush these buffers more or less at the same time
and stampede the cluster, which leads to deep valleys in observed write
performance of 30-60 seconds or longer.



On Tue, Nov 11, 2014 at 8:40 PM, Liu, Ming (HPIT-GADSC) ming.l...@hp.com
wrote:

 Hi, all,

 I am trying to use YCSB to test on our HBase 0.98.5 instance and got a
 strange result: update is 6x better than read. It is just an exercise, so
 the HBase is running in a workstation in standalone mode.
 I modified the workloada shipped with YCSB into two new workloads:
 workloadr and workloadu, where workloadr is do 100% read operation and
 workloadu is do 100% update operation. At the bottom is the workloadr and
 workloadu config files for your reference.

 I found out that the read performance is much worse than the update
 performance, read is about 6000:

 YCSB Client 0.1
 Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadr -p
 columnfamily=family -s -t
 [OVERALL], RunTime(ms), 16565.0
 [OVERALL], Throughput(ops/sec), 6036.824630244491

 And the update performance is about 36000, 6x better than read.

 YCSB Client 0.1
 Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadu -p
 columnfamily=family -s -t
 [OVERALL], RunTime(ms), 2767.0
 [OVERALL], Throughput(ops/sec), 36140.22406938923

 Is this possible? IMHO, read should be faster than update.
 Maybe I am wrong in the workload file? Or there is a possibility that
 update is faster than read? I don't find a YCSB mailing list, if anyone
 knows, please give me a link, so I can also ask question on that mailing
 list. But is it possible that put is faster than get in hbase? If not, the
 result must be wrong and I need to debug the YCSB code to figure out what
 is going wrong.

 Workloadr:
 recordcount=10
 operationcount=10
 workload=com.yahoo.ycsb.workloads.CoreWorkload
 readallfields=true
 readproportion=1
 updateproportion=0
 scanproportion=0
 insertproportion=0
 requestdistribution=zipfian

 workloadu:
 recordcount=10
 operationcount=10
 workload=com.yahoo.ycsb.workloads.CoreWorkload
 readallfields=true
 readproportion=0
 updateproportion=1
 scanproportion=0
 insertproportion=0
 requestdistribution=zipfian


 Thanks,
 Ming




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Programmatic HBase version detection/extraction

2014-11-12 Thread Otis Gospodnetic
Hi,

Is there a way to detect which version of HBase one is running?
Is there an API for that, or a constant with this value, or maybe an MBean
or some other way to get to this info?

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


Re: Programmatic HBase version detection/extraction

2014-11-12 Thread Ted Yu
Using hbase shell:

hbase(main):002:0 status 'detailed'
version 0.98.4.2-hadoop2

Cheers

On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 Is there a way to detect which version of HBase one is running?
 Is there an API for that, or a constant with this value, or maybe an MBean
 or some other way to get to this info?

 Thanks,
 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/



Re: Programmatic HBase version detection/extraction

2014-11-12 Thread Gary Helmling
Yes, you can use the org.apache.hadoop.hbase.util.VersionInfo class.

From java code, you can use VersionInfo.getVersion().  From shell
scripts, you can just run hbase version and parse the output.

On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:
 Hi,

 Is there a way to detect which version of HBase one is running?
 Is there an API for that, or a constant with this value, or maybe an MBean
 or some other way to get to this info?

 Thanks,
 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/


Re: Programmatic HBase version detection/extraction

2014-11-12 Thread Ted Yu
Otis:
You can parse the output from status 'detailed' command - look for the
line starting with 'version'

I checked the output from /jmx but didn't find such information there. The
version would appear in the classpath but that's not easy to parse.

One note about hbase version is that it returns the version of HBase
client was built with - not the version of the cluster the client is
talking to.

Cheers

On Wed, Nov 12, 2014 at 1:49 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi Ted,

 Thanks, but I'm looking for something I can grab programmatically (not
 manually), for example from a Java app.  Maybe there is some API that
 exposes this information or an MBean?

 Here's the use case:
 SPM monitors HBase http://sematext.com/spm/, but HBase MBeans and
 metrics
 have changed over time.
 How will SPM agent know which MBeans to look for, which metrics to extract,
 and how to interpret values it extracts without knowing which version of
 HBase it's monitoring?
 It could try proming for some known MBeans and deduce HBase version from
 that, but that feels a little sloppy.
 Ideally, we'd be able to grab the version from some MBean and based on that
 extract metrics we know are exposed in that version of HBase.

 Thanks,
 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/


 On Wed, Nov 12, 2014 at 4:41 PM, Ted Yu yuzhih...@gmail.com wrote:

  Using hbase shell:
 
  hbase(main):002:0 status 'detailed'
  version 0.98.4.2-hadoop2
 
  Cheers
 
  On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic 
  otis.gospodne...@gmail.com wrote:
 
   Hi,
  
   Is there a way to detect which version of HBase one is running?
   Is there an API for that, or a constant with this value, or maybe an
  MBean
   or some other way to get to this info?
  
   Thanks,
   Otis
   --
   Monitoring * Alerting * Anomaly Detection * Centralized Log Management
   Solr  Elasticsearch Support * http://sematext.com/
  
 



Re: Call for Presentations - HBase User group meeting

2014-11-12 Thread Ryan Rawson
Just popping this back to the top, we are still looking for people to
present at the HBase User Group Meetup in 2 weeks:

http://www.meetup.com/hbaseusergroup/events/205219992/

As always, food and beverages are being provided.  Come and hear about
the cool goings on in HBase land, and possibly even present a few of
your own!

-ryan


On Mon, Nov 10, 2014 at 2:58 PM, Ryan Rawson ryano...@gmail.com wrote:
 Hi all,

 The next HBase user group meeting is on November the 20th.  We need a
 few more presenters still!

 Please send me your proposals - summary and outline of your talk!

 Thanks!
 -ryan


Re: Programmatic HBase version detection/extraction

2014-11-12 Thread Ted Yu
Java-wise, you can use this API in HBaseAdmin:

  ClusterStatus getClusterStatus() throws IOException;

ClusterStatus provides:

  public String getHBaseVersion() {

Cheers

On Wed, Nov 12, 2014 at 2:06 PM, Ted Yu yuzhih...@gmail.com wrote:

 Otis:
 You can parse the output from status 'detailed' command - look for the
 line starting with 'version'

 I checked the output from /jmx but didn't find such information there. The
 version would appear in the classpath but that's not easy to parse.

 One note about hbase version is that it returns the version of HBase
 client was built with - not the version of the cluster the client is
 talking to.

 Cheers

 On Wed, Nov 12, 2014 at 1:49 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

 Hi Ted,

 Thanks, but I'm looking for something I can grab programmatically (not
 manually), for example from a Java app.  Maybe there is some API that
 exposes this information or an MBean?

 Here's the use case:
 SPM monitors HBase http://sematext.com/spm/, but HBase MBeans and
 metrics
 have changed over time.
 How will SPM agent know which MBeans to look for, which metrics to
 extract,
 and how to interpret values it extracts without knowing which version of
 HBase it's monitoring?
 It could try proming for some known MBeans and deduce HBase version from
 that, but that feels a little sloppy.
 Ideally, we'd be able to grab the version from some MBean and based on
 that
 extract metrics we know are exposed in that version of HBase.

 Thanks,
 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/


 On Wed, Nov 12, 2014 at 4:41 PM, Ted Yu yuzhih...@gmail.com wrote:

  Using hbase shell:
 
  hbase(main):002:0 status 'detailed'
  version 0.98.4.2-hadoop2
 
  Cheers
 
  On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic 
  otis.gospodne...@gmail.com wrote:
 
   Hi,
  
   Is there a way to detect which version of HBase one is running?
   Is there an API for that, or a constant with this value, or maybe an
  MBean
   or some other way to get to this info?
  
   Thanks,
   Otis
   --
   Monitoring * Alerting * Anomaly Detection * Centralized Log Management
   Solr  Elasticsearch Support * http://sematext.com/
  
 





Re: Programmatic HBase version detection/extraction

2014-11-12 Thread Otis Gospodnetic
Hi,

Thanks Gary, I think this is exactly what I was after!
Btw. might be nice to expose this via JMX, too, for apps who needs this
info but are not in process.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Wed, Nov 12, 2014 at 4:44 PM, Gary Helmling ghelml...@gmail.com wrote:

 Yes, you can use the org.apache.hadoop.hbase.util.VersionInfo class.

 From java code, you can use VersionInfo.getVersion().  From shell
 scripts, you can just run hbase version and parse the output.

 On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic
 otis.gospodne...@gmail.com wrote:
  Hi,
 
  Is there a way to detect which version of HBase one is running?
  Is there an API for that, or a constant with this value, or maybe an
 MBean
  or some other way to get to this info?
 
  Thanks,
  Otis
  --
  Monitoring * Alerting * Anomaly Detection * Centralized Log Management
  Solr  Elasticsearch Support * http://sematext.com/



Trying to connect HBase Java Client I get: Failed to locate the winutils binary in the hadoop binary path

2014-11-12 Thread Néstor Boscán
Hi

I'm creating my first HBase application and I'm trying to connect from the
Java application in my Java IDE to my HBase server on a Horton Workds 2.1
Virtual Machine. When I run I get:

Failed to locate the winutils binary in the hadoop binary path

Does this mean that I have to have hadoop installed in my laptop to be able
to test connections to HBase?

Regards,

Néstor


Re: Trying to connect HBase Java Client I get: Failed to locate the winutils binary in the hadoop binary path

2014-11-12 Thread Ted Yu
Cycling bits: http://search-hadoop.com/m/DHED4y3J2B

On Wed, Nov 12, 2014 at 3:27 PM, Néstor Boscán nesto...@gmail.com wrote:

 Hi

 I'm creating my first HBase application and I'm trying to connect from the
 Java application in my Java IDE to my HBase server on a Horton Workds 2.1
 Virtual Machine. When I run I get:

 Failed to locate the winutils binary in the hadoop binary path

 Does this mean that I have to have hadoop installed in my laptop to be able
 to test connections to HBase?

 Regards,

 Néstor



Re: Trying to connect HBase Java Client I get: Failed to locate the winutils binary in the hadoop binary path

2014-11-12 Thread Néstor Boscán
Yes I already applied that.

I just wanted to understand that if I have a web application then I'll have
to have the hadoop distribution installed to use the hbase client.

Regards,

Néstor

On Wed, Nov 12, 2014 at 7:57 PM, Ted Yu yuzhih...@gmail.com wrote:

 Cycling bits: http://search-hadoop.com/m/DHED4y3J2B

 On Wed, Nov 12, 2014 at 3:27 PM, Néstor Boscán nesto...@gmail.com wrote:

  Hi
 
  I'm creating my first HBase application and I'm trying to connect from
 the
  Java application in my Java IDE to my HBase server on a Horton Workds 2.1
  Virtual Machine. When I run I get:
 
  Failed to locate the winutils binary in the hadoop binary path
 
  Does this mean that I have to have hadoop installed in my laptop to be
 able
  to test connections to HBase?
 
  Regards,
 
  Néstor
 



RE: Is it possible that HBase update performance is much better than read in YCSB test?

2014-11-12 Thread Liu, Ming (HPIT-GADSC)
Thank you Andrew, this is an excellent answer, I get it now. I will try your 
hbase client for a 'fair' test :-)

Best Regards,
Ming

-Original Message-
From: Andrew Purtell [mailto:apurt...@apache.org] 
Sent: Thursday, November 13, 2014 2:08 AM
To: user@hbase.apache.org
Cc: DeRoo, John
Subject: Re: Is it possible that HBase update performance is much better than 
read in YCSB test?

Try this HBase YCSB client instead:
https://github.com/apurtell/ycsb/tree/new_hbase_client

The HBase YCSB driver in the master repo holds on to one HTable instance per 
driver thread. We accumulate writes into a 12MB write buffer before flushing 
them en masse. This is why the behavior you are seeing confounds your 
expectations. It's not correct behavior IMHO. YCSB wants to measure the round 
trip of every op, not the non-cost of local caching. Worse, if we have a lot of 
driver threads accumulating 12MB of edits more or less at the same rate, then 
we will flush these buffers more or less at the same time and stampede the 
cluster, which leads to deep valleys in observed write performance of 30-60 
seconds or longer.



On Tue, Nov 11, 2014 at 8:40 PM, Liu, Ming (HPIT-GADSC) ming.l...@hp.com
wrote:

 Hi, all,

 I am trying to use YCSB to test on our HBase 0.98.5 instance and got a 
 strange result: update is 6x better than read. It is just an exercise, 
 so the HBase is running in a workstation in standalone mode.
 I modified the workloada shipped with YCSB into two new workloads:
 workloadr and workloadu, where workloadr is do 100% read operation and 
 workloadu is do 100% update operation. At the bottom is the workloadr 
 and workloadu config files for your reference.

 I found out that the read performance is much worse than the update 
 performance, read is about 6000:

 YCSB Client 0.1
 Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadr 
 -p columnfamily=family -s -t [OVERALL], RunTime(ms), 16565.0 
 [OVERALL], Throughput(ops/sec), 6036.824630244491

 And the update performance is about 36000, 6x better than read.

 YCSB Client 0.1
 Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadu 
 -p columnfamily=family -s -t [OVERALL], RunTime(ms), 2767.0 [OVERALL], 
 Throughput(ops/sec), 36140.22406938923

 Is this possible? IMHO, read should be faster than update.
 Maybe I am wrong in the workload file? Or there is a possibility that 
 update is faster than read? I don't find a YCSB mailing list, if 
 anyone knows, please give me a link, so I can also ask question on 
 that mailing list. But is it possible that put is faster than get in 
 hbase? If not, the result must be wrong and I need to debug the YCSB 
 code to figure out what is going wrong.

 Workloadr:
 recordcount=10
 operationcount=10
 workload=com.yahoo.ycsb.workloads.CoreWorkload
 readallfields=true
 readproportion=1
 updateproportion=0
 scanproportion=0
 insertproportion=0
 requestdistribution=zipfian

 workloadu:
 recordcount=10
 operationcount=10
 workload=com.yahoo.ycsb.workloads.CoreWorkload
 readallfields=true
 readproportion=0
 updateproportion=1
 scanproportion=0
 insertproportion=0
 requestdistribution=zipfian


 Thanks,
 Ming




--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)


Re: Is it possible that HBase update performance is much better than read in YCSB test?

2014-11-12 Thread ramkrishna vasudevan
Thanks Andrew.  This would be a very useful information along with the
github link.

Regards
Ram

On Thu, Nov 13, 2014 at 9:00 AM, Liu, Ming (HPIT-GADSC) ming.l...@hp.com
wrote:

 Thank you Andrew, this is an excellent answer, I get it now. I will try
 your hbase client for a 'fair' test :-)

 Best Regards,
 Ming

 -Original Message-
 From: Andrew Purtell [mailto:apurt...@apache.org]
 Sent: Thursday, November 13, 2014 2:08 AM
 To: user@hbase.apache.org
 Cc: DeRoo, John
 Subject: Re: Is it possible that HBase update performance is much better
 than read in YCSB test?

 Try this HBase YCSB client instead:
 https://github.com/apurtell/ycsb/tree/new_hbase_client

 The HBase YCSB driver in the master repo holds on to one HTable instance
 per driver thread. We accumulate writes into a 12MB write buffer before
 flushing them en masse. This is why the behavior you are seeing confounds
 your expectations. It's not correct behavior IMHO. YCSB wants to measure
 the round trip of every op, not the non-cost of local caching. Worse, if we
 have a lot of driver threads accumulating 12MB of edits more or less at the
 same rate, then we will flush these buffers more or less at the same time
 and stampede the cluster, which leads to deep valleys in observed write
 performance of 30-60 seconds or longer.



 On Tue, Nov 11, 2014 at 8:40 PM, Liu, Ming (HPIT-GADSC) ming.l...@hp.com
 wrote:

  Hi, all,
 
  I am trying to use YCSB to test on our HBase 0.98.5 instance and got a
  strange result: update is 6x better than read. It is just an exercise,
  so the HBase is running in a workstation in standalone mode.
  I modified the workloada shipped with YCSB into two new workloads:
  workloadr and workloadu, where workloadr is do 100% read operation and
  workloadu is do 100% update operation. At the bottom is the workloadr
  and workloadu config files for your reference.
 
  I found out that the read performance is much worse than the update
  performance, read is about 6000:
 
  YCSB Client 0.1
  Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadr
  -p columnfamily=family -s -t [OVERALL], RunTime(ms), 16565.0
  [OVERALL], Throughput(ops/sec), 6036.824630244491
 
  And the update performance is about 36000, 6x better than read.
 
  YCSB Client 0.1
  Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadu
  -p columnfamily=family -s -t [OVERALL], RunTime(ms), 2767.0 [OVERALL],
  Throughput(ops/sec), 36140.22406938923
 
  Is this possible? IMHO, read should be faster than update.
  Maybe I am wrong in the workload file? Or there is a possibility that
  update is faster than read? I don't find a YCSB mailing list, if
  anyone knows, please give me a link, so I can also ask question on
  that mailing list. But is it possible that put is faster than get in
  hbase? If not, the result must be wrong and I need to debug the YCSB
  code to figure out what is going wrong.
 
  Workloadr:
  recordcount=10
  operationcount=10
  workload=com.yahoo.ycsb.workloads.CoreWorkload
  readallfields=true
  readproportion=1
  updateproportion=0
  scanproportion=0
  insertproportion=0
  requestdistribution=zipfian
 
  workloadu:
  recordcount=10
  operationcount=10
  workload=com.yahoo.ycsb.workloads.CoreWorkload
  readallfields=true
  readproportion=0
  updateproportion=1
  scanproportion=0
  insertproportion=0
  requestdistribution=zipfian
 
 
  Thanks,
  Ming
 



 --
 Best regards,

- Andy

 Problems worthy of attack prove their worth by hitting back. - Piet Hein
 (via Tom White)



Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-12 Thread Jianshi Huang
Hi,

I'm currently saving JSON in pure String format in the value cell and
depends on HBase' block compression to reduce the overhead of JSON.

I'm wondering if there's a more space efficient way to store JSON?
(there're lots of 0s and 1s, JSON String actually is an OK format)

I want to keep the value as a Map since the schema of source data might
change over time.

Also is there a DIFF based encoding for values? Since I'm storing
historical data (snapshot data) and changes between adjacent value cells
are relatively small.


Thanks,
-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github  Blog: http://huangjs.github.com/


Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-12 Thread Ted Yu
There is FASTDIFF data block encoding. 

See also http://bjson.org/

Cheers

On Nov 12, 2014, at 9:08 PM, Jianshi Huang jianshi.hu...@gmail.com wrote:

 Hi,
 
 I'm currently saving JSON in pure String format in the value cell and
 depends on HBase' block compression to reduce the overhead of JSON.
 
 I'm wondering if there's a more space efficient way to store JSON?
 (there're lots of 0s and 1s, JSON String actually is an OK format)
 
 I want to keep the value as a Map since the schema of source data might
 change over time.
 
 Also is there a DIFF based encoding for values? Since I'm storing
 historical data (snapshot data) and changes between adjacent value cells
 are relatively small.
 
 
 Thanks,
 -- 
 Jianshi Huang
 
 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/


Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-12 Thread Jianshi Huang
I thought FASTDIFF was only for rowkey and columns, great if it also works
in value cell.

And thanks for the bjson link!

Jianshi

On Thu, Nov 13, 2014 at 1:18 PM, Ted Yu yuzhih...@gmail.com wrote:

 There is FASTDIFF data block encoding.

 See also http://bjson.org/

 Cheers

 On Nov 12, 2014, at 9:08 PM, Jianshi Huang jianshi.hu...@gmail.com
 wrote:

  Hi,
 
  I'm currently saving JSON in pure String format in the value cell and
  depends on HBase' block compression to reduce the overhead of JSON.
 
  I'm wondering if there's a more space efficient way to store JSON?
  (there're lots of 0s and 1s, JSON String actually is an OK format)
 
  I want to keep the value as a Map since the schema of source data might
  change over time.
 
  Also is there a DIFF based encoding for values? Since I'm storing
  historical data (snapshot data) and changes between adjacent value cells
  are relatively small.
 
 
  Thanks,
  --
  Jianshi Huang
 
  LinkedIn: jianshi
  Twitter: @jshuang
  Github  Blog: http://huangjs.github.com/




-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github  Blog: http://huangjs.github.com/


Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-12 Thread ramkrishna vasudevan
Hi

 Since I'm storing
historical data (snapshot data) and changes between adjacent value cells
are relatively small.

If the values are changing even if it is smaller the FASTDIFF will rewrite
the value part.  Only if there are exact matches then it would skip the
value part. JFYI.

Regards
Ram

On Thu, Nov 13, 2014 at 11:23 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:

 I thought FASTDIFF was only for rowkey and columns, great if it also works
 in value cell.

 And thanks for the bjson link!

 Jianshi

 On Thu, Nov 13, 2014 at 1:18 PM, Ted Yu yuzhih...@gmail.com wrote:

  There is FASTDIFF data block encoding.
 
  See also http://bjson.org/
 
  Cheers
 
  On Nov 12, 2014, at 9:08 PM, Jianshi Huang jianshi.hu...@gmail.com
  wrote:
 
   Hi,
  
   I'm currently saving JSON in pure String format in the value cell and
   depends on HBase' block compression to reduce the overhead of JSON.
  
   I'm wondering if there's a more space efficient way to store JSON?
   (there're lots of 0s and 1s, JSON String actually is an OK format)
  
   I want to keep the value as a Map since the schema of source data might
   change over time.
  
   Also is there a DIFF based encoding for values? Since I'm storing
   historical data (snapshot data) and changes between adjacent value
 cells
   are relatively small.
  
  
   Thanks,
   --
   Jianshi Huang
  
   LinkedIn: jianshi
   Twitter: @jshuang
   Github  Blog: http://huangjs.github.com/
 



 --
 Jianshi Huang

 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/