Re: hbase cell storage different bewteen bulk load and direct api

2018-04-19 Thread James Taylor
I believe we still rely on that empty key value, even for compact storage
formats (though theoretically it could likely be made so we don't - JIRA,
please?) A quick test would confirm:
- upsert a row with no last_name or first_name
- select * from T where last_name IS NULL
If the row isn't returned, then we need that empty key value.

Thanks,
James

On Thu, Apr 19, 2018 at 1:58 PM, Sergey Soldatov <sergeysolda...@gmail.com>
wrote:

> Heh. That looks like a bug actually. This is a 'dummy' KV (
> https://phoenix.apache.org/faq.html#Why_empty_key_value), but I have some
> doubts that we need it for compacted rows.
>
> Thanks,
> Sergey
>
> On Thu, Apr 19, 2018 at 11:30 PM, Lew Jackman <lew9...@netzero.net> wrote:
>
>> I have not tried the master yet branch yet, however on Phoenix 4.13 this
>> storage discrepancy in hbase is still present with the extra
>> column=M:\x00\x00\x00\x00 cells in hbase when using psql or sqlline.
>>
>> Does anyone have an understanding of the meaning of the column qualifier
>> \x00\x00\x00\x00 ?
>>
>>
>> -- Original Message --
>> From: "Lew Jackman" <lew9...@netzero.net>
>> To: user@phoenix.apache.org
>> Cc: user@phoenix.apache.org
>> Subject: Re: hbase cell storage different bewteen bulk load and direct api
>> Date: Thu, 19 Apr 2018 13:59:16 GMT
>>
>> The upsert statement appears the same as the psql results - i.e. extra
>> cells. I will try the master branch next. Thanks for the tip.
>>
>> -- Original Message --
>> From: Sergey Soldatov <sergeysolda...@gmail.com>
>> To: user@phoenix.apache.org
>> Subject: Re: hbase cell storage different bewteen bulk load and direct api
>> Date: Thu, 19 Apr 2018 12:26:25 +0600
>>
>> Hi Lew,
>> no. 1st one looks line incorrect. You may file a bug on that ( I believe
>> that the second case is correct, but you may also check with uploading data
>> using regular upserts). Also, you may check whether the master branch has
>> this issue.
>>
>> Thanks,
>> Sergey
>>
>> On Thu, Apr 19, 2018 at 10:19 AM, Lew Jackman <lew9...@netzero.net>
>> wrote:
>>
>>> Under Phoenix 4.11 we are seeing some storage discrepancies in hbase
>>> between a load via psql and a bulk load.
>>>
>>> To illustrate in a simple case we have modified the example table from
>>> the load reference https://phoenix.apache.org/bulk_dataload.html
>>>
>>> CREATE TABLE example (
>>> Â Â Â my_pk bigint not null,
>>> Â Â Â m.first_name varchar(50),
>>> Â Â Â m.last_name varchar(50)
>>> Â Â Â CONSTRAINT pk PRIMARY KEY (my_pk))
>>> Â Â Â IMMUTABLE_ROWS=true,
>>> Â Â Â IMMUTABLE_STORAGE_SCHEME = SINGLE_CELL_ARRAY_WITH_OFFSETS,
>>> Â Â Â COLUMN_ENCODED_BYTES = 1;
>>>
>>> Hbase Rows when Loading via PSQL
>>>
>>> x80x00x00x00x00x0009
>>> Â Â Â Â column=M:x00x00x00x00,
>>> timestamp=1524109827690, value=x             Â
>>> x80x00x00x00x00x0009
>>> Â Â Â Â column=M:1, timestamp=1524109827690, value=xJohnDoex00\
>>> \\\x00x00x01x00x05x0
>>> 0x00x00x08x00x00x00x03x02
>>> Â Â Â Â Â Â Â Â Â Â Â Â Â
>>> x80x00x00x00x00x01x092
>>> Â column=M:x00x00x00x00,
>>> timestamp=1524109827690, value=x             Â
>>> x80x00x00x00x00x01x092
>>> Â column=M:1, timestamp=1524109827690, value=xMaryPoppinsx00\
>>> \\\x00x00x01x00x05\\
>>> \\x00x00x00x0Cx00x00
>>> x00x03x02 Â Â Â Â Â Â Â Â Â Â Â Â Â
>>>
>>> Hbase Rows when Loading via MapReduce using CsvBulkLoadTool
>>>
>>> x80x00x00x00x00x0009
>>> Â Â Â Â column=M:1, timestamp=1524110486638, value=xJohnDoex00\
>>> \\\x00x00x01x00x05x0
>>> 0x00x00x08x00x00x00x03x02
>>> Â Â Â Â Â Â Â Â Â Â Â Â Â
>>> x80x00x00x00x00x01x092
>>> Â column=M:1, times

Re: hbase cell storage different bewteen bulk load and direct api

2018-04-19 Thread Sergey Soldatov
Heh. That looks like a bug actually. This is a 'dummy' KV (
https://phoenix.apache.org/faq.html#Why_empty_key_value), but I have some
doubts that we need it for compacted rows.

Thanks,
Sergey

On Thu, Apr 19, 2018 at 11:30 PM, Lew Jackman <lew9...@netzero.net> wrote:

> I have not tried the master yet branch yet, however on Phoenix 4.13 this
> storage discrepancy in hbase is still present with the extra
> column=M:\x00\x00\x00\x00 cells in hbase when using psql or sqlline.
>
> Does anyone have an understanding of the meaning of the column qualifier
> \x00\x00\x00\x00 ?
>
>
> -- Original Message --
> From: "Lew Jackman" <lew9...@netzero.net>
> To: user@phoenix.apache.org
> Cc: user@phoenix.apache.org
> Subject: Re: hbase cell storage different bewteen bulk load and direct api
> Date: Thu, 19 Apr 2018 13:59:16 GMT
>
> The upsert statement appears the same as the psql results - i.e. extra
> cells. I will try the master branch next. Thanks for the tip.
>
> -- Original Message --
> From: Sergey Soldatov <sergeysolda...@gmail.com>
> To: user@phoenix.apache.org
> Subject: Re: hbase cell storage different bewteen bulk load and direct api
> Date: Thu, 19 Apr 2018 12:26:25 +0600
>
> Hi Lew,
> no. 1st one looks line incorrect. You may file a bug on that ( I believe
> that the second case is correct, but you may also check with uploading data
> using regular upserts). Also, you may check whether the master branch has
> this issue.
>
> Thanks,
> Sergey
>
> On Thu, Apr 19, 2018 at 10:19 AM, Lew Jackman <lew9...@netzero.net> wrote:
>
>> Under Phoenix 4.11 we are seeing some storage discrepancies in hbase
>> between a load via psql and a bulk load.
>>
>> To illustrate in a simple case we have modified the example table from
>> the load reference https://phoenix.apache.org/bulk_dataload.html
>>
>> CREATE TABLE example (
>> Â Â Â my_pk bigint not null,
>> Â Â Â m.first_name varchar(50),
>> Â Â Â m.last_name varchar(50)
>> Â Â Â CONSTRAINT pk PRIMARY KEY (my_pk))
>> Â Â Â IMMUTABLE_ROWS=true,
>> Â Â Â IMMUTABLE_STORAGE_SCHEME = SINGLE_CELL_ARRAY_WITH_OFFSETS,
>> Â Â Â COLUMN_ENCODED_BYTES = 1;
>>
>> Hbase Rows when Loading via PSQL
>>
>> x80x00x00x00x00x0009
>> Â Â Â Â column=M:x00x00x00x00,
>> timestamp=1524109827690, value=x             Â
>> x80x00x00x00x00x0009
>> Â Â Â Â column=M:1, timestamp=1524109827690, value=xJohnDoex00\
>> \\\x00x00x01x00x05
>> x00x00x00x08x00x00\\
>> \\x00x03x02 Â Â Â Â Â Â Â Â Â Â Â Â Â
>> x80x00x00x00x00x01x092
>> Â column=M:x00x00x00x00,
>> timestamp=1524109827690, value=x             Â
>> x80x00x00x00x00x01x092
>> Â column=M:1, timestamp=1524109827690, value=xMaryPoppinsx00\
>> \\\x00x00x01x00x05\
>> \\\x00x00x00x0Cx00
>> x00x00x03x02 Â Â Â Â Â Â Â Â Â Â Â Â Â
>>
>> Hbase Rows when Loading via MapReduce using CsvBulkLoadTool
>>
>> x80x00x00x00x00x0009
>> Â Â Â Â column=M:1, timestamp=1524110486638, value=xJohnDoex00\
>> \\\x00x00x01x00x05
>> x00x00x00x08x00x00\\
>> \\x00x03x02 Â Â Â Â Â Â Â Â Â Â Â Â Â
>> x80x00x00x00x00x01x092
>> Â column=M:1, timestamp=1524110486638, value=xMaryPoppinsx00\
>> \\\x00x00x01x00x05\
>> \\\x00x00x00x0Cx00
>> x00x00x03x02 Â Â Â Â Â Â Â Â Â Â Â Â Â
>>
>>
>> So, the bulk loaded tables have 4 cells for the two rows loaded via psql
>> whereas a bulk load is missing two cells since it lacks the cells with col
>> qualifier :x00x00x00x00
>> Â
>> Is this behavior correct?
>> Â
>> Thanks much for any insight.
>> Â
>>
>>
>> 
>> *How To "Remove" Dark Spots*
>> Gundry MD
>> <http://thirdpartyoffers.netzero.net/TGL3232/5ad818ce6211c18ce6b13st04vuc>
>> http://thirdpartyoffers.netzero.net/TGL3232/5ad818ce6211c18ce6b13st04vuc
>> [image: SponsoredBy Content.Ad]
>
>


Re: hbase cell storage different bewteen bulk load and direct api

2018-04-19 Thread Lew Jackman
I have not tried the master yet branch yet, however on Phoenix 4.13 this 
storage discrepancy in hbase is still present with the extra 
column=M:\x00\x00\x00\x00 cells in hbase when using psql or sqlline. Does 
anyone have an understanding of the meaning of the column qualifier 
\x00\x00\x00\x00 ?

-- Original Message --
From: "Lew Jackman" <lew9...@netzero.net>
To: user@phoenix.apache.org
Cc: user@phoenix.apache.org
Subject: Re: hbase cell storage different bewteen bulk load and direct api
Date: Thu, 19 Apr 2018 13:59:16 GMT


The upsert statement appears the same as the psql results - i.e. extra cells. I 
will try the master branch next. Thanks for the tip.

-- Original Message --
From: Sergey Soldatov <sergeysolda...@gmail.com>
To: user@phoenix.apache.org
Subject: Re: hbase cell storage different bewteen bulk load and direct api
Date: Thu, 19 Apr 2018 12:26:25 +0600

Hi Lew,no. 1st one looks line incorrect. You may file a bug on that ( I believe 
that the second case is correct, but you may also check with uploading data 
using regular upserts). Also, you may check whether the master branch has this 
issue. Thanks,Sergey
On Thu, Apr 19, 2018 at 10:19 AM, Lew Jackman <lew9...@netzero.net> wrote:
Under Phoenix 4.11 we are seeing some storage discrepancies in hbase between a 
load via psql and a bulk load.

To illustrate in a simple case we have modified the example table from the load 
reference https://phoenix.apache.org/bulk_dataload.html

CREATE TABLE example (
 Â Â Â my_pk bigint not null,
 Â Â Â m.first_name varchar(50),
 Â Â Â m.last_name varchar(50) 
 Â Â Â CONSTRAINT pk PRIMARY KEY (my_pk))
 Â Â Â IMMUTABLE_ROWS=true,
 Â Â Â IMMUTABLE_STORAGE_SCHEME = SINGLE_CELL_ARRAY_WITH_OFFSETS,
 Â Â Â COLUMN_ENCODED_BYTES = 1;

Hbase Rows when Loading via PSQL

 x80x00x00x00x00x0009 Â Â Â Â 
column=M:x00x00x00x00, timestamp=1524109827690, 
value=x              
 x80x00x00x00x00x0009 Â Â Â Â 
column=M:1, timestamp=1524109827690, 
value=xJohnDoex00x00x00x01x00x05x00x00x00x08x00x00x00x03x02
 Â Â Â Â Â Â Â Â Â Â Â Â Â 
 x80x00x00x00x00x01x092 
 column=M:x00x00x00x00, 
timestamp=1524109827690, value=x              
 x80x00x00x00x00x01x092 
 column=M:1, timestamp=1524109827690, 
value=xMaryPoppinsx00x00x00x01x00x05x00x00x00x0Cx00x00x00x03x02
 Â Â Â Â Â Â Â Â Â Â Â Â Â 

Hbase Rows when Loading via MapReduce using CsvBulkLoadTool 

 x80x00x00x00x00x0009 Â Â Â Â 
column=M:1, timestamp=1524110486638, 
value=xJohnDoex00x00x00x01x00x05x00x00x00x08x00x00x00x03x02
 Â Â Â Â Â Â Â Â Â Â Â Â Â 
 x80x00x00x00x00x01x092 
 column=M:1, timestamp=1524110486638, 
value=xMaryPoppinsx00x00x00x01x00x05x00x00x00x0Cx00x00x00x03x02
 Â Â Â Â Â Â Â Â Â Â Â Â Â 


So, the bulk loaded tables have 4 cells for the two rows loaded via psql 
whereas a bulk load is missing two cells since it lacks the cells with col 
qualifier :x00x00x00x00 Is this behavior 
correct? Thanks much for any insight. 


How To "Remove" Dark Spots
Gundry MD
http://thirdpartyoffers.netzero.net/TGL3232/5ad818ce6211c18ce6b13st04vuc


Re: hbase cell storage different bewteen bulk load and direct api

2018-04-19 Thread Lew Jackman
The upsert statement appears the same as the psql results - i.e. extra cells. I 
will try the master branch next. Thanks for the tip.

-- Original Message --
From: Sergey Soldatov <sergeysolda...@gmail.com>
To: user@phoenix.apache.org
Subject: Re: hbase cell storage different bewteen bulk load and direct api
Date: Thu, 19 Apr 2018 12:26:25 +0600


Hi Lew,no. 1st one looks line incorrect. You may file a bug on that ( I believe 
that the second case is correct, but you may also check with uploading data 
using regular upserts). Also, you may check whether the master branch has this 
issue. Thanks,Sergey
On Thu, Apr 19, 2018 at 10:19 AM, Lew Jackman <lew9...@netzero.net> wrote:
Under Phoenix 4.11 we are seeing some storage discrepancies in hbase between a 
load via psql and a bulk load.

To illustrate in a simple case we have modified the example table from the load 
reference https://phoenix.apache.org/bulk_dataload.html

CREATE TABLE example (
 Â Â Â my_pk bigint not null,
 Â Â Â m.first_name varchar(50),
 Â Â Â m.last_name varchar(50) 
 Â Â Â CONSTRAINT pk PRIMARY KEY (my_pk))
 Â Â Â IMMUTABLE_ROWS=true,
 Â Â Â IMMUTABLE_STORAGE_SCHEME = SINGLE_CELL_ARRAY_WITH_OFFSETS,
 Â Â Â COLUMN_ENCODED_BYTES = 1;

Hbase Rows when Loading via PSQL

 x80x00x00x00x00x0009 Â Â Â Â 
column=M:x00x00x00x00, timestamp=1524109827690, value=x     
         
 x80x00x00x00x00x0009 Â Â Â Â column=M:1, 
timestamp=1524109827690, 
value=xJohnDoex00x00x00x01x00x05x00x00x00x08x00x00x00x03x02
 Â Â Â Â Â Â Â Â Â Â Â Â Â 
 x80x00x00x00x00x01x092 Â 
column=M:x00x00x00x00, timestamp=1524109827690, value=x     
         
 x80x00x00x00x00x01x092 Â column=M:1, 
timestamp=1524109827690, 
value=xMaryPoppinsx00x00x00x01x00x05x00x00x00x0Cx00x00x00x03x02
 Â Â Â Â Â Â Â Â Â Â Â Â Â 

Hbase Rows when Loading via MapReduce using CsvBulkLoadTool 

 x80x00x00x00x00x0009 Â Â Â Â column=M:1, 
timestamp=1524110486638, 
value=xJohnDoex00x00x00x01x00x05x00x00x00x08x00x00x00x03x02
 Â Â Â Â Â Â Â Â Â Â Â Â Â 
 x80x00x00x00x00x01x092 Â column=M:1, 
timestamp=1524110486638, 
value=xMaryPoppinsx00x00x00x01x00x05x00x00x00x0Cx00x00x00x03x02
 Â Â Â Â Â Â Â Â Â Â Â Â Â 


So, the bulk loaded tables have 4 cells for the two rows loaded via psql 
whereas a bulk load is missing two cells since it lacks the cells with col 
qualifier :x00x00x00x00 Is this behavior correct? Thanks 
much for any insight. 


How To "Remove" Dark Spots
Gundry MD
http://thirdpartyoffers.netzero.net/TGL3232/5ad818ce6211c18ce6b13st04vuc


Re: hbase cell storage different bewteen bulk load and direct api

2018-04-19 Thread Sergey Soldatov
Hi Lew,
no. 1st one looks line incorrect. You may file a bug on that ( I believe
that the second case is correct, but you may also check with uploading data
using regular upserts). Also, you may check whether the master branch has
this issue.

Thanks,
Sergey

On Thu, Apr 19, 2018 at 10:19 AM, Lew Jackman  wrote:

> Under Phoenix 4.11 we are seeing some storage discrepancies in hbase
> between a load via psql and a bulk load.
>
> To illustrate in a simple case we have modified the example table from the
> load reference https://phoenix.apache.org/bulk_dataload.html
>
> CREATE TABLE example (
>my_pk bigint not null,
>m.first_name varchar(50),
>m.last_name varchar(50)
>CONSTRAINT pk PRIMARY KEY (my_pk))
>IMMUTABLE_ROWS=true,
>IMMUTABLE_STORAGE_SCHEME = SINGLE_CELL_ARRAY_WITH_OFFSETS,
>COLUMN_ENCODED_BYTES = 1;
>
> Hbase Rows when Loading via PSQL
>
> \\x80\\x00\\x00\\x00\\x00\\x0009 column=M:\\x00\\x00\\x00\\x00,
> timestamp=1524109827690, value=x
> \\x80\\x00\\x00\\x00\\x00\\x0009 column=M:1, timestamp=1524109827690,
> value=xJohnDoe\\x00\\x00\\x00\\x01\\x00\\x05\\x00\\x00\\x00\\x08\\x00\\x00\\x00\\x03\\x02
>
> \\x80\\x00\\x00\\x00\\x00\\x01\\x092  column=M:\\x00\\x00\\x00\\x00,
> timestamp=1524109827690, value=x
> \\x80\\x00\\x00\\x00\\x00\\x01\\x092  column=M:1,
> timestamp=1524109827690, value=xMaryPoppins\\x00\\x00\\
> x00\\x01\\x00\\x05\\x00\\x00\\x00\\x0C\\x00\\x00\\x00\\x03\\x02
>
>
> Hbase Rows when Loading via MapReduce using CsvBulkLoadTool
>
> \\x80\\x00\\x00\\x00\\x00\\x0009 column=M:1, timestamp=1524110486638,
> value=xJohnDoe\\x00\\x00\\x00\\x01\\x00\\x05\\x00\\x00\\x00\\x08\\x00\\x00\\x00\\x03\\x02
>
> \\x80\\x00\\x00\\x00\\x00\\x01\\x092  column=M:1,
> timestamp=1524110486638, value=xMaryPoppins\\x00\\x00\\
> x00\\x01\\x00\\x05\\x00\\x00\\x00\\x0C\\x00\\x00\\x00\\x03\\x02
>
>
>
> So, the bulk loaded tables have 4 cells for the two rows loaded via psql
> whereas a bulk load is missing two cells since it lacks the cells with col
> qualifier :\\x00\\x00\\x00\\x00
>
> Is this behavior correct?
>
> Thanks much for any insight.
>
>
>
> 
> *How To "Remove" Dark Spots*
> Gundry MD
> 
> http://thirdpartyoffers.netzero.net/TGL3232/5ad818ce6211c18ce6b13st04vuc
> [image: SponsoredBy Content.Ad]


hbase cell storage different bewteen bulk load and direct api

2018-04-18 Thread Lew Jackman
Under Phoenix 4.11 we are seeing some storage discrepancies in hbase between a 
load via psql and a bulk load.

To illustrate in a simple case we have modified the example table from the load 
reference https://phoenix.apache.org/bulk_dataload.html

CREATE TABLE example (
my_pk bigint not null,
m.first_name varchar(50),
m.last_name varchar(50) 
CONSTRAINT pk PRIMARY KEY (my_pk))
IMMUTABLE_ROWS=true,
IMMUTABLE_STORAGE_SCHEME = SINGLE_CELL_ARRAY_WITH_OFFSETS,
COLUMN_ENCODED_BYTES = 1;

Hbase Rows when Loading via PSQL

 \\x80\\x00\\x00\\x00\\x00\\x0009 column=M:\\x00\\x00\\x00\\x00, 
timestamp=1524109827690, value=x  
 \\x80\\x00\\x00\\x00\\x00\\x0009 column=M:1, timestamp=1524109827690, 
value=xJohnDoe\\x00\\x00\\x00\\x01\\x00\\x05\\x00\\x00\\x00\\x08\\x00\\x00\\x00\\x03\\x02
  
 \\x80\\x00\\x00\\x00\\x00\\x01\\x092  column=M:\\x00\\x00\\x00\\x00, 
timestamp=1524109827690, value=x  
 \\x80\\x00\\x00\\x00\\x00\\x01\\x092  column=M:1, timestamp=1524109827690, 
value=xMaryPoppins\\x00\\x00\\x00\\x01\\x00\\x05\\x00\\x00\\x00\\x0C\\x00\\x00\\x00\\x03\\x02
  

Hbase Rows when Loading via MapReduce using CsvBulkLoadTool 

 \\x80\\x00\\x00\\x00\\x00\\x0009 column=M:1, timestamp=1524110486638, 
value=xJohnDoe\\x00\\x00\\x00\\x01\\x00\\x05\\x00\\x00\\x00\\x08\\x00\\x00\\x00\\x03\\x02
  
 \\x80\\x00\\x00\\x00\\x00\\x01\\x092  column=M:1, timestamp=1524110486638, 
value=xMaryPoppins\\x00\\x00\\x00\\x01\\x00\\x05\\x00\\x00\\x00\\x0C\\x00\\x00\\x00\\x03\\x02
  


So, the bulk loaded tables have 4 cells for the two rows loaded via psql 
whereas a bulk load is missing two cells since it lacks the cells with col 
qualifier :\\x00\\x00\\x00\\x00 Is this behavior correct?  Thanks much for any 
insight. 

How To "Remove" Dark Spots
Gundry MD
http://thirdpartyoffers.netzero.net/TGL3231/5ad818ce6211c18ce6b13st04vuc