[jira] [Updated] (PHOENIX-3555) Building async local index by IndexTool generate wrong data

2016-12-29 Thread chenzhiming (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenzhiming updated PHOENIX-3555:
-
Description: 
1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021   column=info:CAP_DATE, 
timestamp=1483021375797, value=2016-01-01 00:00:00  

 \x021   column=info:CAR_NUM, 
timestamp=1483021375797, value=car1 
 
 \x021   column=info:ORG_ID, 
timestamp=1483021375797, value=\x80\x00\x00\x00\x00\x00\x00\x0B 
  
 \x021   column=info:ORG_NAME, 
timestamp=1483021375797, value=orgname1 

 \x021   column=info:_0, 
timestamp=1483021375797, value=x 
--
look here,the index data is wrong:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x00\x00\x00\x00
the right index data should be:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001

this is the reason i get any null value(the column not in index):
0: jdbc:phoenix:master> SELECT ORG_ID,CAP_DATE,CAR_NUM,ORG_NAME FROM 
C_PICRECORD WHERE  CAR_NUM='car1' AND CAP_DATE>='2016-01-01' AND 
CAP_DATE<='2016-05-02'  LIMIT 10;

| ORG_ID  |  CAP_DATE| CAR_NUM  | ORG_NAME  |
| null  | 2016-01-01 00:00:00  |  car1   |  
 |


PS: I can get the right index data if change pk's datatype to bigint or upsert 
some string as pk such as 'abc'.


  was:
1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021   column=info:CAP_DATE, 
timestamp=1483021375797, value=2016-01-01 00:00:00  

 \x021   

[jira] [Updated] (PHOENIX-3555) Building async local index by IndexTool generate wrong data

2016-12-29 Thread chenzhiming (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenzhiming updated PHOENIX-3555:
-
Description: 
1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021   column=info:CAP_DATE, 
timestamp=1483021375797, value=2016-01-01 00:00:00  

 \x021   column=info:CAR_NUM, 
timestamp=1483021375797, value=car1 
 
 \x021   column=info:ORG_ID, 
timestamp=1483021375797, value=\x80\x00\x00\x00\x00\x00\x00\x0B 
  
 \x021   column=info:ORG_NAME, 
timestamp=1483021375797, value=orgname1 

 \x021   column=info:_0, 
timestamp=1483021375797, value=x 
--
look here,the index data is wrong:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x00\x00\x00\x00
the right index data should be:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001

this is the reason i get any null value(the column not in index):
0: jdbc:phoenix:master> SELECT ORG_ID,CAP_DATE,CAR_NUM,ORG_NAME FROM 
C_PICRECORD WHERE  CAR_NUM='car1' AND CAP_DATE>='2016-01-01' AND 
CAP_DATE<='2016-05-02'  LIMIT 10;

| ORG_ID  |  CAP_DATE| CAR_NUM  | ORG_NAME  |
| null  | 2016-01-01 00:00:00  |  car1   |  
 |


** i can get the right index data if change pk's datatype to bigint or upsert 
some string as pk such as 'abc'.**


  was:
1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021   column=info:CAP_DATE, 
timestamp=1483021375797, value=2016-01-01 00:00:00  

 \x021   

[jira] [Updated] (PHOENIX-3555) Building async local index by IndexTool generate wrong data

2016-12-29 Thread chenzhiming (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenzhiming updated PHOENIX-3555:
-
Description: 
1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021   column=info:CAP_DATE, 
timestamp=1483021375797, value=2016-01-01 00:00:00  

 \x021   column=info:CAR_NUM, 
timestamp=1483021375797, value=car1 
 
 \x021   column=info:ORG_ID, 
timestamp=1483021375797, value=\x80\x00\x00\x00\x00\x00\x00\x0B 
  
 \x021   column=info:ORG_NAME, 
timestamp=1483021375797, value=orgname1 

 \x021   column=info:_0, 
timestamp=1483021375797, value=x 
--
look here,the index data is wrong:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x00\x00\x00\x00
the right index data should be:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001

this is the reason i get any null value(the column not in index):
0: jdbc:phoenix:master> SELECT ORG_ID,CAP_DATE,CAR_NUM,ORG_NAME FROM 
C_PICRECORD WHERE  CAR_NUM='car1' AND CAP_DATE>='2016-01-01' AND 
CAP_DATE<='2016-05-02'  LIMIT 10;

| ORG_ID  |  CAP_DATE| CAR_NUM  | ORG_NAME  |
__
| null  | 2016-01-01 00:00:00  |  car1   |  
 |



ps: i can get the right index data if change pk's datatype to bigint or upsert 
some string as pk such as 'abc'.


  was:
1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021 

[jira] [Updated] (PHOENIX-3555) Building async local index by IndexTool generate wrong data

2016-12-29 Thread chenzhiming (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenzhiming updated PHOENIX-3555:
-
Description: 
1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021   column=info:CAP_DATE, 
timestamp=1483021375797, value=2016-01-01 00:00:00  

 \x021   column=info:CAR_NUM, 
timestamp=1483021375797, value=car1 
 
 \x021   column=info:ORG_ID, 
timestamp=1483021375797, value=\x80\x00\x00\x00\x00\x00\x00\x0B 
  
 \x021   column=info:ORG_NAME, 
timestamp=1483021375797, value=orgname1 

 \x021   column=info:_0, 
timestamp=1483021375797, value=x 
--
look here,the index data is wrong:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x00\x00\x00\x00
the right index data should be:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001

this is the reason i get any null value(the column not in index):
0: jdbc:phoenix:master> SELECT ORG_ID,CAP_DATE,CAR_NUM,ORG_NAME FROM 
C_PICRECORD WHERE  CAR_NUM='car1' AND CAP_DATE>='2016-01-01' AND 
CAP_DATE<='2016-05-02'  LIMIT 10;

| ORG_ID  |  CAP_DATE| CAR_NUM  | ORG_NAME  |
| null  | 2016-01-01 00:00:00  |  car1   |  
 |


ps: i can get the right index data if change pk's datatype to bigint or upsert 
some string as pk such as 'abc'.


  was:
1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021   column=info:CAP_DATE, 
timestamp=1483021375797, value=2016-01-01 00:00:00  

 \x021   

[jira] [Updated] (PHOENIX-3555) Building async local index by IndexTool generate wrong data

2016-12-29 Thread chenzhiming (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenzhiming updated PHOENIX-3555:
-
Description: 
1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021   column=info:CAP_DATE, 
timestamp=1483021375797, value=2016-01-01 00:00:00  

 \x021   column=info:CAR_NUM, 
timestamp=1483021375797, value=car1 
 
 \x021   column=info:ORG_ID, 
timestamp=1483021375797, value=\x80\x00\x00\x00\x00\x00\x00\x0B 
  
 \x021   column=info:ORG_NAME, 
timestamp=1483021375797, value=orgname1 

 \x021   column=info:_0, 
timestamp=1483021375797, value=x 
--
look here,the index data is wrong:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x00\x00\x00\x00
the right index data should be:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001

this is the reason i get any null value(the column not in index):
0: jdbc:phoenix:master> SELECT ORG_ID,CAP_DATE,CAR_NUM,ORG_NAME FROM 
C_PICRECORD WHERE  CAR_NUM='car1' AND CAP_DATE>='2016-01-01' AND 
CAP_DATE<='2016-05-02'  LIMIT 10;

| ORG_ID  |  CAP_DATE| CAR_NUM  | ORG_NAME  |
| null  | 2016-01-01 00:00:00  |  car1   |  
 |


ps: i can get the right index data if change pk's datatype to bigint or upsert 
some string as pk such as 'abc'.


  was:
1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021   column=info:CAP_DATE, 
timestamp=1483021375797, value=2016-01-01 00:00:00  
  

[jira] [Updated] (PHOENIX-3555) Building async local index by IndexTool generate wrong data

2016-12-29 Thread chenzhiming (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenzhiming updated PHOENIX-3555:
-
Description: 
1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021   column=info:CAP_DATE, 
timestamp=1483021375797, value=2016-01-01 00:00:00  

 \x021   column=info:CAR_NUM, 
timestamp=1483021375797, value=car1 
 
 \x021   column=info:ORG_ID, 
timestamp=1483021375797, value=\x80\x00\x00\x00\x00\x00\x00\x0B 
  
 \x021   column=info:ORG_NAME, 
timestamp=1483021375797, value=orgname1 

 \x021   column=info:_0, 
timestamp=1483021375797, value=x 
--
look here,the index data is wrong:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x00\x00\x00\x00
the right index data should be:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001

this is the reason i get any null value(the column not in index):
0: jdbc:phoenix:master> SELECT ORG_ID,CAP_DATE,CAR_NUM,ORG_NAME FROM 
C_PICRECORD WHERE  CAR_NUM='car1' AND CAP_DATE>='2016-01-01' AND 
CAP_DATE<='2016-05-02'  LIMIT 10;

| ORG_ID  |  CAP_DATE| CAR_NUM  | ORG_NAME  |

| null  | 2016-01-01 00:00:00  |  car1   |  
 |



ps: i can get the right index data if change pk's datatype to bigint or upsert 
some string as pk such as 'abc'.


  was:
1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021   

[jira] [Updated] (PHOENIX-3555) Building async local index by IndexTool generate wrong data

2016-12-29 Thread chenzhiming (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenzhiming updated PHOENIX-3555:
-
Description: 
1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021   column=info:CAP_DATE, 
timestamp=1483021375797, value=2016-01-01 00:00:00  

 \x021   column=info:CAR_NUM, 
timestamp=1483021375797, value=car1 
 
 \x021   column=info:ORG_ID, 
timestamp=1483021375797, value=\x80\x00\x00\x00\x00\x00\x00\x0B 
  
 \x021   column=info:ORG_NAME, 
timestamp=1483021375797, value=orgname1 

 \x021   column=info:_0, 
timestamp=1483021375797, value=x 
--
look here,the index data is wrong:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x00\x00\x00\x00
the right index data should be:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001

this is the reason i get any null value(the column not in index):
0: jdbc:phoenix:master> SELECT ORG_ID,CAP_DATE,CAR_NUM,ORG_NAME FROM 
C_PICRECORD WHERE  CAR_NUM='car1' AND CAP_DATE>='2016-01-01' AND 
CAP_DATE<='2016-05-02'  LIMIT 10;

| ORG_ID  |   CAP_DATE   | CAR_NUM  | ORG_NAME  |
+-+--+--+---+
| null| 2016-01-01 00:00:00  | car1 |   |
+-+--+--+---+


ps: i can get the right index data if change pk's datatype to bigint or upsert 
some string as pk such as 'abc'.


  was:
1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021   column=info:CAP_DATE, 
timestamp=1483021375797, value=2016-01-01 

[jira] [Created] (PHOENIX-3555) Building async local index by IndexTool generate wrong data

2016-12-29 Thread chenzhiming (JIRA)
chenzhiming created PHOENIX-3555:


 Summary: Building async local index by IndexTool generate wrong 
data
 Key: PHOENIX-3555
 URL: https://issues.apache.org/jira/browse/PHOENIX-3555
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.8.0
 Environment: phoenix4.8.0
Reporter: chenzhiming


1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021   column=info:CAP_DATE, 
timestamp=1483021375797, value=2016-01-01 00:00:00  

 \x021   column=info:CAR_NUM, 
timestamp=1483021375797, value=car1 
 
 \x021   column=info:ORG_ID, 
timestamp=1483021375797, value=\x80\x00\x00\x00\x00\x00\x00\x0B 
  
 \x021   column=info:ORG_NAME, 
timestamp=1483021375797, value=orgname1 

 \x021   column=info:_0, 
timestamp=1483021375797, value=x 
--
look here,the index data is wrong:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x00\x00\x00\x00
the right index data should be:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001

this is the reason i get any null value(the column not in index):
0: jdbc:phoenix:master> SELECT ORG_ID,CAP_DATE,CAR_NUM,ORG_NAME FROM 
C_PICRECORD WHERE  CAR_NUM='car1' AND CAP_DATE>='2016-01-01' AND 
CAP_DATE<='2016-05-02'  LIMIT 10;
+-+--+--+---+
| ORG_ID  |   CAP_DATE   | CAR_NUM  | ORG_NAME  |
+-+--+--+---+
| null| 2016-01-01 00:00:00  | car1 |   |
+-+--+--+---+


ps: i can get the right index data if change pk's datatype to bigint or upsert 
some string as pk such as 'abc'.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3554) Building async local index by IndexTool generate wrong data

2016-12-29 Thread chenzhiming (JIRA)
chenzhiming created PHOENIX-3554:


 Summary: Building async local index by IndexTool generate wrong 
data
 Key: PHOENIX-3554
 URL: https://issues.apache.org/jira/browse/PHOENIX-3554
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.8.0
 Environment: phoenix4.8.0
Reporter: chenzhiming


1.a salt table which pk is varchar
CREATE TABLE C_PICRECORD (
  ID VARCHAR NOT NULL PRIMARY KEY,
  "info".CAR_NUM VARCHAR(18)  NULL,
  "info".CAP_DATE VARCHAR  NULL,
  "info".ORG_ID BIGINT  NULL,
  "info".ORG_NAME VARCHAR(255)  NULL
) SALT_BUCKETS=3;

2.upsert into the table 
UPSERT INTO C_PICRECORD(ID,CAR_NUM,CAP_DATE,ORG_ID,ORG_NAME) 
VALUES('1','car1','2016-01-01 00:00:00',11,'orgname1');

3.create async local index 
CREATE LOCAL INDEX C_PICRECORD_IDX_1 on 
C_PICRECORD("info".CAR_NUM,"info".CAP_DATE) ASYNC;

4.use IndexTool to build index 
hbase org.apache.phoenix.mapreduce.index.IndexTool  --data-table C_PICRECORD 
--index-table C_PICRECORD_IDX_1  --output-path /tmp/C_PICRECORD_IDX_1

5.enter into "hbase shell" and scan salt table

hbase(main):102:0> scan 'C_PICRECORD'
ROW  COLUMN+CELL

   
 \x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x0 column=L#0:_0, 
timestamp=1483108992853, value=x
   
 0\x00\x00\x00  

   
 \x021   column=info:CAP_DATE, 
timestamp=1483021375797, value=2016-01-01 00:00:00  

 \x021   column=info:CAR_NUM, 
timestamp=1483021375797, value=car1 
 
 \x021   column=info:ORG_ID, 
timestamp=1483021375797, value=\x80\x00\x00\x00\x00\x00\x00\x0B 
  
 \x021   column=info:ORG_NAME, 
timestamp=1483021375797, value=orgname1 

 \x021   column=info:_0, 
timestamp=1483021375797, value=x 
--
look here,the index data is wrong:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001\x00\x00\x00\x00
the right index data should be:
\x02\x00\x0Ecar1\x002016-01-01 00:00:00\x001

this is the reason i get any null value(the column not in index):
0: jdbc:phoenix:master> SELECT ORG_ID,CAP_DATE,CAR_NUM,ORG_NAME FROM 
C_PICRECORD WHERE  CAR_NUM='car1' AND CAP_DATE>='2016-01-01' AND 
CAP_DATE<='2016-05-02'  LIMIT 10;
+-+--+--+---+
| ORG_ID  |   CAP_DATE   | CAR_NUM  | ORG_NAME  |
+-+--+--+---+
| null| 2016-01-01 00:00:00  | car1 |   |
+-+--+--+---+







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3553) Zookeeper connection should be closed immediately after DefaultStatisticsCollector's collecting stats done

2016-12-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787049#comment-15787049
 ] 

Hadoop QA commented on PHOENIX-3553:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12845150/PHOENIX-3553.patch
  against master branch at commit 07f92732f9c6d2d9464012cebeb4cefc10da95d5.
  ATTACHMENT ID: 12845150

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
42 warning messages.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+
SchemaUtil.getPhysicalTableName(PhoenixDatabaseMetaData.SYSTEM_CATALOG_NAME_BYTES,
 env.getConfiguration()));
+get.addColumn(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, 
PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES);
+guidepostWidth = 
PLong.INSTANCE.getCodec().decodeLong(cell.getValueArray(), 
cell.getValueOffset(), SortOrder.getDefault());

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/712//testReport/
Javadoc warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/712//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/712//console

This message is automatically generated.

> Zookeeper connection should be closed immediately after 
> DefaultStatisticsCollector's collecting stats done
> --
>
> Key: PHOENIX-3553
> URL: https://issues.apache.org/jira/browse/PHOENIX-3553
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.9.0
>Reporter: Yeonseop Kim
>  Labels: stats, zookeeper
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3553.patch
>
>
> In every minor compaction job of HBase,
> org.apache.phoenix.schema.stats.DefaultStatisticsCollector.initGuidePostDepth()
>  is called,
> and SYSTEM.CATALOG table is open to get guidepost width via
> htable = env.getTable(
>  
> SchemaUtil.getPhysicalTableName(PhoenixDatabaseMetaData.SYSTEM_CATALOG_NAME_BYTES,
>  env.getConfiguration()));
> This function call creates one zookeeper connection to get cluster id.
> DefaultStatisticsCollector doesn't close this zookeeper connection 
> immediately after get guidepost width, and the zookeeper connection remains 
> alive until HRegion is closed.
> This is not a problem with small number of Regions, but when number of Region 
> is large and upsert operation is frequent, the number of zookeeper connection 
> gradually increases  to hundreds, and the zookeeper server nodes experience  
> short of available TCP/IP ports.
> This zookeeper connection should be closed immediately after get guidepost 
> width.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3333) Support Spark 2.0

2016-12-29 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787001#comment-15787001
 ] 

James Taylor commented on PHOENIX-:
---

+1 to the patch. Nice work, [~jmahonin] and thanks for the testing, 
[~dalin...@gmail.com] &  [~kalyanhadoop].

> Support Spark 2.0
> -
>
> Key: PHOENIX-
> URL: https://issues.apache.org/jira/browse/PHOENIX-
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.1
> Environment: spark 2.0 ,phoenix 4.8.0 , os is centos 6.7 ,hadoop is 
> hdp 2.5
>Reporter: dalin qin
> Fix For: 4.10.0
>
> Attachments: PHOENIX--interim.patch, PHOENIX-.patch
>
>
> spark version is  2.0.0.2.5.0.0-1245
> As mentioned by Josh , I believe spark 2.0 changed their api so that failed 
> phoenix. Please come up with update version to adapt spark's change.
> In [1]: df = sqlContext.read \
>...:   .format("org.apache.phoenix.spark") \
>...:   .option("table", "TABLE1") \
>...:   .option("zkUrl", "namenode:2181:/hbase-unsecure") \
>...:   .load()
> ---
> Py4JJavaError Traceback (most recent call last)
>  in ()
> > 1 df = sqlContext.read   .format("org.apache.phoenix.spark")   
> .option("table", "TABLE1")   .option("zkUrl", 
> "namenode:2181:/hbase-unsecure")   .load()
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/readwriter.pyc in load(self, 
> path, format, schema, **options)
> 151 return 
> self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
> 152 else:
> --> 153 return self._df(self._jreader.load())
> 154
> 155 @since(1.4)
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py
>  in __call__(self, *args)
> 931 answer = self.gateway_client.send_command(command)
> 932 return_value = get_return_value(
> --> 933 answer, self.gateway_client, self.target_id, self.name)
> 934
> 935 for temp_arg in temp_args:
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/utils.pyc in deco(*a, **kw)
>  61 def deco(*a, **kw):
>  62 try:
> ---> 63 return f(*a, **kw)
>  64 except py4j.protocol.Py4JJavaError as e:
>  65 s = e.java_exception.toString()
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py 
> in get_return_value(answer, gateway_client, target_id, name)
> 310 raise Py4JJavaError(
> 311 "An error occurred while calling {0}{1}{2}.\n".
> --> 312 format(target_id, ".", name), value)
> 313 else:
> 314 raise Py4JError(
> Py4JJavaError: An error occurred while calling o43.load.
> : java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.getDeclaredMethod(Class.java:2128)
> at 
> java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475)
> at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:472)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
> at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
> at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
> at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
> at 
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
> at org.apache.spark.SparkContext.clean(SparkContext.scala:2037)
> at 

[jira] [Commented] (PHOENIX-3333) Support Spark 2.0

2016-12-29 Thread dalin qin (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786977#comment-15786977
 ] 

dalin qin commented on PHOENIX-:


Hi Josh,

yes, you are right ,to let spark load/wirte phoenix table ,only the new 
compiled phoenix-spark-4.9.0-HBase-1.1.jar and 
phoenix-4.9.0-HBase-1.1-client.jar are sufficent . 
I've also don spark2.0.2 write and spark 1.6.2 read write testing ,all works 
fine.

Thanks.

> Support Spark 2.0
> -
>
> Key: PHOENIX-
> URL: https://issues.apache.org/jira/browse/PHOENIX-
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.1
> Environment: spark 2.0 ,phoenix 4.8.0 , os is centos 6.7 ,hadoop is 
> hdp 2.5
>Reporter: dalin qin
> Fix For: 4.10.0
>
> Attachments: PHOENIX--interim.patch, PHOENIX-.patch
>
>
> spark version is  2.0.0.2.5.0.0-1245
> As mentioned by Josh , I believe spark 2.0 changed their api so that failed 
> phoenix. Please come up with update version to adapt spark's change.
> In [1]: df = sqlContext.read \
>...:   .format("org.apache.phoenix.spark") \
>...:   .option("table", "TABLE1") \
>...:   .option("zkUrl", "namenode:2181:/hbase-unsecure") \
>...:   .load()
> ---
> Py4JJavaError Traceback (most recent call last)
>  in ()
> > 1 df = sqlContext.read   .format("org.apache.phoenix.spark")   
> .option("table", "TABLE1")   .option("zkUrl", 
> "namenode:2181:/hbase-unsecure")   .load()
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/readwriter.pyc in load(self, 
> path, format, schema, **options)
> 151 return 
> self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
> 152 else:
> --> 153 return self._df(self._jreader.load())
> 154
> 155 @since(1.4)
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py
>  in __call__(self, *args)
> 931 answer = self.gateway_client.send_command(command)
> 932 return_value = get_return_value(
> --> 933 answer, self.gateway_client, self.target_id, self.name)
> 934
> 935 for temp_arg in temp_args:
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/utils.pyc in deco(*a, **kw)
>  61 def deco(*a, **kw):
>  62 try:
> ---> 63 return f(*a, **kw)
>  64 except py4j.protocol.Py4JJavaError as e:
>  65 s = e.java_exception.toString()
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py 
> in get_return_value(answer, gateway_client, target_id, name)
> 310 raise Py4JJavaError(
> 311 "An error occurred while calling {0}{1}{2}.\n".
> --> 312 format(target_id, ".", name), value)
> 313 else:
> 314 raise Py4JError(
> Py4JJavaError: An error occurred while calling o43.load.
> : java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.getDeclaredMethod(Class.java:2128)
> at 
> java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475)
> at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:472)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
> at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
> at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
> at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
> at 
> 

[jira] [Updated] (PHOENIX-3553) Zookeeper connection should be closed immediately after DefaultStatisticsCollector's collecting stats done

2016-12-29 Thread Yeonseop Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yeonseop Kim updated PHOENIX-3553:
--
Attachment: PHOENIX-3553.patch

> Zookeeper connection should be closed immediately after 
> DefaultStatisticsCollector's collecting stats done
> --
>
> Key: PHOENIX-3553
> URL: https://issues.apache.org/jira/browse/PHOENIX-3553
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.9.0
>Reporter: Yeonseop Kim
>  Labels: stats, zookeeper
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3553.patch
>
>
> In every minor compaction job of HBase,
> org.apache.phoenix.schema.stats.DefaultStatisticsCollector.initGuidePostDepth()
>  is called,
> and SYSTEM.CATALOG table is open to get guidepost width via
> htable = env.getTable(
>  
> SchemaUtil.getPhysicalTableName(PhoenixDatabaseMetaData.SYSTEM_CATALOG_NAME_BYTES,
>  env.getConfiguration()));
> This function call creates one zookeeper connection to get cluster id.
> DefaultStatisticsCollector doesn't close this zookeeper connection 
> immediately after get guidepost width, and the zookeeper connection remains 
> alive until HRegion is closed.
> This is not a problem with small number of Regions, but when number of Region 
> is large and upsert operation is frequent, the number of zookeeper connection 
> gradually increases  to hundreds, and the zookeeper server nodes experience  
> short of available TCP/IP ports.
> This zookeeper connection should be closed immediately after get guidepost 
> width.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3553) Zookeeper connection should be closed immediately after DefaultStatisticsCollector's collecting stats done

2016-12-29 Thread Yeonseop Kim (JIRA)
Yeonseop Kim created PHOENIX-3553:
-

 Summary: Zookeeper connection should be closed immediately after 
DefaultStatisticsCollector's collecting stats done
 Key: PHOENIX-3553
 URL: https://issues.apache.org/jira/browse/PHOENIX-3553
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.9.0
Reporter: Yeonseop Kim


In every minor compaction job of HBase,
org.apache.phoenix.schema.stats.DefaultStatisticsCollector.initGuidePostDepth() 
is called,

and SYSTEM.CATALOG table is open to get guidepost width via

htable = env.getTable(

 
SchemaUtil.getPhysicalTableName(PhoenixDatabaseMetaData.SYSTEM_CATALOG_NAME_BYTES,
 env.getConfiguration()));
This function call creates one zookeeper connection to get cluster id.

DefaultStatisticsCollector doesn't close this zookeeper connection immediately 
after get guidepost width, and the zookeeper connection remains alive until 
HRegion is closed.

This is not a problem with small number of Regions, but when number of Region 
is large and upsert operation is frequent, the number of zookeeper connection 
gradually increases  to hundreds, and the zookeeper server nodes experience  
short of available TCP/IP ports.

This zookeeper connection should be closed immediately after get guidepost 
width.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3333) Support Spark 2.0

2016-12-29 Thread Josh Mahonin (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786510#comment-15786510
 ] 

Josh Mahonin commented on PHOENIX-:
---

Thanks for testing [~dalin...@gmail.com]

One thing I'm curious about is whether all of those JARs are necessary in the 
spark classpath settings? In my experience, just the 
phoenix--client.jar is sufficient.

> Support Spark 2.0
> -
>
> Key: PHOENIX-
> URL: https://issues.apache.org/jira/browse/PHOENIX-
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.1
> Environment: spark 2.0 ,phoenix 4.8.0 , os is centos 6.7 ,hadoop is 
> hdp 2.5
>Reporter: dalin qin
> Fix For: 4.10.0
>
> Attachments: PHOENIX--interim.patch, PHOENIX-.patch
>
>
> spark version is  2.0.0.2.5.0.0-1245
> As mentioned by Josh , I believe spark 2.0 changed their api so that failed 
> phoenix. Please come up with update version to adapt spark's change.
> In [1]: df = sqlContext.read \
>...:   .format("org.apache.phoenix.spark") \
>...:   .option("table", "TABLE1") \
>...:   .option("zkUrl", "namenode:2181:/hbase-unsecure") \
>...:   .load()
> ---
> Py4JJavaError Traceback (most recent call last)
>  in ()
> > 1 df = sqlContext.read   .format("org.apache.phoenix.spark")   
> .option("table", "TABLE1")   .option("zkUrl", 
> "namenode:2181:/hbase-unsecure")   .load()
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/readwriter.pyc in load(self, 
> path, format, schema, **options)
> 151 return 
> self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
> 152 else:
> --> 153 return self._df(self._jreader.load())
> 154
> 155 @since(1.4)
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py
>  in __call__(self, *args)
> 931 answer = self.gateway_client.send_command(command)
> 932 return_value = get_return_value(
> --> 933 answer, self.gateway_client, self.target_id, self.name)
> 934
> 935 for temp_arg in temp_args:
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/utils.pyc in deco(*a, **kw)
>  61 def deco(*a, **kw):
>  62 try:
> ---> 63 return f(*a, **kw)
>  64 except py4j.protocol.Py4JJavaError as e:
>  65 s = e.java_exception.toString()
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py 
> in get_return_value(answer, gateway_client, target_id, name)
> 310 raise Py4JJavaError(
> 311 "An error occurred while calling {0}{1}{2}.\n".
> --> 312 format(target_id, ".", name), value)
> 313 else:
> 314 raise Py4JError(
> Py4JJavaError: An error occurred while calling o43.load.
> : java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.getDeclaredMethod(Class.java:2128)
> at 
> java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475)
> at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:472)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
> at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
> at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
> at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
> at 
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
> at 

[jira] [Created] (PHOENIX-3552) JDBC connectivity is very slow with Phoenix Client driver

2016-12-29 Thread srinivas padala (JIRA)
srinivas padala created PHOENIX-3552:


 Summary: JDBC connectivity is very slow with Phoenix Client driver
 Key: PHOENIX-3552
 URL: https://issues.apache.org/jira/browse/PHOENIX-3552
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.7.0
Reporter: srinivas padala


JDBC connectivity is very slow with Phoenix Client driver



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-29 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-3453:
--
Assignee: chenglei

> Secondary index and query using distinct: Outer query results in ERROR 201 
> (22000): Illegal data. CHAR types may only contain single byte characters
> 
>
> Key: PHOENIX-3453
> URL: https://issues.apache.org/jira/browse/PHOENIX-3453
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID 
> DESC);
> UPSERT INTO test.test VALUES ('entity1',1.1);
> SELECT DISTINCT entity_id, score
> FROM(
> SELECT entity_id, score
> FROM test.test
> LIMIT 25
> );
> Output (in SQuirreL)
> ���   1.1
> If you run it in SQuirreL it results in the entity_id column getting the 
> above error value. Notice that if you remove the secondary index or DISTINCT 
> you get the correct result.
> I've also run the query through the Phoenix java api. Then I get the 
> following exception:
> Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types 
> may only contain single byte characters ()
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at 
> org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291)
> at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121)
> at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997)
> at 
> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3551) broken package

2016-12-29 Thread Flavius Nopcea (JIRA)
Flavius Nopcea created PHOENIX-3551:
---

 Summary: broken package
 Key: PHOENIX-3551
 URL: https://issues.apache.org/jira/browse/PHOENIX-3551
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.4.0
Reporter: Flavius Nopcea


Hi,

I want to let you know that the package located here 
https://mvnrepository.com/artifact/org.apache.phoenix/phoenix/4.4.0-HBase-1.1

is broken.

we can not download it from a regular pom xml file.


thanks 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)