[jira] [Created] (HAWQ-1441) Implement SSL Access from RPS to Ranger

2017-04-26 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1441:
-

 Summary: Implement SSL Access from RPS to Ranger
 Key: HAWQ-1441
 URL: https://issues.apache.org/jira/browse/HAWQ-1441
 Project: Apache HAWQ
  Issue Type: Sub-task
  Components: Security
Reporter: Lili Ma
Assignee: Ed Espino


SSL connection from Ranger plugin to Ranger is a way to ensure the security of 
data transferred between Ranger to Plugin Service. So we need to implement SSL 
support in RPS connection to Ranger.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HAWQ-1436) Implement RPS High availability on HAWQ

2017-04-26 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984355#comment-15984355
 ] 

Lili Ma edited comment on HAWQ-1436 at 4/26/17 8:20 AM:


I suggest we do solution 1) to simply the implementation as the first step.

[~Paul Guo]
Since RPS will only be called by master and a query usually only raises several 
RPS request, there won't be a lot of requests to RPS. So I think load balancer 
may be over-design. 

For services which support a lot of concurrent requests, a proxy server at the 
front of multiple web Service is an ideal design, since it will be convenient 
for both HA and load balance.

[~lei_chang]
It's a good suggestion for auto-discover RPS failure and auto-restart, but I 
think RPS is a little different from Resource Manager process. We need add 
special processing for it since it's a Web service. Do you have any suggestion 
on the detailed implementation?


was (Author: lilima):
I suggest we do solution 1) to simply the implementation as the first step.

[~Paul Guo]
Since RPS will only be called by master and a query usually only raises several 
RPS request, there won't be a lot of requests to RPS. So I think load balancer 
may be over-design. 

For services which support a lot of concurrent requests, a proxy server at the 
front of multiple web Service is an ideal design, since it will be convenient 
for both HA and load balance.

[~lei_chang]
It's a good suggestion for auto-discover RPS failure and auto-restart, but I 
think RPS is a little different from Resource Manager process. We need add 
special processing for it since it's a Web service. Do you have any suggestion 
on the detailed implementation?

> Implement RPS High availability on HAWQ
> ---
>
> Key: HAWQ-1436
> URL: https://issues.apache.org/jira/browse/HAWQ-1436
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Hongxu Ma
>Assignee: Hongxu Ma
> Fix For: backlog
>
> Attachments: RPSHADesign_v0.1.pdf
>
>
> Once Ranger is configured, HAWQ will rely on RPS to connect to Ranger. A 
> single point RPS may influence the robustness of HAWQ. 
> Thus We need to investigate and design out the way to implement RPS High 
> availability. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HAWQ-1436) Implement RPS High availability on HAWQ

2017-04-26 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984355#comment-15984355
 ] 

Lili Ma edited comment on HAWQ-1436 at 4/26/17 8:20 AM:


I suggest we do solution 1) to simply the implementation as the first step.

[~Paul Guo]
Since RPS will only be called by master and a query usually only raises several 
RPS request, there won't be a lot of requests to RPS. So I think load balancer 
may be over-design. 

For services which support a lot of concurrent requests, a proxy server at the 
front of multiple web Service is an ideal design, since it will be convenient 
for both HA and load balance.

[~lei_chang]
It's a good suggestion for auto-discover RPS failure and auto-restart, but I 
think RPS is a little different from Resource Manager process. We need add 
special processing for it since it's a Web service. Do you have any suggestion 
on the detailed implementation?


was (Author: lilima):
I suggest we do solution 1) to simply the implementation as the first step.

Since RPS will only be called by master and a query usually only raises several 
RPS request, there won't be a lot of requests to RPS. So I think load balancer 
may be over-design. 

For services which support a lot of concurrent requests, a proxy server at the 
front of multiple web Service is an ideal design, since it will be convenient 
for both HA and load balance.


> Implement RPS High availability on HAWQ
> ---
>
> Key: HAWQ-1436
> URL: https://issues.apache.org/jira/browse/HAWQ-1436
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Hongxu Ma
>Assignee: Hongxu Ma
> Fix For: backlog
>
> Attachments: RPSHADesign_v0.1.pdf
>
>
> Once Ranger is configured, HAWQ will rely on RPS to connect to Ranger. A 
> single point RPS may influence the robustness of HAWQ. 
> Thus We need to investigate and design out the way to implement RPS High 
> availability. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-1436) Implement RPS High availability on HAWQ

2017-04-26 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984355#comment-15984355
 ] 

Lili Ma commented on HAWQ-1436:
---

I suggest we do solution 1) to simply the implementation as the first step.

Since RPS will only be called by master and a query usually only raises several 
RPS request, there won't be a lot of requests to RPS. So I think load balancer 
may be over-design. 

For services which support a lot of concurrent requests, a proxy server at the 
front of multiple web Service is an ideal design, since it will be convenient 
for both HA and load balance.


> Implement RPS High availability on HAWQ
> ---
>
> Key: HAWQ-1436
> URL: https://issues.apache.org/jira/browse/HAWQ-1436
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Hongxu Ma
>Assignee: Hongxu Ma
> Fix For: backlog
>
> Attachments: RPSHADesign_v0.1.pdf
>
>
> Once Ranger is configured, HAWQ will rely on RPS to connect to Ranger. A 
> single point RPS may influence the robustness of HAWQ. 
> Thus We need to investigate and design out the way to implement RPS High 
> availability. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HAWQ-1428) Table name pg_aoseg_$relfilenode does not change after running truncate command

2017-04-09 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1428:
-

 Summary: Table name pg_aoseg_$relfilenode does not change after 
running truncate command
 Key: HAWQ-1428
 URL: https://issues.apache.org/jira/browse/HAWQ-1428
 Project: Apache HAWQ
  Issue Type: Bug
  Components: Core
Reporter: Lili Ma
Assignee: Ed Espino


The table pg_aoseg.pg_aoseg(paqseg)_$relfilenode describes the information of 
file stored on HDFS for AO table and Parquet table. To make users easily find 
this catalog table, the suffix should equal the relfilenode for this table.

After running truncate command, the relfilenode field for this table changed, 
but pg_aoseg_$ table name was not changed. 

Reproduce Steps:
{code}
postgres=# create table a(a int);
CREATE TABLE
postgres=# insert into a values(51);
INSERT 0 1
postgres=# select oid, * from pg_class where relname='a';
  oid  | relname | relnamespace | reltype | relowner | relam | relfilenode | 
reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | 
relaosegrelid | relaosegidxid | relhasindex | relisshared | relkind | 
relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs 
| relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | 
relacl |reloptions
---+-+--+-+--+---+-+---+--+---+---+---+---+---+-+-+-++--+---+-+--+--+-+++-++--++---
 61269 | a   | 2200 |   61270 |   10 | 0 |   61269 |
 0 |1 | 1 | 0 | 0 | 
0 | 0 | f   | f   | r   | a  |1 
| 0 |   0 |0 |0 |   0 | f  | f  
| f   | f  |16214 || {appendonly=true}
(1 row)

postgres=# select oid, * from pg_class, pg_appendonly where 
pg_appendonly.relid=61269 and pg_appendonly.segrelid=pg_class.oid;
  oid  |relname | relnamespace | reltype | relowner | relam | 
relfilenode | reltablespace | relpages | reltuples | reltoastrelid | 
reltoastidxid | relaosegrelid | relaosegidxid | relhasindex | relisshared | 
relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys 
| relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | 
relfrozenxid | relacl | reloptions | relid | blocksize | safefswritesize | 
compresslevel | majorversion | minorversion | checksum | compresstype | 
columnstore | segrelid | segidxid | blkdirrelid | blkdiridxid | version | 
pagesize | splitsize
---++--+-+--+---+-+---+--+---+---+---+---+---+-+-+-++--+---+-+--+--+-+++-++--+++---+---+-+---+--+--+--+--+-+--+--+-+-+-+--+---
 61271 | pg_aoseg_61269 | 6104 |   61272 |   10 | 0 |   
61271 | 0 |0 | 0 | 0 | 0 |  
   0 | 0 | t   | f   | o   | h  
|5 | 0 |   0 |0 |0 |   0 | f
  | t  | f   | f  |16214 || 
   | 61269 | 32768 |   0 | 0 |2 |   
 0 | f|  | f   |61271 |61273 |  
 0 |   0 |   2 |0 |  67108864
(1 row)

postgres=# truncate a;
TRUNCATE TABLE
postgres=# select oid, * from pg_class where relname='a';   
 oid  | relname | relnamespace | reltype | relowner | relam | 
relfilenode | reltablespace | relpages | reltuples | reltoastrelid | 
reltoastidxid | relaosegrelid | relaosegidxid | relhasindex | relisshared | 
relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys 
| relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | 
relfrozenxid | relacl |reloptions

[jira] [Resolved] (HAWQ-1426) hawq extract meets error after the table was reorganized.

2017-04-09 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma resolved HAWQ-1426.
---
Resolution: Fixed

Have committed the bug fix.

> hawq extract meets error after the table was reorganized.
> -
>
> Key: HAWQ-1426
> URL: https://issues.apache.org/jira/browse/HAWQ-1426
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: Chunling Wang
> Fix For: 2.3.0.0-incubating
>
>
> After one table is reorganized, hawq extract the table will meet error.
> Reproduce Steps:
> 1. create an AO table
> 2. insert into several records into it
> 3. Get the table reorganized.  "alter table a set with (reorganize=true);"
> 4. run hawq extract, error thrown out.
> For the bug fix, we should also guarantee that hawq extract should work if 
> the table is truncated and re-inserted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-1426) hawq extract meets error after the table was reorganized.

2017-04-06 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960282#comment-15960282
 ] 

Lili Ma commented on HAWQ-1426:
---

RCA:
When hawq extract tries to find the HDFS files information, it wrongly treated 
the pg_aoseg.pg_aoseg_$relid as the catalog table for storing those 
information. 
When determining the file path of a table, hawq extract should follow below 
steps:
1. Find the directory on HDFS which stores the actual data for the table. This 
can be achieved by following the column "relfilenode" in pg_class table. 
2. Find the detailed file name for the table under above directory. This can be 
achieved by searching the catalog table pg_aoseg.pg_aoseg(paqseg)_$. The table 
name suffix is neither $relid nor $relfilenode under some circumstances. We 
should get it by referring the column "segrelid" in catalog table 
"pg_appendonly", and then looking up the table "pg_class" to get the accurate 
table name.

> hawq extract meets error after the table was reorganized.
> -
>
> Key: HAWQ-1426
> URL: https://issues.apache.org/jira/browse/HAWQ-1426
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: Ed Espino
> Fix For: 2.3.0.0-incubating
>
>
> After one table is reorganized, hawq extract the table will meet error.
> Reproduce Steps:
> 1. create an AO table
> 2. insert into several records into it
> 3. Get the table reorganized.  "alter table a set with (reorganize=true);"
> 4. run hawq extract, error thrown out.
> For the bug fix, we should also guarantee that hawq extract should work if 
> the table is truncated and re-inserted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (HAWQ-1418) Print executing command for hawq register

2017-04-05 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma reopened HAWQ-1418:
---

> Print executing command for hawq register
> -
>
> Key: HAWQ-1418
> URL: https://issues.apache.org/jira/browse/HAWQ-1418
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Chunling Wang
>Assignee: Chunling Wang
> Fix For: backlog
>
>
> Print executing command for hawq register



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-1418) Print executing command for hawq register

2017-04-04 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955459#comment-15955459
 ] 

Lili Ma commented on HAWQ-1418:
---

The aim for this JIRA is printing out the detailed command which the util is 
running, so that it will be easier to analyze the output logs of hawq register, 
especially during concurrent call of hawq register.

> Print executing command for hawq register
> -
>
> Key: HAWQ-1418
> URL: https://issues.apache.org/jira/browse/HAWQ-1418
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Chunling Wang
>Assignee: Chunling Wang
> Fix For: backlog
>
>
> Print executing command for hawq register



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-1406) Update HAWQ product version strings to 2.2.0.0 (HAWQ/HAWQ Ambari Plugin) & 3.2.0.0 (PXF)

2017-03-28 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944834#comment-15944834
 ] 

Lili Ma commented on HAWQ-1406:
---

I think we should keep both the two versions align with HAWQ version.  At this 
time we should change it to 2.2.0.0.  [~adenisso] Could you please verify?

Also PXF version should be 3.2.1.0.

> Update HAWQ product version strings to 2.2.0.0 (HAWQ/HAWQ Ambari Plugin) & 
> 3.2.0.0 (PXF)
> 
>
> Key: HAWQ-1406
> URL: https://issues.apache.org/jira/browse/HAWQ-1406
> Project: Apache HAWQ
>  Issue Type: Task
>  Components: Build
>Affects Versions: 2.1.0.0-incubating
>Reporter: Ruilong Huo
>Assignee: Ruilong Huo
> Fix For: 2.2.0.0-incubating
>
>
> Need to update the HAWQ (2.2.0.0), HAWQ Ambari Plugin (2.2.0.0) and PXF 
> (3.2.0.0) versions so that we can clearly identify Apache HAWQ 
> 2.2.0.0-incubating artifacts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HAWQ-1397) Incorrect Message for judging Flex version in the period of configure.

2017-03-21 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1397:
-

 Summary: Incorrect Message for judging Flex version in the period 
of configure.
 Key: HAWQ-1397
 URL: https://issues.apache.org/jira/browse/HAWQ-1397
 Project: Apache HAWQ
  Issue Type: Bug
  Components: Build
Reporter: Lili Ma
Assignee: Ed Espino


I have flex with version 2.6.0 and 2.5.35 in my local environment, and the 
default if 2.6.0. When I ran ./configure in HAWQ, the configure log indicates 
that HAWQ requires Flex version 2.5.4 or later, while my version is 2.6.0. It 
should not throw this error and should user version 2.6.0.   

{code}
  470 configure:7467: checking for flex
  471 configure:7498: WARNING:
  472 *** The installed version of Flex, /usr/local/bin/flex, is too old to use 
with Greenplum DB.
  473 *** Flex version 2.5.4 or later is required, but this is flex 2.6.0.
  474 configure:7512: result: /usr/bin/flex
  475 configure:7532: using flex 2.5.35 Apple(flex-31)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-760) Hawq register

2017-03-01 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891592#comment-15891592
 ] 

Lili Ma commented on HAWQ-760:
--

[~kdunn926] 

HAWQ register doesn't check HAWQ version number.  Although HAWQ 2.X optimized 
the storage for AO format table, it can still read the AO file generated by 
HAWQ 1.X. Parquet file does not changed, so there won't be problem. 
So, I don't think you will encounter problem if you want to register table from 
HAWQ 1.X to HAWQ 2.X.

If you want to register Parquet files generated by other products such as Hive, 
Impala which may use a later version, hawq register don't throw error when 
register.  But you may meet some error thrown out when select from the 
registered table.  For example, if some data page is encoded with dictionary 
encoding, HAWQ will throw error out indicating that it can not process that. 

> Hawq register
> -
>
> Key: HAWQ-760
> URL: https://issues.apache.org/jira/browse/HAWQ-760
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: Command Line Tools
>Reporter: Yangcheng Luo
>Assignee: Lili Ma
> Fix For: backlog
>
>
> Scenario: 
> 1. Register a parquet file generated by other systems, such as Hive, Spark, 
> etc.
> 2. For cluster Disaster Recovery. Two clusters co-exist, periodically import 
> data from Cluster A to Cluster B. Need Register data to Cluster B.
> 3. For the rollback of table. Do checkpoints somewhere, and need to rollback 
> to previous checkpoint. 
> Usage1
> Description
> Register a file/folder to an existing table. Can register a file or a folder. 
> If we register a file, can specify eof of this file. If eof not specified, 
> directly use actual file size. If we register a folder, directly use actual 
> file size.
> hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f 
> filepath] [-e eof]
> Usage 2
> Description
> Register according to .yml configuration file. 
> hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c 
> config] [--force][--repair]  
> Behavior:
> 1. If table doesn't exist, will automatically create the table and register 
> the files in .yml configuration file. Will use the filesize specified in .yml 
> to update the catalog table. 
> 2. If table already exist, and neither --force nor --repair configured. Do 
> not create any table, and directly register the files specified in .yml file 
> to the table. Note that if the file is under table directory in HDFS, will 
> throw error, say, to-be-registered files should not under the table path.
> 3. If table already exist, and --force is specified. Will clear all the 
> catalog contents in pg_aoseg.pg_paqseg_$relid while keep the files on HDFS, 
> and then re-register all the files to the table.  This is for scenario 2.
> 4. If table already exist, and --repair is specified. Will change both file 
> folder and catalog table pg_aoseg.pg_paqseg_$relid to the state which .yml 
> file configures. Note may some new generated files since the checkpoint may 
> be deleted here. Also note the all the files in .yml file should all under 
> the table folder on HDFS. Limitation: Do not support cases for hash table 
> redistribution, table truncate and table drop. This is for scenario 3.
> Requirements for both the cases:
> 1. To be registered file path has to colocate with HAWQ in the same HDFS 
> cluster.
> 2. If to be registered is a hash table, the registered file number should be 
> one or multiple times or hash table bucket number.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HAWQ-1366) HAWQ should throw error if finding dictionary encoding type for Parquet

2017-02-28 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma resolved HAWQ-1366.
---
Resolution: Fixed
  Assignee: Lili Ma  (was: Ed Espino)

> HAWQ should throw error if finding dictionary encoding type for Parquet
> ---
>
> Key: HAWQ-1366
> URL: https://issues.apache.org/jira/browse/HAWQ-1366
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Storage
>Reporter: Lili Ma
>Assignee: Lili Ma
> Fix For: 2.2.0.0-incubating
>
>
> Since HAWQ is based on Parquet format version 1.0, which does not support 
> dictionary page, and hawq register may register Parquet format version 2.0 
> data into HAWQ, we should throw error if finding unsupported page for column.
> Reproduce Steps:
> 1. In Hive, create a table and insert into 8 records:
> {code}
> (hive> create table tt (i int,
> >   fname varchar(100),
> >   title varchar(100),
> >   salary double
> > )
> > STORED AS PARQUET;
> OK
> Time taken: 0.029 seconds
> hive> insert into tt values (5,'OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW',
> 'Sales',80282.54),
> > (7,'UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE','Engineer',10206.65),
> > (4,'PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ','Director',63691.23),
> > (9,'CTDCDYRURBZMBLNWHQNOQCYFFVULOP','Engineer',63867.44),
> > (10,'WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK','Sales',97720.08);
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. spark, 
> tez) or using Hive 1.X releases.
> Query ID = malili_20170228173956_f370414c-ddc8-4e6d-99e9-7c1fa1f678d1
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks is set to 0 since there's no reduce operator
> Job running in-process (local Hadoop)
> 2017-02-28 17:39:58,713 Stage-1 map = 100%,  reduce = 0%
> Ended Job = job_local2046305831_0004
> Stage-4 is selected by condition resolver.
> Stage-3 is filtered out by condition resolver.
> Stage-5 is filtered out by condition resolver.
> Moving data to directory 
> hdfs://127.0.0.1:8020/user/hive/warehouse/tt/.hive-staging_hive_2017-02-28_17-39-56_806_3518057455919651199-1/-ext-1
> Loading data to table default.tt
> MapReduce Jobs Launched:
> Stage-Stage-1:  HDFS Read: 3945 HDFS Write: 4226 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 1.975 seconds
> hive> select * from tt;
> OK
> 5 OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW  Sales   80282.54
> 7 UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE  Engineer10206.65
> 4 PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ  Director63691.23
> 9 CTDCDYRURBZMBLNWHQNOQCYFFVULOP  Engineer63867.44
> 10WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK  Sales   97720.08
> Time taken: 0.056 seconds, Fetched: 5 row(s)
> {code}
> 2. Create table in HAWQ
> {code}
> CREATE TABLE public.tt
> (i int,
>   fname varchar(100),
>   title varchar(100),
>   salary float8)
> WITH (appendonly=true,orientation=parquet);
> {code}
> 3. run hawq register
> {code}
> malilis-MacBook-Pro:Hawq_register malili$ hawq register -d postgres -f 
> hdfs://localhost:8020/user/hive/warehouse/tt tt
> 20170228:17:40:25:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-try 
> to connect database localhost:5432 postgres
> 20170228:17:40:33:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-New 
> file(s) to be registered: 
> ['hdfs://localhost:8020/user/hive/warehouse/tt/00_0']
> hdfscmd: "hadoop fs -mv hdfs://localhost:8020/user/hive/warehouse/tt/00_0 
> hdfs://localhost:8020/hawq_default/16385/16387/49281/1"
> 20170228:17:40:41:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Hawq 
> Register Succeed.
> {code}
> 4. select from hawq
> {code}
> postgres=# select * from tt;
>  i  | fname  | title |  salary
> ++---+--
>   5 | OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW |   | 80282.54
>   7 | UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE |   | 10206.65
>   4 | PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ |   | 63691.23
>   9 | CTDCDYRURBZMBLNWHQNOQCYFFVULOP |   | 63867.44
>  10 | WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK |   | 97720.08
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-401) json type support

2017-02-28 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889312#comment-15889312
 ] 

Lili Ma commented on HAWQ-401:
--

[~kdunn926] I reviewed the pull request for JSON support in Greenplum.  It 
seems the modified part can be directly applied to HAWQ. Since it involved 
catalog change including pg_proc and pg_type, we may need consider this in hawq 
upgrade.  Thanks




> json type support
> -
>
> Key: HAWQ-401
> URL: https://issues.apache.org/jira/browse/HAWQ-401
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Core
>Reporter: Lei Chang
>Assignee: Lei Chang
> Fix For: backlog
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HAWQ-1368) normal user who doesn't have home directory may have problem when running hawq register

2017-02-28 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1368:
-

 Summary: normal user who doesn't have home directory may have 
problem when running hawq register
 Key: HAWQ-1368
 URL: https://issues.apache.org/jira/browse/HAWQ-1368
 Project: Apache HAWQ
  Issue Type: Bug
  Components: Command Line Tools
Reporter: Lili Ma
Assignee: Ed Espino


HAWQ register stores information in hawqregister_MMDD.log under directory 
~/hawqAdminLogs, and normal user who doesn't have own home directory may 
encounter failure when running hawq regsiter.

We can add -l option in order to set the target log directory and file name of 
hawq register.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-1366) HAWQ should throw error if finding dictionary encoding type for Parquet

2017-02-28 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887674#comment-15887674
 ] 

Lili Ma commented on HAWQ-1366:
---

With the modified code, HAWQ throws error out.

{code}
postgres=# select * from tt;
ERROR:  HAWQ does not support dictionary page type resolver for Parquet format 
in column 'title' (cdbparquetcolumn.c:152)  (seg0 localhost:4 pid=90708)
{code}

> HAWQ should throw error if finding dictionary encoding type for Parquet
> ---
>
> Key: HAWQ-1366
> URL: https://issues.apache.org/jira/browse/HAWQ-1366
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Storage
>Reporter: Lili Ma
>Assignee: Ed Espino
> Fix For: 2.2.0.0-incubating
>
>
> Since HAWQ is based on Parquet format version 1.0, which does not support 
> dictionary page, and hawq register may register Parquet format version 2.0 
> data into HAWQ, we should throw error if finding unsupported page for column.
> Reproduce Steps:
> 1. In Hive, create a table and insert into 8 records:
> {code}
> (hive> create table tt (i int,
> >   fname varchar(100),
> >   title varchar(100),
> >   salary double
> > )
> > STORED AS PARQUET;
> OK
> Time taken: 0.029 seconds
> hive> insert into tt values (5,'OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW',
> 'Sales',80282.54),
> > (7,'UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE','Engineer',10206.65),
> > (4,'PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ','Director',63691.23),
> > (9,'CTDCDYRURBZMBLNWHQNOQCYFFVULOP','Engineer',63867.44),
> > (10,'WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK','Sales',97720.08);
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. spark, 
> tez) or using Hive 1.X releases.
> Query ID = malili_20170228173956_f370414c-ddc8-4e6d-99e9-7c1fa1f678d1
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks is set to 0 since there's no reduce operator
> Job running in-process (local Hadoop)
> 2017-02-28 17:39:58,713 Stage-1 map = 100%,  reduce = 0%
> Ended Job = job_local2046305831_0004
> Stage-4 is selected by condition resolver.
> Stage-3 is filtered out by condition resolver.
> Stage-5 is filtered out by condition resolver.
> Moving data to directory 
> hdfs://127.0.0.1:8020/user/hive/warehouse/tt/.hive-staging_hive_2017-02-28_17-39-56_806_3518057455919651199-1/-ext-1
> Loading data to table default.tt
> MapReduce Jobs Launched:
> Stage-Stage-1:  HDFS Read: 3945 HDFS Write: 4226 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 1.975 seconds
> hive> select * from tt;
> OK
> 5 OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW  Sales   80282.54
> 7 UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE  Engineer10206.65
> 4 PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ  Director63691.23
> 9 CTDCDYRURBZMBLNWHQNOQCYFFVULOP  Engineer63867.44
> 10WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK  Sales   97720.08
> Time taken: 0.056 seconds, Fetched: 5 row(s)
> {code}
> 2. Create table in HAWQ
> {code}
> CREATE TABLE public.tt
> (i int,
>   fname varchar(100),
>   title varchar(100),
>   salary float8)
> WITH (appendonly=true,orientation=parquet);
> {code}
> 3. run hawq register
> {code}
> malilis-MacBook-Pro:Hawq_register malili$ hawq register -d postgres -f 
> hdfs://localhost:8020/user/hive/warehouse/tt tt
> 20170228:17:40:25:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-try 
> to connect database localhost:5432 postgres
> 20170228:17:40:33:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-New 
> file(s) to be registered: 
> ['hdfs://localhost:8020/user/hive/warehouse/tt/00_0']
> hdfscmd: "hadoop fs -mv hdfs://localhost:8020/user/hive/warehouse/tt/00_0 
> hdfs://localhost:8020/hawq_default/16385/16387/49281/1"
> 20170228:17:40:41:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Hawq 
> Register Succeed.
> {code}
> 4. select from hawq
> {code}
> postgres=# select * from tt;
>  i  | fname  | title |  salary
> ++---+--
>   5 | OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW |   | 80282.54
>   7 | UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE |   | 10206.65
>   4 | PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ |   | 63691.23
>   9 | CTDCDYRURBZMBLNWHQNOQCYFFVULOP |   | 63867.44
>  10 | WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK |   | 97720.08
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-1366) HAWQ should throw error if finding dictionary encoding type for Parquet

2017-02-28 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887672#comment-15887672
 ] 

Lili Ma commented on HAWQ-1366:
---

The title is optimized in Hive to dictionary storage.  Since HAWQ doesn't 
support this, the output information is a little werid.

In short team, HAWQ should throw error out for this case. In long term, HAWQ 
should support Parquet 2.0 data read/write.


> HAWQ should throw error if finding dictionary encoding type for Parquet
> ---
>
> Key: HAWQ-1366
> URL: https://issues.apache.org/jira/browse/HAWQ-1366
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Storage
>Reporter: Lili Ma
>Assignee: Ed Espino
> Fix For: 2.2.0.0-incubating
>
>
> Since HAWQ is based on Parquet format version 1.0, which does not support 
> dictionary page, and hawq register may register Parquet format version 2.0 
> data into HAWQ, we should throw error if finding unsupported page for column.
> Reproduce Steps:
> 1. In Hive, create a table and insert into 8 records:
> {code}
> (hive> create table tt (i int,
> >   fname varchar(100),
> >   title varchar(100),
> >   salary double
> > )
> > STORED AS PARQUET;
> OK
> Time taken: 0.029 seconds
> hive> insert into tt values (5,'OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW',
> 'Sales',80282.54),
> > (7,'UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE','Engineer',10206.65),
> > (4,'PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ','Director',63691.23),
> > (9,'CTDCDYRURBZMBLNWHQNOQCYFFVULOP','Engineer',63867.44),
> > (10,'WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK','Sales',97720.08);
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. spark, 
> tez) or using Hive 1.X releases.
> Query ID = malili_20170228173956_f370414c-ddc8-4e6d-99e9-7c1fa1f678d1
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks is set to 0 since there's no reduce operator
> Job running in-process (local Hadoop)
> 2017-02-28 17:39:58,713 Stage-1 map = 100%,  reduce = 0%
> Ended Job = job_local2046305831_0004
> Stage-4 is selected by condition resolver.
> Stage-3 is filtered out by condition resolver.
> Stage-5 is filtered out by condition resolver.
> Moving data to directory 
> hdfs://127.0.0.1:8020/user/hive/warehouse/tt/.hive-staging_hive_2017-02-28_17-39-56_806_3518057455919651199-1/-ext-1
> Loading data to table default.tt
> MapReduce Jobs Launched:
> Stage-Stage-1:  HDFS Read: 3945 HDFS Write: 4226 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 1.975 seconds
> hive> select * from tt;
> OK
> 5 OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW  Sales   80282.54
> 7 UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE  Engineer10206.65
> 4 PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ  Director63691.23
> 9 CTDCDYRURBZMBLNWHQNOQCYFFVULOP  Engineer63867.44
> 10WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK  Sales   97720.08
> Time taken: 0.056 seconds, Fetched: 5 row(s)
> {code}
> 2. Create table in HAWQ
> {code}
> CREATE TABLE public.tt
> (i int,
>   fname varchar(100),
>   title varchar(100),
>   salary float8)
> WITH (appendonly=true,orientation=parquet);
> {code}
> 3. run hawq register
> {code}
> malilis-MacBook-Pro:Hawq_register malili$ hawq register -d postgres -f 
> hdfs://localhost:8020/user/hive/warehouse/tt tt
> 20170228:17:40:25:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-try 
> to connect database localhost:5432 postgres
> 20170228:17:40:33:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-New 
> file(s) to be registered: 
> ['hdfs://localhost:8020/user/hive/warehouse/tt/00_0']
> hdfscmd: "hadoop fs -mv hdfs://localhost:8020/user/hive/warehouse/tt/00_0 
> hdfs://localhost:8020/hawq_default/16385/16387/49281/1"
> 20170228:17:40:41:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Hawq 
> Register Succeed.
> {code}
> 4. select from hawq
> {code}
> postgres=# select * from tt;
>  i  | fname  | title |  salary
> ++---+--
>   5 | OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW |   | 80282.54
>   7 | UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE |   | 10206.65
>   4 | PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ |   | 63691.23
>   9 | CTDCDYRURBZMBLNWHQNOQCYFFVULOP |   | 63867.44
>  10 | WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK |   | 97720.08
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HAWQ-1366) HAWQ should throw error if finding dictionary encoding type for Parquet

2017-02-28 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1366:
-

 Summary: HAWQ should throw error if finding dictionary encoding 
type for Parquet
 Key: HAWQ-1366
 URL: https://issues.apache.org/jira/browse/HAWQ-1366
 Project: Apache HAWQ
  Issue Type: Bug
  Components: Storage
Reporter: Lili Ma
Assignee: Ed Espino
 Fix For: 2.2.0.0-incubating


Since HAWQ is based on Parquet format version 1.0, which does not support 
dictionary page, and hawq register may register Parquet format version 2.0 data 
into HAWQ, we should throw error if finding unsupported page for column.

Reproduce Steps:
1. In Hive, create a table and insert into 8 records:
{code}
(hive> create table tt (i int,
>   fname varchar(100),
>   title varchar(100),
>   salary double
> )
> STORED AS PARQUET;
OK
Time taken: 0.029 seconds
hive> insert into tt values (5,'OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW',
'Sales',80282.54),
> (7,'UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE','Engineer',10206.65),
> (4,'PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ','Director',63691.23),
> (9,'CTDCDYRURBZMBLNWHQNOQCYFFVULOP','Engineer',63867.44),
> (10,'WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK','Sales',97720.08);
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = malili_20170228173956_f370414c-ddc8-4e6d-99e9-7c1fa1f678d1
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2017-02-28 17:39:58,713 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local2046305831_0004
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory 
hdfs://127.0.0.1:8020/user/hive/warehouse/tt/.hive-staging_hive_2017-02-28_17-39-56_806_3518057455919651199-1/-ext-1
Loading data to table default.tt
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 3945 HDFS Write: 4226 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 1.975 seconds
hive> select * from tt;
OK
5   OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW  Sales   80282.54
7   UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE  Engineer10206.65
4   PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ  Director63691.23
9   CTDCDYRURBZMBLNWHQNOQCYFFVULOP  Engineer63867.44
10  WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK  Sales   97720.08
Time taken: 0.056 seconds, Fetched: 5 row(s)
{code}
2. Create table in HAWQ
{code}
CREATE TABLE public.tt
(i int,
  fname varchar(100),
  title varchar(100),
  salary float8)
WITH (appendonly=true,orientation=parquet);
{code}
3. run hawq register
{code}
malilis-MacBook-Pro:Hawq_register malili$ hawq register -d postgres -f 
hdfs://localhost:8020/user/hive/warehouse/tt tt
20170228:17:40:25:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-try to 
connect database localhost:5432 postgres
20170228:17:40:33:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-New 
file(s) to be registered: 
['hdfs://localhost:8020/user/hive/warehouse/tt/00_0']
hdfscmd: "hadoop fs -mv hdfs://localhost:8020/user/hive/warehouse/tt/00_0 
hdfs://localhost:8020/hawq_default/16385/16387/49281/1"
20170228:17:40:41:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Hawq 
Register Succeed.
{code}
4. select from hawq
{code}
postgres=# select * from tt;
 i  | fname  | title |  salary
++---+--
  5 | OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW |   | 80282.54
  7 | UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE |   | 10206.65
  4 | PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ |   | 63691.23
  9 | CTDCDYRURBZMBLNWHQNOQCYFFVULOP |   | 63867.44
 10 | WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK |   | 97720.08
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-1353) Provide template for Ranger access audit to Solr from RPS

2017-02-23 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881803#comment-15881803
 ] 

Lili Ma commented on HAWQ-1353:
---

[~adenisso] Once we send audit log to Solr, can we see the audit information on 
Ranger Admin UI?

> Provide template for Ranger access audit to Solr from RPS
> -
>
> Key: HAWQ-1353
> URL: https://issues.apache.org/jira/browse/HAWQ-1353
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Alexander Denissov
>Assignee: Alexander Denissov
> Fix For: backlog
>
>
> We currently ship examples of how to configure audit log into HDFS (disabled 
> by default). Audit to HDFS is supposed to be for long-term storage and is not 
> searchable or presentable on Ranger Admin UI.
> To be able to see and search audit log entries, the audit entries should be 
> sent to Solr, which is a preferred way. We need to provide a set of 
> properties that users should be able to edit in hawq-ranger-audit.xml file to 
> enable sending audit events to Solr. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HAWQ-1353) Provide template for Ranger access audit to Solr from RPS

2017-02-22 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880008#comment-15880008
 ] 

Lili Ma edited comment on HAWQ-1353 at 2/23/17 6:59 AM:


[~adenisso] Could you add more description for this JIRA? Why do you add Ranger 
access audit to Solr?  Thanks


was (Author: lilima):
[~adenisso] Could you add more description for this JIRA? Why do you add Ranger 
access audit to Solr?

> Provide template for Ranger access audit to Solr from RPS
> -
>
> Key: HAWQ-1353
> URL: https://issues.apache.org/jira/browse/HAWQ-1353
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Alexander Denissov
>Assignee: Alexander Denissov
> Fix For: backlog
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-1353) Provide template for Ranger access audit to Solr from RPS

2017-02-22 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880008#comment-15880008
 ] 

Lili Ma commented on HAWQ-1353:
---

[~adenisso] Could you add more description for this JIRA? Why do you add Ranger 
access audit to Solr?

> Provide template for Ranger access audit to Solr from RPS
> -
>
> Key: HAWQ-1353
> URL: https://issues.apache.org/jira/browse/HAWQ-1353
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Alexander Denissov
>Assignee: Alexander Denissov
> Fix For: backlog
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-1332) Can not grant database and schema privileges without table privileges in ranger or ranger plugin service

2017-02-15 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867573#comment-15867573
 ] 

Lili Ma commented on HAWQ-1332:
---

[~adenisso] Seems that this is a bug from RPS. Could you help see it? Thanks

> Can not grant database and schema privileges without table privileges in 
> ranger or ranger plugin service
> 
>
> Key: HAWQ-1332
> URL: https://issues.apache.org/jira/browse/HAWQ-1332
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Security
>Reporter: Chunling Wang
>Assignee: Alexander Denissov
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> We try to grant database connect and schema usage privileges to a non-super 
> user to connect database. We find that if we set policy with database and 
> schema included, but with table excluded, we can not connect database. But if 
> we include table, we can connect to database. We think there may be bug in 
> Ranger Plugin Service or Ranger. Here are steps to reproduce it.
> 1. create a new user "usertest1" in database:
> {code}
> $ psql postgres
> psql (8.2.15)
> Type "help" for help.
> postgres=# CREATE USER usertest1;
> NOTICE:  resource queue required -- using default resource queue "pg_default"
> CREATE ROLE
> postgres=#
> {code}
> 2. add user "usertest1" in pg_hba.conf
> {code}
> local all usertest1 trust
> {code}
> 3. set policy with database and schema included, with table excluded
> !screenshot-1.png|width=800,height=400!
> 4. connect database with user "usertest1" but failed with permission denied
> {code}
> $ psql postgres -U usertest1
> psql: FATAL:  permission denied for database "postgres"
> DETAIL:  User does not have CONNECT privilege.
> {code}
> 5. set policy with database, schema and table included
> !screenshot-2.png|width=800,height=400!
> 6. connect database with user "usertest1" and succeed
> {code}
> $ psql postgres -U usertest1
> psql (8.2.15)
> Type "help" for help.
> postgres=#
> {code}
> But if we do not set table as "*", and specify table like "a", we can not 
> access database either.
> !screenshot-3.png|width=800,height=400!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HAWQ-1332) Can not grant database and schema privileges without table privileges in ranger or ranger plugin service

2017-02-15 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma reassigned HAWQ-1332:
-

Assignee: Alexander Denissov  (was: Ed Espino)

> Can not grant database and schema privileges without table privileges in 
> ranger or ranger plugin service
> 
>
> Key: HAWQ-1332
> URL: https://issues.apache.org/jira/browse/HAWQ-1332
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Security
>Reporter: Chunling Wang
>Assignee: Alexander Denissov
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> We try to grant database connect and schema usage privileges to a non-super 
> user to connect database. We find that if we set policy with database and 
> schema included, but with table excluded, we can not connect database. But if 
> we include table, we can connect to database. We think there may be bug in 
> Ranger Plugin Service or Ranger. Here are steps to reproduce it.
> 1. create a new user "usertest1" in database:
> {code}
> $ psql postgres
> psql (8.2.15)
> Type "help" for help.
> postgres=# CREATE USER usertest1;
> NOTICE:  resource queue required -- using default resource queue "pg_default"
> CREATE ROLE
> postgres=#
> {code}
> 2. add user "usertest1" in pg_hba.conf
> {code}
> local all usertest1 trust
> {code}
> 3. set policy with database and schema included, with table excluded
> !screenshot-1.png|width=800,height=400!
> 4. connect database with user "usertest1" but failed with permission denied
> {code}
> $ psql postgres -U usertest1
> psql: FATAL:  permission denied for database "postgres"
> DETAIL:  User does not have CONNECT privilege.
> {code}
> 5. set policy with database, schema and table included
> !screenshot-2.png|width=800,height=400!
> 6. connect database with user "usertest1" and succeed
> {code}
> $ psql postgres -U usertest1
> psql (8.2.15)
> Type "help" for help.
> postgres=#
> {code}
> But if we do not set table as "*", and specify table like "a", we can not 
> access database either.
> !screenshot-3.png|width=800,height=400!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger

2017-02-14 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867342#comment-15867342
 ] 

Lili Ma commented on HAWQ-256:
--

[~kdunn926] I re-looked at your input.

1) Why do they want to use Ranger? What are the scenario and use cases?
Ranger provides the missing (and very important) functionality for 
synchronizing roles and groups from a identity management provider (like LDAP) 
into HAWQ. Without this capability, roles must be provisioned manually or 
something like pg-ldap-sync must be used, neither are very enterprise-friendly 
or "baked" solutions.

Actually, I don't think Ranger provides the functionality to sync role/group 
information into HAWQ. It just sync those information to itself. We may still 
need to manage the role information in HAWQ to allow them to login.  Or, a 
thorough solution is that HAWQ does not store any user information, but we may 
not do it now given there are some objects not managed by Ranger. Thoughts? 

> Integrate Security with Apache Ranger
> -
>
> Key: HAWQ-256
> URL: https://issues.apache.org/jira/browse/HAWQ-256
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: Security
>Reporter: Michael Andre Pearce (IG)
>Assignee: Lili Ma
> Fix For: backlog
>
> Attachments: HAWQRangerSupportDesign.pdf, 
> HAWQRangerSupportDesign_v0.2.pdf, HAWQRangerSupportDesign_v0.3.pdf
>
>
> Integrate security with Apache Ranger for a unified Hadoop security solution. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger

2017-02-09 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859265#comment-15859265
 ] 

Lili Ma commented on HAWQ-256:
--

[~kdunn926] Thanks a lot! The information you provided is very helpful.

About item 9, I wonder whether it is a little strange if we record the audit 
information for catalog table/owner check in Ranger side given that it is not 
managed by Ranger.

> Integrate Security with Apache Ranger
> -
>
> Key: HAWQ-256
> URL: https://issues.apache.org/jira/browse/HAWQ-256
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: Security
>Reporter: Michael Andre Pearce (IG)
>Assignee: Lili Ma
> Fix For: backlog
>
> Attachments: HAWQRangerSupportDesign.pdf, 
> HAWQRangerSupportDesign_v0.2.pdf, HAWQRangerSupportDesign_v0.3.pdf
>
>
> Integrate security with Apache Ranger for a unified Hadoop security solution. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HAWQ-1204) Add one option in Ambari to enable user to specify whether they want enable Ranger for ACL check

2017-01-20 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831484#comment-15831484
 ] 

Lili Ma edited comment on HAWQ-1204 at 1/20/17 9:48 AM:


We can do it by using the configurable GUC to specify Ranger as the first step.


was (Author: lilima):
We can do it by using the configurable GUC for specifying Ranger as the first 
step.

> Add one option in Ambari to enable user to specify whether they want enable 
> Ranger for ACL check
> 
>
> Key: HAWQ-1204
> URL: https://issues.apache.org/jira/browse/HAWQ-1204
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Ambari
>Reporter: Lili Ma
>Assignee: Alexander Denissov
> Fix For: backlog
>
>
> Ambari needs do corresponding modification for enable Ranger in HAWQ.
> Also need do special processing if Ranger is on. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1204) Add one option in Ambari to enable user to specify whether they want enable Ranger for ACL check

2017-01-20 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831484#comment-15831484
 ] 

Lili Ma commented on HAWQ-1204:
---

We can do it by using the configurable GUC for specifying Ranger as the first 
step.

> Add one option in Ambari to enable user to specify whether they want enable 
> Ranger for ACL check
> 
>
> Key: HAWQ-1204
> URL: https://issues.apache.org/jira/browse/HAWQ-1204
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Ambari
>Reporter: Lili Ma
>Assignee: Alexander Denissov
> Fix For: backlog
>
>
> Ambari needs do corresponding modification for enable Ranger in HAWQ.
> Also need do special processing if Ranger is on. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAWQ-1206) Process catalog table ACL on Ranger.

2017-01-20 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma resolved HAWQ-1206.
---
   Resolution: Duplicate
Fix Version/s: (was: backlog)
   2.1.0.0-incubating

> Process catalog table ACL on Ranger.
> 
>
> Key: HAWQ-1206
> URL: https://issues.apache.org/jira/browse/HAWQ-1206
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Lili Ma
>Assignee: Hubert Zhang
> Fix For: 2.1.0.0-incubating
>
>
> There are a lot of catalog tables in HAWQ which also need to go through ACL 
> check. We need find out how to process there tables once Ranger is configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1206) Process catalog table ACL on Ranger.

2017-01-20 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831469#comment-15831469
 ] 

Lili Ma commented on HAWQ-1206:
---

Close this JIRA, since it's duplicated with HAWQ-1275. We currently put the 
catalog ACL check in HAWQ side, assuming that users may require Ranger feature 
to mange non-heap table. 

> Process catalog table ACL on Ranger.
> 
>
> Key: HAWQ-1206
> URL: https://issues.apache.org/jira/browse/HAWQ-1206
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Lili Ma
>Assignee: Hubert Zhang
> Fix For: backlog
>
>
> There are a lot of catalog tables in HAWQ which also need to go through ACL 
> check. We need find out how to process there tables once Ranger is configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAWQ-1207) Gpadmin super user processing on ACL

2017-01-20 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831463#comment-15831463
 ] 

Lili Ma edited comment on HAWQ-1207 at 1/20/17 9:40 AM:


[~thebellhead] I split the stories given that they are from two aspects: 
catalog table and super user. 

For super user, HAWQ behavior without Ranger is that superuser can have all the 
privileges upon HAWQ internal tables.  We need limit the super user behavior 
for accessing tables create by others.

Besides this, there are a lot of super user specific behaviors for some 
objects. Only superuser has the rights for following operations: 
1. create cast: when function is NULL
2. create filespace
3. create/remove/alter foreign-data wrapper
4. create function: For untrusted language, only superuser can create function.
5. create/drop procedural language
6. create/drop/alter resource queue
7. create tablespace: It means the privilege to create tablespace, and only 
superuser can do. But the CREATE privilege for tablespace means creating 
database/table/index... in tablespace, which is different.
8. create external table: Only super user can create EXECUTE external web table 
or create an external table with a file protocol (but in HAWQ 2.0, the file 
protocol is not supported any more).
9. create operator class
10. copy: Only superuser can copy to or from a file. And in ranger, the 
superuser can not run copy to or from when he doesn't have the privilege for 
that table select or insert.
11. alter state of system triggers
12. some build in functions, including pg_logdir_ls, pg_ls_dir, pg_read_file, 
pg_reload_conf, pg_rotate_logfile, pg_signal_backend, pg_start_backup, 
pg_stat_file, pg_stat_get_activity, pg_stat_get_backend_activity_start, 
pg_stat_get_backend_activity, pg_stat_get_backend_client_addr, 
pg_stat_get_backend_client_port, pg_stat_get_backend_start, 
pg_stat_get_backend_waiting, pg_stop_backup, pg_switch_xlog, pg_stat_reset

For above operations, we'd rather keep it checked in HAWQ side if there is no 
other concerns.



was (Author: lilima):
[~thebellhead] I split the stories given that they are from two aspects: 
catalog table and super user. 

For super user, HAWQ behavior without Ranger is that superuser can have all the 
privileges upon HAWQ internal tables.  We need limit the super user behavior 
for accessing tables create by others.

Besides this, there are a lot of super user specific behaviors for some 
objects. Only superuser has the rights for following operations: 
1. create cast: when function is NULL
2. create filespace
3. create/remove/alter foreign-data wrapper
4. create function: For untrusted language, only superuser can create function.
5. create/drop procedural language
6. create/drop/alter resource queue
7. create tablespace: It means the privilege to create tablespace, and only 
superuser can do. But the CREATE privilege for tablespace means creating 
database/table/index... in tablespace, which is different.
8. create external table: Only super user can create EXECUTE external web table 
or create an external table with a file protocol (but in HAWQ 2.0, the file 
protocol is not supported any more).
9. create operator class
10. copy: Only superuser can copy to or from a file. And in ranger, the 
superuser can not run copy to or from when he doesn't have the privilege for 
that table select or insert.
11. alter state of system triggers
12. some build in functions, including pg_logdir_ls, pg_ls_dir, pg_read_file, 
pg_reload_conf, pg_rotate_logfile, pg_signal_backend, pg_start_backup, 
pg_stat_file, pg_stat_get_activity, pg_stat_get_backend_activity_start, 
pg_stat_get_backend_activity, pg_stat_get_backend_client_addr, 
pg_stat_get_backend_client_port, pg_stat_get_backend_start, 
pg_stat_get_backend_waiting, pg_stop_backup, pg_switch_xlog, pg_stat_reset

For above operations, we'd rather keep it checked in HAWQ side, if there is no 
other concerns.


> Gpadmin super user processing on ACL
> 
>
> Key: HAWQ-1207
> URL: https://issues.apache.org/jira/browse/HAWQ-1207
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Lili Ma
>Assignee: Alexander Denissov
> Fix For: backlog
>
>
> Once we specify enable_ranger, we need process gpadmin user privileges. 
> Ideally, we should also restrict gpadmin behavior since we won't allow 
> gpadmin to have all control on all user data. 
> During the init system period, we can let gpadmin has all the privileges on 
> all the objects. May implement this as seed policy in Ranger plugin side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAWQ-1207) Gpadmin super user processing on ACL

2017-01-20 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831463#comment-15831463
 ] 

Lili Ma edited comment on HAWQ-1207 at 1/20/17 9:40 AM:


[~thebellhead] I split the stories given that they are from two aspects: 
catalog table and super user. 

For super user, HAWQ behavior without Ranger is that superuser can have all the 
privileges upon HAWQ internal tables.  We need limit the super user behavior 
for accessing tables create by others.

Besides this, there are a lot of super user specific behaviors for some 
objects. Only superuser has the rights for following operations: 
1. create cast: when function is NULL
2. create filespace
3. create/remove/alter foreign-data wrapper
4. create function: For untrusted language, only superuser can create function.
5. create/drop procedural language
6. create/drop/alter resource queue
7. create tablespace: It means the privilege to create tablespace, and only 
superuser can do. But the CREATE privilege for tablespace means creating 
database/table/index... in tablespace, which is different.
8. create external table: Only super user can create EXECUTE external web table 
or create an external table with a file protocol (but in HAWQ 2.0, the file 
protocol is not supported any more).
9. create operator class
10. copy: Only superuser can copy to or from a file. And in ranger, the 
superuser can not run copy to or from when he doesn't have the privilege for 
that table select or insert.
11. alter state of system triggers
12. some build in functions, including pg_logdir_ls, pg_ls_dir, pg_read_file, 
pg_reload_conf, pg_rotate_logfile, pg_signal_backend, pg_start_backup, 
pg_stat_file, pg_stat_get_activity, pg_stat_get_backend_activity_start, 
pg_stat_get_backend_activity, pg_stat_get_backend_client_addr, 
pg_stat_get_backend_client_port, pg_stat_get_backend_start, 
pg_stat_get_backend_waiting, pg_stop_backup, pg_switch_xlog, pg_stat_reset

For above operations, we'd rather keep it checked in HAWQ side, if there is no 
other concerns.



was (Author: lilima):
[~thebellhead] I split the stories given that they are from two aspects: 
catalog table and super user. 

For super user, HAWQ behavior without Ranger is that superuser can have all the 
privileges upon HAWQ internal tables.  We need limit the super user behavior 
for accessing tables create by others.

Besides this, there are a lot of super user specific behaviors for some 
objects. Only superuser can have the right for following behavior: 
1. create cast: when function is NULL
2. create filespace
3. create/remove/alter foreign-data wrapper
4. create function: For untrusted language, only superuser can create function.
5. create/drop procedural language
6. create/drop/alter resource queue
7. create tablespace: It means the privilege to create tablespace, and only 
superuser can do. But the CREATE privilege for tablespace means creating 
database/table/index... in tablespace, which is different.
8. create external table: Only super user can create EXECUTE external web table 
or create an external table with a file protocol (but in HAWQ 2.0, the file 
protocol is not supported any more).
9. create operator class
10. copy: Only superuser can copy to or from a file. And in ranger, the 
superuser can not run copy to or from when he doesn't have the privilege for 
that table select or insert.
11. alter state of system triggers
12. some build in functions, including pg_logdir_ls, pg_ls_dir, pg_read_file, 
pg_reload_conf, pg_rotate_logfile, pg_signal_backend, pg_start_backup, 
pg_stat_file, pg_stat_get_activity, pg_stat_get_backend_activity_start, 
pg_stat_get_backend_activity, pg_stat_get_backend_client_addr, 
pg_stat_get_backend_client_port, pg_stat_get_backend_start, 
pg_stat_get_backend_waiting, pg_stop_backup, pg_switch_xlog, pg_stat_reset

For above operations, we'd rather keep it checked in HAWQ side, if there is no 
other concerns.


> Gpadmin super user processing on ACL
> 
>
> Key: HAWQ-1207
> URL: https://issues.apache.org/jira/browse/HAWQ-1207
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Lili Ma
>Assignee: Alexander Denissov
> Fix For: backlog
>
>
> Once we specify enable_ranger, we need process gpadmin user privileges. 
> Ideally, we should also restrict gpadmin behavior since we won't allow 
> gpadmin to have all control on all user data. 
> During the init system period, we can let gpadmin has all the privileges on 
> all the objects. May implement this as seed policy in Ranger plugin side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1207) Gpadmin super user processing on ACL

2017-01-20 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831463#comment-15831463
 ] 

Lili Ma commented on HAWQ-1207:
---

[~thebellhead] I split the stories given that they are from two aspects: 
catalog table and super user. 

For super user, HAWQ behavior without Ranger is that superuser can have all the 
privileges upon HAWQ internal tables.  We need limit the super user behavior 
for accessing tables create by others.

Besides this, there are a lot of super user specific behaviors for some 
objects. Only superuser can have the right for following behavior: 
1. create cast: when function is NULL
2. create filespace
3. create/remove/alter foreign-data wrapper
4. create function: For untrusted language, only superuser can create function.
5. create/drop procedural language
6. create/drop/alter resource queue
7. create tablespace: It means the privilege to create tablespace, and only 
superuser can do. But the CREATE privilege for tablespace means creating 
database/table/index... in tablespace, which is different.
8. create external table: Only super user can create EXECUTE external web table 
or create an external table with a file protocol (but in HAWQ 2.0, the file 
protocol is not supported any more).
9. create operator class
10. copy: Only superuser can copy to or from a file. And in ranger, the 
superuser can not run copy to or from when he doesn't have the privilege for 
that table select or insert.
11. alter state of system triggers
12. some build in functions, including pg_logdir_ls, pg_ls_dir, pg_read_file, 
pg_reload_conf, pg_rotate_logfile, pg_signal_backend, pg_start_backup, 
pg_stat_file, pg_stat_get_activity, pg_stat_get_backend_activity_start, 
pg_stat_get_backend_activity, pg_stat_get_backend_client_addr, 
pg_stat_get_backend_client_port, pg_stat_get_backend_start, 
pg_stat_get_backend_waiting, pg_stop_backup, pg_switch_xlog, pg_stat_reset

For above operations, we'd rather keep it checked in HAWQ side, if there is no 
other concerns.


> Gpadmin super user processing on ACL
> 
>
> Key: HAWQ-1207
> URL: https://issues.apache.org/jira/browse/HAWQ-1207
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Lili Ma
>Assignee: Alexander Denissov
> Fix For: backlog
>
>
> Once we specify enable_ranger, we need process gpadmin user privileges. 
> Ideally, we should also restrict gpadmin behavior since we won't allow 
> gpadmin to have all control on all user data. 
> During the init system period, we can let gpadmin has all the privileges on 
> all the objects. May implement this as seed policy in Ranger plugin side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAWQ-1275) Check build-in catalogs, tables and functions in native aclcheck.

2017-01-20 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma resolved HAWQ-1275.
---
   Resolution: Fixed
Fix Version/s: (was: backlog)
   2.2.0.0-incubating

> Check build-in catalogs, tables and functions in native aclcheck.
> -
>
> Key: HAWQ-1275
> URL: https://issues.apache.org/jira/browse/HAWQ-1275
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Hubert Zhang
>Assignee: Hubert Zhang
> Fix For: 2.2.0.0-incubating
>
>
> We plan to do privilege check in hawq side for build-in catalogs, tables and 
> functions. The reasons are two folds;
> 1 Ranger mainly manage the user data, but build-in catalogs and tables are 
> not related to user data(note that some of them contain statistics 
> information of user data such as catalog table pg_aoseg_*).
> 2 We haven't finish the code of merge of all the privilege check requests 
> into one big request. Without it query such as "\d" and "analyze" will lead 
> to hundreds of RPS request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-256) Integrate Security with Apache Ranger

2017-01-18 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-256:
-
Component/s: (was: PXF)

> Integrate Security with Apache Ranger
> -
>
> Key: HAWQ-256
> URL: https://issues.apache.org/jira/browse/HAWQ-256
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: Security
>Reporter: Michael Andre Pearce (IG)
>Assignee: Lili Ma
> Fix For: backlog
>
> Attachments: HAWQRangerSupportDesign.pdf, 
> HAWQRangerSupportDesign_v0.2.pdf, HAWQRangerSupportDesign_v0.3.pdf
>
>
> Integrate security with Apache Ranger for a unified Hadoop security solution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1275) Check build-in catalogs, tables and functions in native aclcheck.

2017-01-17 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825715#comment-15825715
 ] 

Lili Ma commented on HAWQ-1275:
---

[~hubertzhang]I think this has already been finished. Please close this JIRA if 
you have finished it. Thanks

> Check build-in catalogs, tables and functions in native aclcheck.
> -
>
> Key: HAWQ-1275
> URL: https://issues.apache.org/jira/browse/HAWQ-1275
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Hubert Zhang
>Assignee: Hubert Zhang
> Fix For: backlog
>
>
> We plan to do privilege check in hawq side for build-in catalogs, tables and 
> functions. The reasons are two folds;
> 1 Ranger mainly manage the user data, but build-in catalogs and tables are 
> not related to user data(note that some of them contain statistics 
> information of user data such as catalog table pg_aoseg_*).
> 2 We haven't finish the code of merge of all the privilege check requests 
> into one big request. Without it query such as "\d" and "analyze" will lead 
> to hundreds of RPS request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1257) If user doesn't have privileges on certain objects, need return user which specific table he doesn't have right.

2017-01-04 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1257:
-

 Summary: If user doesn't have privileges on certain objects, need 
return user which specific table he doesn't have right. 
 Key: HAWQ-1257
 URL: https://issues.apache.org/jira/browse/HAWQ-1257
 Project: Apache HAWQ
  Issue Type: Sub-task
  Components: Security
Reporter: Lili Ma
Assignee: Ed Espino
 Fix For: 2.2.0.0-incubating


If user doesn't have privileges on certain objects, need return user all the 
objects he doesn't have right, to avoid the user modify one privilege, and then 
find another privilege constraint, and then another... which may bother the 
user a lot.

For example:
user didn't have the rights of t1 and t2.
{code}
postgres=> select * from test_sa.t1 left join test_sa.t2 on 
test_sa.t1.i=test_sa.t2.i;
ERROR:  permission denied for relation t1
{code}
We wish to prompt user didn't have the rights of t2 also.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1257) If user doesn't have privileges on certain objects, need return user which specific table he doesn't have right.

2017-01-04 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1257:
--
Assignee: Hongxu Ma  (was: Ed Espino)

> If user doesn't have privileges on certain objects, need return user which 
> specific table he doesn't have right. 
> -
>
> Key: HAWQ-1257
> URL: https://issues.apache.org/jira/browse/HAWQ-1257
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Lili Ma
>Assignee: Hongxu Ma
> Fix For: 2.2.0.0-incubating
>
>
> If user doesn't have privileges on certain objects, need return user all the 
> objects he doesn't have right, to avoid the user modify one privilege, and 
> then find another privilege constraint, and then another... which may bother 
> the user a lot.
> For example:
> user didn't have the rights of t1 and t2.
> {code}
> postgres=> select * from test_sa.t1 left join test_sa.t2 on 
> test_sa.t1.i=test_sa.t2.i;
> ERROR:  permission denied for relation t1
> {code}
> We wish to prompt user didn't have the rights of t2 also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1003) Implement batched ACL check through Ranger.

2017-01-04 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1003:
--
Assignee: Hubert Zhang  (was: hongwu)

> Implement batched ACL check through Ranger.
> ---
>
> Key: HAWQ-1003
> URL: https://issues.apache.org/jira/browse/HAWQ-1003
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Lili Ma
>Assignee: Hubert Zhang
> Fix For: backlog
>
>
> Implement enhanced hawq ACL check through Ranger, which means, if a query 
> contains several tables, we can combine the multiple table request together, 
> to send just one REST request to Ranger REST API Server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1256) Enhance libcurl connection to RPS(Ranger Plugin Service), keep it as a long-live connection in session level

2017-01-04 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1256:
--
Assignee: Xiang Sheng  (was: Ed Espino)

> Enhance libcurl connection to RPS(Ranger Plugin Service), keep it as a 
> long-live connection in session level
> 
>
> Key: HAWQ-1256
> URL: https://issues.apache.org/jira/browse/HAWQ-1256
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Lili Ma
>Assignee: Xiang Sheng
> Fix For: 2.2.0.0-incubating
>
>
> The current implementation of call restful api is using a local libcurl 
> handle, which means every time there is restful call, this handle will be 
> initialized and user, after the restful call, this handle is finalized.
> Establishing the call consumes more time, we can reduce this by keep the 
> libcurl as a long-live connection.
> A better way is to make this libcurl context as a global structure. Just 
> initialize it once before QD calls restful api, and finalize it before QD 
> exits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1256) Enhance libcurl connection to RPS(Ranger Plugin Service), keep it as a long-live connection in session level

2017-01-04 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1256:
-

 Summary: Enhance libcurl connection to RPS(Ranger Plugin Service), 
keep it as a long-live connection in session level
 Key: HAWQ-1256
 URL: https://issues.apache.org/jira/browse/HAWQ-1256
 Project: Apache HAWQ
  Issue Type: Sub-task
  Components: Security
Reporter: Lili Ma
Assignee: Ed Espino
 Fix For: 2.2.0.0-incubating


The current implementation of call restful api is using a local libcurl handle, 
which means every time there is restful call, this handle will be initialized 
and user, after the restful call, this handle is finalized.

Establishing the call consumes more time, we can reduce this by keep the 
libcurl as a long-live connection.

A better way is to make this libcurl context as a global structure. Just 
initialize it once before QD calls restful api, and finalize it before QD exits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1003) Implement batched ACL check through Ranger.

2017-01-03 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794682#comment-15794682
 ] 

Lili Ma commented on HAWQ-1003:
---

A query is usually composed of multiple ACL Check requests, for example, if 
insert into an empty table, a series queries for analyze will generate. If we 
can assemble all the requests into one, we will reduce the cost added on Ranger 
ACL.


> Implement batched ACL check through Ranger.
> ---
>
> Key: HAWQ-1003
> URL: https://issues.apache.org/jira/browse/HAWQ-1003
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Lili Ma
>Assignee: hongwu
> Fix For: backlog
>
>
> Implement enhanced hawq ACL check through Ranger, which means, if a query 
> contains several tables, we can combine the multiple table request together, 
> to send just one REST request to Ranger REST API Server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1246) Add generation of RequestID, ClientIP, queryContext(SQL Statement) in HAWQ , and encapsulate these contents to JSON request to RPS

2017-01-03 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1246:
--
Issue Type: Sub-task  (was: Bug)
Parent: HAWQ-256

> Add generation of RequestID, ClientIP, queryContext(SQL Statement) in HAWQ , 
> and encapsulate these contents to JSON request to RPS
> --
>
> Key: HAWQ-1246
> URL: https://issues.apache.org/jira/browse/HAWQ-1246
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Xiang Sheng
>Assignee: Xiang Sheng
> Fix For: 2.2.0.0-incubating
>
>
> These informations should be generated and encapsulate them to the full json 
> request. 
> Currently they are hardcoded.
> {code}
> json_object *jreqid = json_object_new_string("1");
> json_object_object_add(jrequest, "requestId", jreqid);
> json_object *jclientip = json_object_new_string("123.0.0.21");
> json_object_object_add(jrequest, "clientIp", jclientip);
> json_object *jcontext = json_object_new_string("SELECT * FROM DDD");
> json_object_object_add(jrequest, "context", jcontext);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1246) Add generation of RequestID, ClientIP, queryContext(SQL Statement) in HAWQ , and encapsulate these contents to JSON request to RPS

2017-01-03 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794657#comment-15794657
 ] 

Lili Ma commented on HAWQ-1246:
---

Current Request ID is managed per session level, and now we print it out in 
HAWQ master log.
```
2017-01-03 17:38:34.775632 
CST,"malili","postgres",p54608,th2056364032,"[local]",,2017-01-03 17:38:30 
CST,3586,con14,cmd2,seg-1,,,x3586,sx1,"LOG","0","Send JSON request to 
Ranger: { ""requestId"": ""8"", ""user"": ""malili"", ""clientIp"": 
""127.0.0.1"", ""context"": ""SELECT d.datname as \""Name\"",\n   
pg_catalog.pg_get_userbyid(d.datdba) as \""Owner\"",\n   
pg_catalog.pg_encoding_to_char(d.encoding) as \""Encoding\"",\n   
pg_catalog.array_to_string(d.datacl, E'\\n') AS \""Access privileges\""\nFROM 
pg_catalog.pg_database d\nWHERE d.datname <> 'hcatalog'\nORDER BY 1;"", 
""access"": [ { ""resource"": { ""database"": ""postgres"", ""schema"": 
""pg_catalog"", ""function"": ""pg_encoding_to_char"" }, ""privileges"": [ 
""EXECUTE"" ] } ] }",,"SELECT d.datname as ""Name"",
   pg_catalog.pg_get_userbyid(d.datdba) as ""Owner"",
   pg_catalog.pg_encoding_to_char(d.encoding) as ""Encoding"",
   pg_catalog.array_to_string(d.datacl, E'\n') AS ""Access privileges""
FROM pg_catalog.pg_database d
WHERE d.datname <> 'hcatalog'
ORDER BY 1;",0,,"rangerrest.c",391,
```

Note that it's session level.  We can use this information to detect what's a 
query is composed of.

> Add generation of RequestID, ClientIP, queryContext(SQL Statement) in HAWQ , 
> and encapsulate these contents to JSON request to RPS
> --
>
> Key: HAWQ-1246
> URL: https://issues.apache.org/jira/browse/HAWQ-1246
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Security
>Reporter: Xiang Sheng
>Assignee: Xiang Sheng
> Fix For: 2.2.0.0-incubating
>
>
> These informations should be generated and encapsulate them to the full json 
> request. 
> Currently they are hardcoded.
> {code}
> json_object *jreqid = json_object_new_string("1");
> json_object_object_add(jrequest, "requestId", jreqid);
> json_object *jclientip = json_object_new_string("123.0.0.21");
> json_object_object_add(jrequest, "clientIp", jclientip);
> json_object *jcontext = json_object_new_string("SELECT * FROM DDD");
> json_object_object_add(jrequest, "context", jcontext);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1220) Support ranger plugin server HA in hawq side.

2016-12-13 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747574#comment-15747574
 ] 

Lili Ma commented on HAWQ-1220:
---

I think RPS(Ranger Plugin Service) is independent from master/standby master, 
which means, there may exist HAWQ master fail but RPS on hawq master still 
alive, and another side that HAWQ standby master fail but RPS on hawq standby 
master still alive, right?

In current implementation, we just include start RPS in "hawq start master"  
and "hawq start standby". But whatever HAWQ master or HAWQ standby master, it 
will check the RPS service on the same host with it, if finding failure, will 
try to connect another RPS, right?

Thanks

> Support ranger plugin server HA in hawq side.
> -
>
> Key: HAWQ-1220
> URL: https://issues.apache.org/jira/browse/HAWQ-1220
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Hubert Zhang
>Assignee: Hubert Zhang
> Fix For: backlog
>
>
> RPS will run both at master and at standby master, If connection to master 
> RPS failed, we should try to connect to standby master instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1205) Change hawq start script once finding enable_ranger GUC is on.

2016-12-07 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1205:
--
Assignee: Lili Ma  (was: Lei Chang)

> Change hawq start script once finding enable_ranger GUC is on.
> --
>
> Key: HAWQ-1205
> URL: https://issues.apache.org/jira/browse/HAWQ-1205
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: PXF, Security
>Reporter: Lili Ma
>Assignee: Lili Ma
> Fix For: backlog
>
>
> If hawq start finds enable_ranger GUC is on, it needs to start RPS service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1207) Gpadmin super user processing on ACL

2016-12-07 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1207:
-

 Summary: Gpadmin super user processing on ACL
 Key: HAWQ-1207
 URL: https://issues.apache.org/jira/browse/HAWQ-1207
 Project: Apache HAWQ
  Issue Type: Sub-task
  Components: Security
Reporter: Lili Ma
Assignee: Lei Chang


Once we specify enable_ranger, we need process gpadmin user privileges. 

Ideally, we should also restrict gpadmin behavior since we won't allow gpadmin 
to have all control on all user data. 

During the init system period, we can let gpadmin has all the privileges on all 
the objects. May implement this as seed policy in Ranger plugin side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1206) Process catalog table ACL on Ranger

2016-12-07 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1206:
--
Assignee: Lin Wen  (was: Lei Chang)

> Process catalog table ACL on Ranger
> ---
>
> Key: HAWQ-1206
> URL: https://issues.apache.org/jira/browse/HAWQ-1206
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Lili Ma
>Assignee: Lin Wen
> Fix For: backlog
>
>
> There are a lot of catalog tables in HAWQ which also need to go through ACL 
> check. We need find out how to process there tables once Ranger is configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1207) Gpadmin super user processing on ACL

2016-12-07 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1207:
--
Assignee: Alexander Denissov  (was: Lei Chang)

> Gpadmin super user processing on ACL
> 
>
> Key: HAWQ-1207
> URL: https://issues.apache.org/jira/browse/HAWQ-1207
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Lili Ma
>Assignee: Alexander Denissov
> Fix For: backlog
>
>
> Once we specify enable_ranger, we need process gpadmin user privileges. 
> Ideally, we should also restrict gpadmin behavior since we won't allow 
> gpadmin to have all control on all user data. 
> During the init system period, we can let gpadmin has all the privileges on 
> all the objects. May implement this as seed policy in Ranger plugin side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1206) Process catalog table ACL on Ranger

2016-12-07 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1206:
-

 Summary: Process catalog table ACL on Ranger
 Key: HAWQ-1206
 URL: https://issues.apache.org/jira/browse/HAWQ-1206
 Project: Apache HAWQ
  Issue Type: Sub-task
  Components: Security
Reporter: Lili Ma
Assignee: Lei Chang


There are a lot of catalog tables in HAWQ which also need to go through ACL 
check. We need find out how to process there tables once Ranger is configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1001) Implement HAWQ basic user ACL check through Ranger

2016-12-07 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731268#comment-15731268
 ] 

Lili Ma commented on HAWQ-1001:
---

At this time, HAWQ ACL should integrate with Ranger Plugin Service(RPS) 
together to establish a first cycle. 

> Implement HAWQ basic user ACL check through Ranger
> --
>
> Key: HAWQ-1001
> URL: https://issues.apache.org/jira/browse/HAWQ-1001
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Lili Ma
>Assignee: Hubert Zhang
> Fix For: backlog
>
>
> When a user run some query,  HAWQ can connect to Ranger to judge whether the 
> user has the privilege to do that. 
> For each object with unique oid, send one request to Ranger



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1205) Change hawq start script once finding enable_ranger GUC is on.

2016-12-07 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1205:
-

 Summary: Change hawq start script once finding enable_ranger GUC 
is on.
 Key: HAWQ-1205
 URL: https://issues.apache.org/jira/browse/HAWQ-1205
 Project: Apache HAWQ
  Issue Type: Sub-task
Reporter: Lili Ma
Assignee: Lei Chang


If hawq start finds enable_ranger GUC is on, it needs to start RPS service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1204) Add one option in Ambari to enable user to specify whether they want enable Ranger for ACL check

2016-12-07 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731262#comment-15731262
 ] 

Lili Ma commented on HAWQ-1204:
---

Current life cycle for enabling Ranger processing with Ambari is as follows:
1.  use the script to register JAR / JSON / policies with Ranger (manual, as 
Ambari can not ssh into the Ranger host to upload JAR there)
2.  define additional policies in Ranger UI, if needed, Ranger can talk to HAWQ 
as HAWQ is already up
3.  change GUC in hawq_site.xml (via Ambari)
4.  restart HAWQ (via Ambari)
5.  Upon restart, 'hawq start' command will detect GUC setting and start up RPS 
first before starting hawq binary.

> Add one option in Ambari to enable user to specify whether they want enable 
> Ranger for ACL check
> 
>
> Key: HAWQ-1204
> URL: https://issues.apache.org/jira/browse/HAWQ-1204
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Ambari
>Reporter: Lili Ma
>Assignee: Alexander Denissov
> Fix For: backlog
>
>
> Ambari needs do corresponding modification for enable Ranger in HAWQ.
> Also need do special processing if Ranger is on. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HAWQ-256) Integrate Security with Apache Ranger

2016-12-07 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma reassigned HAWQ-256:


Assignee: Lili Ma  (was: Alexander Denissov)

> Integrate Security with Apache Ranger
> -
>
> Key: HAWQ-256
> URL: https://issues.apache.org/jira/browse/HAWQ-256
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF, Security
>Reporter: Michael Andre Pearce (IG)
>Assignee: Lili Ma
> Fix For: backlog
>
> Attachments: HAWQRangerSupportDesign.pdf, 
> HAWQRangerSupportDesign_v0.2.pdf, HAWQRangerSupportDesign_v0.3.pdf
>
>
> Integrate security with Apache Ranger for a unified Hadoop security solution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-256) Integrate Security with Apache Ranger

2016-12-07 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-256:
-
Assignee: Alexander Denissov  (was: Lili Ma)

> Integrate Security with Apache Ranger
> -
>
> Key: HAWQ-256
> URL: https://issues.apache.org/jira/browse/HAWQ-256
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF, Security
>Reporter: Michael Andre Pearce (IG)
>Assignee: Alexander Denissov
> Fix For: backlog
>
> Attachments: HAWQRangerSupportDesign.pdf, 
> HAWQRangerSupportDesign_v0.2.pdf, HAWQRangerSupportDesign_v0.3.pdf
>
>
> Integrate security with Apache Ranger for a unified Hadoop security solution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1204) Add one option in Ambari to enable user to specify whether they want enable Ranger for ACL check

2016-12-07 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1204:
-

 Summary: Add one option in Ambari to enable user to specify 
whether they want enable Ranger for ACL check
 Key: HAWQ-1204
 URL: https://issues.apache.org/jira/browse/HAWQ-1204
 Project: Apache HAWQ
  Issue Type: Sub-task
  Components: Ambari
Reporter: Lili Ma
Assignee: Alexander Denissov


Ambari needs do corresponding modification for enable Ranger in HAWQ.
Also need do special processing if Ranger is on. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1203) Implement Ranger Plugin Service which holds HAWQ Ranger Plugin and provide REST Service

2016-12-07 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1203:
--
Assignee: Alexander Denissov  (was: Lei Chang)

> Implement Ranger Plugin Service which holds HAWQ Ranger Plugin and provide 
> REST Service
> ---
>
> Key: HAWQ-1203
> URL: https://issues.apache.org/jira/browse/HAWQ-1203
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Lili Ma
>Assignee: Alexander Denissov
> Fix For: backlog
>
>
> Per design, we want to create a separate RPS service which hosts HAWQ Ranger 
> plugin service and in charge of handling HAWQ ACL request and periodically 
> fetch policies from Ranger server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1203) Implement Ranger Plugin Service which holds HAWQ Ranger Plugin and provide REST Service

2016-12-07 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1203:
-

 Summary: Implement Ranger Plugin Service which holds HAWQ Ranger 
Plugin and provide REST Service
 Key: HAWQ-1203
 URL: https://issues.apache.org/jira/browse/HAWQ-1203
 Project: Apache HAWQ
  Issue Type: Sub-task
  Components: Security
Reporter: Lili Ma
Assignee: Lei Chang


Per design, we want to create a separate RPS service which hosts HAWQ Ranger 
plugin service and in charge of handling HAWQ ACL request and periodically 
fetch policies from Ranger server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAWQ-1171) Support upgrade for hawq register.

2016-11-30 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710623#comment-15710623
 ] 

Lili Ma edited comment on HAWQ-1171 at 12/1/16 2:36 AM:


It aims to provide multiple update functions. But for upgrade from HAWQ 2.0.0 
to current version, we only need to upgrade hawq register part.  For future 
releases, we may need upgrade other parts too, so we keep this script name. 


was (Author: lilima):
It aims to provide multiple update functions. But for upgrade from HAWQ 2.0.X 
to 2.1.0, we only need to upgrade hawq register part.  For future releases, we 
may need upgrade other parts too, so we keep this script name. 

> Support upgrade for hawq register.
> --
>
> Key: HAWQ-1171
> URL: https://issues.apache.org/jira/browse/HAWQ-1171
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: Core
>Reporter: Hubert Zhang
>Assignee: Hubert Zhang
> Fix For: 2.0.1.0-incubating
>
>
> For Hawq register feature, we need to add some build-in functions to support 
> some catalog changes. This could be done by a hawqupgrade script.
> User interface:
> Hawq upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1145) After registering a partition table, if we want to insert some data into the table, it fails.

2016-11-20 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1145:
--
Attachment: dists.dss
dbgen

[~xunzhang] you can use these two files to generate tpch data. The way to 
generate lineitem_1g is 
dbgen -b dists.dss -s 1 -T L >lineitem_1g

> After registering a partition table, if we want to insert some data into the 
> table, it fails.
> -
>
> Key: HAWQ-1145
> URL: https://issues.apache.org/jira/browse/HAWQ-1145
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: Lili Ma
>Assignee: Hubert Zhang
> Fix For: 2.0.1.0-incubating
>
> Attachments: dbgen, dists.dss
>
>
> Reproduce Steps:
> 1. Create a partition table
> {code}
> CREATE TABLE parquet_LINEITEM_uncompressed(   
>   
>   
>   
>  L_ORDERKEY INT8, 
>   
>   
>   
>  L_PARTKEY BIGINT,
>   
>   
>   
>  L_SUPPKEY BIGINT,
>   
>   
>   
>  L_LINENUMBER BIGINT, 
>   
>   
>   
>  L_QUANTITY decimal,  
>   
>   
>   
>  L_EXTENDEDPRICE decimal, 
>   
>   
>   
>  L_DISCOUNT decimal,  
>   
>   
>   
>  L_TAX decimal,   
>   
>   
>   
>  L_RETURNFLAG CHAR(1),
>   
>   
>   
>  L_LINESTATUS 
> CHAR(1),  
>   
>   
>  
> L_SHIPDATE date,  
>   
>   
>   
> 

[jira] [Updated] (HAWQ-1113) In force mode, hawq register error when files in yaml is disordered

2016-11-10 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1113:
--
Assignee: Chunling Wang  (was: Lei Chang)

> In force mode, hawq register error when files in yaml is disordered
> ---
>
> Key: HAWQ-1113
> URL: https://issues.apache.org/jira/browse/HAWQ-1113
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: Chunling Wang
>Assignee: Chunling Wang
>
> In force mode, hawq register error when files in yaml is in disordered. For 
> example, the files order in yaml is as following:
> {code}
>   Files:
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/2
> size: 250
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/4
> size: 250
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/5
> size: 258
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/6
> size: 270
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/3
> size: 258
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW2@/1
> size: 228
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/2
> size: 215
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/3
> size: 215
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/4
> size: 220
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/1
> size: 254
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/6
> size: 215
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/5
> size: 210
> {code}
> After hawq register success, we select data from table and get the error:
> {code}
> ERROR:  hdfs file length does not equal to metadata logic length! 
> (cdbdatalocality.c:1102)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1113) In force mode, hawq register error when files in yaml is disordered

2016-11-10 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1113:
--
Affects Version/s: 2.0.1.0-incubating

> In force mode, hawq register error when files in yaml is disordered
> ---
>
> Key: HAWQ-1113
> URL: https://issues.apache.org/jira/browse/HAWQ-1113
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: Chunling Wang
>Assignee: Chunling Wang
>
> In force mode, hawq register error when files in yaml is in disordered. For 
> example, the files order in yaml is as following:
> {code}
>   Files:
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/2
> size: 250
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/4
> size: 250
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/5
> size: 258
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/6
> size: 270
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/3
> size: 258
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW2@/1
> size: 228
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/2
> size: 215
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/3
> size: 215
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/4
> size: 220
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/1
> size: 254
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/6
> size: 215
>   - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/5
> size: 210
> {code}
> After hawq register success, we select data from table and get the error:
> {code}
> ERROR:  hdfs file length does not equal to metadata logic length! 
> (cdbdatalocality.c:1102)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAWQ-1035) support partition table register

2016-11-10 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma resolved HAWQ-1035.
---
Resolution: Fixed

> support partition table register
> 
>
> Key: HAWQ-1035
> URL: https://issues.apache.org/jira/browse/HAWQ-1035
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: Chunling Wang
> Fix For: 2.0.1.0-incubating
>
>
> Support partition table register, limited to 1 level partition table, since 
> hawq extract only supports 1-level partition table.
> Expected behavior:
> 1. Create a partition table in HAWQ, then extract the information out to .yml 
> file
> 2. Call hawq register and specify identified .yml file and a new table name, 
> the files should be registered into the new table.
> Work can be detailed down to implement partition table register:
> 1. modify .yml configuration file parsing function, add content for partition 
> table.
> 2. construct partition table DDL regards to .yml configuration file
> 3. map sub partition table name to the table list in .yml configuration file
> 4. register the subpartition table one by one



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-991) "HAWQ register" could register tables according to .yml configuration file

2016-11-10 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-991:
-
Component/s: (was: External Tables)

> "HAWQ register" could register tables according to .yml configuration file
> --
>
> Key: HAWQ-991
> URL: https://issues.apache.org/jira/browse/HAWQ-991
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: hongwu
>Assignee: hongwu
> Fix For: 2.0.1.0-incubating
>
>
> Scenario: 
> 1. For cluster Disaster Recovery. Two clusters co-exist, periodically import 
> data from Cluster A to Cluster B. Need Register data to Cluster B.
> 2. For the rollback of table. Do checkpoints somewhere, and need to rollback 
> to previous checkpoint. 
> Description:
> Register according to .yml configuration file. 
> hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c 
> config] [--force][--repair]  
> Behaviors:
> 1. If table doesn't exist, will automatically create the table and register 
> the files in .yml configuration file. Will use the filesize specified in .yml 
> to update the catalog table. 
> 2. If table already exist, and neither --force nor --repair configured. Do 
> not create any table, and directly register the files specified in .yml file 
> to the table. Note that if the file is under table directory in HDFS, will 
> throw error, say, to-be-registered files should not under the table path.
> 3. If table already exist, and --force is specified. Will clear all the 
> catalog contents in pg_aoseg.pg_paqseg_$relid while keep the files on HDFS, 
> and then re-register all the files to the table.  This is for scenario 2.
> 4. If table already exist, and --repair is specified. Will change both file 
> folder and catalog table pg_aoseg.pg_paqseg_$relid to the state which .yml 
> file configures. Note may some new generated files since the checkpoint may 
> be deleted here. Also note the all the files in .yml file should all under 
> the table folder on HDFS. Limitation: Do not support cases for hash table 
> redistribution, table truncate and table drop. This is for scenario 3.
> Requirements:
> 1. To be registered file path has to colocate with HAWQ in the same HDFS 
> cluster.
> 2. If to be registered is a hash table, the registered file number should be 
> one or multiple times or hash table bucket number.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1091) HAWQ InputFormat Bugs

2016-11-10 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1091:
--
Component/s: (was: Command Line Tools)
 Storage

> HAWQ InputFormat Bugs
> -
>
> Key: HAWQ-1091
> URL: https://issues.apache.org/jira/browse/HAWQ-1091
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Storage
>Reporter: hongwu
>Assignee: hongwu
> Fix For: 2.0.1.0-incubating
>
>
> In "TPCHLocalTester.java" and "HAWQInputFormatPerformanceTest_TPCH.java", it 
> uses "WHERE content>=0" filter which is old condition in old version of HAWQ.
> dbgen binary is not included in hawq repo which is needed for 
> generate_load_tpch.pl script to generate data used for running mapreduce test 
> cases. We should disable these cases.
> A bug when size in extracted yaml file is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1145) After registering a partition table, if we want to insert some data into the table, it fails.

2016-11-03 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1145:
--
Description: 
Reproduce Steps:
1. Create a partition table
{code}
CREATE TABLE parquet_LINEITEM_uncompressed( 



 L_ORDERKEY INT8,   



 L_PARTKEY BIGINT,  



 L_SUPPKEY BIGINT,  



 L_LINENUMBER BIGINT,   


 
L_QUANTITY decimal, 



L_EXTENDEDPRICE decimal,



L_DISCOUNT decimal, 



L_TAX decimal,  



L_RETURNFLAG CHAR(1),   



L_LINESTATUS CHAR(1),   



L_SHIPDATE date,



L_COMMITDATE date,  



L_RECEIPTDATE date, 



L_SHIPINSTRUCT CHAR(25),



[jira] [Updated] (HAWQ-1145) After registering a partition table, if we want to insert some data into the table, it fails.

2016-11-03 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1145:
--
Assignee: Hubert Zhang  (was: Lei Chang)

> After registering a partition table, if we want to insert some data into the 
> table, it fails.
> -
>
> Key: HAWQ-1145
> URL: https://issues.apache.org/jira/browse/HAWQ-1145
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: Hubert Zhang
> Fix For: 2.0.1.0-incubating
>
>
> Reproduce Steps:
> 1. Create a partition table
> CREATE TABLE parquet_LINEITEM_uncompressed(   
>   
>   
>   
>  L_ORDERKEY INT8, 
>   
>   
>   
>  L_PARTKEY BIGINT,
>   
>   
>   
>  L_SUPPKEY BIGINT,
>   
>   
>   
>  L_LINENUMBER BIGINT, 
>   
>   
>   
>  L_QUANTITY decimal,  
>   
>   
>   
>  L_EXTENDEDPRICE decimal, 
>   
>   
>   
>  L_DISCOUNT decimal,  
>   
>   
>   
>  L_TAX decimal,   
>   
>   
>   
>  L_RETURNFLAG CHAR(1),
>   
>   
>   
>  L_LINESTATUS 
> CHAR(1),  
>   
>   
>  
> L_SHIPDATE date,  
>   
>   
>   
> L_COMMITDATE date,
>   
>   
> 

[jira] [Updated] (HAWQ-1144) Register into a 2-level partition table, hawq register didn't throw error, and indicates that hawq register succeed, but no data can be selected out.

2016-11-03 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1144:
--
Description: 
Register into a 2-level partition table, hawq register didn't throw error, and 
indicates that hawq register succeed, but no data can be selected out.

Reproduce Steps:
1. Create a one-level partition table
{code}
 create table parquet_wt (id SERIAL,a1 int,a2 char(5),a3 numeric,a4 boolean 
DEFAULT false ,a5 char DEFAULT 'd',a6 text,a7 timestamp,a8 character 
varying(705),a9 bigint,a10 date,a11 varchar(600),a12 text,a13 decimal,a14 
real,a15 bigint,a16 int4 ,a17 bytea,a18 timestamp with time zone,a19 timetz,a20 
path,a21 box,a22 macaddr,a23 interval,a24 character varying(800),a25 lseg,a26 
point,a27 double precision,a28 circle,a29 int4,a30 numeric(8),a31 polygon,a32 
date,a33 real,a34 money,a35 cidr,a36 inet,a37 time,a38 text,a39 bit,a40 bit 
varying(5),a41 smallint,a42 int )   WITH (appendonly=true, orientation=parquet) 
distributed randomly  Partition by range(a1) (start(1)  end(5000) every(1000) );
{code}
2. insert some data into this table
{code}
insert into parquet_wt 
(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42)
 values(generate_series(1,20),'M',2011,'t','a','This is news of today: Deadlock 
between Republicans and Democrats over how best to reduce the U.S. deficit, and 
over what period, has blocked an agreement to allow the raising of the $14.3 
trillion debt ceiling','2001-12-24 02:26:11','U.S. House of Representatives 
Speaker John Boehner, the top Republican in Congress who has put forward a 
deficit reduction plan to be voted on later on Thursday said he had no control 
over whether his bill would avert a credit 
downgrade.',generate_series(2490,2505),'2011-10-11','The Republican-controlled 
House is tentatively scheduled to vote on Boehner proposal this afternoon at 
around 6 p.m. EDT (2200 GMT). The main Republican vote counter in the House, 
Kevin McCarthy, would not say if there were enough votes to pass the 
bill.','WASHINGTON:House Speaker John Boehner says his plan mixing spending 
cuts in exchange for raising the nations $14.3 trillion debt limit is not 
perfect but is as large a step that a divided government can take that is 
doable and signable by President Barack Obama.The Ohio Republican says the 
measure is an honest and sincere attempt at compromise and was negotiated with 
Democrats last weekend and that passing it would end the ongoing debt crisis. 
The plan blends $900 billion-plus in spending cuts with a companion increase in 
the nations borrowing 
cap.','1234.56',323453,generate_series(3452,3462),7845,'0011','2005-07-16 
01:51:15+1359','2001-12-13 
01:51:15','((1,2),(0,3),(2,1))','((2,3)(4,5))','08:00:2b:01:02:03','1-2','Republicans
 had been working throughout the day Thursday to lock down support for their 
plan to raise the nations debt ceiling, even as Senate Democrats vowed to 
swiftly kill it if 
passed.','((2,3)(4,5))','(6,7)',11.222,'((4,5),7)',32,3214,'(1,0,2,3)','2010-02-21',43564,'$1,000.00','192.168.1','126.1.3.4','12:30:45','Johnson
 & Johnsons McNeil Consumer Healthcare announced the voluntary dosage reduction 
today. Labels will carry new dosing instructions this fall.The company says it 
will cut the maximum dosage of Regular Strength Tylenol and other 
acetaminophen-containing products in 2012.Acetaminophen is safe when used as 
directed, says Edwin Kuffner, MD, McNeil vice president of over-the-counter 
medical affairs. But, when too much is taken, it can cause liver damage.The 
action is intended to cut the risk of such accidental overdoses, the company 
says in a news release.','1','0',12,23);
{code}
3. extract the metadata out for the table
{code}
hawq extract -d postgres -o ~/parquet.yaml parquet_wt
{code}
4. create a two-level partition table
{code}
CREATE TABLE parquet_wt_subpartgzip2
  (id SERIAL,a1 int,a2 
char(5),a3 numeric,a4 boolean DEFAULT false ,a5 char DEFAULT 'd',a6 text,a7 
timestamp,a8 character varying(705),a9 bigint,a10 date,a11 varchar(600),a12 
text,a13 decimal,a14 real,a15 bigint,a16 int4 ,a17 bytea,a18 timestamp with 
time zone,a19 timetz,a20 path,a21 box,a22 macaddr,a23 interval,a24 character 
varying(800),a25 lseg,a26 point,a27 double precision,a28 circle,a29 int4,a30 
numeric(8),a31 polygon,a32 date,a33 real,a34 money,a35 cidr,a36 inet,a37 
time,a38 text,a39 bit,a40 bit varying(5),a41 smallint,a42 int ) 
WITH (appendonly=true, orientation=parquet) distributed 
randomly  Partition by range(a1) Subpartition by list(a2) subpartition template 
( default subpartition df_sp, subpartition sp1 values('M') , subpartition sp2 
values('F')   

[jira] [Created] (HAWQ-1145) After registering a partition table, if we want to insert some data into the table, it fails.

2016-11-03 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1145:
-

 Summary: After registering a partition table, if we want to insert 
some data into the table, it fails.
 Key: HAWQ-1145
 URL: https://issues.apache.org/jira/browse/HAWQ-1145
 Project: Apache HAWQ
  Issue Type: Bug
  Components: Command Line Tools
Reporter: Lili Ma
Assignee: Lei Chang
 Fix For: 2.0.1.0-incubating


Reproduce Steps:
1. Create a partition table
CREATE TABLE parquet_LINEITEM_uncompressed( 



 L_ORDERKEY INT8,   



 L_PARTKEY BIGINT,  



 L_SUPPKEY BIGINT,  



 L_LINENUMBER BIGINT,   


 
L_QUANTITY decimal, 



L_EXTENDEDPRICE decimal,



L_DISCOUNT decimal, 



L_TAX decimal,  



L_RETURNFLAG CHAR(1),   



L_LINESTATUS CHAR(1),   



L_SHIPDATE date,



L_COMMITDATE date,  



L_RECEIPTDATE date, 


   

[jira] [Updated] (HAWQ-1144) Register into a 2-level partition table, hawq register didn't throw error, and indicates that hawq register succeed, but no data can be selected out.

2016-11-03 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1144:
--
   Assignee: Lin Wen  (was: Lei Chang)
Description: 
Register into a 2-level partition table, hawq register didn't throw error, and 
indicates that hawq register succeed, but no data can be selected out.

Reproduce Steps:
1. Create a one-level partition table
{code}
 create table parquet_wt (id SERIAL,a1 int,a2 char(5),a3 numeric,a4 boolean 
DEFAULT false ,a5 char DEFAULT 'd',a6 text,a7 timestamp,a8 character 
varying(705),a9 bigint,a10 date,a11 varchar(600),a12 text,a13 decimal,a14 
real,a15 bigint,a16 int4 ,a17 bytea,a18 timestamp with time zone,a19 timetz,a20 
path,a21 box,a22 macaddr,a23 interval,a24 character varying(800),a25 lseg,a26 
point,a27 double precision,a28 circle,a29 int4,a30 numeric(8),a31 polygon,a32 
date,a33 real,a34 money,a35 cidr,a36 inet,a37 time,a38 text,a39 bit,a40 bit 
varying(5),a41 smallint,a42 int )   WITH (appendonly=true, orientation=parquet) 
distributed randomly  Partition by range(a1) (start(1)  end(5000) every(1000) );
{code}
2. insert some data into this table
```
insert into parquet_wt 
(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42)
 values(generate_series(1,20),'M',2011,'t','a','This is news of today: Deadlock 
between Republicans and Democrats over how best to reduce the U.S. deficit, and 
over what period, has blocked an agreement to allow the raising of the $14.3 
trillion debt ceiling','2001-12-24 02:26:11','U.S. House of Representatives 
Speaker John Boehner, the top Republican in Congress who has put forward a 
deficit reduction plan to be voted on later on Thursday said he had no control 
over whether his bill would avert a credit 
downgrade.',generate_series(2490,2505),'2011-10-11','The Republican-controlled 
House is tentatively scheduled to vote on Boehner proposal this afternoon at 
around 6 p.m. EDT (2200 GMT). The main Republican vote counter in the House, 
Kevin McCarthy, would not say if there were enough votes to pass the 
bill.','WASHINGTON:House Speaker John Boehner says his plan mixing spending 
cuts in exchange for raising the nations $14.3 trillion debt limit is not 
perfect but is as large a step that a divided government can take that is 
doable and signable by President Barack Obama.The Ohio Republican says the 
measure is an honest and sincere attempt at compromise and was negotiated with 
Democrats last weekend and that passing it would end the ongoing debt crisis. 
The plan blends $900 billion-plus in spending cuts with a companion increase in 
the nations borrowing 
cap.','1234.56',323453,generate_series(3452,3462),7845,'0011','2005-07-16 
01:51:15+1359','2001-12-13 
01:51:15','((1,2),(0,3),(2,1))','((2,3)(4,5))','08:00:2b:01:02:03','1-2','Republicans
 had been working throughout the day Thursday to lock down support for their 
plan to raise the nations debt ceiling, even as Senate Democrats vowed to 
swiftly kill it if 
passed.','((2,3)(4,5))','(6,7)',11.222,'((4,5),7)',32,3214,'(1,0,2,3)','2010-02-21',43564,'$1,000.00','192.168.1','126.1.3.4','12:30:45','Johnson
 & Johnsons McNeil Consumer Healthcare announced the voluntary dosage reduction 
today. Labels will carry new dosing instructions this fall.The company says it 
will cut the maximum dosage of Regular Strength Tylenol and other 
acetaminophen-containing products in 2012.Acetaminophen is safe when used as 
directed, says Edwin Kuffner, MD, McNeil vice president of over-the-counter 
medical affairs. But, when too much is taken, it can cause liver damage.The 
action is intended to cut the risk of such accidental overdoses, the company 
says in a news release.','1','0',12,23);
```
3. extract the metadata out for the table
```
hawq extract -d postgres -o ~/parquet.yaml parquet_wt
```
4. create a two-level partition table
```
CREATE TABLE parquet_wt_subpartgzip2
  (id SERIAL,a1 int,a2 
char(5),a3 numeric,a4 boolean DEFAULT false ,a5 char DEFAULT 'd',a6 text,a7 
timestamp,a8 character varying(705),a9 bigint,a10 date,a11 varchar(600),a12 
text,a13 decimal,a14 real,a15 bigint,a16 int4 ,a17 bytea,a18 timestamp with 
time zone,a19 timetz,a20 path,a21 box,a22 macaddr,a23 interval,a24 character 
varying(800),a25 lseg,a26 point,a27 double precision,a28 circle,a29 int4,a30 
numeric(8),a31 polygon,a32 date,a33 real,a34 money,a35 cidr,a36 inet,a37 
time,a38 text,a39 bit,a40 bit varying(5),a41 smallint,a42 int ) 
WITH (appendonly=true, orientation=parquet) distributed 
randomly  Partition by range(a1) Subpartition by list(a2) subpartition template 
( default subpartition df_sp, subpartition sp1 values('M') , subpartition sp2 
values('F')  

[jira] [Created] (HAWQ-1144) Register into a 2-level partition table, hawq register didn't throw error, and indicates that hawq register succeed, but no data can be selected out.

2016-11-03 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1144:
-

 Summary: Register into a 2-level partition table, hawq register 
didn't throw error, and indicates that hawq register succeed, but no data can 
be selected out.
 Key: HAWQ-1144
 URL: https://issues.apache.org/jira/browse/HAWQ-1144
 Project: Apache HAWQ
  Issue Type: Bug
  Components: Command Line Tools
Reporter: Lili Ma
Assignee: Lei Chang
 Fix For: 2.0.1.0-incubating


Register into a 2-level partition table, hawq register didn't throw error, and 
indicates that hawq register succeed, but no data can be selected out.

Reproduce Steps:
1. Create a one-level partition table
```
 create table parquet_wt (id SERIAL,a1 int,a2 char(5),a3 numeric,a4 boolean 
DEFAULT false ,a5 char DEFAULT 'd',a6 text,a7 timestamp,a8 character 
varying(705),a9 bigint,a10 date,a11 varchar(600),a12 text,a13 decimal,a14 
real,a15 bigint,a16 int4 ,a17 bytea,a18 timestamp with time zone,a19 timetz,a20 
path,a21 box,a22 macaddr,a23 interval,a24 character varying(800),a25 lseg,a26 
point,a27 double precision,a28 circle,a29 int4,a30 numeric(8),a31 polygon,a32 
date,a33 real,a34 money,a35 cidr,a36 inet,a37 time,a38 text,a39 bit,a40 bit 
varying(5),a41 smallint,a42 int )   WITH (appendonly=true, orientation=parquet) 
distributed randomly  Partition by range(a1) (start(1)  end(5000) every(1000) );
```
2. insert some data into this table
```
insert into parquet_wt 
(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42)
 values(generate_series(1,20),'M',2011,'t','a','This is news of today: Deadlock 
between Republicans and Democrats over how best to reduce the U.S. deficit, and 
over what period, has blocked an agreement to allow the raising of the $14.3 
trillion debt ceiling','2001-12-24 02:26:11','U.S. House of Representatives 
Speaker John Boehner, the top Republican in Congress who has put forward a 
deficit reduction plan to be voted on later on Thursday said he had no control 
over whether his bill would avert a credit 
downgrade.',generate_series(2490,2505),'2011-10-11','The Republican-controlled 
House is tentatively scheduled to vote on Boehner proposal this afternoon at 
around 6 p.m. EDT (2200 GMT). The main Republican vote counter in the House, 
Kevin McCarthy, would not say if there were enough votes to pass the 
bill.','WASHINGTON:House Speaker John Boehner says his plan mixing spending 
cuts in exchange for raising the nations $14.3 trillion debt limit is not 
perfect but is as large a step that a divided government can take that is 
doable and signable by President Barack Obama.The Ohio Republican says the 
measure is an honest and sincere attempt at compromise and was negotiated with 
Democrats last weekend and that passing it would end the ongoing debt crisis. 
The plan blends $900 billion-plus in spending cuts with a companion increase in 
the nations borrowing 
cap.','1234.56',323453,generate_series(3452,3462),7845,'0011','2005-07-16 
01:51:15+1359','2001-12-13 
01:51:15','((1,2),(0,3),(2,1))','((2,3)(4,5))','08:00:2b:01:02:03','1-2','Republicans
 had been working throughout the day Thursday to lock down support for their 
plan to raise the nations debt ceiling, even as Senate Democrats vowed to 
swiftly kill it if 
passed.','((2,3)(4,5))','(6,7)',11.222,'((4,5),7)',32,3214,'(1,0,2,3)','2010-02-21',43564,'$1,000.00','192.168.1','126.1.3.4','12:30:45','Johnson
 & Johnsons McNeil Consumer Healthcare announced the voluntary dosage reduction 
today. Labels will carry new dosing instructions this fall.The company says it 
will cut the maximum dosage of Regular Strength Tylenol and other 
acetaminophen-containing products in 2012.Acetaminophen is safe when used as 
directed, says Edwin Kuffner, MD, McNeil vice president of over-the-counter 
medical affairs. But, when too much is taken, it can cause liver damage.The 
action is intended to cut the risk of such accidental overdoses, the company 
says in a news release.','1','0',12,23);
```
3. extract the metadata out for the table
```
hawq extract -d postgres -o ~/parquet.yaml parquet_wt
```
4. create a two-level partition table
```
CREATE TABLE parquet_wt_subpartgzip2
  (id SERIAL,a1 int,a2 
char(5),a3 numeric,a4 boolean DEFAULT false ,a5 char DEFAULT 'd',a6 text,a7 
timestamp,a8 character varying(705),a9 bigint,a10 date,a11 varchar(600),a12 
text,a13 decimal,a14 real,a15 bigint,a16 int4 ,a17 bytea,a18 timestamp with 
time zone,a19 timetz,a20 path,a21 box,a22 macaddr,a23 interval,a24 character 
varying(800),a25 lseg,a26 point,a27 double precision,a28 circle,a29 int4,a30 
numeric(8),a31 polygon,a32 date,a33 real,a34 money,a35 cidr,a36 inet,a37 
time,a38 text,a39 bit,a40 bit varying(5),a41 smallint,a42 int )

[jira] [Updated] (HAWQ-1035) support partition table register

2016-10-31 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1035:
--
Assignee: Chunling Wang  (was: Hubert Zhang)

> support partition table register
> 
>
> Key: HAWQ-1035
> URL: https://issues.apache.org/jira/browse/HAWQ-1035
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: Chunling Wang
> Fix For: 2.0.1.0-incubating
>
>
> Support partition table register, limited to 1 level partition table, since 
> hawq extract only supports 1-level partition table.
> Expected behavior:
> 1. Create a partition table in HAWQ, then extract the information out to .yml 
> file
> 2. Call hawq register and specify identified .yml file and a new table name, 
> the files should be registered into the new table.
> Work can be detailed down to implement partition table register:
> 1. modify .yml configuration file parsing function, add content for partition 
> table.
> 2. construct partition table DDL regards to .yml configuration file
> 3. map sub partition table name to the table list in .yml configuration file
> 4. register the subpartition table one by one



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAWQ-1034) add --repair option for hawq register

2016-10-31 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15624133#comment-15624133
 ] 

Lili Ma edited comment on HAWQ-1034 at 11/1/16 2:44 AM:


Repair mode can be thought of particular case of force mode.  
1) Force mode registers the files according to yaml configuration file, erase 
all the records in catalog (pg_aoseg.pg_aoseg(paqseg)_$relid) and re-implement 
catalog insert. It requires HDFS files for the table be included in yaml 
configuation file.
2) Repair mode also registers files according to yaml configuration file, erase 
the catalog records and re-insert. But it doesn't require all the HDFS files 
for the table be included in yaml configuration file. It will directly delete 
those files which are under the table directory but not included in yaml 
configuration file. 
Since repair mode may directly deleting HDFS files, say, if user uses repair 
mode by mistake, his/her data may be deleted, it may bring some risks.  We can 
allow them to use force mode, and throw error for files under the directory but 
not included in yaml configuration file.  If user does think the files are 
unnecessary, he/she can delete the files by himself/herself.

The workaround for supporting repair mode use --force option:
1) If there is no added files since last checkpoint where the yaml 
configuration file is generated, force mode can directly handle it.
2) If there are some added files since last checkpoint which the user does want 
to delete, we can output those file information in force mode so that users can 
delete those files by themselves and then do register force mode again. 

Since we can use force mode to implement repair feature, we will remove 
existing code for repair mode and close this JIRA.  Thanks


was (Author: lilima):
Repair mode can be thought of particular case of force mode.  
1) Force mode registers the files according to yaml configuration file, erase 
all the records in catalog (pg_aoseg.pg_aoseg(paqseg)_$relid) and re-implement 
catalog insert. It requires HDFS files for the table be included in yaml 
configuation file.
2) Repair mode also registers files according to yaml configuration file, erase 
the catalog records and re-insert. But it doesn't require all the HDFS files 
for the table be included in yaml configuration file. It will directly delete 
those files which are under the table directory but not included in yaml 
configuration file. 
I'm a little concerned about directly deleting HDFS files, say, if user uses 
repair mode by mistake, his/her data may be deleted.  So, what if we just allow 
them to use force mode, and throw error for files under the directory but not 
included in yaml configuration file.  If user does think the files are 
unnecessary, he/she can delete the files by himself/herself.

The workaround for supporting repair mode use --force option:
1) If there is no added files since last checkpoint where the yaml 
configuration file is generated, force mode can directly handle it.
2) If there are some added files since last checkpoint which the user does want 
to delete, we can output those file information in force mode so that users can 
delete those files by themselves and then do register force mode again. 

Since we can use force mode to implement repair feature, we will remove 
existing code for repair mode and close this JIRA.  Thanks

> add --repair option for hawq register
> -
>
> Key: HAWQ-1034
> URL: https://issues.apache.org/jira/browse/HAWQ-1034
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: Lili Ma
>Assignee: Chunling Wang
> Fix For: 2.0.1.0-incubating
>
>
> add --repair option for hawq register
> Will change both file folder and catalog table pg_aoseg.pg_paqseg_$relid to 
> the state which .yml file configures. Note may some new generated files since 
> the checkpoint may be deleted here. Also note the all the files in .yml file 
> should all under the table folder on HDFS. Limitation: Do not support cases 
> for hash table redistribution, table truncate and table drop. This is for 
> scenario rollback of table: Do checkpoints somewhere, and need to rollback to 
> previous checkpoint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAWQ-1034) add --repair option for hawq register

2016-10-31 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma resolved HAWQ-1034.
---
Resolution: Done

> add --repair option for hawq register
> -
>
> Key: HAWQ-1034
> URL: https://issues.apache.org/jira/browse/HAWQ-1034
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: Lili Ma
>Assignee: Chunling Wang
> Fix For: 2.0.1.0-incubating
>
>
> add --repair option for hawq register
> Will change both file folder and catalog table pg_aoseg.pg_paqseg_$relid to 
> the state which .yml file configures. Note may some new generated files since 
> the checkpoint may be deleted here. Also note the all the files in .yml file 
> should all under the table folder on HDFS. Limitation: Do not support cases 
> for hash table redistribution, table truncate and table drop. This is for 
> scenario rollback of table: Do checkpoints somewhere, and need to rollback to 
> previous checkpoint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1034) add --repair option for hawq register

2016-10-31 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15624133#comment-15624133
 ] 

Lili Ma commented on HAWQ-1034:
---

Repair mode can be thought of particular case of force mode.  
1) Force mode registers the files according to yaml configuration file, erase 
all the records in catalog (pg_aoseg.pg_aoseg(paqseg)_$relid) and re-implement 
catalog insert. It requires HDFS files for the table be included in yaml 
configuation file.
2) Repair mode also registers files according to yaml configuration file, erase 
the catalog records and re-insert. But it doesn't require all the HDFS files 
for the table be included in yaml configuration file. It will directly delete 
those files which are under the table directory but not included in yaml 
configuration file. 
I'm a little concerned about directly deleting HDFS files, say, if user uses 
repair mode by mistake, his/her data may be deleted.  So, what if we just allow 
them to use force mode, and throw error for files under the directory but not 
included in yaml configuration file.  If user does think the files are 
unnecessary, he/she can delete the files by himself/herself.

The workaround for supporting repair mode use --force option:
1) If there is no added files since last checkpoint where the yaml 
configuration file is generated, force mode can directly handle it.
2) If there are some added files since last checkpoint which the user does want 
to delete, we can output those file information in force mode so that users can 
delete those files by themselves and then do register force mode again. 

Since we can use force mode to implement repair feature, we will remove 
existing code for repair mode and close this JIRA.  Thanks

> add --repair option for hawq register
> -
>
> Key: HAWQ-1034
> URL: https://issues.apache.org/jira/browse/HAWQ-1034
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: Lili Ma
>Assignee: Chunling Wang
> Fix For: 2.0.1.0-incubating
>
>
> add --repair option for hawq register
> Will change both file folder and catalog table pg_aoseg.pg_paqseg_$relid to 
> the state which .yml file configures. Note may some new generated files since 
> the checkpoint may be deleted here. Also note the all the files in .yml file 
> should all under the table folder on HDFS. Limitation: Do not support cases 
> for hash table redistribution, table truncate and table drop. This is for 
> scenario rollback of table: Do checkpoints somewhere, and need to rollback to 
> previous checkpoint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1104) Add tupcount, varblockcount and eofuncompressed value in hawq extract yaml configuration, also add implementation in hawq register to recognize these values

2016-10-14 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1104:
--
Assignee: hongwu  (was: Lei Chang)

> Add tupcount, varblockcount and eofuncompressed value in hawq extract yaml 
> configuration, also add implementation in hawq register to recognize these 
> values  
> --
>
> Key: HAWQ-1104
> URL: https://issues.apache.org/jira/browse/HAWQ-1104
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: hongwu
> Fix For: 2.0.1.0-incubating
>
>
> Add tupcount, varblockcount and eofuncompressed value in hawq extract yaml 
> configuration, and also add implementation in hawq register to recognize 
> these values so the information in catalog table pg_aoseg.pg_aoseg_$relid or 
> pg_aoseg.pg_paqseg_$relid can become correct.  
> After the work, the information in catalog table will become correct if we 
> register table according to the yaml configuration file which is generated by 
> another table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1104) Add tupcount, varblockcount and eofuncompressed value in hawq extract yaml configuration, also add implementation in hawq register to recognize these values

2016-10-14 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1104:
-

 Summary: Add tupcount, varblockcount and eofuncompressed value in 
hawq extract yaml configuration, also add implementation in hawq register to 
recognize these values  
 Key: HAWQ-1104
 URL: https://issues.apache.org/jira/browse/HAWQ-1104
 Project: Apache HAWQ
  Issue Type: Sub-task
  Components: Command Line Tools
Reporter: Lili Ma
Assignee: Lei Chang
 Fix For: 2.0.1.0-incubating


Add tupcount, varblockcount and eofuncompressed value in hawq extract yaml 
configuration, and also add implementation in hawq register to recognize these 
values so the information in catalog table pg_aoseg.pg_aoseg_$relid or 
pg_aoseg.pg_paqseg_$relid can become correct.  

After the work, the information in catalog table will become correct if we 
register table according to the yaml configuration file which is generated by 
another table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1061) Improve hawq register for already bugs found

2016-09-18 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1061:
-

 Summary: Improve hawq register for already bugs found
 Key: HAWQ-1061
 URL: https://issues.apache.org/jira/browse/HAWQ-1061
 Project: Apache HAWQ
  Issue Type: Sub-task
  Components: Command Line Tools
Reporter: Lili Ma
Assignee: Lei Chang


Fix the bugs found by the verification process



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1050) hawq register help can not return correct result indicating the help information

2016-09-13 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1050:
--
Issue Type: Sub-task  (was: Bug)
Parent: HAWQ-991

> hawq register help can not return correct result indicating the help 
> information
> 
>
> Key: HAWQ-1050
> URL: https://issues.apache.org/jira/browse/HAWQ-1050
> Project: Apache HAWQ
>  Issue Type: Sub-task
>Reporter: Lili Ma
>Assignee: Lei Chang
>
> hawq register help can not return correct result indicating the help 
> information.
> should keep help as a keyword and return same results as hawq register --help.
> {code}
> malilis-MacBook-Pro:~ malili$ hawq register help
> 20160914:09:56:37:007364 
> hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Usage: hadoop [--config 
> confdir] COMMAND
>where COMMAND is one of:
>   fs   run a generic filesystem user client
>   version  print the version
>   jar run a jar file
>   checknative [-a|-h]  check native hadoop and compression libraries 
> availability
>   distcp   copy file or directories recursively
>   archive -archiveName NAME -p  *  create a hadoop 
> archive
>   classpathprints the class path needed to get the
>   credential   interact with credential providers
>Hadoop jar and the required libraries
>   daemonlogget/set the log level for each daemon
>   traceview and modify Hadoop tracing settings
>  or
>   CLASSNAMErun the class named CLASSNAME
> Most commands print help when invoked w/o parameters.
> Traceback (most recent call last):
>   File "/usr/local/hawq/bin/hawqregister", line 398, in 
> check_hash_type(dburl, tablename) # Usage1 only support randomly 
> distributed table
>   File "/usr/local/hawq/bin/hawqregister", line 197, in check_hash_type
> logger.error('Table not found in table gp_distribution_policy.' % 
> tablename)
> TypeError: not all arguments converted during string formatting
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1050) hawq register help can not return correct result indicating the help information

2016-09-13 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1050:
-

 Summary: hawq register help can not return correct result 
indicating the help information
 Key: HAWQ-1050
 URL: https://issues.apache.org/jira/browse/HAWQ-1050
 Project: Apache HAWQ
  Issue Type: Bug
Reporter: Lili Ma
Assignee: Lei Chang


hawq register help can not return correct result indicating the help 
information.
should keep help as a keyword and return same results as hawq register --help.

{code}
malilis-MacBook-Pro:~ malili$ hawq register help
20160914:09:56:37:007364 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Usage: 
hadoop [--config confdir] COMMAND
   where COMMAND is one of:
  fs   run a generic filesystem user client
  version  print the version
  jar run a jar file
  checknative [-a|-h]  check native hadoop and compression libraries 
availability
  distcp   copy file or directories recursively
  archive -archiveName NAME -p  *  create a hadoop 
archive
  classpathprints the class path needed to get the
  credential   interact with credential providers
   Hadoop jar and the required libraries
  daemonlogget/set the log level for each daemon
  traceview and modify Hadoop tracing settings
 or
  CLASSNAMErun the class named CLASSNAME

Most commands print help when invoked w/o parameters.
Traceback (most recent call last):
  File "/usr/local/hawq/bin/hawqregister", line 398, in 
check_hash_type(dburl, tablename) # Usage1 only support randomly 
distributed table
  File "/usr/local/hawq/bin/hawqregister", line 197, in check_hash_type
logger.error('Table not found in table gp_distribution_policy.' % tablename)
TypeError: not all arguments converted during string formatting
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1044) Verify the correctness of hawq register

2016-09-12 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486031#comment-15486031
 ] 

Lili Ma commented on HAWQ-1044:
---

We need design our test cases to verify hawq register from following aspects:
1. partition table/non-partition table,  
2. format: row-oriented/parquet
3. randomly distributed/hash distributed
4. partition policy, range partition or list partition.

> Verify the correctness of hawq register
> ---
>
> Key: HAWQ-1044
> URL: https://issues.apache.org/jira/browse/HAWQ-1044
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: hongwu
> Fix For: backlog
>
>
> Verify the correctness of hawq register, summary all the use scenarios and 
> design corresponding test cases for it.
> I think following test cases should be added for the HAWQ register.
> 1. Use Case 1: Register file/folder into HAWQ by specifying file/folder name
> a) hawq register -d postgres -f a.file tableA
> b) hawq register -d postgres -f a.file -e eof tableA
> c) hawq register -d postgres -f folderA tableA
> d) register file to existing table. normal path
> e) register file to existing table. error path: to-be-registered files under 
> the file folder for the existing table on HDFS. Should throw error out.
> f) verify wrong input file. The file format not parquet format.
> 2. Use case 2: Register into HAWQ table using .yml configuration file to a 
> non-existing table
> a) Verify normal input:
> create table a(a int, b int);
> insert into a values(generate_series(1,100), 25);
> hawq extract -d postgres -o a.yml a
> hawq register -d postgres -c a.yml b
> b) Modify the fileSize in .yml file to a value which is different from actual 
> data size of data file
> 3. Use Case 2: Regsiter into HAWQ table using .yml configuration file to an 
> existing table
> a) Verify normal path:
> Call multiple times of hawq register, to verify whether can succeed. Each 
> time the to-be-registered files are not under the table directory.
> b) Error path: to-be-registered files under the file folder for the existing 
> table on HDFS
> Should throw error out: not support!
> 4. Use Case 2: Register into HAWQ table using .yml configuration file by 
> specifying --force option
> a) The table not exist: should create a new table, and do the register
> b) The table already exist, but no data there: can directly call hawq register
> c) Table already exist, and already data there -- normal path: .yml 
> configuration file includes the data files under table directory, and 
> just include those data files.
> d) Table already exist, and already data there -- normal path: .yml 
> configuration file includes the data files under table directory, and 
> also includes data files not under table directory.
> e) Table already exist, and already data there -- error path: .yml 
> configuration file doesn't include the data files under that table directory. 
> Should throw error out, "there are already existing files under the table, 
> but not included in .yml configuration file"
> 5. Use Case 2: Register into HAWQ table using .yml configuration file by 
> specifying --repair option
> a) Normal Path 1: (Append to new file)
> create a tableA
> insert some data into tableA
> call hawq extract the metadata to a.yml file
> insert new data into tableA
> call hawq register --repair option to rollback to the state
> b) Normal Path 2: (New files generated)
> Same as Normal Path 1, but during the second insert, use multiple inserts 
> concurrenly aiming at producing new files. Then call hawq register --repair,
> the new files should be discarded.
> c) Error Path: restributed
> Create a table with hash-distributed, distributed by column A
> insert some data into tableA
> call hawq extract the metadata to a.yml file
> alter table redistributed by column B
> insert new data into tableA
> call hawq register --repair option to rollback to the state  
> --> should throw error "the table is redistributed"
> d) Error Path: table being truncated
> Create a table with hash-distributed, distributed by column A
> insert some data into tableA
> call hawq extract the metadata to a.yml file
> truncate tableA
> call hawq register --repair option to rollback to the state  
> --> should throw error "the table becomes smaller than the .yml config file 
> specified."
> e) Error Path: files specified in .yml configuration not under data directory 
> of table A
> --> should throw error "the files should all under the table directory when 
> --repair option specified for hawq register"
> 6. hawq register partition table support
> a) Normal Path: create a 1-level partition table, calling hawq extract and 
> then hawq register, can work
> b) Error Path: create a 2-level partition table, calling hawq extract and 
> 

[jira] [Updated] (HAWQ-1035) support partition table register

2016-09-12 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1035:
--
Description: 
Support partition table register, limited to 1 level partition table, since 
hawq extract only supports 1-level partition table.

Expected behavior:
1. Create a partition table in HAWQ, then extract the information out to .yml 
file
2. Call hawq register and specify identified .yml file and a new table name, 
the files should be registered into the new table.

Work can be detailed down to implement partition table register:
1. modify .yml configuration file parsing function, add content for partition 
table.
2. construct partition table DDL regards to .yml configuration file
3. map sub partition table name to the table list in .yml configuration file
4. register the subpartition table one by one

  was:
Support partition table register, limited to 1 level partition table, since 
hawq extract only supports 1-level partition table.

Expected behavior:
1. Create a partition table in HAWQ, then extract the information out to .yml 
file
2. Call hawq register and specify identified .yml file and a new table name, 
the files should be registered into the new table.

Works can be detailed down to implementation partition table registeration:
1. modify .yml configuration file parsing function, add content for partition 
table.
2. construct partition table DDL regards to .yml configuration file
3. map sub partition table name to the table list in .yml configuration file
4. register the subpartition table one by one


> support partition table register
> 
>
> Key: HAWQ-1035
> URL: https://issues.apache.org/jira/browse/HAWQ-1035
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: hongwu
> Fix For: 2.0.1.0-incubating
>
>
> Support partition table register, limited to 1 level partition table, since 
> hawq extract only supports 1-level partition table.
> Expected behavior:
> 1. Create a partition table in HAWQ, then extract the information out to .yml 
> file
> 2. Call hawq register and specify identified .yml file and a new table name, 
> the files should be registered into the new table.
> Work can be detailed down to implement partition table register:
> 1. modify .yml configuration file parsing function, add content for partition 
> table.
> 2. construct partition table DDL regards to .yml configuration file
> 3. map sub partition table name to the table list in .yml configuration file
> 4. register the subpartition table one by one



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1035) support partition table register

2016-09-12 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1035:
--
Description: 
Support partition table register, limited to 1 level partition table, since 
hawq extract only supports 1-level partition table.

Expected behavior:
1. Create a partition table in HAWQ, then extract the information out to .yml 
file
2. Call hawq register and specify identified .yml file and a new table name, 
the files should be registered into the new table.

Works can be detailed down to implementation partition table registeration:
1. modify .yml configuration file parsing function, add content for partition 
table.
2. construct partition table DDL regards to .yml configuration file
3. map sub partition table name to the table list in .yml configuration file
4. register the subpartition table one by one

  was:Support partitiont table register, limited to 1 level partition table, 
since hawq extract only supports 1-level partition table


> support partition table register
> 
>
> Key: HAWQ-1035
> URL: https://issues.apache.org/jira/browse/HAWQ-1035
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: hongwu
> Fix For: 2.0.1.0-incubating
>
>
> Support partition table register, limited to 1 level partition table, since 
> hawq extract only supports 1-level partition table.
> Expected behavior:
> 1. Create a partition table in HAWQ, then extract the information out to .yml 
> file
> 2. Call hawq register and specify identified .yml file and a new table name, 
> the files should be registered into the new table.
> Works can be detailed down to implementation partition table registeration:
> 1. modify .yml configuration file parsing function, add content for partition 
> table.
> 2. construct partition table DDL regards to .yml configuration file
> 3. map sub partition table name to the table list in .yml configuration file
> 4. register the subpartition table one by one



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1044) Verify the correctness of hawq register

2016-09-12 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1044:
--
Description: 
Verify the correctness of hawq register, summary all the use scenarios and 
design corresponding test cases for it.

I think following test cases should be added for the HAWQ register.
1. Use Case 1: Register file/folder into HAWQ by specifying file/folder name
a) hawq register -d postgres -f a.file tableA
b) hawq register -d postgres -f a.file -e eof tableA
c) hawq register -d postgres -f folderA tableA
d) register file to existing table. normal path
e) register file to existing table. error path: to-be-registered files under 
the file folder for the existing table on HDFS. Should throw error out.
f) verify wrong input file. The file format not parquet format.

2. Use case 2: Register into HAWQ table using .yml configuration file to a 
non-existing table
a) Verify normal input:
create table a(a int, b int);
insert into a values(generate_series(1,100), 25);
hawq extract -d postgres -o a.yml a
hawq register -d postgres -c a.yml b
b) Modify the fileSize in .yml file to a value which is different from actual 
data size of data file

3. Use Case 2: Regsiter into HAWQ table using .yml configuration file to an 
existing table
a) Verify normal path:
Call multiple times of hawq register, to verify whether can succeed. Each time 
the to-be-registered files are not under the table directory.
b) Error path: to-be-registered files under the file folder for the existing 
table on HDFS
Should throw error out: not support!

4. Use Case 2: Register into HAWQ table using .yml configuration file by 
specifying --force option
a) The table not exist: should create a new table, and do the register
b) The table already exist, but no data there: can directly call hawq register
c) Table already exist, and already data there -- normal path: .yml 
configuration file includes the data files under table directory, and 
just include those data files.
d) Table already exist, and already data there -- normal path: .yml 
configuration file includes the data files under table directory, and 
also includes data files not under table directory.
e) Table already exist, and already data there -- error path: .yml 
configuration file doesn't include the data files under that table directory. 
Should throw error out, "there are already existing files under the table, but 
not included in .yml configuration file"

5. Use Case 2: Register into HAWQ table using .yml configuration file by 
specifying --repair option
a) Normal Path 1: (Append to new file)
create a tableA
insert some data into tableA
call hawq extract the metadata to a.yml file
insert new data into tableA
call hawq register --repair option to rollback to the state
b) Normal Path 2: (New files generated)
Same as Normal Path 1, but during the second insert, use multiple inserts 
concurrenly aiming at producing new files. Then call hawq register --repair,
the new files should be discarded.
c) Error Path: restributed
Create a table with hash-distributed, distributed by column A
insert some data into tableA
call hawq extract the metadata to a.yml file
alter table redistributed by column B
insert new data into tableA
call hawq register --repair option to rollback to the state  
--> should throw error "the table is redistributed"
d) Error Path: table being truncated
Create a table with hash-distributed, distributed by column A
insert some data into tableA
call hawq extract the metadata to a.yml file
truncate tableA
call hawq register --repair option to rollback to the state  
--> should throw error "the table becomes smaller than the .yml config file 
specified."
e) Error Path: files specified in .yml configuration not under data directory 
of table A
--> should throw error "the files should all under the table directory when 
--repair option specified for hawq register"

6. hawq register partition table support
a) Normal Path: create a 1-level partition table, calling hawq extract and then 
hawq register, can work
b) Error Path: create a 2-level partition table, calling hawq extract and then 
hawq register, 
--> should throw error "only supports 1-level partition table"

  was:Verify the correctness of hawq register, summary all the use scenarios 
and design corresponding test cases for it.


> Verify the correctness of hawq register
> ---
>
> Key: HAWQ-1044
> URL: https://issues.apache.org/jira/browse/HAWQ-1044
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: Lei Chang
> Fix For: backlog
>
>
> Verify the correctness of hawq register, summary all the use scenarios and 
> design corresponding test cases for it.
> I think following test cases should be added for the HAWQ register.
> 1. Use Case 1: Register 

[jira] [Updated] (HAWQ-1044) Verify the correctness of hawq register

2016-09-12 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1044:
--
Assignee: hongwu  (was: Lei Chang)

> Verify the correctness of hawq register
> ---
>
> Key: HAWQ-1044
> URL: https://issues.apache.org/jira/browse/HAWQ-1044
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: hongwu
> Fix For: backlog
>
>
> Verify the correctness of hawq register, summary all the use scenarios and 
> design corresponding test cases for it.
> I think following test cases should be added for the HAWQ register.
> 1. Use Case 1: Register file/folder into HAWQ by specifying file/folder name
> a) hawq register -d postgres -f a.file tableA
> b) hawq register -d postgres -f a.file -e eof tableA
> c) hawq register -d postgres -f folderA tableA
> d) register file to existing table. normal path
> e) register file to existing table. error path: to-be-registered files under 
> the file folder for the existing table on HDFS. Should throw error out.
> f) verify wrong input file. The file format not parquet format.
> 2. Use case 2: Register into HAWQ table using .yml configuration file to a 
> non-existing table
> a) Verify normal input:
> create table a(a int, b int);
> insert into a values(generate_series(1,100), 25);
> hawq extract -d postgres -o a.yml a
> hawq register -d postgres -c a.yml b
> b) Modify the fileSize in .yml file to a value which is different from actual 
> data size of data file
> 3. Use Case 2: Regsiter into HAWQ table using .yml configuration file to an 
> existing table
> a) Verify normal path:
> Call multiple times of hawq register, to verify whether can succeed. Each 
> time the to-be-registered files are not under the table directory.
> b) Error path: to-be-registered files under the file folder for the existing 
> table on HDFS
> Should throw error out: not support!
> 4. Use Case 2: Register into HAWQ table using .yml configuration file by 
> specifying --force option
> a) The table not exist: should create a new table, and do the register
> b) The table already exist, but no data there: can directly call hawq register
> c) Table already exist, and already data there -- normal path: .yml 
> configuration file includes the data files under table directory, and 
> just include those data files.
> d) Table already exist, and already data there -- normal path: .yml 
> configuration file includes the data files under table directory, and 
> also includes data files not under table directory.
> e) Table already exist, and already data there -- error path: .yml 
> configuration file doesn't include the data files under that table directory. 
> Should throw error out, "there are already existing files under the table, 
> but not included in .yml configuration file"
> 5. Use Case 2: Register into HAWQ table using .yml configuration file by 
> specifying --repair option
> a) Normal Path 1: (Append to new file)
> create a tableA
> insert some data into tableA
> call hawq extract the metadata to a.yml file
> insert new data into tableA
> call hawq register --repair option to rollback to the state
> b) Normal Path 2: (New files generated)
> Same as Normal Path 1, but during the second insert, use multiple inserts 
> concurrenly aiming at producing new files. Then call hawq register --repair,
> the new files should be discarded.
> c) Error Path: restributed
> Create a table with hash-distributed, distributed by column A
> insert some data into tableA
> call hawq extract the metadata to a.yml file
> alter table redistributed by column B
> insert new data into tableA
> call hawq register --repair option to rollback to the state  
> --> should throw error "the table is redistributed"
> d) Error Path: table being truncated
> Create a table with hash-distributed, distributed by column A
> insert some data into tableA
> call hawq extract the metadata to a.yml file
> truncate tableA
> call hawq register --repair option to rollback to the state  
> --> should throw error "the table becomes smaller than the .yml config file 
> specified."
> e) Error Path: files specified in .yml configuration not under data directory 
> of table A
> --> should throw error "the files should all under the table directory when 
> --repair option specified for hawq register"
> 6. hawq register partition table support
> a) Normal Path: create a 1-level partition table, calling hawq extract and 
> then hawq register, can work
> b) Error Path: create a 2-level partition table, calling hawq extract and 
> then hawq register, 
> --> should throw error "only supports 1-level partition table"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1044) Verify the correctness of hawq register

2016-09-12 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1044:
-

 Summary: Verify the correctness of hawq register
 Key: HAWQ-1044
 URL: https://issues.apache.org/jira/browse/HAWQ-1044
 Project: Apache HAWQ
  Issue Type: Sub-task
Reporter: Lili Ma
Assignee: Lei Chang


Verify the correctness of hawq register, summary all the use scenarios and 
design corresponding test cases for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1032) Bucket number of newly added partition is not consistent with parent table.

2016-09-05 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1032:
--
Description: 
Failure Case
{code}
set deafult_hash_table_bucket_number = 12;
CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) DISTRIBUTED 
BY (id)   PARTITION BY 
RANGE (date) ( START (date 
'2008-01-01') INCLUSIVEEND (date 
'2009-01-01') EXCLUSIVE EVERY 
(INTERVAL '1 day') );

set default_hash_table_bucket_number = 16;
ALTER TABLE sales3 ADD PARTITION   START (date 
'2009-03-01') INCLUSIVE   END (date 
'2009-04-01') EXCLUSIVE;
{code}

The newly added partition with buckcet number 16 is not consistent with parent 
partition.

  was:
Failure Case
{code}
set deafult_hash_table_bucket_number = 12;
CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) DISTRIBUTED 
BY (id)   PARTITION BY 
RANGE (date) ( START (date 
'2008-01-01') INCLUSIVEEND (date 
'2009-01-01') EXCLUSIVE EVERY 
(INTERVAL '1 day') );

set deafult_hash_table_bucket_number = 16;
ALTER TABLE sales3 ADD PARTITION   START (date 
'2009-03-01') INCLUSIVE   END (date 
'2009-04-01') EXCLUSIVE;
{code}

The newly added partition with buckcet number 16 is not consistent with parent 
partition.


> Bucket number of newly added partition is not consistent with parent table.
> ---
>
> Key: HAWQ-1032
> URL: https://issues.apache.org/jira/browse/HAWQ-1032
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Reporter: Hubert Zhang
>Assignee: Hubert Zhang
> Fix For: 2.0.1.0-incubating
>
>
> Failure Case
> {code}
> set deafult_hash_table_bucket_number = 12;
> CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) 
> DISTRIBUTED BY (id)   
> PARTITION BY RANGE (date) 
> ( START (date '2008-01-01') INCLUSIVE 
>END (date '2009-01-01') EXCLUSIVE  
>EVERY (INTERVAL '1 day') );
> set default_hash_table_bucket_number = 16;
> ALTER TABLE sales3 ADD PARTITION   START 
> (date '2009-03-01') INCLUSIVE   END 
> (date '2009-04-01') EXCLUSIVE;
> {code}
> The newly added partition with buckcet number 16 is not consistent with 
> parent partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1004) Implement calling Ranger REST Service -- use mock server

2016-09-04 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1004:
--
Summary: Implement calling Ranger REST Service -- use mock server  (was: 
Decide How HAWQ connect Ranger, through which user, how to connect to REST 
Server)

> Implement calling Ranger REST Service -- use mock server
> 
>
> Key: HAWQ-1004
> URL: https://issues.apache.org/jira/browse/HAWQ-1004
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Lili Ma
>Assignee: Lin Wen
> Fix For: backlog
>
>
> Decide How HAWQ connect Ranger, through which user, how to connect to REST 
> Server
> Acceptance Criteria: 
> Provide an interface for HAWQ connecting Ranger REST Server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger

2016-08-31 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454137#comment-15454137
 ] 

Lili Ma commented on HAWQ-256:
--

[~thebellhead]  From technical view, we can restrict HAWQSuperUser privilege in 
Ranger definitely. 

But, if we restrict that, HAWQ superuser behavior changes. I think this needs 
careful discussion, and it's out of the scope of this JIRA. Right?  Anyway, if 
everyone agrees to remove the superuser privileges, we can implement that 
function. Thanks

> Integrate Security with Apache Ranger
> -
>
> Key: HAWQ-256
> URL: https://issues.apache.org/jira/browse/HAWQ-256
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF, Security
>Reporter: Michael Andre Pearce (IG)
>Assignee: Lili Ma
> Fix For: backlog
>
> Attachments: HAWQRangerSupportDesign.pdf, 
> HAWQRangerSupportDesign_v0.2.pdf
>
>
> Integrate security with Apache Ranger for a unified Hadoop security solution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAWQ-256) Integrate Security with Apache Ranger

2016-08-31 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451558#comment-15451558
 ] 

Lili Ma edited comment on HAWQ-256 at 8/31/16 8:24 AM:
---

[~thebellhead], quit good questions!

1. In order for tools, syntax checking, etc to work everyone (the HAWQ public 
role) requires access to the catalog and some of the toolkit. Will Ranger-only 
access control apply only to user created tables, views and external tables?
Yes, since the catalog tables and toolkits are shared and used by various 
users, Ranger-only access control just applies to user defined objects.  But 
the objects include not only database, table and view, but also include 
function, language, schema, tablespace and protocol. You can find the detailed 
objects and privileges in the design doc. I have reviewed your proposal in 
HAWQ-1036, could you share what's your handing for the objects which don't lie 
in HDFS layer such as function, schema, language, etc?

2. If so - will gpadmin and any other HAWQ-defined roles not have access to the 
data in Ranger managed tables?
Just as you mentioned, HAWQ uses gpadmin identity to create files on HDFS, say, 
when a specified userA creates a table in HAWQ, the HDFS files for the table 
are created by gpadmin instead of userA. Since Ranger lies in Hadoop 
eco-system, it usually needs to control both HAWQ and HDFS, I think we need 
assign gpadmin to the full privileges of hawq data file directory on HDFS in 
Ranger UI previously. 

About your concern about the superuser can see all the users' data, I think 
it's kind of like the "root" role in operation system?  If the users have 
concerns about the DBA/Superuser's unlimited access, I totally agree with you 
about the solution of "passing down user-identifiy" for solving this problem :)

3. How would this be extended for the hcatalog virtual database in HAWQ? Could 
the Ranger permissions for the underlying store (for instance Hive) be read and 
enforced/reported at the HAWQ level?
If HAWQ keeps the gpadmin for operating HDFS or external storage, I think we 
just need grant the privilege to superuser. But if we have implemented the 
user-identity passing down, say, the data files on HDFS for a table created by 
userA are owned by userA instead of gpadmin, in this way we need to double 
connect to Ranger, from HAWQ and HDFS respectively.  I haven't include the 
underlying store privileges check into HAWQ side, that may need multiple code 
changes. I think keeping the privileges in the component is another choice. 
Your thoughts?

Thanks
Lili



was (Author: lilima):
[~thebellhead], quit good questions!

1. In order for tools, syntax checking, etc to work everyone (the HAWQ public 
role) requires access to the catalog and some of the toolkit. Will Ranger-only 
access control apply only to user created tables, views and external tables?
Yes, since the catalog tables and toolkits are shared and used by various 
users, Ranger-only access control just applies to user defined objects.  But 
the objects include not only database, table and view, but also include 
function, language, schema, tablespace and protocol. You can find the detailed 
objects and privileges in the design doc.

2. If so - will gpadmin and any other HAWQ-defined roles not have access to the 
data in Ranger managed tables?
Just as you mentioned, HAWQ uses gpadmin identity to create files on HDFS, say, 
when a specified userA creates a table in HAWQ, the HDFS files for the table 
are created by gpadmin instead of userA. Since Ranger lies in Hadoop 
eco-system, it usually needs to control both HAWQ and HDFS, I think we need 
assign gpadmin to the full privileges of hawq data file directory on HDFS in 
Ranger UI previously. 

About your concern about the superuser can see all the users' data, I think 
it's kind of like the "root" role in operation system?  If the users have 
concerns about the DBA/Superuser's unlimited access, I totally agree with you 
about the solution of "passing down user-identifiy" for solving this problem :)

3. How would this be extended for the hcatalog virtual database in HAWQ? Could 
the Ranger permissions for the underlying store (for instance Hive) be read and 
enforced/reported at the HAWQ level?
If HAWQ keeps the gpadmin for operating HDFS or external storage, I think we 
just need grant the privilege to superuser. But if we have implemented the 
user-identity passing down, say, the data files on HDFS for a table created by 
userA are owned by userA instead of gpadmin, in this way we need to double 
connect to Ranger, from HAWQ and HDFS respectively.  I haven't include the 
underlying store privileges check into HAWQ side, that may need multiple code 
changes. I think keeping the privileges in the component is another choice. 
Your thoughts?

Thanks
Lili


> Integrate Security with Apache Ranger
> -
>
>  

[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger

2016-08-31 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451558#comment-15451558
 ] 

Lili Ma commented on HAWQ-256:
--

[~thebellhead], quit good questions!

1. In order for tools, syntax checking, etc to work everyone (the HAWQ public 
role) requires access to the catalog and some of the toolkit. Will Ranger-only 
access control apply only to user created tables, views and external tables?
Yes, since the catalog tables and toolkits are shared and used by various 
users, Ranger-only access control just applies to user defined objects.  But 
the objects include not only database, table and view, but also include 
function, language, schema, tablespace and protocol. You can find the detailed 
objects and privileges in the design doc.

2. If so - will gpadmin and any other HAWQ-defined roles not have access to the 
data in Ranger managed tables?
Just as you mentioned, HAWQ uses gpadmin identity to create files on HDFS, say, 
when a specified userA creates a table in HAWQ, the HDFS files for the table 
are created by gpadmin instead of userA. Since Ranger lies in Hadoop 
eco-system, it usually needs to control both HAWQ and HDFS, I think we need 
assign gpadmin to the full privileges of hawq data file directory on HDFS in 
Ranger UI previously. 

About your concern about the superuser can see all the users' data, I think 
it's kind of like the "root" role in operation system?  If the users have 
concerns about the DBA/Superuser's unlimited access, I totally agree with you 
about the solution of "passing down user-identifiy" for solving this problem :)

3. How would this be extended for the hcatalog virtual database in HAWQ? Could 
the Ranger permissions for the underlying store (for instance Hive) be read and 
enforced/reported at the HAWQ level?
If HAWQ keeps the gpadmin for operating HDFS or external storage, I think we 
just need grant the privilege to superuser. But if we have implemented the 
user-identity passing down, say, the data files on HDFS for a table created by 
userA are owned by userA instead of gpadmin, in this way we need to double 
connect to Ranger, from HAWQ and HDFS respectively.  I haven't include the 
underlying store privileges check into HAWQ side, that may need multiple code 
changes. I think keeping the privileges in the component is another choice. 
Your thoughts?

Thanks
Lili


> Integrate Security with Apache Ranger
> -
>
> Key: HAWQ-256
> URL: https://issues.apache.org/jira/browse/HAWQ-256
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF, Security
>Reporter: Michael Andre Pearce (IG)
>Assignee: Lili Ma
> Fix For: backlog
>
> Attachments: HAWQRangerSupportDesign.pdf, 
> HAWQRangerSupportDesign_v0.2.pdf
>
>
> Integrate security with Apache Ranger for a unified Hadoop security solution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1034) add --repair option for hawq register

2016-08-30 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1034:
--
Description: 
add --repair option for hawq register

Will change both file folder and catalog table pg_aoseg.pg_paqseg_$relid to the 
state which .yml file configures. Note may some new generated files since the 
checkpoint may be deleted here. Also note the all the files in .yml file should 
all under the table folder on HDFS. Limitation: Do not support cases for hash 
table redistribution, table truncate and table drop. This is for scenario 
rollback of table: Do checkpoints somewhere, and need to rollback to previous 
checkpoint. 

  was:add --repair option for hawq register


> add --repair option for hawq register
> -
>
> Key: HAWQ-1034
> URL: https://issues.apache.org/jira/browse/HAWQ-1034
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: Lei Chang
> Fix For: 2.0.1.0-incubating
>
>
> add --repair option for hawq register
> Will change both file folder and catalog table pg_aoseg.pg_paqseg_$relid to 
> the state which .yml file configures. Note may some new generated files since 
> the checkpoint may be deleted here. Also note the all the files in .yml file 
> should all under the table folder on HDFS. Limitation: Do not support cases 
> for hash table redistribution, table truncate and table drop. This is for 
> scenario rollback of table: Do checkpoints somewhere, and need to rollback to 
> previous checkpoint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1033) add --force option for hawq register

2016-08-30 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1033:
--
Description: 
add --force option for hawq register

Will clear all the catalog contents in pg_aoseg.pg_paqseg_$relid while keep the 
files on HDFS, and then re-register all the files to the table.  This is for 
scenario cluster Disaster Recovery: Two clusters co-exist, periodically import 
data from Cluster A to Cluster B. Need Register data to Cluster B.


  was:
add --force option for hawq register

Will clear all the catalog contents in pg_aoseg.pg_paqseg_$relid while keep the 
files on HDFS, and then re-register all the files to the table.  This is for 
scenario 2.



> add --force option for hawq register
> 
>
> Key: HAWQ-1033
> URL: https://issues.apache.org/jira/browse/HAWQ-1033
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: Lei Chang
> Fix For: 2.0.1.0-incubating
>
>
> add --force option for hawq register
> Will clear all the catalog contents in pg_aoseg.pg_paqseg_$relid while keep 
> the files on HDFS, and then re-register all the files to the table.  This is 
> for scenario cluster Disaster Recovery: Two clusters co-exist, periodically 
> import data from Cluster A to Cluster B. Need Register data to Cluster B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1035) support partition table register

2016-08-30 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1035:
-

 Summary: support partition table register
 Key: HAWQ-1035
 URL: https://issues.apache.org/jira/browse/HAWQ-1035
 Project: Apache HAWQ
  Issue Type: Sub-task
  Components: Command Line Tools
Reporter: Lili Ma
Assignee: Lei Chang


Support partitiont table register, limited to 1 level partition table, since 
hawq extract only supports 1-level partition table



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1034) add --repair option for hawq register

2016-08-30 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1034:
-

 Summary: add --repair option for hawq register
 Key: HAWQ-1034
 URL: https://issues.apache.org/jira/browse/HAWQ-1034
 Project: Apache HAWQ
  Issue Type: Sub-task
  Components: Command Line Tools
Reporter: Lili Ma
Assignee: Lei Chang


add --repair option for hawq register



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1033) add --force option for hawq register

2016-08-30 Thread Lili Ma (JIRA)
Lili Ma created HAWQ-1033:
-

 Summary: add --force option for hawq register
 Key: HAWQ-1033
 URL: https://issues.apache.org/jira/browse/HAWQ-1033
 Project: Apache HAWQ
  Issue Type: Sub-task
  Components: Command Line Tools
Reporter: Lili Ma
Assignee: Lei Chang


add --force option for hawq register




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (HAWQ-1024) Rollback if hawq register failed in process

2016-08-30 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma closed HAWQ-1024.
-
Resolution: Invalid

> Rollback if hawq register failed in process
> ---
>
> Key: HAWQ-1024
> URL: https://issues.apache.org/jira/browse/HAWQ-1024
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: hongwu
>Assignee: hongwu
> Fix For: 2.0.1.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1025) Modify the content of yml file, and change hawq register implementation for the modification

2016-08-30 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1025:
--
Description: 
1. Add bucket number for hash-distributed table in yml file, when hawq 
register, ensure the number of files be multiple times of the bucket number
2. hawq register should use the file size information in yml file to update the 
catalog table pg_aoseg.pg_paqseg_$relid
3. hawq register processing steps:
   a. create table
   b. mv all the files
   c. change the catalog table once.


> Modify the content of yml file, and change hawq register implementation for 
> the modification
> 
>
> Key: HAWQ-1025
> URL: https://issues.apache.org/jira/browse/HAWQ-1025
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: hongwu
>Assignee: Lili Ma
> Fix For: 2.0.1.0-incubating
>
>
> 1. Add bucket number for hash-distributed table in yml file, when hawq 
> register, ensure the number of files be multiple times of the bucket number
> 2. hawq register should use the file size information in yml file to update 
> the catalog table pg_aoseg.pg_paqseg_$relid
> 3. hawq register processing steps:
>a. create table
>b. mv all the files
>c. change the catalog table once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1025) Modify the content of yml file, and change hawq register implementation for the modification

2016-08-30 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1025:
--
Description: 
1. Add bucket number for hash-distributed table in yml file, when hawq 
register, ensure the number of files be multiple times of the bucket number
2. hawq register should use the file size information in yml file to update the 
catalog table pg_aoseg.pg_paqseg_$relid
3. hawq register processing steps:
   a. create table
   b. mv all the files
   c. change the catalog table once.


  was:
1. Add bucket number for hash-distributed table in yml file, when hawq 
register, ensure the number of files be multiple times of the bucket number
2. hawq register should use the file size information in yml file to update the 
catalog table pg_aoseg.pg_paqseg_$relid
3. hawq register processing steps:
   a. create table
   b. mv all the files
   c. change the catalog table once.



> Modify the content of yml file, and change hawq register implementation for 
> the modification
> 
>
> Key: HAWQ-1025
> URL: https://issues.apache.org/jira/browse/HAWQ-1025
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: hongwu
>Assignee: Lili Ma
> Fix For: 2.0.1.0-incubating
>
>
> 1. Add bucket number for hash-distributed table in yml file, when hawq 
> register, ensure the number of files be multiple times of the bucket number
> 2. hawq register should use the file size information in yml file to update 
> the catalog table pg_aoseg.pg_paqseg_$relid
> 3. hawq register processing steps:
>a. create table
>b. mv all the files
>c. change the catalog table once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1025) Modify the content of yml file, and change hawq register implementation for the modification

2016-08-30 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma updated HAWQ-1025:
--
Summary: Modify the content of yml file, and change hawq register 
implementation for the modification  (was: Check the consistency of 
AO/Parquet_FileLocations.Files.size attribute in extracted yaml file and the 
actual file size in HDFS.)

> Modify the content of yml file, and change hawq register implementation for 
> the modification
> 
>
> Key: HAWQ-1025
> URL: https://issues.apache.org/jira/browse/HAWQ-1025
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: hongwu
>Assignee: Lili Ma
> Fix For: 2.0.1.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HAWQ-1025) Check the consistency of AO/Parquet_FileLocations.Files.size attribute in extracted yaml file and the actual file size in HDFS.

2016-08-30 Thread Lili Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lili Ma reassigned HAWQ-1025:
-

Assignee: Lili Ma  (was: hongwu)

> Check the consistency of AO/Parquet_FileLocations.Files.size attribute in 
> extracted yaml file and the actual file size in HDFS.
> ---
>
> Key: HAWQ-1025
> URL: https://issues.apache.org/jira/browse/HAWQ-1025
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: hongwu
>Assignee: Lili Ma
> Fix For: 2.0.1.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >