[jira] [Created] (HAWQ-1441) Implement SSL Access from RPS to Ranger
Lili Ma created HAWQ-1441: - Summary: Implement SSL Access from RPS to Ranger Key: HAWQ-1441 URL: https://issues.apache.org/jira/browse/HAWQ-1441 Project: Apache HAWQ Issue Type: Sub-task Components: Security Reporter: Lili Ma Assignee: Ed Espino SSL connection from Ranger plugin to Ranger is a way to ensure the security of data transferred between Ranger to Plugin Service. So we need to implement SSL support in RPS connection to Ranger. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HAWQ-1436) Implement RPS High availability on HAWQ
[ https://issues.apache.org/jira/browse/HAWQ-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984355#comment-15984355 ] Lili Ma edited comment on HAWQ-1436 at 4/26/17 8:20 AM: I suggest we do solution 1) to simply the implementation as the first step. [~Paul Guo] Since RPS will only be called by master and a query usually only raises several RPS request, there won't be a lot of requests to RPS. So I think load balancer may be over-design. For services which support a lot of concurrent requests, a proxy server at the front of multiple web Service is an ideal design, since it will be convenient for both HA and load balance. [~lei_chang] It's a good suggestion for auto-discover RPS failure and auto-restart, but I think RPS is a little different from Resource Manager process. We need add special processing for it since it's a Web service. Do you have any suggestion on the detailed implementation? was (Author: lilima): I suggest we do solution 1) to simply the implementation as the first step. [~Paul Guo] Since RPS will only be called by master and a query usually only raises several RPS request, there won't be a lot of requests to RPS. So I think load balancer may be over-design. For services which support a lot of concurrent requests, a proxy server at the front of multiple web Service is an ideal design, since it will be convenient for both HA and load balance. [~lei_chang] It's a good suggestion for auto-discover RPS failure and auto-restart, but I think RPS is a little different from Resource Manager process. We need add special processing for it since it's a Web service. Do you have any suggestion on the detailed implementation? > Implement RPS High availability on HAWQ > --- > > Key: HAWQ-1436 > URL: https://issues.apache.org/jira/browse/HAWQ-1436 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Hongxu Ma >Assignee: Hongxu Ma > Fix For: backlog > > Attachments: RPSHADesign_v0.1.pdf > > > Once Ranger is configured, HAWQ will rely on RPS to connect to Ranger. A > single point RPS may influence the robustness of HAWQ. > Thus We need to investigate and design out the way to implement RPS High > availability. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HAWQ-1436) Implement RPS High availability on HAWQ
[ https://issues.apache.org/jira/browse/HAWQ-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984355#comment-15984355 ] Lili Ma edited comment on HAWQ-1436 at 4/26/17 8:20 AM: I suggest we do solution 1) to simply the implementation as the first step. [~Paul Guo] Since RPS will only be called by master and a query usually only raises several RPS request, there won't be a lot of requests to RPS. So I think load balancer may be over-design. For services which support a lot of concurrent requests, a proxy server at the front of multiple web Service is an ideal design, since it will be convenient for both HA and load balance. [~lei_chang] It's a good suggestion for auto-discover RPS failure and auto-restart, but I think RPS is a little different from Resource Manager process. We need add special processing for it since it's a Web service. Do you have any suggestion on the detailed implementation? was (Author: lilima): I suggest we do solution 1) to simply the implementation as the first step. Since RPS will only be called by master and a query usually only raises several RPS request, there won't be a lot of requests to RPS. So I think load balancer may be over-design. For services which support a lot of concurrent requests, a proxy server at the front of multiple web Service is an ideal design, since it will be convenient for both HA and load balance. > Implement RPS High availability on HAWQ > --- > > Key: HAWQ-1436 > URL: https://issues.apache.org/jira/browse/HAWQ-1436 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Hongxu Ma >Assignee: Hongxu Ma > Fix For: backlog > > Attachments: RPSHADesign_v0.1.pdf > > > Once Ranger is configured, HAWQ will rely on RPS to connect to Ranger. A > single point RPS may influence the robustness of HAWQ. > Thus We need to investigate and design out the way to implement RPS High > availability. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAWQ-1436) Implement RPS High availability on HAWQ
[ https://issues.apache.org/jira/browse/HAWQ-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984355#comment-15984355 ] Lili Ma commented on HAWQ-1436: --- I suggest we do solution 1) to simply the implementation as the first step. Since RPS will only be called by master and a query usually only raises several RPS request, there won't be a lot of requests to RPS. So I think load balancer may be over-design. For services which support a lot of concurrent requests, a proxy server at the front of multiple web Service is an ideal design, since it will be convenient for both HA and load balance. > Implement RPS High availability on HAWQ > --- > > Key: HAWQ-1436 > URL: https://issues.apache.org/jira/browse/HAWQ-1436 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Hongxu Ma >Assignee: Hongxu Ma > Fix For: backlog > > Attachments: RPSHADesign_v0.1.pdf > > > Once Ranger is configured, HAWQ will rely on RPS to connect to Ranger. A > single point RPS may influence the robustness of HAWQ. > Thus We need to investigate and design out the way to implement RPS High > availability. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HAWQ-1428) Table name pg_aoseg_$relfilenode does not change after running truncate command
Lili Ma created HAWQ-1428: - Summary: Table name pg_aoseg_$relfilenode does not change after running truncate command Key: HAWQ-1428 URL: https://issues.apache.org/jira/browse/HAWQ-1428 Project: Apache HAWQ Issue Type: Bug Components: Core Reporter: Lili Ma Assignee: Ed Espino The table pg_aoseg.pg_aoseg(paqseg)_$relfilenode describes the information of file stored on HDFS for AO table and Parquet table. To make users easily find this catalog table, the suffix should equal the relfilenode for this table. After running truncate command, the relfilenode field for this table changed, but pg_aoseg_$ table name was not changed. Reproduce Steps: {code} postgres=# create table a(a int); CREATE TABLE postgres=# insert into a values(51); INSERT 0 1 postgres=# select oid, * from pg_class where relname='a'; oid | relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relaosegrelid | relaosegidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl |reloptions ---+-+--+-+--+---+-+---+--+---+---+---+---+---+-+-+-++--+---+-+--+--+-+++-++--++--- 61269 | a | 2200 | 61270 | 10 | 0 | 61269 | 0 |1 | 1 | 0 | 0 | 0 | 0 | f | f | r | a |1 | 0 | 0 |0 |0 | 0 | f | f | f | f |16214 || {appendonly=true} (1 row) postgres=# select oid, * from pg_class, pg_appendonly where pg_appendonly.relid=61269 and pg_appendonly.segrelid=pg_class.oid; oid |relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relaosegrelid | relaosegidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl | reloptions | relid | blocksize | safefswritesize | compresslevel | majorversion | minorversion | checksum | compresstype | columnstore | segrelid | segidxid | blkdirrelid | blkdiridxid | version | pagesize | splitsize ---++--+-+--+---+-+---+--+---+---+---+---+---+-+-+-++--+---+-+--+--+-+++-++--+++---+---+-+---+--+--+--+--+-+--+--+-+-+-+--+--- 61271 | pg_aoseg_61269 | 6104 | 61272 | 10 | 0 | 61271 | 0 |0 | 0 | 0 | 0 | 0 | 0 | t | f | o | h |5 | 0 | 0 |0 |0 | 0 | f | t | f | f |16214 || | 61269 | 32768 | 0 | 0 |2 | 0 | f| | f |61271 |61273 | 0 | 0 | 2 |0 | 67108864 (1 row) postgres=# truncate a; TRUNCATE TABLE postgres=# select oid, * from pg_class where relname='a'; oid | relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relaosegrelid | relaosegidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl |reloptions
[jira] [Resolved] (HAWQ-1426) hawq extract meets error after the table was reorganized.
[ https://issues.apache.org/jira/browse/HAWQ-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma resolved HAWQ-1426. --- Resolution: Fixed Have committed the bug fix. > hawq extract meets error after the table was reorganized. > - > > Key: HAWQ-1426 > URL: https://issues.apache.org/jira/browse/HAWQ-1426 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Reporter: Lili Ma >Assignee: Chunling Wang > Fix For: 2.3.0.0-incubating > > > After one table is reorganized, hawq extract the table will meet error. > Reproduce Steps: > 1. create an AO table > 2. insert into several records into it > 3. Get the table reorganized. "alter table a set with (reorganize=true);" > 4. run hawq extract, error thrown out. > For the bug fix, we should also guarantee that hawq extract should work if > the table is truncated and re-inserted. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAWQ-1426) hawq extract meets error after the table was reorganized.
[ https://issues.apache.org/jira/browse/HAWQ-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960282#comment-15960282 ] Lili Ma commented on HAWQ-1426: --- RCA: When hawq extract tries to find the HDFS files information, it wrongly treated the pg_aoseg.pg_aoseg_$relid as the catalog table for storing those information. When determining the file path of a table, hawq extract should follow below steps: 1. Find the directory on HDFS which stores the actual data for the table. This can be achieved by following the column "relfilenode" in pg_class table. 2. Find the detailed file name for the table under above directory. This can be achieved by searching the catalog table pg_aoseg.pg_aoseg(paqseg)_$. The table name suffix is neither $relid nor $relfilenode under some circumstances. We should get it by referring the column "segrelid" in catalog table "pg_appendonly", and then looking up the table "pg_class" to get the accurate table name. > hawq extract meets error after the table was reorganized. > - > > Key: HAWQ-1426 > URL: https://issues.apache.org/jira/browse/HAWQ-1426 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Reporter: Lili Ma >Assignee: Ed Espino > Fix For: 2.3.0.0-incubating > > > After one table is reorganized, hawq extract the table will meet error. > Reproduce Steps: > 1. create an AO table > 2. insert into several records into it > 3. Get the table reorganized. "alter table a set with (reorganize=true);" > 4. run hawq extract, error thrown out. > For the bug fix, we should also guarantee that hawq extract should work if > the table is truncated and re-inserted. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Reopened] (HAWQ-1418) Print executing command for hawq register
[ https://issues.apache.org/jira/browse/HAWQ-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma reopened HAWQ-1418: --- > Print executing command for hawq register > - > > Key: HAWQ-1418 > URL: https://issues.apache.org/jira/browse/HAWQ-1418 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Chunling Wang >Assignee: Chunling Wang > Fix For: backlog > > > Print executing command for hawq register -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAWQ-1418) Print executing command for hawq register
[ https://issues.apache.org/jira/browse/HAWQ-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955459#comment-15955459 ] Lili Ma commented on HAWQ-1418: --- The aim for this JIRA is printing out the detailed command which the util is running, so that it will be easier to analyze the output logs of hawq register, especially during concurrent call of hawq register. > Print executing command for hawq register > - > > Key: HAWQ-1418 > URL: https://issues.apache.org/jira/browse/HAWQ-1418 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Chunling Wang >Assignee: Chunling Wang > Fix For: backlog > > > Print executing command for hawq register -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAWQ-1406) Update HAWQ product version strings to 2.2.0.0 (HAWQ/HAWQ Ambari Plugin) & 3.2.0.0 (PXF)
[ https://issues.apache.org/jira/browse/HAWQ-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944834#comment-15944834 ] Lili Ma commented on HAWQ-1406: --- I think we should keep both the two versions align with HAWQ version. At this time we should change it to 2.2.0.0. [~adenisso] Could you please verify? Also PXF version should be 3.2.1.0. > Update HAWQ product version strings to 2.2.0.0 (HAWQ/HAWQ Ambari Plugin) & > 3.2.0.0 (PXF) > > > Key: HAWQ-1406 > URL: https://issues.apache.org/jira/browse/HAWQ-1406 > Project: Apache HAWQ > Issue Type: Task > Components: Build >Affects Versions: 2.1.0.0-incubating >Reporter: Ruilong Huo >Assignee: Ruilong Huo > Fix For: 2.2.0.0-incubating > > > Need to update the HAWQ (2.2.0.0), HAWQ Ambari Plugin (2.2.0.0) and PXF > (3.2.0.0) versions so that we can clearly identify Apache HAWQ > 2.2.0.0-incubating artifacts. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HAWQ-1397) Incorrect Message for judging Flex version in the period of configure.
Lili Ma created HAWQ-1397: - Summary: Incorrect Message for judging Flex version in the period of configure. Key: HAWQ-1397 URL: https://issues.apache.org/jira/browse/HAWQ-1397 Project: Apache HAWQ Issue Type: Bug Components: Build Reporter: Lili Ma Assignee: Ed Espino I have flex with version 2.6.0 and 2.5.35 in my local environment, and the default if 2.6.0. When I ran ./configure in HAWQ, the configure log indicates that HAWQ requires Flex version 2.5.4 or later, while my version is 2.6.0. It should not throw this error and should user version 2.6.0. {code} 470 configure:7467: checking for flex 471 configure:7498: WARNING: 472 *** The installed version of Flex, /usr/local/bin/flex, is too old to use with Greenplum DB. 473 *** Flex version 2.5.4 or later is required, but this is flex 2.6.0. 474 configure:7512: result: /usr/bin/flex 475 configure:7532: using flex 2.5.35 Apple(flex-31) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAWQ-760) Hawq register
[ https://issues.apache.org/jira/browse/HAWQ-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891592#comment-15891592 ] Lili Ma commented on HAWQ-760: -- [~kdunn926] HAWQ register doesn't check HAWQ version number. Although HAWQ 2.X optimized the storage for AO format table, it can still read the AO file generated by HAWQ 1.X. Parquet file does not changed, so there won't be problem. So, I don't think you will encounter problem if you want to register table from HAWQ 1.X to HAWQ 2.X. If you want to register Parquet files generated by other products such as Hive, Impala which may use a later version, hawq register don't throw error when register. But you may meet some error thrown out when select from the registered table. For example, if some data page is encoded with dictionary encoding, HAWQ will throw error out indicating that it can not process that. > Hawq register > - > > Key: HAWQ-760 > URL: https://issues.apache.org/jira/browse/HAWQ-760 > Project: Apache HAWQ > Issue Type: New Feature > Components: Command Line Tools >Reporter: Yangcheng Luo >Assignee: Lili Ma > Fix For: backlog > > > Scenario: > 1. Register a parquet file generated by other systems, such as Hive, Spark, > etc. > 2. For cluster Disaster Recovery. Two clusters co-exist, periodically import > data from Cluster A to Cluster B. Need Register data to Cluster B. > 3. For the rollback of table. Do checkpoints somewhere, and need to rollback > to previous checkpoint. > Usage1 > Description > Register a file/folder to an existing table. Can register a file or a folder. > If we register a file, can specify eof of this file. If eof not specified, > directly use actual file size. If we register a folder, directly use actual > file size. > hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f > filepath] [-e eof] > Usage 2 > Description > Register according to .yml configuration file. > hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c > config] [--force][--repair] > Behavior: > 1. If table doesn't exist, will automatically create the table and register > the files in .yml configuration file. Will use the filesize specified in .yml > to update the catalog table. > 2. If table already exist, and neither --force nor --repair configured. Do > not create any table, and directly register the files specified in .yml file > to the table. Note that if the file is under table directory in HDFS, will > throw error, say, to-be-registered files should not under the table path. > 3. If table already exist, and --force is specified. Will clear all the > catalog contents in pg_aoseg.pg_paqseg_$relid while keep the files on HDFS, > and then re-register all the files to the table. This is for scenario 2. > 4. If table already exist, and --repair is specified. Will change both file > folder and catalog table pg_aoseg.pg_paqseg_$relid to the state which .yml > file configures. Note may some new generated files since the checkpoint may > be deleted here. Also note the all the files in .yml file should all under > the table folder on HDFS. Limitation: Do not support cases for hash table > redistribution, table truncate and table drop. This is for scenario 3. > Requirements for both the cases: > 1. To be registered file path has to colocate with HAWQ in the same HDFS > cluster. > 2. If to be registered is a hash table, the registered file number should be > one or multiple times or hash table bucket number. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HAWQ-1366) HAWQ should throw error if finding dictionary encoding type for Parquet
[ https://issues.apache.org/jira/browse/HAWQ-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma resolved HAWQ-1366. --- Resolution: Fixed Assignee: Lili Ma (was: Ed Espino) > HAWQ should throw error if finding dictionary encoding type for Parquet > --- > > Key: HAWQ-1366 > URL: https://issues.apache.org/jira/browse/HAWQ-1366 > Project: Apache HAWQ > Issue Type: Bug > Components: Storage >Reporter: Lili Ma >Assignee: Lili Ma > Fix For: 2.2.0.0-incubating > > > Since HAWQ is based on Parquet format version 1.0, which does not support > dictionary page, and hawq register may register Parquet format version 2.0 > data into HAWQ, we should throw error if finding unsupported page for column. > Reproduce Steps: > 1. In Hive, create a table and insert into 8 records: > {code} > (hive> create table tt (i int, > > fname varchar(100), > > title varchar(100), > > salary double > > ) > > STORED AS PARQUET; > OK > Time taken: 0.029 seconds > hive> insert into tt values (5,'OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW', > 'Sales',80282.54), > > (7,'UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE','Engineer',10206.65), > > (4,'PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ','Director',63691.23), > > (9,'CTDCDYRURBZMBLNWHQNOQCYFFVULOP','Engineer',63867.44), > > (10,'WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK','Sales',97720.08); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. spark, > tez) or using Hive 1.X releases. > Query ID = malili_20170228173956_f370414c-ddc8-4e6d-99e9-7c1fa1f678d1 > Total jobs = 3 > Launching Job 1 out of 3 > Number of reduce tasks is set to 0 since there's no reduce operator > Job running in-process (local Hadoop) > 2017-02-28 17:39:58,713 Stage-1 map = 100%, reduce = 0% > Ended Job = job_local2046305831_0004 > Stage-4 is selected by condition resolver. > Stage-3 is filtered out by condition resolver. > Stage-5 is filtered out by condition resolver. > Moving data to directory > hdfs://127.0.0.1:8020/user/hive/warehouse/tt/.hive-staging_hive_2017-02-28_17-39-56_806_3518057455919651199-1/-ext-1 > Loading data to table default.tt > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 3945 HDFS Write: 4226 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 1.975 seconds > hive> select * from tt; > OK > 5 OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW Sales 80282.54 > 7 UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE Engineer10206.65 > 4 PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ Director63691.23 > 9 CTDCDYRURBZMBLNWHQNOQCYFFVULOP Engineer63867.44 > 10WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK Sales 97720.08 > Time taken: 0.056 seconds, Fetched: 5 row(s) > {code} > 2. Create table in HAWQ > {code} > CREATE TABLE public.tt > (i int, > fname varchar(100), > title varchar(100), > salary float8) > WITH (appendonly=true,orientation=parquet); > {code} > 3. run hawq register > {code} > malilis-MacBook-Pro:Hawq_register malili$ hawq register -d postgres -f > hdfs://localhost:8020/user/hive/warehouse/tt tt > 20170228:17:40:25:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-try > to connect database localhost:5432 postgres > 20170228:17:40:33:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-New > file(s) to be registered: > ['hdfs://localhost:8020/user/hive/warehouse/tt/00_0'] > hdfscmd: "hadoop fs -mv hdfs://localhost:8020/user/hive/warehouse/tt/00_0 > hdfs://localhost:8020/hawq_default/16385/16387/49281/1" > 20170228:17:40:41:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Hawq > Register Succeed. > {code} > 4. select from hawq > {code} > postgres=# select * from tt; > i | fname | title | salary > ++---+-- > 5 | OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW | | 80282.54 > 7 | UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE | | 10206.65 > 4 | PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ | | 63691.23 > 9 | CTDCDYRURBZMBLNWHQNOQCYFFVULOP | | 63867.44 > 10 | WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK | | 97720.08 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAWQ-401) json type support
[ https://issues.apache.org/jira/browse/HAWQ-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889312#comment-15889312 ] Lili Ma commented on HAWQ-401: -- [~kdunn926] I reviewed the pull request for JSON support in Greenplum. It seems the modified part can be directly applied to HAWQ. Since it involved catalog change including pg_proc and pg_type, we may need consider this in hawq upgrade. Thanks > json type support > - > > Key: HAWQ-401 > URL: https://issues.apache.org/jira/browse/HAWQ-401 > Project: Apache HAWQ > Issue Type: Wish > Components: Core >Reporter: Lei Chang >Assignee: Lei Chang > Fix For: backlog > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HAWQ-1368) normal user who doesn't have home directory may have problem when running hawq register
Lili Ma created HAWQ-1368: - Summary: normal user who doesn't have home directory may have problem when running hawq register Key: HAWQ-1368 URL: https://issues.apache.org/jira/browse/HAWQ-1368 Project: Apache HAWQ Issue Type: Bug Components: Command Line Tools Reporter: Lili Ma Assignee: Ed Espino HAWQ register stores information in hawqregister_MMDD.log under directory ~/hawqAdminLogs, and normal user who doesn't have own home directory may encounter failure when running hawq regsiter. We can add -l option in order to set the target log directory and file name of hawq register. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAWQ-1366) HAWQ should throw error if finding dictionary encoding type for Parquet
[ https://issues.apache.org/jira/browse/HAWQ-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887674#comment-15887674 ] Lili Ma commented on HAWQ-1366: --- With the modified code, HAWQ throws error out. {code} postgres=# select * from tt; ERROR: HAWQ does not support dictionary page type resolver for Parquet format in column 'title' (cdbparquetcolumn.c:152) (seg0 localhost:4 pid=90708) {code} > HAWQ should throw error if finding dictionary encoding type for Parquet > --- > > Key: HAWQ-1366 > URL: https://issues.apache.org/jira/browse/HAWQ-1366 > Project: Apache HAWQ > Issue Type: Bug > Components: Storage >Reporter: Lili Ma >Assignee: Ed Espino > Fix For: 2.2.0.0-incubating > > > Since HAWQ is based on Parquet format version 1.0, which does not support > dictionary page, and hawq register may register Parquet format version 2.0 > data into HAWQ, we should throw error if finding unsupported page for column. > Reproduce Steps: > 1. In Hive, create a table and insert into 8 records: > {code} > (hive> create table tt (i int, > > fname varchar(100), > > title varchar(100), > > salary double > > ) > > STORED AS PARQUET; > OK > Time taken: 0.029 seconds > hive> insert into tt values (5,'OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW', > 'Sales',80282.54), > > (7,'UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE','Engineer',10206.65), > > (4,'PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ','Director',63691.23), > > (9,'CTDCDYRURBZMBLNWHQNOQCYFFVULOP','Engineer',63867.44), > > (10,'WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK','Sales',97720.08); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. spark, > tez) or using Hive 1.X releases. > Query ID = malili_20170228173956_f370414c-ddc8-4e6d-99e9-7c1fa1f678d1 > Total jobs = 3 > Launching Job 1 out of 3 > Number of reduce tasks is set to 0 since there's no reduce operator > Job running in-process (local Hadoop) > 2017-02-28 17:39:58,713 Stage-1 map = 100%, reduce = 0% > Ended Job = job_local2046305831_0004 > Stage-4 is selected by condition resolver. > Stage-3 is filtered out by condition resolver. > Stage-5 is filtered out by condition resolver. > Moving data to directory > hdfs://127.0.0.1:8020/user/hive/warehouse/tt/.hive-staging_hive_2017-02-28_17-39-56_806_3518057455919651199-1/-ext-1 > Loading data to table default.tt > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 3945 HDFS Write: 4226 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 1.975 seconds > hive> select * from tt; > OK > 5 OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW Sales 80282.54 > 7 UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE Engineer10206.65 > 4 PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ Director63691.23 > 9 CTDCDYRURBZMBLNWHQNOQCYFFVULOP Engineer63867.44 > 10WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK Sales 97720.08 > Time taken: 0.056 seconds, Fetched: 5 row(s) > {code} > 2. Create table in HAWQ > {code} > CREATE TABLE public.tt > (i int, > fname varchar(100), > title varchar(100), > salary float8) > WITH (appendonly=true,orientation=parquet); > {code} > 3. run hawq register > {code} > malilis-MacBook-Pro:Hawq_register malili$ hawq register -d postgres -f > hdfs://localhost:8020/user/hive/warehouse/tt tt > 20170228:17:40:25:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-try > to connect database localhost:5432 postgres > 20170228:17:40:33:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-New > file(s) to be registered: > ['hdfs://localhost:8020/user/hive/warehouse/tt/00_0'] > hdfscmd: "hadoop fs -mv hdfs://localhost:8020/user/hive/warehouse/tt/00_0 > hdfs://localhost:8020/hawq_default/16385/16387/49281/1" > 20170228:17:40:41:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Hawq > Register Succeed. > {code} > 4. select from hawq > {code} > postgres=# select * from tt; > i | fname | title | salary > ++---+-- > 5 | OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW | | 80282.54 > 7 | UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE | | 10206.65 > 4 | PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ | | 63691.23 > 9 | CTDCDYRURBZMBLNWHQNOQCYFFVULOP | | 63867.44 > 10 | WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK | | 97720.08 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAWQ-1366) HAWQ should throw error if finding dictionary encoding type for Parquet
[ https://issues.apache.org/jira/browse/HAWQ-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887672#comment-15887672 ] Lili Ma commented on HAWQ-1366: --- The title is optimized in Hive to dictionary storage. Since HAWQ doesn't support this, the output information is a little werid. In short team, HAWQ should throw error out for this case. In long term, HAWQ should support Parquet 2.0 data read/write. > HAWQ should throw error if finding dictionary encoding type for Parquet > --- > > Key: HAWQ-1366 > URL: https://issues.apache.org/jira/browse/HAWQ-1366 > Project: Apache HAWQ > Issue Type: Bug > Components: Storage >Reporter: Lili Ma >Assignee: Ed Espino > Fix For: 2.2.0.0-incubating > > > Since HAWQ is based on Parquet format version 1.0, which does not support > dictionary page, and hawq register may register Parquet format version 2.0 > data into HAWQ, we should throw error if finding unsupported page for column. > Reproduce Steps: > 1. In Hive, create a table and insert into 8 records: > {code} > (hive> create table tt (i int, > > fname varchar(100), > > title varchar(100), > > salary double > > ) > > STORED AS PARQUET; > OK > Time taken: 0.029 seconds > hive> insert into tt values (5,'OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW', > 'Sales',80282.54), > > (7,'UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE','Engineer',10206.65), > > (4,'PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ','Director',63691.23), > > (9,'CTDCDYRURBZMBLNWHQNOQCYFFVULOP','Engineer',63867.44), > > (10,'WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK','Sales',97720.08); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. spark, > tez) or using Hive 1.X releases. > Query ID = malili_20170228173956_f370414c-ddc8-4e6d-99e9-7c1fa1f678d1 > Total jobs = 3 > Launching Job 1 out of 3 > Number of reduce tasks is set to 0 since there's no reduce operator > Job running in-process (local Hadoop) > 2017-02-28 17:39:58,713 Stage-1 map = 100%, reduce = 0% > Ended Job = job_local2046305831_0004 > Stage-4 is selected by condition resolver. > Stage-3 is filtered out by condition resolver. > Stage-5 is filtered out by condition resolver. > Moving data to directory > hdfs://127.0.0.1:8020/user/hive/warehouse/tt/.hive-staging_hive_2017-02-28_17-39-56_806_3518057455919651199-1/-ext-1 > Loading data to table default.tt > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 3945 HDFS Write: 4226 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 1.975 seconds > hive> select * from tt; > OK > 5 OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW Sales 80282.54 > 7 UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE Engineer10206.65 > 4 PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ Director63691.23 > 9 CTDCDYRURBZMBLNWHQNOQCYFFVULOP Engineer63867.44 > 10WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK Sales 97720.08 > Time taken: 0.056 seconds, Fetched: 5 row(s) > {code} > 2. Create table in HAWQ > {code} > CREATE TABLE public.tt > (i int, > fname varchar(100), > title varchar(100), > salary float8) > WITH (appendonly=true,orientation=parquet); > {code} > 3. run hawq register > {code} > malilis-MacBook-Pro:Hawq_register malili$ hawq register -d postgres -f > hdfs://localhost:8020/user/hive/warehouse/tt tt > 20170228:17:40:25:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-try > to connect database localhost:5432 postgres > 20170228:17:40:33:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-New > file(s) to be registered: > ['hdfs://localhost:8020/user/hive/warehouse/tt/00_0'] > hdfscmd: "hadoop fs -mv hdfs://localhost:8020/user/hive/warehouse/tt/00_0 > hdfs://localhost:8020/hawq_default/16385/16387/49281/1" > 20170228:17:40:41:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Hawq > Register Succeed. > {code} > 4. select from hawq > {code} > postgres=# select * from tt; > i | fname | title | salary > ++---+-- > 5 | OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW | | 80282.54 > 7 | UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE | | 10206.65 > 4 | PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ | | 63691.23 > 9 | CTDCDYRURBZMBLNWHQNOQCYFFVULOP | | 63867.44 > 10 | WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK | | 97720.08 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HAWQ-1366) HAWQ should throw error if finding dictionary encoding type for Parquet
Lili Ma created HAWQ-1366: - Summary: HAWQ should throw error if finding dictionary encoding type for Parquet Key: HAWQ-1366 URL: https://issues.apache.org/jira/browse/HAWQ-1366 Project: Apache HAWQ Issue Type: Bug Components: Storage Reporter: Lili Ma Assignee: Ed Espino Fix For: 2.2.0.0-incubating Since HAWQ is based on Parquet format version 1.0, which does not support dictionary page, and hawq register may register Parquet format version 2.0 data into HAWQ, we should throw error if finding unsupported page for column. Reproduce Steps: 1. In Hive, create a table and insert into 8 records: {code} (hive> create table tt (i int, > fname varchar(100), > title varchar(100), > salary double > ) > STORED AS PARQUET; OK Time taken: 0.029 seconds hive> insert into tt values (5,'OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW', 'Sales',80282.54), > (7,'UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE','Engineer',10206.65), > (4,'PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ','Director',63691.23), > (9,'CTDCDYRURBZMBLNWHQNOQCYFFVULOP','Engineer',63867.44), > (10,'WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK','Sales',97720.08); WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = malili_20170228173956_f370414c-ddc8-4e6d-99e9-7c1fa1f678d1 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Job running in-process (local Hadoop) 2017-02-28 17:39:58,713 Stage-1 map = 100%, reduce = 0% Ended Job = job_local2046305831_0004 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://127.0.0.1:8020/user/hive/warehouse/tt/.hive-staging_hive_2017-02-28_17-39-56_806_3518057455919651199-1/-ext-1 Loading data to table default.tt MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 3945 HDFS Write: 4226 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 1.975 seconds hive> select * from tt; OK 5 OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW Sales 80282.54 7 UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE Engineer10206.65 4 PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ Director63691.23 9 CTDCDYRURBZMBLNWHQNOQCYFFVULOP Engineer63867.44 10 WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK Sales 97720.08 Time taken: 0.056 seconds, Fetched: 5 row(s) {code} 2. Create table in HAWQ {code} CREATE TABLE public.tt (i int, fname varchar(100), title varchar(100), salary float8) WITH (appendonly=true,orientation=parquet); {code} 3. run hawq register {code} malilis-MacBook-Pro:Hawq_register malili$ hawq register -d postgres -f hdfs://localhost:8020/user/hive/warehouse/tt tt 20170228:17:40:25:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-try to connect database localhost:5432 postgres 20170228:17:40:33:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-New file(s) to be registered: ['hdfs://localhost:8020/user/hive/warehouse/tt/00_0'] hdfscmd: "hadoop fs -mv hdfs://localhost:8020/user/hive/warehouse/tt/00_0 hdfs://localhost:8020/hawq_default/16385/16387/49281/1" 20170228:17:40:41:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Hawq Register Succeed. {code} 4. select from hawq {code} postgres=# select * from tt; i | fname | title | salary ++---+-- 5 | OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW | | 80282.54 7 | UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE | | 10206.65 4 | PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ | | 63691.23 9 | CTDCDYRURBZMBLNWHQNOQCYFFVULOP | | 63867.44 10 | WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK | | 97720.08 {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAWQ-1353) Provide template for Ranger access audit to Solr from RPS
[ https://issues.apache.org/jira/browse/HAWQ-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881803#comment-15881803 ] Lili Ma commented on HAWQ-1353: --- [~adenisso] Once we send audit log to Solr, can we see the audit information on Ranger Admin UI? > Provide template for Ranger access audit to Solr from RPS > - > > Key: HAWQ-1353 > URL: https://issues.apache.org/jira/browse/HAWQ-1353 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Alexander Denissov >Assignee: Alexander Denissov > Fix For: backlog > > > We currently ship examples of how to configure audit log into HDFS (disabled > by default). Audit to HDFS is supposed to be for long-term storage and is not > searchable or presentable on Ranger Admin UI. > To be able to see and search audit log entries, the audit entries should be > sent to Solr, which is a preferred way. We need to provide a set of > properties that users should be able to edit in hawq-ranger-audit.xml file to > enable sending audit events to Solr. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HAWQ-1353) Provide template for Ranger access audit to Solr from RPS
[ https://issues.apache.org/jira/browse/HAWQ-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880008#comment-15880008 ] Lili Ma edited comment on HAWQ-1353 at 2/23/17 6:59 AM: [~adenisso] Could you add more description for this JIRA? Why do you add Ranger access audit to Solr? Thanks was (Author: lilima): [~adenisso] Could you add more description for this JIRA? Why do you add Ranger access audit to Solr? > Provide template for Ranger access audit to Solr from RPS > - > > Key: HAWQ-1353 > URL: https://issues.apache.org/jira/browse/HAWQ-1353 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Alexander Denissov >Assignee: Alexander Denissov > Fix For: backlog > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAWQ-1353) Provide template for Ranger access audit to Solr from RPS
[ https://issues.apache.org/jira/browse/HAWQ-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880008#comment-15880008 ] Lili Ma commented on HAWQ-1353: --- [~adenisso] Could you add more description for this JIRA? Why do you add Ranger access audit to Solr? > Provide template for Ranger access audit to Solr from RPS > - > > Key: HAWQ-1353 > URL: https://issues.apache.org/jira/browse/HAWQ-1353 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Alexander Denissov >Assignee: Alexander Denissov > Fix For: backlog > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAWQ-1332) Can not grant database and schema privileges without table privileges in ranger or ranger plugin service
[ https://issues.apache.org/jira/browse/HAWQ-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867573#comment-15867573 ] Lili Ma commented on HAWQ-1332: --- [~adenisso] Seems that this is a bug from RPS. Could you help see it? Thanks > Can not grant database and schema privileges without table privileges in > ranger or ranger plugin service > > > Key: HAWQ-1332 > URL: https://issues.apache.org/jira/browse/HAWQ-1332 > Project: Apache HAWQ > Issue Type: Bug > Components: Security >Reporter: Chunling Wang >Assignee: Alexander Denissov > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png > > > We try to grant database connect and schema usage privileges to a non-super > user to connect database. We find that if we set policy with database and > schema included, but with table excluded, we can not connect database. But if > we include table, we can connect to database. We think there may be bug in > Ranger Plugin Service or Ranger. Here are steps to reproduce it. > 1. create a new user "usertest1" in database: > {code} > $ psql postgres > psql (8.2.15) > Type "help" for help. > postgres=# CREATE USER usertest1; > NOTICE: resource queue required -- using default resource queue "pg_default" > CREATE ROLE > postgres=# > {code} > 2. add user "usertest1" in pg_hba.conf > {code} > local all usertest1 trust > {code} > 3. set policy with database and schema included, with table excluded > !screenshot-1.png|width=800,height=400! > 4. connect database with user "usertest1" but failed with permission denied > {code} > $ psql postgres -U usertest1 > psql: FATAL: permission denied for database "postgres" > DETAIL: User does not have CONNECT privilege. > {code} > 5. set policy with database, schema and table included > !screenshot-2.png|width=800,height=400! > 6. connect database with user "usertest1" and succeed > {code} > $ psql postgres -U usertest1 > psql (8.2.15) > Type "help" for help. > postgres=# > {code} > But if we do not set table as "*", and specify table like "a", we can not > access database either. > !screenshot-3.png|width=800,height=400! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HAWQ-1332) Can not grant database and schema privileges without table privileges in ranger or ranger plugin service
[ https://issues.apache.org/jira/browse/HAWQ-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma reassigned HAWQ-1332: - Assignee: Alexander Denissov (was: Ed Espino) > Can not grant database and schema privileges without table privileges in > ranger or ranger plugin service > > > Key: HAWQ-1332 > URL: https://issues.apache.org/jira/browse/HAWQ-1332 > Project: Apache HAWQ > Issue Type: Bug > Components: Security >Reporter: Chunling Wang >Assignee: Alexander Denissov > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png > > > We try to grant database connect and schema usage privileges to a non-super > user to connect database. We find that if we set policy with database and > schema included, but with table excluded, we can not connect database. But if > we include table, we can connect to database. We think there may be bug in > Ranger Plugin Service or Ranger. Here are steps to reproduce it. > 1. create a new user "usertest1" in database: > {code} > $ psql postgres > psql (8.2.15) > Type "help" for help. > postgres=# CREATE USER usertest1; > NOTICE: resource queue required -- using default resource queue "pg_default" > CREATE ROLE > postgres=# > {code} > 2. add user "usertest1" in pg_hba.conf > {code} > local all usertest1 trust > {code} > 3. set policy with database and schema included, with table excluded > !screenshot-1.png|width=800,height=400! > 4. connect database with user "usertest1" but failed with permission denied > {code} > $ psql postgres -U usertest1 > psql: FATAL: permission denied for database "postgres" > DETAIL: User does not have CONNECT privilege. > {code} > 5. set policy with database, schema and table included > !screenshot-2.png|width=800,height=400! > 6. connect database with user "usertest1" and succeed > {code} > $ psql postgres -U usertest1 > psql (8.2.15) > Type "help" for help. > postgres=# > {code} > But if we do not set table as "*", and specify table like "a", we can not > access database either. > !screenshot-3.png|width=800,height=400! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger
[ https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867342#comment-15867342 ] Lili Ma commented on HAWQ-256: -- [~kdunn926] I re-looked at your input. 1) Why do they want to use Ranger? What are the scenario and use cases? Ranger provides the missing (and very important) functionality for synchronizing roles and groups from a identity management provider (like LDAP) into HAWQ. Without this capability, roles must be provisioned manually or something like pg-ldap-sync must be used, neither are very enterprise-friendly or "baked" solutions. Actually, I don't think Ranger provides the functionality to sync role/group information into HAWQ. It just sync those information to itself. We may still need to manage the role information in HAWQ to allow them to login. Or, a thorough solution is that HAWQ does not store any user information, but we may not do it now given there are some objects not managed by Ranger. Thoughts? > Integrate Security with Apache Ranger > - > > Key: HAWQ-256 > URL: https://issues.apache.org/jira/browse/HAWQ-256 > Project: Apache HAWQ > Issue Type: New Feature > Components: Security >Reporter: Michael Andre Pearce (IG) >Assignee: Lili Ma > Fix For: backlog > > Attachments: HAWQRangerSupportDesign.pdf, > HAWQRangerSupportDesign_v0.2.pdf, HAWQRangerSupportDesign_v0.3.pdf > > > Integrate security with Apache Ranger for a unified Hadoop security solution. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger
[ https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859265#comment-15859265 ] Lili Ma commented on HAWQ-256: -- [~kdunn926] Thanks a lot! The information you provided is very helpful. About item 9, I wonder whether it is a little strange if we record the audit information for catalog table/owner check in Ranger side given that it is not managed by Ranger. > Integrate Security with Apache Ranger > - > > Key: HAWQ-256 > URL: https://issues.apache.org/jira/browse/HAWQ-256 > Project: Apache HAWQ > Issue Type: New Feature > Components: Security >Reporter: Michael Andre Pearce (IG) >Assignee: Lili Ma > Fix For: backlog > > Attachments: HAWQRangerSupportDesign.pdf, > HAWQRangerSupportDesign_v0.2.pdf, HAWQRangerSupportDesign_v0.3.pdf > > > Integrate security with Apache Ranger for a unified Hadoop security solution. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HAWQ-1204) Add one option in Ambari to enable user to specify whether they want enable Ranger for ACL check
[ https://issues.apache.org/jira/browse/HAWQ-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831484#comment-15831484 ] Lili Ma edited comment on HAWQ-1204 at 1/20/17 9:48 AM: We can do it by using the configurable GUC to specify Ranger as the first step. was (Author: lilima): We can do it by using the configurable GUC for specifying Ranger as the first step. > Add one option in Ambari to enable user to specify whether they want enable > Ranger for ACL check > > > Key: HAWQ-1204 > URL: https://issues.apache.org/jira/browse/HAWQ-1204 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Ambari >Reporter: Lili Ma >Assignee: Alexander Denissov > Fix For: backlog > > > Ambari needs do corresponding modification for enable Ranger in HAWQ. > Also need do special processing if Ranger is on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1204) Add one option in Ambari to enable user to specify whether they want enable Ranger for ACL check
[ https://issues.apache.org/jira/browse/HAWQ-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831484#comment-15831484 ] Lili Ma commented on HAWQ-1204: --- We can do it by using the configurable GUC for specifying Ranger as the first step. > Add one option in Ambari to enable user to specify whether they want enable > Ranger for ACL check > > > Key: HAWQ-1204 > URL: https://issues.apache.org/jira/browse/HAWQ-1204 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Ambari >Reporter: Lili Ma >Assignee: Alexander Denissov > Fix For: backlog > > > Ambari needs do corresponding modification for enable Ranger in HAWQ. > Also need do special processing if Ranger is on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAWQ-1206) Process catalog table ACL on Ranger.
[ https://issues.apache.org/jira/browse/HAWQ-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma resolved HAWQ-1206. --- Resolution: Duplicate Fix Version/s: (was: backlog) 2.1.0.0-incubating > Process catalog table ACL on Ranger. > > > Key: HAWQ-1206 > URL: https://issues.apache.org/jira/browse/HAWQ-1206 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Lili Ma >Assignee: Hubert Zhang > Fix For: 2.1.0.0-incubating > > > There are a lot of catalog tables in HAWQ which also need to go through ACL > check. We need find out how to process there tables once Ranger is configured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1206) Process catalog table ACL on Ranger.
[ https://issues.apache.org/jira/browse/HAWQ-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831469#comment-15831469 ] Lili Ma commented on HAWQ-1206: --- Close this JIRA, since it's duplicated with HAWQ-1275. We currently put the catalog ACL check in HAWQ side, assuming that users may require Ranger feature to mange non-heap table. > Process catalog table ACL on Ranger. > > > Key: HAWQ-1206 > URL: https://issues.apache.org/jira/browse/HAWQ-1206 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Lili Ma >Assignee: Hubert Zhang > Fix For: backlog > > > There are a lot of catalog tables in HAWQ which also need to go through ACL > check. We need find out how to process there tables once Ranger is configured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HAWQ-1207) Gpadmin super user processing on ACL
[ https://issues.apache.org/jira/browse/HAWQ-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831463#comment-15831463 ] Lili Ma edited comment on HAWQ-1207 at 1/20/17 9:40 AM: [~thebellhead] I split the stories given that they are from two aspects: catalog table and super user. For super user, HAWQ behavior without Ranger is that superuser can have all the privileges upon HAWQ internal tables. We need limit the super user behavior for accessing tables create by others. Besides this, there are a lot of super user specific behaviors for some objects. Only superuser has the rights for following operations: 1. create cast: when function is NULL 2. create filespace 3. create/remove/alter foreign-data wrapper 4. create function: For untrusted language, only superuser can create function. 5. create/drop procedural language 6. create/drop/alter resource queue 7. create tablespace: It means the privilege to create tablespace, and only superuser can do. But the CREATE privilege for tablespace means creating database/table/index... in tablespace, which is different. 8. create external table: Only super user can create EXECUTE external web table or create an external table with a file protocol (but in HAWQ 2.0, the file protocol is not supported any more). 9. create operator class 10. copy: Only superuser can copy to or from a file. And in ranger, the superuser can not run copy to or from when he doesn't have the privilege for that table select or insert. 11. alter state of system triggers 12. some build in functions, including pg_logdir_ls, pg_ls_dir, pg_read_file, pg_reload_conf, pg_rotate_logfile, pg_signal_backend, pg_start_backup, pg_stat_file, pg_stat_get_activity, pg_stat_get_backend_activity_start, pg_stat_get_backend_activity, pg_stat_get_backend_client_addr, pg_stat_get_backend_client_port, pg_stat_get_backend_start, pg_stat_get_backend_waiting, pg_stop_backup, pg_switch_xlog, pg_stat_reset For above operations, we'd rather keep it checked in HAWQ side if there is no other concerns. was (Author: lilima): [~thebellhead] I split the stories given that they are from two aspects: catalog table and super user. For super user, HAWQ behavior without Ranger is that superuser can have all the privileges upon HAWQ internal tables. We need limit the super user behavior for accessing tables create by others. Besides this, there are a lot of super user specific behaviors for some objects. Only superuser has the rights for following operations: 1. create cast: when function is NULL 2. create filespace 3. create/remove/alter foreign-data wrapper 4. create function: For untrusted language, only superuser can create function. 5. create/drop procedural language 6. create/drop/alter resource queue 7. create tablespace: It means the privilege to create tablespace, and only superuser can do. But the CREATE privilege for tablespace means creating database/table/index... in tablespace, which is different. 8. create external table: Only super user can create EXECUTE external web table or create an external table with a file protocol (but in HAWQ 2.0, the file protocol is not supported any more). 9. create operator class 10. copy: Only superuser can copy to or from a file. And in ranger, the superuser can not run copy to or from when he doesn't have the privilege for that table select or insert. 11. alter state of system triggers 12. some build in functions, including pg_logdir_ls, pg_ls_dir, pg_read_file, pg_reload_conf, pg_rotate_logfile, pg_signal_backend, pg_start_backup, pg_stat_file, pg_stat_get_activity, pg_stat_get_backend_activity_start, pg_stat_get_backend_activity, pg_stat_get_backend_client_addr, pg_stat_get_backend_client_port, pg_stat_get_backend_start, pg_stat_get_backend_waiting, pg_stop_backup, pg_switch_xlog, pg_stat_reset For above operations, we'd rather keep it checked in HAWQ side, if there is no other concerns. > Gpadmin super user processing on ACL > > > Key: HAWQ-1207 > URL: https://issues.apache.org/jira/browse/HAWQ-1207 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Lili Ma >Assignee: Alexander Denissov > Fix For: backlog > > > Once we specify enable_ranger, we need process gpadmin user privileges. > Ideally, we should also restrict gpadmin behavior since we won't allow > gpadmin to have all control on all user data. > During the init system period, we can let gpadmin has all the privileges on > all the objects. May implement this as seed policy in Ranger plugin side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HAWQ-1207) Gpadmin super user processing on ACL
[ https://issues.apache.org/jira/browse/HAWQ-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831463#comment-15831463 ] Lili Ma edited comment on HAWQ-1207 at 1/20/17 9:40 AM: [~thebellhead] I split the stories given that they are from two aspects: catalog table and super user. For super user, HAWQ behavior without Ranger is that superuser can have all the privileges upon HAWQ internal tables. We need limit the super user behavior for accessing tables create by others. Besides this, there are a lot of super user specific behaviors for some objects. Only superuser has the rights for following operations: 1. create cast: when function is NULL 2. create filespace 3. create/remove/alter foreign-data wrapper 4. create function: For untrusted language, only superuser can create function. 5. create/drop procedural language 6. create/drop/alter resource queue 7. create tablespace: It means the privilege to create tablespace, and only superuser can do. But the CREATE privilege for tablespace means creating database/table/index... in tablespace, which is different. 8. create external table: Only super user can create EXECUTE external web table or create an external table with a file protocol (but in HAWQ 2.0, the file protocol is not supported any more). 9. create operator class 10. copy: Only superuser can copy to or from a file. And in ranger, the superuser can not run copy to or from when he doesn't have the privilege for that table select or insert. 11. alter state of system triggers 12. some build in functions, including pg_logdir_ls, pg_ls_dir, pg_read_file, pg_reload_conf, pg_rotate_logfile, pg_signal_backend, pg_start_backup, pg_stat_file, pg_stat_get_activity, pg_stat_get_backend_activity_start, pg_stat_get_backend_activity, pg_stat_get_backend_client_addr, pg_stat_get_backend_client_port, pg_stat_get_backend_start, pg_stat_get_backend_waiting, pg_stop_backup, pg_switch_xlog, pg_stat_reset For above operations, we'd rather keep it checked in HAWQ side, if there is no other concerns. was (Author: lilima): [~thebellhead] I split the stories given that they are from two aspects: catalog table and super user. For super user, HAWQ behavior without Ranger is that superuser can have all the privileges upon HAWQ internal tables. We need limit the super user behavior for accessing tables create by others. Besides this, there are a lot of super user specific behaviors for some objects. Only superuser can have the right for following behavior: 1. create cast: when function is NULL 2. create filespace 3. create/remove/alter foreign-data wrapper 4. create function: For untrusted language, only superuser can create function. 5. create/drop procedural language 6. create/drop/alter resource queue 7. create tablespace: It means the privilege to create tablespace, and only superuser can do. But the CREATE privilege for tablespace means creating database/table/index... in tablespace, which is different. 8. create external table: Only super user can create EXECUTE external web table or create an external table with a file protocol (but in HAWQ 2.0, the file protocol is not supported any more). 9. create operator class 10. copy: Only superuser can copy to or from a file. And in ranger, the superuser can not run copy to or from when he doesn't have the privilege for that table select or insert. 11. alter state of system triggers 12. some build in functions, including pg_logdir_ls, pg_ls_dir, pg_read_file, pg_reload_conf, pg_rotate_logfile, pg_signal_backend, pg_start_backup, pg_stat_file, pg_stat_get_activity, pg_stat_get_backend_activity_start, pg_stat_get_backend_activity, pg_stat_get_backend_client_addr, pg_stat_get_backend_client_port, pg_stat_get_backend_start, pg_stat_get_backend_waiting, pg_stop_backup, pg_switch_xlog, pg_stat_reset For above operations, we'd rather keep it checked in HAWQ side, if there is no other concerns. > Gpadmin super user processing on ACL > > > Key: HAWQ-1207 > URL: https://issues.apache.org/jira/browse/HAWQ-1207 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Lili Ma >Assignee: Alexander Denissov > Fix For: backlog > > > Once we specify enable_ranger, we need process gpadmin user privileges. > Ideally, we should also restrict gpadmin behavior since we won't allow > gpadmin to have all control on all user data. > During the init system period, we can let gpadmin has all the privileges on > all the objects. May implement this as seed policy in Ranger plugin side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1207) Gpadmin super user processing on ACL
[ https://issues.apache.org/jira/browse/HAWQ-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831463#comment-15831463 ] Lili Ma commented on HAWQ-1207: --- [~thebellhead] I split the stories given that they are from two aspects: catalog table and super user. For super user, HAWQ behavior without Ranger is that superuser can have all the privileges upon HAWQ internal tables. We need limit the super user behavior for accessing tables create by others. Besides this, there are a lot of super user specific behaviors for some objects. Only superuser can have the right for following behavior: 1. create cast: when function is NULL 2. create filespace 3. create/remove/alter foreign-data wrapper 4. create function: For untrusted language, only superuser can create function. 5. create/drop procedural language 6. create/drop/alter resource queue 7. create tablespace: It means the privilege to create tablespace, and only superuser can do. But the CREATE privilege for tablespace means creating database/table/index... in tablespace, which is different. 8. create external table: Only super user can create EXECUTE external web table or create an external table with a file protocol (but in HAWQ 2.0, the file protocol is not supported any more). 9. create operator class 10. copy: Only superuser can copy to or from a file. And in ranger, the superuser can not run copy to or from when he doesn't have the privilege for that table select or insert. 11. alter state of system triggers 12. some build in functions, including pg_logdir_ls, pg_ls_dir, pg_read_file, pg_reload_conf, pg_rotate_logfile, pg_signal_backend, pg_start_backup, pg_stat_file, pg_stat_get_activity, pg_stat_get_backend_activity_start, pg_stat_get_backend_activity, pg_stat_get_backend_client_addr, pg_stat_get_backend_client_port, pg_stat_get_backend_start, pg_stat_get_backend_waiting, pg_stop_backup, pg_switch_xlog, pg_stat_reset For above operations, we'd rather keep it checked in HAWQ side, if there is no other concerns. > Gpadmin super user processing on ACL > > > Key: HAWQ-1207 > URL: https://issues.apache.org/jira/browse/HAWQ-1207 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Lili Ma >Assignee: Alexander Denissov > Fix For: backlog > > > Once we specify enable_ranger, we need process gpadmin user privileges. > Ideally, we should also restrict gpadmin behavior since we won't allow > gpadmin to have all control on all user data. > During the init system period, we can let gpadmin has all the privileges on > all the objects. May implement this as seed policy in Ranger plugin side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAWQ-1275) Check build-in catalogs, tables and functions in native aclcheck.
[ https://issues.apache.org/jira/browse/HAWQ-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma resolved HAWQ-1275. --- Resolution: Fixed Fix Version/s: (was: backlog) 2.2.0.0-incubating > Check build-in catalogs, tables and functions in native aclcheck. > - > > Key: HAWQ-1275 > URL: https://issues.apache.org/jira/browse/HAWQ-1275 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Hubert Zhang >Assignee: Hubert Zhang > Fix For: 2.2.0.0-incubating > > > We plan to do privilege check in hawq side for build-in catalogs, tables and > functions. The reasons are two folds; > 1 Ranger mainly manage the user data, but build-in catalogs and tables are > not related to user data(note that some of them contain statistics > information of user data such as catalog table pg_aoseg_*). > 2 We haven't finish the code of merge of all the privilege check requests > into one big request. Without it query such as "\d" and "analyze" will lead > to hundreds of RPS request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-256) Integrate Security with Apache Ranger
[ https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-256: - Component/s: (was: PXF) > Integrate Security with Apache Ranger > - > > Key: HAWQ-256 > URL: https://issues.apache.org/jira/browse/HAWQ-256 > Project: Apache HAWQ > Issue Type: New Feature > Components: Security >Reporter: Michael Andre Pearce (IG) >Assignee: Lili Ma > Fix For: backlog > > Attachments: HAWQRangerSupportDesign.pdf, > HAWQRangerSupportDesign_v0.2.pdf, HAWQRangerSupportDesign_v0.3.pdf > > > Integrate security with Apache Ranger for a unified Hadoop security solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1275) Check build-in catalogs, tables and functions in native aclcheck.
[ https://issues.apache.org/jira/browse/HAWQ-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825715#comment-15825715 ] Lili Ma commented on HAWQ-1275: --- [~hubertzhang]I think this has already been finished. Please close this JIRA if you have finished it. Thanks > Check build-in catalogs, tables and functions in native aclcheck. > - > > Key: HAWQ-1275 > URL: https://issues.apache.org/jira/browse/HAWQ-1275 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Hubert Zhang >Assignee: Hubert Zhang > Fix For: backlog > > > We plan to do privilege check in hawq side for build-in catalogs, tables and > functions. The reasons are two folds; > 1 Ranger mainly manage the user data, but build-in catalogs and tables are > not related to user data(note that some of them contain statistics > information of user data such as catalog table pg_aoseg_*). > 2 We haven't finish the code of merge of all the privilege check requests > into one big request. Without it query such as "\d" and "analyze" will lead > to hundreds of RPS request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-1257) If user doesn't have privileges on certain objects, need return user which specific table he doesn't have right.
Lili Ma created HAWQ-1257: - Summary: If user doesn't have privileges on certain objects, need return user which specific table he doesn't have right. Key: HAWQ-1257 URL: https://issues.apache.org/jira/browse/HAWQ-1257 Project: Apache HAWQ Issue Type: Sub-task Components: Security Reporter: Lili Ma Assignee: Ed Espino Fix For: 2.2.0.0-incubating If user doesn't have privileges on certain objects, need return user all the objects he doesn't have right, to avoid the user modify one privilege, and then find another privilege constraint, and then another... which may bother the user a lot. For example: user didn't have the rights of t1 and t2. {code} postgres=> select * from test_sa.t1 left join test_sa.t2 on test_sa.t1.i=test_sa.t2.i; ERROR: permission denied for relation t1 {code} We wish to prompt user didn't have the rights of t2 also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1257) If user doesn't have privileges on certain objects, need return user which specific table he doesn't have right.
[ https://issues.apache.org/jira/browse/HAWQ-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1257: -- Assignee: Hongxu Ma (was: Ed Espino) > If user doesn't have privileges on certain objects, need return user which > specific table he doesn't have right. > - > > Key: HAWQ-1257 > URL: https://issues.apache.org/jira/browse/HAWQ-1257 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Lili Ma >Assignee: Hongxu Ma > Fix For: 2.2.0.0-incubating > > > If user doesn't have privileges on certain objects, need return user all the > objects he doesn't have right, to avoid the user modify one privilege, and > then find another privilege constraint, and then another... which may bother > the user a lot. > For example: > user didn't have the rights of t1 and t2. > {code} > postgres=> select * from test_sa.t1 left join test_sa.t2 on > test_sa.t1.i=test_sa.t2.i; > ERROR: permission denied for relation t1 > {code} > We wish to prompt user didn't have the rights of t2 also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1003) Implement batched ACL check through Ranger.
[ https://issues.apache.org/jira/browse/HAWQ-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1003: -- Assignee: Hubert Zhang (was: hongwu) > Implement batched ACL check through Ranger. > --- > > Key: HAWQ-1003 > URL: https://issues.apache.org/jira/browse/HAWQ-1003 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Core >Reporter: Lili Ma >Assignee: Hubert Zhang > Fix For: backlog > > > Implement enhanced hawq ACL check through Ranger, which means, if a query > contains several tables, we can combine the multiple table request together, > to send just one REST request to Ranger REST API Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1256) Enhance libcurl connection to RPS(Ranger Plugin Service), keep it as a long-live connection in session level
[ https://issues.apache.org/jira/browse/HAWQ-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1256: -- Assignee: Xiang Sheng (was: Ed Espino) > Enhance libcurl connection to RPS(Ranger Plugin Service), keep it as a > long-live connection in session level > > > Key: HAWQ-1256 > URL: https://issues.apache.org/jira/browse/HAWQ-1256 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Lili Ma >Assignee: Xiang Sheng > Fix For: 2.2.0.0-incubating > > > The current implementation of call restful api is using a local libcurl > handle, which means every time there is restful call, this handle will be > initialized and user, after the restful call, this handle is finalized. > Establishing the call consumes more time, we can reduce this by keep the > libcurl as a long-live connection. > A better way is to make this libcurl context as a global structure. Just > initialize it once before QD calls restful api, and finalize it before QD > exits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-1256) Enhance libcurl connection to RPS(Ranger Plugin Service), keep it as a long-live connection in session level
Lili Ma created HAWQ-1256: - Summary: Enhance libcurl connection to RPS(Ranger Plugin Service), keep it as a long-live connection in session level Key: HAWQ-1256 URL: https://issues.apache.org/jira/browse/HAWQ-1256 Project: Apache HAWQ Issue Type: Sub-task Components: Security Reporter: Lili Ma Assignee: Ed Espino Fix For: 2.2.0.0-incubating The current implementation of call restful api is using a local libcurl handle, which means every time there is restful call, this handle will be initialized and user, after the restful call, this handle is finalized. Establishing the call consumes more time, we can reduce this by keep the libcurl as a long-live connection. A better way is to make this libcurl context as a global structure. Just initialize it once before QD calls restful api, and finalize it before QD exits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1003) Implement batched ACL check through Ranger.
[ https://issues.apache.org/jira/browse/HAWQ-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794682#comment-15794682 ] Lili Ma commented on HAWQ-1003: --- A query is usually composed of multiple ACL Check requests, for example, if insert into an empty table, a series queries for analyze will generate. If we can assemble all the requests into one, we will reduce the cost added on Ranger ACL. > Implement batched ACL check through Ranger. > --- > > Key: HAWQ-1003 > URL: https://issues.apache.org/jira/browse/HAWQ-1003 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Core >Reporter: Lili Ma >Assignee: hongwu > Fix For: backlog > > > Implement enhanced hawq ACL check through Ranger, which means, if a query > contains several tables, we can combine the multiple table request together, > to send just one REST request to Ranger REST API Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1246) Add generation of RequestID, ClientIP, queryContext(SQL Statement) in HAWQ , and encapsulate these contents to JSON request to RPS
[ https://issues.apache.org/jira/browse/HAWQ-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1246: -- Issue Type: Sub-task (was: Bug) Parent: HAWQ-256 > Add generation of RequestID, ClientIP, queryContext(SQL Statement) in HAWQ , > and encapsulate these contents to JSON request to RPS > -- > > Key: HAWQ-1246 > URL: https://issues.apache.org/jira/browse/HAWQ-1246 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Xiang Sheng >Assignee: Xiang Sheng > Fix For: 2.2.0.0-incubating > > > These informations should be generated and encapsulate them to the full json > request. > Currently they are hardcoded. > {code} > json_object *jreqid = json_object_new_string("1"); > json_object_object_add(jrequest, "requestId", jreqid); > json_object *jclientip = json_object_new_string("123.0.0.21"); > json_object_object_add(jrequest, "clientIp", jclientip); > json_object *jcontext = json_object_new_string("SELECT * FROM DDD"); > json_object_object_add(jrequest, "context", jcontext); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1246) Add generation of RequestID, ClientIP, queryContext(SQL Statement) in HAWQ , and encapsulate these contents to JSON request to RPS
[ https://issues.apache.org/jira/browse/HAWQ-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794657#comment-15794657 ] Lili Ma commented on HAWQ-1246: --- Current Request ID is managed per session level, and now we print it out in HAWQ master log. ``` 2017-01-03 17:38:34.775632 CST,"malili","postgres",p54608,th2056364032,"[local]",,2017-01-03 17:38:30 CST,3586,con14,cmd2,seg-1,,,x3586,sx1,"LOG","0","Send JSON request to Ranger: { ""requestId"": ""8"", ""user"": ""malili"", ""clientIp"": ""127.0.0.1"", ""context"": ""SELECT d.datname as \""Name\"",\n pg_catalog.pg_get_userbyid(d.datdba) as \""Owner\"",\n pg_catalog.pg_encoding_to_char(d.encoding) as \""Encoding\"",\n pg_catalog.array_to_string(d.datacl, E'\\n') AS \""Access privileges\""\nFROM pg_catalog.pg_database d\nWHERE d.datname <> 'hcatalog'\nORDER BY 1;"", ""access"": [ { ""resource"": { ""database"": ""postgres"", ""schema"": ""pg_catalog"", ""function"": ""pg_encoding_to_char"" }, ""privileges"": [ ""EXECUTE"" ] } ] }",,"SELECT d.datname as ""Name"", pg_catalog.pg_get_userbyid(d.datdba) as ""Owner"", pg_catalog.pg_encoding_to_char(d.encoding) as ""Encoding"", pg_catalog.array_to_string(d.datacl, E'\n') AS ""Access privileges"" FROM pg_catalog.pg_database d WHERE d.datname <> 'hcatalog' ORDER BY 1;",0,,"rangerrest.c",391, ``` Note that it's session level. We can use this information to detect what's a query is composed of. > Add generation of RequestID, ClientIP, queryContext(SQL Statement) in HAWQ , > and encapsulate these contents to JSON request to RPS > -- > > Key: HAWQ-1246 > URL: https://issues.apache.org/jira/browse/HAWQ-1246 > Project: Apache HAWQ > Issue Type: Bug > Components: Security >Reporter: Xiang Sheng >Assignee: Xiang Sheng > Fix For: 2.2.0.0-incubating > > > These informations should be generated and encapsulate them to the full json > request. > Currently they are hardcoded. > {code} > json_object *jreqid = json_object_new_string("1"); > json_object_object_add(jrequest, "requestId", jreqid); > json_object *jclientip = json_object_new_string("123.0.0.21"); > json_object_object_add(jrequest, "clientIp", jclientip); > json_object *jcontext = json_object_new_string("SELECT * FROM DDD"); > json_object_object_add(jrequest, "context", jcontext); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1220) Support ranger plugin server HA in hawq side.
[ https://issues.apache.org/jira/browse/HAWQ-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747574#comment-15747574 ] Lili Ma commented on HAWQ-1220: --- I think RPS(Ranger Plugin Service) is independent from master/standby master, which means, there may exist HAWQ master fail but RPS on hawq master still alive, and another side that HAWQ standby master fail but RPS on hawq standby master still alive, right? In current implementation, we just include start RPS in "hawq start master" and "hawq start standby". But whatever HAWQ master or HAWQ standby master, it will check the RPS service on the same host with it, if finding failure, will try to connect another RPS, right? Thanks > Support ranger plugin server HA in hawq side. > - > > Key: HAWQ-1220 > URL: https://issues.apache.org/jira/browse/HAWQ-1220 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Hubert Zhang >Assignee: Hubert Zhang > Fix For: backlog > > > RPS will run both at master and at standby master, If connection to master > RPS failed, we should try to connect to standby master instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1205) Change hawq start script once finding enable_ranger GUC is on.
[ https://issues.apache.org/jira/browse/HAWQ-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1205: -- Assignee: Lili Ma (was: Lei Chang) > Change hawq start script once finding enable_ranger GUC is on. > -- > > Key: HAWQ-1205 > URL: https://issues.apache.org/jira/browse/HAWQ-1205 > Project: Apache HAWQ > Issue Type: Sub-task > Components: PXF, Security >Reporter: Lili Ma >Assignee: Lili Ma > Fix For: backlog > > > If hawq start finds enable_ranger GUC is on, it needs to start RPS service. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-1207) Gpadmin super user processing on ACL
Lili Ma created HAWQ-1207: - Summary: Gpadmin super user processing on ACL Key: HAWQ-1207 URL: https://issues.apache.org/jira/browse/HAWQ-1207 Project: Apache HAWQ Issue Type: Sub-task Components: Security Reporter: Lili Ma Assignee: Lei Chang Once we specify enable_ranger, we need process gpadmin user privileges. Ideally, we should also restrict gpadmin behavior since we won't allow gpadmin to have all control on all user data. During the init system period, we can let gpadmin has all the privileges on all the objects. May implement this as seed policy in Ranger plugin side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1206) Process catalog table ACL on Ranger
[ https://issues.apache.org/jira/browse/HAWQ-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1206: -- Assignee: Lin Wen (was: Lei Chang) > Process catalog table ACL on Ranger > --- > > Key: HAWQ-1206 > URL: https://issues.apache.org/jira/browse/HAWQ-1206 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Lili Ma >Assignee: Lin Wen > Fix For: backlog > > > There are a lot of catalog tables in HAWQ which also need to go through ACL > check. We need find out how to process there tables once Ranger is configured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1207) Gpadmin super user processing on ACL
[ https://issues.apache.org/jira/browse/HAWQ-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1207: -- Assignee: Alexander Denissov (was: Lei Chang) > Gpadmin super user processing on ACL > > > Key: HAWQ-1207 > URL: https://issues.apache.org/jira/browse/HAWQ-1207 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Lili Ma >Assignee: Alexander Denissov > Fix For: backlog > > > Once we specify enable_ranger, we need process gpadmin user privileges. > Ideally, we should also restrict gpadmin behavior since we won't allow > gpadmin to have all control on all user data. > During the init system period, we can let gpadmin has all the privileges on > all the objects. May implement this as seed policy in Ranger plugin side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-1206) Process catalog table ACL on Ranger
Lili Ma created HAWQ-1206: - Summary: Process catalog table ACL on Ranger Key: HAWQ-1206 URL: https://issues.apache.org/jira/browse/HAWQ-1206 Project: Apache HAWQ Issue Type: Sub-task Components: Security Reporter: Lili Ma Assignee: Lei Chang There are a lot of catalog tables in HAWQ which also need to go through ACL check. We need find out how to process there tables once Ranger is configured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1001) Implement HAWQ basic user ACL check through Ranger
[ https://issues.apache.org/jira/browse/HAWQ-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731268#comment-15731268 ] Lili Ma commented on HAWQ-1001: --- At this time, HAWQ ACL should integrate with Ranger Plugin Service(RPS) together to establish a first cycle. > Implement HAWQ basic user ACL check through Ranger > -- > > Key: HAWQ-1001 > URL: https://issues.apache.org/jira/browse/HAWQ-1001 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Core >Reporter: Lili Ma >Assignee: Hubert Zhang > Fix For: backlog > > > When a user run some query, HAWQ can connect to Ranger to judge whether the > user has the privilege to do that. > For each object with unique oid, send one request to Ranger -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-1205) Change hawq start script once finding enable_ranger GUC is on.
Lili Ma created HAWQ-1205: - Summary: Change hawq start script once finding enable_ranger GUC is on. Key: HAWQ-1205 URL: https://issues.apache.org/jira/browse/HAWQ-1205 Project: Apache HAWQ Issue Type: Sub-task Reporter: Lili Ma Assignee: Lei Chang If hawq start finds enable_ranger GUC is on, it needs to start RPS service. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1204) Add one option in Ambari to enable user to specify whether they want enable Ranger for ACL check
[ https://issues.apache.org/jira/browse/HAWQ-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731262#comment-15731262 ] Lili Ma commented on HAWQ-1204: --- Current life cycle for enabling Ranger processing with Ambari is as follows: 1. use the script to register JAR / JSON / policies with Ranger (manual, as Ambari can not ssh into the Ranger host to upload JAR there) 2. define additional policies in Ranger UI, if needed, Ranger can talk to HAWQ as HAWQ is already up 3. change GUC in hawq_site.xml (via Ambari) 4. restart HAWQ (via Ambari) 5. Upon restart, 'hawq start' command will detect GUC setting and start up RPS first before starting hawq binary. > Add one option in Ambari to enable user to specify whether they want enable > Ranger for ACL check > > > Key: HAWQ-1204 > URL: https://issues.apache.org/jira/browse/HAWQ-1204 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Ambari >Reporter: Lili Ma >Assignee: Alexander Denissov > Fix For: backlog > > > Ambari needs do corresponding modification for enable Ranger in HAWQ. > Also need do special processing if Ranger is on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HAWQ-256) Integrate Security with Apache Ranger
[ https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma reassigned HAWQ-256: Assignee: Lili Ma (was: Alexander Denissov) > Integrate Security with Apache Ranger > - > > Key: HAWQ-256 > URL: https://issues.apache.org/jira/browse/HAWQ-256 > Project: Apache HAWQ > Issue Type: New Feature > Components: PXF, Security >Reporter: Michael Andre Pearce (IG) >Assignee: Lili Ma > Fix For: backlog > > Attachments: HAWQRangerSupportDesign.pdf, > HAWQRangerSupportDesign_v0.2.pdf, HAWQRangerSupportDesign_v0.3.pdf > > > Integrate security with Apache Ranger for a unified Hadoop security solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-256) Integrate Security with Apache Ranger
[ https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-256: - Assignee: Alexander Denissov (was: Lili Ma) > Integrate Security with Apache Ranger > - > > Key: HAWQ-256 > URL: https://issues.apache.org/jira/browse/HAWQ-256 > Project: Apache HAWQ > Issue Type: New Feature > Components: PXF, Security >Reporter: Michael Andre Pearce (IG) >Assignee: Alexander Denissov > Fix For: backlog > > Attachments: HAWQRangerSupportDesign.pdf, > HAWQRangerSupportDesign_v0.2.pdf, HAWQRangerSupportDesign_v0.3.pdf > > > Integrate security with Apache Ranger for a unified Hadoop security solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-1204) Add one option in Ambari to enable user to specify whether they want enable Ranger for ACL check
Lili Ma created HAWQ-1204: - Summary: Add one option in Ambari to enable user to specify whether they want enable Ranger for ACL check Key: HAWQ-1204 URL: https://issues.apache.org/jira/browse/HAWQ-1204 Project: Apache HAWQ Issue Type: Sub-task Components: Ambari Reporter: Lili Ma Assignee: Alexander Denissov Ambari needs do corresponding modification for enable Ranger in HAWQ. Also need do special processing if Ranger is on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1203) Implement Ranger Plugin Service which holds HAWQ Ranger Plugin and provide REST Service
[ https://issues.apache.org/jira/browse/HAWQ-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1203: -- Assignee: Alexander Denissov (was: Lei Chang) > Implement Ranger Plugin Service which holds HAWQ Ranger Plugin and provide > REST Service > --- > > Key: HAWQ-1203 > URL: https://issues.apache.org/jira/browse/HAWQ-1203 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Security >Reporter: Lili Ma >Assignee: Alexander Denissov > Fix For: backlog > > > Per design, we want to create a separate RPS service which hosts HAWQ Ranger > plugin service and in charge of handling HAWQ ACL request and periodically > fetch policies from Ranger server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-1203) Implement Ranger Plugin Service which holds HAWQ Ranger Plugin and provide REST Service
Lili Ma created HAWQ-1203: - Summary: Implement Ranger Plugin Service which holds HAWQ Ranger Plugin and provide REST Service Key: HAWQ-1203 URL: https://issues.apache.org/jira/browse/HAWQ-1203 Project: Apache HAWQ Issue Type: Sub-task Components: Security Reporter: Lili Ma Assignee: Lei Chang Per design, we want to create a separate RPS service which hosts HAWQ Ranger plugin service and in charge of handling HAWQ ACL request and periodically fetch policies from Ranger server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HAWQ-1171) Support upgrade for hawq register.
[ https://issues.apache.org/jira/browse/HAWQ-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710623#comment-15710623 ] Lili Ma edited comment on HAWQ-1171 at 12/1/16 2:36 AM: It aims to provide multiple update functions. But for upgrade from HAWQ 2.0.0 to current version, we only need to upgrade hawq register part. For future releases, we may need upgrade other parts too, so we keep this script name. was (Author: lilima): It aims to provide multiple update functions. But for upgrade from HAWQ 2.0.X to 2.1.0, we only need to upgrade hawq register part. For future releases, we may need upgrade other parts too, so we keep this script name. > Support upgrade for hawq register. > -- > > Key: HAWQ-1171 > URL: https://issues.apache.org/jira/browse/HAWQ-1171 > Project: Apache HAWQ > Issue Type: New Feature > Components: Core >Reporter: Hubert Zhang >Assignee: Hubert Zhang > Fix For: 2.0.1.0-incubating > > > For Hawq register feature, we need to add some build-in functions to support > some catalog changes. This could be done by a hawqupgrade script. > User interface: > Hawq upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1145) After registering a partition table, if we want to insert some data into the table, it fails.
[ https://issues.apache.org/jira/browse/HAWQ-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1145: -- Attachment: dists.dss dbgen [~xunzhang] you can use these two files to generate tpch data. The way to generate lineitem_1g is dbgen -b dists.dss -s 1 -T L >lineitem_1g > After registering a partition table, if we want to insert some data into the > table, it fails. > - > > Key: HAWQ-1145 > URL: https://issues.apache.org/jira/browse/HAWQ-1145 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Affects Versions: 2.0.1.0-incubating >Reporter: Lili Ma >Assignee: Hubert Zhang > Fix For: 2.0.1.0-incubating > > Attachments: dbgen, dists.dss > > > Reproduce Steps: > 1. Create a partition table > {code} > CREATE TABLE parquet_LINEITEM_uncompressed( > > > > L_ORDERKEY INT8, > > > > L_PARTKEY BIGINT, > > > > L_SUPPKEY BIGINT, > > > > L_LINENUMBER BIGINT, > > > > L_QUANTITY decimal, > > > > L_EXTENDEDPRICE decimal, > > > > L_DISCOUNT decimal, > > > > L_TAX decimal, > > > > L_RETURNFLAG CHAR(1), > > > > L_LINESTATUS > CHAR(1), > > > > L_SHIPDATE date, > > > >
[jira] [Updated] (HAWQ-1113) In force mode, hawq register error when files in yaml is disordered
[ https://issues.apache.org/jira/browse/HAWQ-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1113: -- Assignee: Chunling Wang (was: Lei Chang) > In force mode, hawq register error when files in yaml is disordered > --- > > Key: HAWQ-1113 > URL: https://issues.apache.org/jira/browse/HAWQ-1113 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Affects Versions: 2.0.1.0-incubating >Reporter: Chunling Wang >Assignee: Chunling Wang > > In force mode, hawq register error when files in yaml is in disordered. For > example, the files order in yaml is as following: > {code} > Files: > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/2 > size: 250 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/4 > size: 250 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/5 > size: 258 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/6 > size: 270 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/3 > size: 258 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW2@/1 > size: 228 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/2 > size: 215 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/3 > size: 215 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/4 > size: 220 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/1 > size: 254 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/6 > size: 215 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/5 > size: 210 > {code} > After hawq register success, we select data from table and get the error: > {code} > ERROR: hdfs file length does not equal to metadata logic length! > (cdbdatalocality.c:1102) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1113) In force mode, hawq register error when files in yaml is disordered
[ https://issues.apache.org/jira/browse/HAWQ-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1113: -- Affects Version/s: 2.0.1.0-incubating > In force mode, hawq register error when files in yaml is disordered > --- > > Key: HAWQ-1113 > URL: https://issues.apache.org/jira/browse/HAWQ-1113 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Affects Versions: 2.0.1.0-incubating >Reporter: Chunling Wang >Assignee: Chunling Wang > > In force mode, hawq register error when files in yaml is in disordered. For > example, the files order in yaml is as following: > {code} > Files: > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/2 > size: 250 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/4 > size: 250 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/5 > size: 258 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/6 > size: 270 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/3 > size: 258 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW2@/1 > size: 228 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/2 > size: 215 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/3 > size: 215 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/4 > size: 220 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_OLD@/1 > size: 254 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/6 > size: 215 > - path: /hawq_default/16385/@DATABASE_OID@/@TABLE_OID_NEW@/5 > size: 210 > {code} > After hawq register success, we select data from table and get the error: > {code} > ERROR: hdfs file length does not equal to metadata logic length! > (cdbdatalocality.c:1102) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAWQ-1035) support partition table register
[ https://issues.apache.org/jira/browse/HAWQ-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma resolved HAWQ-1035. --- Resolution: Fixed > support partition table register > > > Key: HAWQ-1035 > URL: https://issues.apache.org/jira/browse/HAWQ-1035 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Lili Ma >Assignee: Chunling Wang > Fix For: 2.0.1.0-incubating > > > Support partition table register, limited to 1 level partition table, since > hawq extract only supports 1-level partition table. > Expected behavior: > 1. Create a partition table in HAWQ, then extract the information out to .yml > file > 2. Call hawq register and specify identified .yml file and a new table name, > the files should be registered into the new table. > Work can be detailed down to implement partition table register: > 1. modify .yml configuration file parsing function, add content for partition > table. > 2. construct partition table DDL regards to .yml configuration file > 3. map sub partition table name to the table list in .yml configuration file > 4. register the subpartition table one by one -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-991) "HAWQ register" could register tables according to .yml configuration file
[ https://issues.apache.org/jira/browse/HAWQ-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-991: - Component/s: (was: External Tables) > "HAWQ register" could register tables according to .yml configuration file > -- > > Key: HAWQ-991 > URL: https://issues.apache.org/jira/browse/HAWQ-991 > Project: Apache HAWQ > Issue Type: New Feature > Components: Command Line Tools >Affects Versions: 2.0.1.0-incubating >Reporter: hongwu >Assignee: hongwu > Fix For: 2.0.1.0-incubating > > > Scenario: > 1. For cluster Disaster Recovery. Two clusters co-exist, periodically import > data from Cluster A to Cluster B. Need Register data to Cluster B. > 2. For the rollback of table. Do checkpoints somewhere, and need to rollback > to previous checkpoint. > Description: > Register according to .yml configuration file. > hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c > config] [--force][--repair] > Behaviors: > 1. If table doesn't exist, will automatically create the table and register > the files in .yml configuration file. Will use the filesize specified in .yml > to update the catalog table. > 2. If table already exist, and neither --force nor --repair configured. Do > not create any table, and directly register the files specified in .yml file > to the table. Note that if the file is under table directory in HDFS, will > throw error, say, to-be-registered files should not under the table path. > 3. If table already exist, and --force is specified. Will clear all the > catalog contents in pg_aoseg.pg_paqseg_$relid while keep the files on HDFS, > and then re-register all the files to the table. This is for scenario 2. > 4. If table already exist, and --repair is specified. Will change both file > folder and catalog table pg_aoseg.pg_paqseg_$relid to the state which .yml > file configures. Note may some new generated files since the checkpoint may > be deleted here. Also note the all the files in .yml file should all under > the table folder on HDFS. Limitation: Do not support cases for hash table > redistribution, table truncate and table drop. This is for scenario 3. > Requirements: > 1. To be registered file path has to colocate with HAWQ in the same HDFS > cluster. > 2. If to be registered is a hash table, the registered file number should be > one or multiple times or hash table bucket number. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1091) HAWQ InputFormat Bugs
[ https://issues.apache.org/jira/browse/HAWQ-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1091: -- Component/s: (was: Command Line Tools) Storage > HAWQ InputFormat Bugs > - > > Key: HAWQ-1091 > URL: https://issues.apache.org/jira/browse/HAWQ-1091 > Project: Apache HAWQ > Issue Type: Bug > Components: Storage >Reporter: hongwu >Assignee: hongwu > Fix For: 2.0.1.0-incubating > > > In "TPCHLocalTester.java" and "HAWQInputFormatPerformanceTest_TPCH.java", it > uses "WHERE content>=0" filter which is old condition in old version of HAWQ. > dbgen binary is not included in hawq repo which is needed for > generate_load_tpch.pl script to generate data used for running mapreduce test > cases. We should disable these cases. > A bug when size in extracted yaml file is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1145) After registering a partition table, if we want to insert some data into the table, it fails.
[ https://issues.apache.org/jira/browse/HAWQ-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1145: -- Description: Reproduce Steps: 1. Create a partition table {code} CREATE TABLE parquet_LINEITEM_uncompressed( L_ORDERKEY INT8, L_PARTKEY BIGINT, L_SUPPKEY BIGINT, L_LINENUMBER BIGINT, L_QUANTITY decimal, L_EXTENDEDPRICE decimal, L_DISCOUNT decimal, L_TAX decimal, L_RETURNFLAG CHAR(1), L_LINESTATUS CHAR(1), L_SHIPDATE date, L_COMMITDATE date, L_RECEIPTDATE date, L_SHIPINSTRUCT CHAR(25),
[jira] [Updated] (HAWQ-1145) After registering a partition table, if we want to insert some data into the table, it fails.
[ https://issues.apache.org/jira/browse/HAWQ-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1145: -- Assignee: Hubert Zhang (was: Lei Chang) > After registering a partition table, if we want to insert some data into the > table, it fails. > - > > Key: HAWQ-1145 > URL: https://issues.apache.org/jira/browse/HAWQ-1145 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Reporter: Lili Ma >Assignee: Hubert Zhang > Fix For: 2.0.1.0-incubating > > > Reproduce Steps: > 1. Create a partition table > CREATE TABLE parquet_LINEITEM_uncompressed( > > > > L_ORDERKEY INT8, > > > > L_PARTKEY BIGINT, > > > > L_SUPPKEY BIGINT, > > > > L_LINENUMBER BIGINT, > > > > L_QUANTITY decimal, > > > > L_EXTENDEDPRICE decimal, > > > > L_DISCOUNT decimal, > > > > L_TAX decimal, > > > > L_RETURNFLAG CHAR(1), > > > > L_LINESTATUS > CHAR(1), > > > > L_SHIPDATE date, > > > > L_COMMITDATE date, > > >
[jira] [Updated] (HAWQ-1144) Register into a 2-level partition table, hawq register didn't throw error, and indicates that hawq register succeed, but no data can be selected out.
[ https://issues.apache.org/jira/browse/HAWQ-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1144: -- Description: Register into a 2-level partition table, hawq register didn't throw error, and indicates that hawq register succeed, but no data can be selected out. Reproduce Steps: 1. Create a one-level partition table {code} create table parquet_wt (id SERIAL,a1 int,a2 char(5),a3 numeric,a4 boolean DEFAULT false ,a5 char DEFAULT 'd',a6 text,a7 timestamp,a8 character varying(705),a9 bigint,a10 date,a11 varchar(600),a12 text,a13 decimal,a14 real,a15 bigint,a16 int4 ,a17 bytea,a18 timestamp with time zone,a19 timetz,a20 path,a21 box,a22 macaddr,a23 interval,a24 character varying(800),a25 lseg,a26 point,a27 double precision,a28 circle,a29 int4,a30 numeric(8),a31 polygon,a32 date,a33 real,a34 money,a35 cidr,a36 inet,a37 time,a38 text,a39 bit,a40 bit varying(5),a41 smallint,a42 int ) WITH (appendonly=true, orientation=parquet) distributed randomly Partition by range(a1) (start(1) end(5000) every(1000) ); {code} 2. insert some data into this table {code} insert into parquet_wt (a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42) values(generate_series(1,20),'M',2011,'t','a','This is news of today: Deadlock between Republicans and Democrats over how best to reduce the U.S. deficit, and over what period, has blocked an agreement to allow the raising of the $14.3 trillion debt ceiling','2001-12-24 02:26:11','U.S. House of Representatives Speaker John Boehner, the top Republican in Congress who has put forward a deficit reduction plan to be voted on later on Thursday said he had no control over whether his bill would avert a credit downgrade.',generate_series(2490,2505),'2011-10-11','The Republican-controlled House is tentatively scheduled to vote on Boehner proposal this afternoon at around 6 p.m. EDT (2200 GMT). The main Republican vote counter in the House, Kevin McCarthy, would not say if there were enough votes to pass the bill.','WASHINGTON:House Speaker John Boehner says his plan mixing spending cuts in exchange for raising the nations $14.3 trillion debt limit is not perfect but is as large a step that a divided government can take that is doable and signable by President Barack Obama.The Ohio Republican says the measure is an honest and sincere attempt at compromise and was negotiated with Democrats last weekend and that passing it would end the ongoing debt crisis. The plan blends $900 billion-plus in spending cuts with a companion increase in the nations borrowing cap.','1234.56',323453,generate_series(3452,3462),7845,'0011','2005-07-16 01:51:15+1359','2001-12-13 01:51:15','((1,2),(0,3),(2,1))','((2,3)(4,5))','08:00:2b:01:02:03','1-2','Republicans had been working throughout the day Thursday to lock down support for their plan to raise the nations debt ceiling, even as Senate Democrats vowed to swiftly kill it if passed.','((2,3)(4,5))','(6,7)',11.222,'((4,5),7)',32,3214,'(1,0,2,3)','2010-02-21',43564,'$1,000.00','192.168.1','126.1.3.4','12:30:45','Johnson & Johnsons McNeil Consumer Healthcare announced the voluntary dosage reduction today. Labels will carry new dosing instructions this fall.The company says it will cut the maximum dosage of Regular Strength Tylenol and other acetaminophen-containing products in 2012.Acetaminophen is safe when used as directed, says Edwin Kuffner, MD, McNeil vice president of over-the-counter medical affairs. But, when too much is taken, it can cause liver damage.The action is intended to cut the risk of such accidental overdoses, the company says in a news release.','1','0',12,23); {code} 3. extract the metadata out for the table {code} hawq extract -d postgres -o ~/parquet.yaml parquet_wt {code} 4. create a two-level partition table {code} CREATE TABLE parquet_wt_subpartgzip2 (id SERIAL,a1 int,a2 char(5),a3 numeric,a4 boolean DEFAULT false ,a5 char DEFAULT 'd',a6 text,a7 timestamp,a8 character varying(705),a9 bigint,a10 date,a11 varchar(600),a12 text,a13 decimal,a14 real,a15 bigint,a16 int4 ,a17 bytea,a18 timestamp with time zone,a19 timetz,a20 path,a21 box,a22 macaddr,a23 interval,a24 character varying(800),a25 lseg,a26 point,a27 double precision,a28 circle,a29 int4,a30 numeric(8),a31 polygon,a32 date,a33 real,a34 money,a35 cidr,a36 inet,a37 time,a38 text,a39 bit,a40 bit varying(5),a41 smallint,a42 int ) WITH (appendonly=true, orientation=parquet) distributed randomly Partition by range(a1) Subpartition by list(a2) subpartition template ( default subpartition df_sp, subpartition sp1 values('M') , subpartition sp2 values('F')
[jira] [Created] (HAWQ-1145) After registering a partition table, if we want to insert some data into the table, it fails.
Lili Ma created HAWQ-1145: - Summary: After registering a partition table, if we want to insert some data into the table, it fails. Key: HAWQ-1145 URL: https://issues.apache.org/jira/browse/HAWQ-1145 Project: Apache HAWQ Issue Type: Bug Components: Command Line Tools Reporter: Lili Ma Assignee: Lei Chang Fix For: 2.0.1.0-incubating Reproduce Steps: 1. Create a partition table CREATE TABLE parquet_LINEITEM_uncompressed( L_ORDERKEY INT8, L_PARTKEY BIGINT, L_SUPPKEY BIGINT, L_LINENUMBER BIGINT, L_QUANTITY decimal, L_EXTENDEDPRICE decimal, L_DISCOUNT decimal, L_TAX decimal, L_RETURNFLAG CHAR(1), L_LINESTATUS CHAR(1), L_SHIPDATE date, L_COMMITDATE date, L_RECEIPTDATE date,
[jira] [Updated] (HAWQ-1144) Register into a 2-level partition table, hawq register didn't throw error, and indicates that hawq register succeed, but no data can be selected out.
[ https://issues.apache.org/jira/browse/HAWQ-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1144: -- Assignee: Lin Wen (was: Lei Chang) Description: Register into a 2-level partition table, hawq register didn't throw error, and indicates that hawq register succeed, but no data can be selected out. Reproduce Steps: 1. Create a one-level partition table {code} create table parquet_wt (id SERIAL,a1 int,a2 char(5),a3 numeric,a4 boolean DEFAULT false ,a5 char DEFAULT 'd',a6 text,a7 timestamp,a8 character varying(705),a9 bigint,a10 date,a11 varchar(600),a12 text,a13 decimal,a14 real,a15 bigint,a16 int4 ,a17 bytea,a18 timestamp with time zone,a19 timetz,a20 path,a21 box,a22 macaddr,a23 interval,a24 character varying(800),a25 lseg,a26 point,a27 double precision,a28 circle,a29 int4,a30 numeric(8),a31 polygon,a32 date,a33 real,a34 money,a35 cidr,a36 inet,a37 time,a38 text,a39 bit,a40 bit varying(5),a41 smallint,a42 int ) WITH (appendonly=true, orientation=parquet) distributed randomly Partition by range(a1) (start(1) end(5000) every(1000) ); {code} 2. insert some data into this table ``` insert into parquet_wt (a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42) values(generate_series(1,20),'M',2011,'t','a','This is news of today: Deadlock between Republicans and Democrats over how best to reduce the U.S. deficit, and over what period, has blocked an agreement to allow the raising of the $14.3 trillion debt ceiling','2001-12-24 02:26:11','U.S. House of Representatives Speaker John Boehner, the top Republican in Congress who has put forward a deficit reduction plan to be voted on later on Thursday said he had no control over whether his bill would avert a credit downgrade.',generate_series(2490,2505),'2011-10-11','The Republican-controlled House is tentatively scheduled to vote on Boehner proposal this afternoon at around 6 p.m. EDT (2200 GMT). The main Republican vote counter in the House, Kevin McCarthy, would not say if there were enough votes to pass the bill.','WASHINGTON:House Speaker John Boehner says his plan mixing spending cuts in exchange for raising the nations $14.3 trillion debt limit is not perfect but is as large a step that a divided government can take that is doable and signable by President Barack Obama.The Ohio Republican says the measure is an honest and sincere attempt at compromise and was negotiated with Democrats last weekend and that passing it would end the ongoing debt crisis. The plan blends $900 billion-plus in spending cuts with a companion increase in the nations borrowing cap.','1234.56',323453,generate_series(3452,3462),7845,'0011','2005-07-16 01:51:15+1359','2001-12-13 01:51:15','((1,2),(0,3),(2,1))','((2,3)(4,5))','08:00:2b:01:02:03','1-2','Republicans had been working throughout the day Thursday to lock down support for their plan to raise the nations debt ceiling, even as Senate Democrats vowed to swiftly kill it if passed.','((2,3)(4,5))','(6,7)',11.222,'((4,5),7)',32,3214,'(1,0,2,3)','2010-02-21',43564,'$1,000.00','192.168.1','126.1.3.4','12:30:45','Johnson & Johnsons McNeil Consumer Healthcare announced the voluntary dosage reduction today. Labels will carry new dosing instructions this fall.The company says it will cut the maximum dosage of Regular Strength Tylenol and other acetaminophen-containing products in 2012.Acetaminophen is safe when used as directed, says Edwin Kuffner, MD, McNeil vice president of over-the-counter medical affairs. But, when too much is taken, it can cause liver damage.The action is intended to cut the risk of such accidental overdoses, the company says in a news release.','1','0',12,23); ``` 3. extract the metadata out for the table ``` hawq extract -d postgres -o ~/parquet.yaml parquet_wt ``` 4. create a two-level partition table ``` CREATE TABLE parquet_wt_subpartgzip2 (id SERIAL,a1 int,a2 char(5),a3 numeric,a4 boolean DEFAULT false ,a5 char DEFAULT 'd',a6 text,a7 timestamp,a8 character varying(705),a9 bigint,a10 date,a11 varchar(600),a12 text,a13 decimal,a14 real,a15 bigint,a16 int4 ,a17 bytea,a18 timestamp with time zone,a19 timetz,a20 path,a21 box,a22 macaddr,a23 interval,a24 character varying(800),a25 lseg,a26 point,a27 double precision,a28 circle,a29 int4,a30 numeric(8),a31 polygon,a32 date,a33 real,a34 money,a35 cidr,a36 inet,a37 time,a38 text,a39 bit,a40 bit varying(5),a41 smallint,a42 int ) WITH (appendonly=true, orientation=parquet) distributed randomly Partition by range(a1) Subpartition by list(a2) subpartition template ( default subpartition df_sp, subpartition sp1 values('M') , subpartition sp2 values('F')
[jira] [Created] (HAWQ-1144) Register into a 2-level partition table, hawq register didn't throw error, and indicates that hawq register succeed, but no data can be selected out.
Lili Ma created HAWQ-1144: - Summary: Register into a 2-level partition table, hawq register didn't throw error, and indicates that hawq register succeed, but no data can be selected out. Key: HAWQ-1144 URL: https://issues.apache.org/jira/browse/HAWQ-1144 Project: Apache HAWQ Issue Type: Bug Components: Command Line Tools Reporter: Lili Ma Assignee: Lei Chang Fix For: 2.0.1.0-incubating Register into a 2-level partition table, hawq register didn't throw error, and indicates that hawq register succeed, but no data can be selected out. Reproduce Steps: 1. Create a one-level partition table ``` create table parquet_wt (id SERIAL,a1 int,a2 char(5),a3 numeric,a4 boolean DEFAULT false ,a5 char DEFAULT 'd',a6 text,a7 timestamp,a8 character varying(705),a9 bigint,a10 date,a11 varchar(600),a12 text,a13 decimal,a14 real,a15 bigint,a16 int4 ,a17 bytea,a18 timestamp with time zone,a19 timetz,a20 path,a21 box,a22 macaddr,a23 interval,a24 character varying(800),a25 lseg,a26 point,a27 double precision,a28 circle,a29 int4,a30 numeric(8),a31 polygon,a32 date,a33 real,a34 money,a35 cidr,a36 inet,a37 time,a38 text,a39 bit,a40 bit varying(5),a41 smallint,a42 int ) WITH (appendonly=true, orientation=parquet) distributed randomly Partition by range(a1) (start(1) end(5000) every(1000) ); ``` 2. insert some data into this table ``` insert into parquet_wt (a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23,a24,a25,a26,a27,a28,a29,a30,a31,a32,a33,a34,a35,a36,a37,a38,a39,a40,a41,a42) values(generate_series(1,20),'M',2011,'t','a','This is news of today: Deadlock between Republicans and Democrats over how best to reduce the U.S. deficit, and over what period, has blocked an agreement to allow the raising of the $14.3 trillion debt ceiling','2001-12-24 02:26:11','U.S. House of Representatives Speaker John Boehner, the top Republican in Congress who has put forward a deficit reduction plan to be voted on later on Thursday said he had no control over whether his bill would avert a credit downgrade.',generate_series(2490,2505),'2011-10-11','The Republican-controlled House is tentatively scheduled to vote on Boehner proposal this afternoon at around 6 p.m. EDT (2200 GMT). The main Republican vote counter in the House, Kevin McCarthy, would not say if there were enough votes to pass the bill.','WASHINGTON:House Speaker John Boehner says his plan mixing spending cuts in exchange for raising the nations $14.3 trillion debt limit is not perfect but is as large a step that a divided government can take that is doable and signable by President Barack Obama.The Ohio Republican says the measure is an honest and sincere attempt at compromise and was negotiated with Democrats last weekend and that passing it would end the ongoing debt crisis. The plan blends $900 billion-plus in spending cuts with a companion increase in the nations borrowing cap.','1234.56',323453,generate_series(3452,3462),7845,'0011','2005-07-16 01:51:15+1359','2001-12-13 01:51:15','((1,2),(0,3),(2,1))','((2,3)(4,5))','08:00:2b:01:02:03','1-2','Republicans had been working throughout the day Thursday to lock down support for their plan to raise the nations debt ceiling, even as Senate Democrats vowed to swiftly kill it if passed.','((2,3)(4,5))','(6,7)',11.222,'((4,5),7)',32,3214,'(1,0,2,3)','2010-02-21',43564,'$1,000.00','192.168.1','126.1.3.4','12:30:45','Johnson & Johnsons McNeil Consumer Healthcare announced the voluntary dosage reduction today. Labels will carry new dosing instructions this fall.The company says it will cut the maximum dosage of Regular Strength Tylenol and other acetaminophen-containing products in 2012.Acetaminophen is safe when used as directed, says Edwin Kuffner, MD, McNeil vice president of over-the-counter medical affairs. But, when too much is taken, it can cause liver damage.The action is intended to cut the risk of such accidental overdoses, the company says in a news release.','1','0',12,23); ``` 3. extract the metadata out for the table ``` hawq extract -d postgres -o ~/parquet.yaml parquet_wt ``` 4. create a two-level partition table ``` CREATE TABLE parquet_wt_subpartgzip2 (id SERIAL,a1 int,a2 char(5),a3 numeric,a4 boolean DEFAULT false ,a5 char DEFAULT 'd',a6 text,a7 timestamp,a8 character varying(705),a9 bigint,a10 date,a11 varchar(600),a12 text,a13 decimal,a14 real,a15 bigint,a16 int4 ,a17 bytea,a18 timestamp with time zone,a19 timetz,a20 path,a21 box,a22 macaddr,a23 interval,a24 character varying(800),a25 lseg,a26 point,a27 double precision,a28 circle,a29 int4,a30 numeric(8),a31 polygon,a32 date,a33 real,a34 money,a35 cidr,a36 inet,a37 time,a38 text,a39 bit,a40 bit varying(5),a41 smallint,a42 int )
[jira] [Updated] (HAWQ-1035) support partition table register
[ https://issues.apache.org/jira/browse/HAWQ-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1035: -- Assignee: Chunling Wang (was: Hubert Zhang) > support partition table register > > > Key: HAWQ-1035 > URL: https://issues.apache.org/jira/browse/HAWQ-1035 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Lili Ma >Assignee: Chunling Wang > Fix For: 2.0.1.0-incubating > > > Support partition table register, limited to 1 level partition table, since > hawq extract only supports 1-level partition table. > Expected behavior: > 1. Create a partition table in HAWQ, then extract the information out to .yml > file > 2. Call hawq register and specify identified .yml file and a new table name, > the files should be registered into the new table. > Work can be detailed down to implement partition table register: > 1. modify .yml configuration file parsing function, add content for partition > table. > 2. construct partition table DDL regards to .yml configuration file > 3. map sub partition table name to the table list in .yml configuration file > 4. register the subpartition table one by one -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HAWQ-1034) add --repair option for hawq register
[ https://issues.apache.org/jira/browse/HAWQ-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15624133#comment-15624133 ] Lili Ma edited comment on HAWQ-1034 at 11/1/16 2:44 AM: Repair mode can be thought of particular case of force mode. 1) Force mode registers the files according to yaml configuration file, erase all the records in catalog (pg_aoseg.pg_aoseg(paqseg)_$relid) and re-implement catalog insert. It requires HDFS files for the table be included in yaml configuation file. 2) Repair mode also registers files according to yaml configuration file, erase the catalog records and re-insert. But it doesn't require all the HDFS files for the table be included in yaml configuration file. It will directly delete those files which are under the table directory but not included in yaml configuration file. Since repair mode may directly deleting HDFS files, say, if user uses repair mode by mistake, his/her data may be deleted, it may bring some risks. We can allow them to use force mode, and throw error for files under the directory but not included in yaml configuration file. If user does think the files are unnecessary, he/she can delete the files by himself/herself. The workaround for supporting repair mode use --force option: 1) If there is no added files since last checkpoint where the yaml configuration file is generated, force mode can directly handle it. 2) If there are some added files since last checkpoint which the user does want to delete, we can output those file information in force mode so that users can delete those files by themselves and then do register force mode again. Since we can use force mode to implement repair feature, we will remove existing code for repair mode and close this JIRA. Thanks was (Author: lilima): Repair mode can be thought of particular case of force mode. 1) Force mode registers the files according to yaml configuration file, erase all the records in catalog (pg_aoseg.pg_aoseg(paqseg)_$relid) and re-implement catalog insert. It requires HDFS files for the table be included in yaml configuation file. 2) Repair mode also registers files according to yaml configuration file, erase the catalog records and re-insert. But it doesn't require all the HDFS files for the table be included in yaml configuration file. It will directly delete those files which are under the table directory but not included in yaml configuration file. I'm a little concerned about directly deleting HDFS files, say, if user uses repair mode by mistake, his/her data may be deleted. So, what if we just allow them to use force mode, and throw error for files under the directory but not included in yaml configuration file. If user does think the files are unnecessary, he/she can delete the files by himself/herself. The workaround for supporting repair mode use --force option: 1) If there is no added files since last checkpoint where the yaml configuration file is generated, force mode can directly handle it. 2) If there are some added files since last checkpoint which the user does want to delete, we can output those file information in force mode so that users can delete those files by themselves and then do register force mode again. Since we can use force mode to implement repair feature, we will remove existing code for repair mode and close this JIRA. Thanks > add --repair option for hawq register > - > > Key: HAWQ-1034 > URL: https://issues.apache.org/jira/browse/HAWQ-1034 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Affects Versions: 2.0.1.0-incubating >Reporter: Lili Ma >Assignee: Chunling Wang > Fix For: 2.0.1.0-incubating > > > add --repair option for hawq register > Will change both file folder and catalog table pg_aoseg.pg_paqseg_$relid to > the state which .yml file configures. Note may some new generated files since > the checkpoint may be deleted here. Also note the all the files in .yml file > should all under the table folder on HDFS. Limitation: Do not support cases > for hash table redistribution, table truncate and table drop. This is for > scenario rollback of table: Do checkpoints somewhere, and need to rollback to > previous checkpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAWQ-1034) add --repair option for hawq register
[ https://issues.apache.org/jira/browse/HAWQ-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma resolved HAWQ-1034. --- Resolution: Done > add --repair option for hawq register > - > > Key: HAWQ-1034 > URL: https://issues.apache.org/jira/browse/HAWQ-1034 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Affects Versions: 2.0.1.0-incubating >Reporter: Lili Ma >Assignee: Chunling Wang > Fix For: 2.0.1.0-incubating > > > add --repair option for hawq register > Will change both file folder and catalog table pg_aoseg.pg_paqseg_$relid to > the state which .yml file configures. Note may some new generated files since > the checkpoint may be deleted here. Also note the all the files in .yml file > should all under the table folder on HDFS. Limitation: Do not support cases > for hash table redistribution, table truncate and table drop. This is for > scenario rollback of table: Do checkpoints somewhere, and need to rollback to > previous checkpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1034) add --repair option for hawq register
[ https://issues.apache.org/jira/browse/HAWQ-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15624133#comment-15624133 ] Lili Ma commented on HAWQ-1034: --- Repair mode can be thought of particular case of force mode. 1) Force mode registers the files according to yaml configuration file, erase all the records in catalog (pg_aoseg.pg_aoseg(paqseg)_$relid) and re-implement catalog insert. It requires HDFS files for the table be included in yaml configuation file. 2) Repair mode also registers files according to yaml configuration file, erase the catalog records and re-insert. But it doesn't require all the HDFS files for the table be included in yaml configuration file. It will directly delete those files which are under the table directory but not included in yaml configuration file. I'm a little concerned about directly deleting HDFS files, say, if user uses repair mode by mistake, his/her data may be deleted. So, what if we just allow them to use force mode, and throw error for files under the directory but not included in yaml configuration file. If user does think the files are unnecessary, he/she can delete the files by himself/herself. The workaround for supporting repair mode use --force option: 1) If there is no added files since last checkpoint where the yaml configuration file is generated, force mode can directly handle it. 2) If there are some added files since last checkpoint which the user does want to delete, we can output those file information in force mode so that users can delete those files by themselves and then do register force mode again. Since we can use force mode to implement repair feature, we will remove existing code for repair mode and close this JIRA. Thanks > add --repair option for hawq register > - > > Key: HAWQ-1034 > URL: https://issues.apache.org/jira/browse/HAWQ-1034 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Affects Versions: 2.0.1.0-incubating >Reporter: Lili Ma >Assignee: Chunling Wang > Fix For: 2.0.1.0-incubating > > > add --repair option for hawq register > Will change both file folder and catalog table pg_aoseg.pg_paqseg_$relid to > the state which .yml file configures. Note may some new generated files since > the checkpoint may be deleted here. Also note the all the files in .yml file > should all under the table folder on HDFS. Limitation: Do not support cases > for hash table redistribution, table truncate and table drop. This is for > scenario rollback of table: Do checkpoints somewhere, and need to rollback to > previous checkpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1104) Add tupcount, varblockcount and eofuncompressed value in hawq extract yaml configuration, also add implementation in hawq register to recognize these values
[ https://issues.apache.org/jira/browse/HAWQ-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1104: -- Assignee: hongwu (was: Lei Chang) > Add tupcount, varblockcount and eofuncompressed value in hawq extract yaml > configuration, also add implementation in hawq register to recognize these > values > -- > > Key: HAWQ-1104 > URL: https://issues.apache.org/jira/browse/HAWQ-1104 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Lili Ma >Assignee: hongwu > Fix For: 2.0.1.0-incubating > > > Add tupcount, varblockcount and eofuncompressed value in hawq extract yaml > configuration, and also add implementation in hawq register to recognize > these values so the information in catalog table pg_aoseg.pg_aoseg_$relid or > pg_aoseg.pg_paqseg_$relid can become correct. > After the work, the information in catalog table will become correct if we > register table according to the yaml configuration file which is generated by > another table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-1104) Add tupcount, varblockcount and eofuncompressed value in hawq extract yaml configuration, also add implementation in hawq register to recognize these values
Lili Ma created HAWQ-1104: - Summary: Add tupcount, varblockcount and eofuncompressed value in hawq extract yaml configuration, also add implementation in hawq register to recognize these values Key: HAWQ-1104 URL: https://issues.apache.org/jira/browse/HAWQ-1104 Project: Apache HAWQ Issue Type: Sub-task Components: Command Line Tools Reporter: Lili Ma Assignee: Lei Chang Fix For: 2.0.1.0-incubating Add tupcount, varblockcount and eofuncompressed value in hawq extract yaml configuration, and also add implementation in hawq register to recognize these values so the information in catalog table pg_aoseg.pg_aoseg_$relid or pg_aoseg.pg_paqseg_$relid can become correct. After the work, the information in catalog table will become correct if we register table according to the yaml configuration file which is generated by another table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-1061) Improve hawq register for already bugs found
Lili Ma created HAWQ-1061: - Summary: Improve hawq register for already bugs found Key: HAWQ-1061 URL: https://issues.apache.org/jira/browse/HAWQ-1061 Project: Apache HAWQ Issue Type: Sub-task Components: Command Line Tools Reporter: Lili Ma Assignee: Lei Chang Fix the bugs found by the verification process -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1050) hawq register help can not return correct result indicating the help information
[ https://issues.apache.org/jira/browse/HAWQ-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1050: -- Issue Type: Sub-task (was: Bug) Parent: HAWQ-991 > hawq register help can not return correct result indicating the help > information > > > Key: HAWQ-1050 > URL: https://issues.apache.org/jira/browse/HAWQ-1050 > Project: Apache HAWQ > Issue Type: Sub-task >Reporter: Lili Ma >Assignee: Lei Chang > > hawq register help can not return correct result indicating the help > information. > should keep help as a keyword and return same results as hawq register --help. > {code} > malilis-MacBook-Pro:~ malili$ hawq register help > 20160914:09:56:37:007364 > hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Usage: hadoop [--config > confdir] COMMAND >where COMMAND is one of: > fs run a generic filesystem user client > version print the version > jar run a jar file > checknative [-a|-h] check native hadoop and compression libraries > availability > distcp copy file or directories recursively > archive -archiveName NAME -p * create a hadoop > archive > classpathprints the class path needed to get the > credential interact with credential providers >Hadoop jar and the required libraries > daemonlogget/set the log level for each daemon > traceview and modify Hadoop tracing settings > or > CLASSNAMErun the class named CLASSNAME > Most commands print help when invoked w/o parameters. > Traceback (most recent call last): > File "/usr/local/hawq/bin/hawqregister", line 398, in > check_hash_type(dburl, tablename) # Usage1 only support randomly > distributed table > File "/usr/local/hawq/bin/hawqregister", line 197, in check_hash_type > logger.error('Table not found in table gp_distribution_policy.' % > tablename) > TypeError: not all arguments converted during string formatting > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-1050) hawq register help can not return correct result indicating the help information
Lili Ma created HAWQ-1050: - Summary: hawq register help can not return correct result indicating the help information Key: HAWQ-1050 URL: https://issues.apache.org/jira/browse/HAWQ-1050 Project: Apache HAWQ Issue Type: Bug Reporter: Lili Ma Assignee: Lei Chang hawq register help can not return correct result indicating the help information. should keep help as a keyword and return same results as hawq register --help. {code} malilis-MacBook-Pro:~ malili$ hawq register help 20160914:09:56:37:007364 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: fs run a generic filesystem user client version print the version jar run a jar file checknative [-a|-h] check native hadoop and compression libraries availability distcp copy file or directories recursively archive -archiveName NAME -p * create a hadoop archive classpathprints the class path needed to get the credential interact with credential providers Hadoop jar and the required libraries daemonlogget/set the log level for each daemon traceview and modify Hadoop tracing settings or CLASSNAMErun the class named CLASSNAME Most commands print help when invoked w/o parameters. Traceback (most recent call last): File "/usr/local/hawq/bin/hawqregister", line 398, in check_hash_type(dburl, tablename) # Usage1 only support randomly distributed table File "/usr/local/hawq/bin/hawqregister", line 197, in check_hash_type logger.error('Table not found in table gp_distribution_policy.' % tablename) TypeError: not all arguments converted during string formatting {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1044) Verify the correctness of hawq register
[ https://issues.apache.org/jira/browse/HAWQ-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486031#comment-15486031 ] Lili Ma commented on HAWQ-1044: --- We need design our test cases to verify hawq register from following aspects: 1. partition table/non-partition table, 2. format: row-oriented/parquet 3. randomly distributed/hash distributed 4. partition policy, range partition or list partition. > Verify the correctness of hawq register > --- > > Key: HAWQ-1044 > URL: https://issues.apache.org/jira/browse/HAWQ-1044 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Lili Ma >Assignee: hongwu > Fix For: backlog > > > Verify the correctness of hawq register, summary all the use scenarios and > design corresponding test cases for it. > I think following test cases should be added for the HAWQ register. > 1. Use Case 1: Register file/folder into HAWQ by specifying file/folder name > a) hawq register -d postgres -f a.file tableA > b) hawq register -d postgres -f a.file -e eof tableA > c) hawq register -d postgres -f folderA tableA > d) register file to existing table. normal path > e) register file to existing table. error path: to-be-registered files under > the file folder for the existing table on HDFS. Should throw error out. > f) verify wrong input file. The file format not parquet format. > 2. Use case 2: Register into HAWQ table using .yml configuration file to a > non-existing table > a) Verify normal input: > create table a(a int, b int); > insert into a values(generate_series(1,100), 25); > hawq extract -d postgres -o a.yml a > hawq register -d postgres -c a.yml b > b) Modify the fileSize in .yml file to a value which is different from actual > data size of data file > 3. Use Case 2: Regsiter into HAWQ table using .yml configuration file to an > existing table > a) Verify normal path: > Call multiple times of hawq register, to verify whether can succeed. Each > time the to-be-registered files are not under the table directory. > b) Error path: to-be-registered files under the file folder for the existing > table on HDFS > Should throw error out: not support! > 4. Use Case 2: Register into HAWQ table using .yml configuration file by > specifying --force option > a) The table not exist: should create a new table, and do the register > b) The table already exist, but no data there: can directly call hawq register > c) Table already exist, and already data there -- normal path: .yml > configuration file includes the data files under table directory, and > just include those data files. > d) Table already exist, and already data there -- normal path: .yml > configuration file includes the data files under table directory, and > also includes data files not under table directory. > e) Table already exist, and already data there -- error path: .yml > configuration file doesn't include the data files under that table directory. > Should throw error out, "there are already existing files under the table, > but not included in .yml configuration file" > 5. Use Case 2: Register into HAWQ table using .yml configuration file by > specifying --repair option > a) Normal Path 1: (Append to new file) > create a tableA > insert some data into tableA > call hawq extract the metadata to a.yml file > insert new data into tableA > call hawq register --repair option to rollback to the state > b) Normal Path 2: (New files generated) > Same as Normal Path 1, but during the second insert, use multiple inserts > concurrenly aiming at producing new files. Then call hawq register --repair, > the new files should be discarded. > c) Error Path: restributed > Create a table with hash-distributed, distributed by column A > insert some data into tableA > call hawq extract the metadata to a.yml file > alter table redistributed by column B > insert new data into tableA > call hawq register --repair option to rollback to the state > --> should throw error "the table is redistributed" > d) Error Path: table being truncated > Create a table with hash-distributed, distributed by column A > insert some data into tableA > call hawq extract the metadata to a.yml file > truncate tableA > call hawq register --repair option to rollback to the state > --> should throw error "the table becomes smaller than the .yml config file > specified." > e) Error Path: files specified in .yml configuration not under data directory > of table A > --> should throw error "the files should all under the table directory when > --repair option specified for hawq register" > 6. hawq register partition table support > a) Normal Path: create a 1-level partition table, calling hawq extract and > then hawq register, can work > b) Error Path: create a 2-level partition table, calling hawq extract and >
[jira] [Updated] (HAWQ-1035) support partition table register
[ https://issues.apache.org/jira/browse/HAWQ-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1035: -- Description: Support partition table register, limited to 1 level partition table, since hawq extract only supports 1-level partition table. Expected behavior: 1. Create a partition table in HAWQ, then extract the information out to .yml file 2. Call hawq register and specify identified .yml file and a new table name, the files should be registered into the new table. Work can be detailed down to implement partition table register: 1. modify .yml configuration file parsing function, add content for partition table. 2. construct partition table DDL regards to .yml configuration file 3. map sub partition table name to the table list in .yml configuration file 4. register the subpartition table one by one was: Support partition table register, limited to 1 level partition table, since hawq extract only supports 1-level partition table. Expected behavior: 1. Create a partition table in HAWQ, then extract the information out to .yml file 2. Call hawq register and specify identified .yml file and a new table name, the files should be registered into the new table. Works can be detailed down to implementation partition table registeration: 1. modify .yml configuration file parsing function, add content for partition table. 2. construct partition table DDL regards to .yml configuration file 3. map sub partition table name to the table list in .yml configuration file 4. register the subpartition table one by one > support partition table register > > > Key: HAWQ-1035 > URL: https://issues.apache.org/jira/browse/HAWQ-1035 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Lili Ma >Assignee: hongwu > Fix For: 2.0.1.0-incubating > > > Support partition table register, limited to 1 level partition table, since > hawq extract only supports 1-level partition table. > Expected behavior: > 1. Create a partition table in HAWQ, then extract the information out to .yml > file > 2. Call hawq register and specify identified .yml file and a new table name, > the files should be registered into the new table. > Work can be detailed down to implement partition table register: > 1. modify .yml configuration file parsing function, add content for partition > table. > 2. construct partition table DDL regards to .yml configuration file > 3. map sub partition table name to the table list in .yml configuration file > 4. register the subpartition table one by one -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1035) support partition table register
[ https://issues.apache.org/jira/browse/HAWQ-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1035: -- Description: Support partition table register, limited to 1 level partition table, since hawq extract only supports 1-level partition table. Expected behavior: 1. Create a partition table in HAWQ, then extract the information out to .yml file 2. Call hawq register and specify identified .yml file and a new table name, the files should be registered into the new table. Works can be detailed down to implementation partition table registeration: 1. modify .yml configuration file parsing function, add content for partition table. 2. construct partition table DDL regards to .yml configuration file 3. map sub partition table name to the table list in .yml configuration file 4. register the subpartition table one by one was:Support partitiont table register, limited to 1 level partition table, since hawq extract only supports 1-level partition table > support partition table register > > > Key: HAWQ-1035 > URL: https://issues.apache.org/jira/browse/HAWQ-1035 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Lili Ma >Assignee: hongwu > Fix For: 2.0.1.0-incubating > > > Support partition table register, limited to 1 level partition table, since > hawq extract only supports 1-level partition table. > Expected behavior: > 1. Create a partition table in HAWQ, then extract the information out to .yml > file > 2. Call hawq register and specify identified .yml file and a new table name, > the files should be registered into the new table. > Works can be detailed down to implementation partition table registeration: > 1. modify .yml configuration file parsing function, add content for partition > table. > 2. construct partition table DDL regards to .yml configuration file > 3. map sub partition table name to the table list in .yml configuration file > 4. register the subpartition table one by one -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1044) Verify the correctness of hawq register
[ https://issues.apache.org/jira/browse/HAWQ-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1044: -- Description: Verify the correctness of hawq register, summary all the use scenarios and design corresponding test cases for it. I think following test cases should be added for the HAWQ register. 1. Use Case 1: Register file/folder into HAWQ by specifying file/folder name a) hawq register -d postgres -f a.file tableA b) hawq register -d postgres -f a.file -e eof tableA c) hawq register -d postgres -f folderA tableA d) register file to existing table. normal path e) register file to existing table. error path: to-be-registered files under the file folder for the existing table on HDFS. Should throw error out. f) verify wrong input file. The file format not parquet format. 2. Use case 2: Register into HAWQ table using .yml configuration file to a non-existing table a) Verify normal input: create table a(a int, b int); insert into a values(generate_series(1,100), 25); hawq extract -d postgres -o a.yml a hawq register -d postgres -c a.yml b b) Modify the fileSize in .yml file to a value which is different from actual data size of data file 3. Use Case 2: Regsiter into HAWQ table using .yml configuration file to an existing table a) Verify normal path: Call multiple times of hawq register, to verify whether can succeed. Each time the to-be-registered files are not under the table directory. b) Error path: to-be-registered files under the file folder for the existing table on HDFS Should throw error out: not support! 4. Use Case 2: Register into HAWQ table using .yml configuration file by specifying --force option a) The table not exist: should create a new table, and do the register b) The table already exist, but no data there: can directly call hawq register c) Table already exist, and already data there -- normal path: .yml configuration file includes the data files under table directory, and just include those data files. d) Table already exist, and already data there -- normal path: .yml configuration file includes the data files under table directory, and also includes data files not under table directory. e) Table already exist, and already data there -- error path: .yml configuration file doesn't include the data files under that table directory. Should throw error out, "there are already existing files under the table, but not included in .yml configuration file" 5. Use Case 2: Register into HAWQ table using .yml configuration file by specifying --repair option a) Normal Path 1: (Append to new file) create a tableA insert some data into tableA call hawq extract the metadata to a.yml file insert new data into tableA call hawq register --repair option to rollback to the state b) Normal Path 2: (New files generated) Same as Normal Path 1, but during the second insert, use multiple inserts concurrenly aiming at producing new files. Then call hawq register --repair, the new files should be discarded. c) Error Path: restributed Create a table with hash-distributed, distributed by column A insert some data into tableA call hawq extract the metadata to a.yml file alter table redistributed by column B insert new data into tableA call hawq register --repair option to rollback to the state --> should throw error "the table is redistributed" d) Error Path: table being truncated Create a table with hash-distributed, distributed by column A insert some data into tableA call hawq extract the metadata to a.yml file truncate tableA call hawq register --repair option to rollback to the state --> should throw error "the table becomes smaller than the .yml config file specified." e) Error Path: files specified in .yml configuration not under data directory of table A --> should throw error "the files should all under the table directory when --repair option specified for hawq register" 6. hawq register partition table support a) Normal Path: create a 1-level partition table, calling hawq extract and then hawq register, can work b) Error Path: create a 2-level partition table, calling hawq extract and then hawq register, --> should throw error "only supports 1-level partition table" was:Verify the correctness of hawq register, summary all the use scenarios and design corresponding test cases for it. > Verify the correctness of hawq register > --- > > Key: HAWQ-1044 > URL: https://issues.apache.org/jira/browse/HAWQ-1044 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Lili Ma >Assignee: Lei Chang > Fix For: backlog > > > Verify the correctness of hawq register, summary all the use scenarios and > design corresponding test cases for it. > I think following test cases should be added for the HAWQ register. > 1. Use Case 1: Register
[jira] [Updated] (HAWQ-1044) Verify the correctness of hawq register
[ https://issues.apache.org/jira/browse/HAWQ-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1044: -- Assignee: hongwu (was: Lei Chang) > Verify the correctness of hawq register > --- > > Key: HAWQ-1044 > URL: https://issues.apache.org/jira/browse/HAWQ-1044 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Lili Ma >Assignee: hongwu > Fix For: backlog > > > Verify the correctness of hawq register, summary all the use scenarios and > design corresponding test cases for it. > I think following test cases should be added for the HAWQ register. > 1. Use Case 1: Register file/folder into HAWQ by specifying file/folder name > a) hawq register -d postgres -f a.file tableA > b) hawq register -d postgres -f a.file -e eof tableA > c) hawq register -d postgres -f folderA tableA > d) register file to existing table. normal path > e) register file to existing table. error path: to-be-registered files under > the file folder for the existing table on HDFS. Should throw error out. > f) verify wrong input file. The file format not parquet format. > 2. Use case 2: Register into HAWQ table using .yml configuration file to a > non-existing table > a) Verify normal input: > create table a(a int, b int); > insert into a values(generate_series(1,100), 25); > hawq extract -d postgres -o a.yml a > hawq register -d postgres -c a.yml b > b) Modify the fileSize in .yml file to a value which is different from actual > data size of data file > 3. Use Case 2: Regsiter into HAWQ table using .yml configuration file to an > existing table > a) Verify normal path: > Call multiple times of hawq register, to verify whether can succeed. Each > time the to-be-registered files are not under the table directory. > b) Error path: to-be-registered files under the file folder for the existing > table on HDFS > Should throw error out: not support! > 4. Use Case 2: Register into HAWQ table using .yml configuration file by > specifying --force option > a) The table not exist: should create a new table, and do the register > b) The table already exist, but no data there: can directly call hawq register > c) Table already exist, and already data there -- normal path: .yml > configuration file includes the data files under table directory, and > just include those data files. > d) Table already exist, and already data there -- normal path: .yml > configuration file includes the data files under table directory, and > also includes data files not under table directory. > e) Table already exist, and already data there -- error path: .yml > configuration file doesn't include the data files under that table directory. > Should throw error out, "there are already existing files under the table, > but not included in .yml configuration file" > 5. Use Case 2: Register into HAWQ table using .yml configuration file by > specifying --repair option > a) Normal Path 1: (Append to new file) > create a tableA > insert some data into tableA > call hawq extract the metadata to a.yml file > insert new data into tableA > call hawq register --repair option to rollback to the state > b) Normal Path 2: (New files generated) > Same as Normal Path 1, but during the second insert, use multiple inserts > concurrenly aiming at producing new files. Then call hawq register --repair, > the new files should be discarded. > c) Error Path: restributed > Create a table with hash-distributed, distributed by column A > insert some data into tableA > call hawq extract the metadata to a.yml file > alter table redistributed by column B > insert new data into tableA > call hawq register --repair option to rollback to the state > --> should throw error "the table is redistributed" > d) Error Path: table being truncated > Create a table with hash-distributed, distributed by column A > insert some data into tableA > call hawq extract the metadata to a.yml file > truncate tableA > call hawq register --repair option to rollback to the state > --> should throw error "the table becomes smaller than the .yml config file > specified." > e) Error Path: files specified in .yml configuration not under data directory > of table A > --> should throw error "the files should all under the table directory when > --repair option specified for hawq register" > 6. hawq register partition table support > a) Normal Path: create a 1-level partition table, calling hawq extract and > then hawq register, can work > b) Error Path: create a 2-level partition table, calling hawq extract and > then hawq register, > --> should throw error "only supports 1-level partition table" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-1044) Verify the correctness of hawq register
Lili Ma created HAWQ-1044: - Summary: Verify the correctness of hawq register Key: HAWQ-1044 URL: https://issues.apache.org/jira/browse/HAWQ-1044 Project: Apache HAWQ Issue Type: Sub-task Reporter: Lili Ma Assignee: Lei Chang Verify the correctness of hawq register, summary all the use scenarios and design corresponding test cases for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1032) Bucket number of newly added partition is not consistent with parent table.
[ https://issues.apache.org/jira/browse/HAWQ-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1032: -- Description: Failure Case {code} set deafult_hash_table_bucket_number = 12; CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) DISTRIBUTED BY (id) PARTITION BY RANGE (date) ( START (date '2008-01-01') INCLUSIVEEND (date '2009-01-01') EXCLUSIVE EVERY (INTERVAL '1 day') ); set default_hash_table_bucket_number = 16; ALTER TABLE sales3 ADD PARTITION START (date '2009-03-01') INCLUSIVE END (date '2009-04-01') EXCLUSIVE; {code} The newly added partition with buckcet number 16 is not consistent with parent partition. was: Failure Case {code} set deafult_hash_table_bucket_number = 12; CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) DISTRIBUTED BY (id) PARTITION BY RANGE (date) ( START (date '2008-01-01') INCLUSIVEEND (date '2009-01-01') EXCLUSIVE EVERY (INTERVAL '1 day') ); set deafult_hash_table_bucket_number = 16; ALTER TABLE sales3 ADD PARTITION START (date '2009-03-01') INCLUSIVE END (date '2009-04-01') EXCLUSIVE; {code} The newly added partition with buckcet number 16 is not consistent with parent partition. > Bucket number of newly added partition is not consistent with parent table. > --- > > Key: HAWQ-1032 > URL: https://issues.apache.org/jira/browse/HAWQ-1032 > Project: Apache HAWQ > Issue Type: Bug > Components: Core >Reporter: Hubert Zhang >Assignee: Hubert Zhang > Fix For: 2.0.1.0-incubating > > > Failure Case > {code} > set deafult_hash_table_bucket_number = 12; > CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) > DISTRIBUTED BY (id) > PARTITION BY RANGE (date) > ( START (date '2008-01-01') INCLUSIVE >END (date '2009-01-01') EXCLUSIVE >EVERY (INTERVAL '1 day') ); > set default_hash_table_bucket_number = 16; > ALTER TABLE sales3 ADD PARTITION START > (date '2009-03-01') INCLUSIVE END > (date '2009-04-01') EXCLUSIVE; > {code} > The newly added partition with buckcet number 16 is not consistent with > parent partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1004) Implement calling Ranger REST Service -- use mock server
[ https://issues.apache.org/jira/browse/HAWQ-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1004: -- Summary: Implement calling Ranger REST Service -- use mock server (was: Decide How HAWQ connect Ranger, through which user, how to connect to REST Server) > Implement calling Ranger REST Service -- use mock server > > > Key: HAWQ-1004 > URL: https://issues.apache.org/jira/browse/HAWQ-1004 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Core >Reporter: Lili Ma >Assignee: Lin Wen > Fix For: backlog > > > Decide How HAWQ connect Ranger, through which user, how to connect to REST > Server > Acceptance Criteria: > Provide an interface for HAWQ connecting Ranger REST Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger
[ https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454137#comment-15454137 ] Lili Ma commented on HAWQ-256: -- [~thebellhead] From technical view, we can restrict HAWQSuperUser privilege in Ranger definitely. But, if we restrict that, HAWQ superuser behavior changes. I think this needs careful discussion, and it's out of the scope of this JIRA. Right? Anyway, if everyone agrees to remove the superuser privileges, we can implement that function. Thanks > Integrate Security with Apache Ranger > - > > Key: HAWQ-256 > URL: https://issues.apache.org/jira/browse/HAWQ-256 > Project: Apache HAWQ > Issue Type: New Feature > Components: PXF, Security >Reporter: Michael Andre Pearce (IG) >Assignee: Lili Ma > Fix For: backlog > > Attachments: HAWQRangerSupportDesign.pdf, > HAWQRangerSupportDesign_v0.2.pdf > > > Integrate security with Apache Ranger for a unified Hadoop security solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HAWQ-256) Integrate Security with Apache Ranger
[ https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451558#comment-15451558 ] Lili Ma edited comment on HAWQ-256 at 8/31/16 8:24 AM: --- [~thebellhead], quit good questions! 1. In order for tools, syntax checking, etc to work everyone (the HAWQ public role) requires access to the catalog and some of the toolkit. Will Ranger-only access control apply only to user created tables, views and external tables? Yes, since the catalog tables and toolkits are shared and used by various users, Ranger-only access control just applies to user defined objects. But the objects include not only database, table and view, but also include function, language, schema, tablespace and protocol. You can find the detailed objects and privileges in the design doc. I have reviewed your proposal in HAWQ-1036, could you share what's your handing for the objects which don't lie in HDFS layer such as function, schema, language, etc? 2. If so - will gpadmin and any other HAWQ-defined roles not have access to the data in Ranger managed tables? Just as you mentioned, HAWQ uses gpadmin identity to create files on HDFS, say, when a specified userA creates a table in HAWQ, the HDFS files for the table are created by gpadmin instead of userA. Since Ranger lies in Hadoop eco-system, it usually needs to control both HAWQ and HDFS, I think we need assign gpadmin to the full privileges of hawq data file directory on HDFS in Ranger UI previously. About your concern about the superuser can see all the users' data, I think it's kind of like the "root" role in operation system? If the users have concerns about the DBA/Superuser's unlimited access, I totally agree with you about the solution of "passing down user-identifiy" for solving this problem :) 3. How would this be extended for the hcatalog virtual database in HAWQ? Could the Ranger permissions for the underlying store (for instance Hive) be read and enforced/reported at the HAWQ level? If HAWQ keeps the gpadmin for operating HDFS or external storage, I think we just need grant the privilege to superuser. But if we have implemented the user-identity passing down, say, the data files on HDFS for a table created by userA are owned by userA instead of gpadmin, in this way we need to double connect to Ranger, from HAWQ and HDFS respectively. I haven't include the underlying store privileges check into HAWQ side, that may need multiple code changes. I think keeping the privileges in the component is another choice. Your thoughts? Thanks Lili was (Author: lilima): [~thebellhead], quit good questions! 1. In order for tools, syntax checking, etc to work everyone (the HAWQ public role) requires access to the catalog and some of the toolkit. Will Ranger-only access control apply only to user created tables, views and external tables? Yes, since the catalog tables and toolkits are shared and used by various users, Ranger-only access control just applies to user defined objects. But the objects include not only database, table and view, but also include function, language, schema, tablespace and protocol. You can find the detailed objects and privileges in the design doc. 2. If so - will gpadmin and any other HAWQ-defined roles not have access to the data in Ranger managed tables? Just as you mentioned, HAWQ uses gpadmin identity to create files on HDFS, say, when a specified userA creates a table in HAWQ, the HDFS files for the table are created by gpadmin instead of userA. Since Ranger lies in Hadoop eco-system, it usually needs to control both HAWQ and HDFS, I think we need assign gpadmin to the full privileges of hawq data file directory on HDFS in Ranger UI previously. About your concern about the superuser can see all the users' data, I think it's kind of like the "root" role in operation system? If the users have concerns about the DBA/Superuser's unlimited access, I totally agree with you about the solution of "passing down user-identifiy" for solving this problem :) 3. How would this be extended for the hcatalog virtual database in HAWQ? Could the Ranger permissions for the underlying store (for instance Hive) be read and enforced/reported at the HAWQ level? If HAWQ keeps the gpadmin for operating HDFS or external storage, I think we just need grant the privilege to superuser. But if we have implemented the user-identity passing down, say, the data files on HDFS for a table created by userA are owned by userA instead of gpadmin, in this way we need to double connect to Ranger, from HAWQ and HDFS respectively. I haven't include the underlying store privileges check into HAWQ side, that may need multiple code changes. I think keeping the privileges in the component is another choice. Your thoughts? Thanks Lili > Integrate Security with Apache Ranger > - > >
[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger
[ https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451558#comment-15451558 ] Lili Ma commented on HAWQ-256: -- [~thebellhead], quit good questions! 1. In order for tools, syntax checking, etc to work everyone (the HAWQ public role) requires access to the catalog and some of the toolkit. Will Ranger-only access control apply only to user created tables, views and external tables? Yes, since the catalog tables and toolkits are shared and used by various users, Ranger-only access control just applies to user defined objects. But the objects include not only database, table and view, but also include function, language, schema, tablespace and protocol. You can find the detailed objects and privileges in the design doc. 2. If so - will gpadmin and any other HAWQ-defined roles not have access to the data in Ranger managed tables? Just as you mentioned, HAWQ uses gpadmin identity to create files on HDFS, say, when a specified userA creates a table in HAWQ, the HDFS files for the table are created by gpadmin instead of userA. Since Ranger lies in Hadoop eco-system, it usually needs to control both HAWQ and HDFS, I think we need assign gpadmin to the full privileges of hawq data file directory on HDFS in Ranger UI previously. About your concern about the superuser can see all the users' data, I think it's kind of like the "root" role in operation system? If the users have concerns about the DBA/Superuser's unlimited access, I totally agree with you about the solution of "passing down user-identifiy" for solving this problem :) 3. How would this be extended for the hcatalog virtual database in HAWQ? Could the Ranger permissions for the underlying store (for instance Hive) be read and enforced/reported at the HAWQ level? If HAWQ keeps the gpadmin for operating HDFS or external storage, I think we just need grant the privilege to superuser. But if we have implemented the user-identity passing down, say, the data files on HDFS for a table created by userA are owned by userA instead of gpadmin, in this way we need to double connect to Ranger, from HAWQ and HDFS respectively. I haven't include the underlying store privileges check into HAWQ side, that may need multiple code changes. I think keeping the privileges in the component is another choice. Your thoughts? Thanks Lili > Integrate Security with Apache Ranger > - > > Key: HAWQ-256 > URL: https://issues.apache.org/jira/browse/HAWQ-256 > Project: Apache HAWQ > Issue Type: New Feature > Components: PXF, Security >Reporter: Michael Andre Pearce (IG) >Assignee: Lili Ma > Fix For: backlog > > Attachments: HAWQRangerSupportDesign.pdf, > HAWQRangerSupportDesign_v0.2.pdf > > > Integrate security with Apache Ranger for a unified Hadoop security solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1034) add --repair option for hawq register
[ https://issues.apache.org/jira/browse/HAWQ-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1034: -- Description: add --repair option for hawq register Will change both file folder and catalog table pg_aoseg.pg_paqseg_$relid to the state which .yml file configures. Note may some new generated files since the checkpoint may be deleted here. Also note the all the files in .yml file should all under the table folder on HDFS. Limitation: Do not support cases for hash table redistribution, table truncate and table drop. This is for scenario rollback of table: Do checkpoints somewhere, and need to rollback to previous checkpoint. was:add --repair option for hawq register > add --repair option for hawq register > - > > Key: HAWQ-1034 > URL: https://issues.apache.org/jira/browse/HAWQ-1034 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Lili Ma >Assignee: Lei Chang > Fix For: 2.0.1.0-incubating > > > add --repair option for hawq register > Will change both file folder and catalog table pg_aoseg.pg_paqseg_$relid to > the state which .yml file configures. Note may some new generated files since > the checkpoint may be deleted here. Also note the all the files in .yml file > should all under the table folder on HDFS. Limitation: Do not support cases > for hash table redistribution, table truncate and table drop. This is for > scenario rollback of table: Do checkpoints somewhere, and need to rollback to > previous checkpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1033) add --force option for hawq register
[ https://issues.apache.org/jira/browse/HAWQ-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1033: -- Description: add --force option for hawq register Will clear all the catalog contents in pg_aoseg.pg_paqseg_$relid while keep the files on HDFS, and then re-register all the files to the table. This is for scenario cluster Disaster Recovery: Two clusters co-exist, periodically import data from Cluster A to Cluster B. Need Register data to Cluster B. was: add --force option for hawq register Will clear all the catalog contents in pg_aoseg.pg_paqseg_$relid while keep the files on HDFS, and then re-register all the files to the table. This is for scenario 2. > add --force option for hawq register > > > Key: HAWQ-1033 > URL: https://issues.apache.org/jira/browse/HAWQ-1033 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Lili Ma >Assignee: Lei Chang > Fix For: 2.0.1.0-incubating > > > add --force option for hawq register > Will clear all the catalog contents in pg_aoseg.pg_paqseg_$relid while keep > the files on HDFS, and then re-register all the files to the table. This is > for scenario cluster Disaster Recovery: Two clusters co-exist, periodically > import data from Cluster A to Cluster B. Need Register data to Cluster B. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-1035) support partition table register
Lili Ma created HAWQ-1035: - Summary: support partition table register Key: HAWQ-1035 URL: https://issues.apache.org/jira/browse/HAWQ-1035 Project: Apache HAWQ Issue Type: Sub-task Components: Command Line Tools Reporter: Lili Ma Assignee: Lei Chang Support partitiont table register, limited to 1 level partition table, since hawq extract only supports 1-level partition table -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-1034) add --repair option for hawq register
Lili Ma created HAWQ-1034: - Summary: add --repair option for hawq register Key: HAWQ-1034 URL: https://issues.apache.org/jira/browse/HAWQ-1034 Project: Apache HAWQ Issue Type: Sub-task Components: Command Line Tools Reporter: Lili Ma Assignee: Lei Chang add --repair option for hawq register -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-1033) add --force option for hawq register
Lili Ma created HAWQ-1033: - Summary: add --force option for hawq register Key: HAWQ-1033 URL: https://issues.apache.org/jira/browse/HAWQ-1033 Project: Apache HAWQ Issue Type: Sub-task Components: Command Line Tools Reporter: Lili Ma Assignee: Lei Chang add --force option for hawq register -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (HAWQ-1024) Rollback if hawq register failed in process
[ https://issues.apache.org/jira/browse/HAWQ-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma closed HAWQ-1024. - Resolution: Invalid > Rollback if hawq register failed in process > --- > > Key: HAWQ-1024 > URL: https://issues.apache.org/jira/browse/HAWQ-1024 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Affects Versions: 2.0.1.0-incubating >Reporter: hongwu >Assignee: hongwu > Fix For: 2.0.1.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1025) Modify the content of yml file, and change hawq register implementation for the modification
[ https://issues.apache.org/jira/browse/HAWQ-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1025: -- Description: 1. Add bucket number for hash-distributed table in yml file, when hawq register, ensure the number of files be multiple times of the bucket number 2. hawq register should use the file size information in yml file to update the catalog table pg_aoseg.pg_paqseg_$relid 3. hawq register processing steps: a. create table b. mv all the files c. change the catalog table once. > Modify the content of yml file, and change hawq register implementation for > the modification > > > Key: HAWQ-1025 > URL: https://issues.apache.org/jira/browse/HAWQ-1025 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Affects Versions: 2.0.1.0-incubating >Reporter: hongwu >Assignee: Lili Ma > Fix For: 2.0.1.0-incubating > > > 1. Add bucket number for hash-distributed table in yml file, when hawq > register, ensure the number of files be multiple times of the bucket number > 2. hawq register should use the file size information in yml file to update > the catalog table pg_aoseg.pg_paqseg_$relid > 3. hawq register processing steps: >a. create table >b. mv all the files >c. change the catalog table once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1025) Modify the content of yml file, and change hawq register implementation for the modification
[ https://issues.apache.org/jira/browse/HAWQ-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1025: -- Description: 1. Add bucket number for hash-distributed table in yml file, when hawq register, ensure the number of files be multiple times of the bucket number 2. hawq register should use the file size information in yml file to update the catalog table pg_aoseg.pg_paqseg_$relid 3. hawq register processing steps: a. create table b. mv all the files c. change the catalog table once. was: 1. Add bucket number for hash-distributed table in yml file, when hawq register, ensure the number of files be multiple times of the bucket number 2. hawq register should use the file size information in yml file to update the catalog table pg_aoseg.pg_paqseg_$relid 3. hawq register processing steps: a. create table b. mv all the files c. change the catalog table once. > Modify the content of yml file, and change hawq register implementation for > the modification > > > Key: HAWQ-1025 > URL: https://issues.apache.org/jira/browse/HAWQ-1025 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Affects Versions: 2.0.1.0-incubating >Reporter: hongwu >Assignee: Lili Ma > Fix For: 2.0.1.0-incubating > > > 1. Add bucket number for hash-distributed table in yml file, when hawq > register, ensure the number of files be multiple times of the bucket number > 2. hawq register should use the file size information in yml file to update > the catalog table pg_aoseg.pg_paqseg_$relid > 3. hawq register processing steps: >a. create table >b. mv all the files >c. change the catalog table once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1025) Modify the content of yml file, and change hawq register implementation for the modification
[ https://issues.apache.org/jira/browse/HAWQ-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma updated HAWQ-1025: -- Summary: Modify the content of yml file, and change hawq register implementation for the modification (was: Check the consistency of AO/Parquet_FileLocations.Files.size attribute in extracted yaml file and the actual file size in HDFS.) > Modify the content of yml file, and change hawq register implementation for > the modification > > > Key: HAWQ-1025 > URL: https://issues.apache.org/jira/browse/HAWQ-1025 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Affects Versions: 2.0.1.0-incubating >Reporter: hongwu >Assignee: Lili Ma > Fix For: 2.0.1.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HAWQ-1025) Check the consistency of AO/Parquet_FileLocations.Files.size attribute in extracted yaml file and the actual file size in HDFS.
[ https://issues.apache.org/jira/browse/HAWQ-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lili Ma reassigned HAWQ-1025: - Assignee: Lili Ma (was: hongwu) > Check the consistency of AO/Parquet_FileLocations.Files.size attribute in > extracted yaml file and the actual file size in HDFS. > --- > > Key: HAWQ-1025 > URL: https://issues.apache.org/jira/browse/HAWQ-1025 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Affects Versions: 2.0.1.0-incubating >Reporter: hongwu >Assignee: Lili Ma > Fix For: 2.0.1.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)