[GitHub] incubator-hawq issue #873: HAWQ-992. PXF Hive data type check in Fragmenter ...
Github user kavinderd commented on the issue: https://github.com/apache/incubator-hawq/pull/873 Overall LGTM, but I'd like to spend more time reviewing. This is a lot of code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (HAWQ-1039) Add test case of bucket number may not be consistent with parent table.
Hubert Zhang created HAWQ-1039: -- Summary: Add test case of bucket number may not be consistent with parent table. Key: HAWQ-1039 URL: https://issues.apache.org/jira/browse/HAWQ-1039 Project: Apache HAWQ Issue Type: Sub-task Components: Core Reporter: Hubert Zhang Assignee: Lei Chang add test case for HAWQ-1032 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger
[ https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454137#comment-15454137 ] Lili Ma commented on HAWQ-256: -- [~thebellhead] From technical view, we can restrict HAWQSuperUser privilege in Ranger definitely. But, if we restrict that, HAWQ superuser behavior changes. I think this needs careful discussion, and it's out of the scope of this JIRA. Right? Anyway, if everyone agrees to remove the superuser privileges, we can implement that function. Thanks > Integrate Security with Apache Ranger > - > > Key: HAWQ-256 > URL: https://issues.apache.org/jira/browse/HAWQ-256 > Project: Apache HAWQ > Issue Type: New Feature > Components: PXF, Security >Reporter: Michael Andre Pearce (IG) >Assignee: Lili Ma > Fix For: backlog > > Attachments: HAWQRangerSupportDesign.pdf, > HAWQRangerSupportDesign_v0.2.pdf > > > Integrate security with Apache Ranger for a unified Hadoop security solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-hawq issue #881: HAWQ-1032. Bucket number of new added partition i...
Github user wengyanqing commented on the issue: https://github.com/apache/incubator-hawq/pull/881 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq issue #881: HAWQ-1032. Bucket number of new added partition i...
Github user ictmalili commented on the issue: https://github.com/apache/incubator-hawq/pull/881 LGTM. +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...
Github user GodenYao commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/873#discussion_r77103957 --- Diff: pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/EnumHawqType.java --- @@ -43,37 +45,40 @@ public void serialize(EnumHawqType value, JsonGenerator generator, */ @JsonSerialize(using = EnumHawqTypeSerializer.class) public enum EnumHawqType { -Int2Type("int2"), -Int4Type("int4"), -Int8Type("int8"), -Float4Type("float4"), -Float8Type("float8"), -TextType("text"), -VarcharType("varchar", (byte) 1, true), -ByteaType("bytea"), -DateType("date"), -TimestampType("timestamp"), -BoolType("bool"), -NumericType("numeric", (byte) 2, true), -BpcharType("bpchar", (byte) 1, true); +Int2Type("int2", DataType.SMALLINT), +Int4Type("int4", DataType.INTEGER), +Int8Type("int8", DataType.BIGINT), +Float4Type("float4", DataType.REAL), +Float8Type("float8", DataType.FLOAT8), +TextType("text", DataType.TEXT), +VarcharType("varchar", DataType.VARCHAR, (byte) 1, false), +ByteaType("bytea", DataType.BYTEA), +DateType("date", DataType.DATE), +TimestampType("timestamp", DataType.TIMESTAMP), +BoolType("bool", DataType.BOOLEAN), +NumericType("numeric", DataType.NUMERIC, (byte) 2, false), +BpcharType("bpchar", DataType.BPCHAR, (byte) 1, false), --- End diff -- yes, if omitting the modifier, bpchar will be dynamic length --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #881: HAWQ-1032. Bucket number of new added part...
GitHub user zhangh43 opened a pull request: https://github.com/apache/incubator-hawq/pull/881 HAWQ-1032. Bucket number of new added partition is not consistent wit⦠â¦h parent table. Failure Case {code} set deafult_hash_table_bucket_number = 12; CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) DISTRIBUTED BY (id) PARTITION BY RANGE (date) ( START (date '2008-01-01') INCLUSIVE END (date '2009-01-01') EXCLUSIVE EVERY (INTERVAL '1 day') ); set deafult_hash_table_bucket_number = 16; ALTER TABLE sales3 ADD PARTITION START (date '2009-03-01') INCLUSIVE END (date '2009-04-01') EXCLUSIVE; {code} The newly added partition with buckcet number 16 is not consistent with parent partition. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhangh43/incubator-hawq hawq1032 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/881.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #881 commit 2b1124d44917c2955652fe4e7583f4e9d855897b Author: hzhang2Date: 2016-09-01T01:33:58Z HAWQ-1032. Bucket number of new added partition is not consistent with parent table. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (HAWQ-1032) Bucket number of newly added partition is not consistent with parent table.
[ https://issues.apache.org/jira/browse/HAWQ-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453989#comment-15453989 ] Hubert Zhang commented on HAWQ-1032: not partition number, but bucket number of sub partition is not consistent with bucket number of root partition. In the above case, the bucket number of root partition is 12, while the new added partition's bucket number is 16. The query will failed if bucket numbers of every partitions are not the same > Bucket number of newly added partition is not consistent with parent table. > --- > > Key: HAWQ-1032 > URL: https://issues.apache.org/jira/browse/HAWQ-1032 > Project: Apache HAWQ > Issue Type: Bug > Components: Core >Reporter: Hubert Zhang >Assignee: Hubert Zhang > Fix For: 2.0.1.0-incubating > > > Failure Case > {code} > set deafult_hash_table_bucket_number = 12; > CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) > DISTRIBUTED BY (id) > PARTITION BY RANGE (date) > ( START (date '2008-01-01') INCLUSIVE >END (date '2009-01-01') EXCLUSIVE >EVERY (INTERVAL '1 day') ); > set deafult_hash_table_bucket_number = 16; > ALTER TABLE sales3 ADD PARTITION START > (date '2009-03-01') INCLUSIVE END > (date '2009-04-01') EXCLUSIVE; > {code} > The newly added partition with buckcet number 16 is not consistent with > parent partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-hawq pull request #880: HAWQ-1037. Modify way to get HDFS port in ...
GitHub user wcl14 opened a pull request: https://github.com/apache/incubator-hawq/pull/880 HAWQ-1037. Modify way to get HDFS port in TestHawqRegister You can merge this pull request into a Git repository by running: $ git pull https://github.com/wcl14/incubator-hawq HAWQ-1037 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/880.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #880 commit caaace3cb340dbeb41da1eaed02f0aa5152c0b38 Author: Chunling WangDate: 2016-08-31T10:19:05Z HAWQ-1037. Modify way to get HDFS port in TestHawqRegister --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...
Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/873#discussion_r77096574 --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/EnumHiveToHawqType.java --- @@ -110,4 +122,68 @@ public static EnumHiveToHawqType getHiveToHawqType(String hiveType) { + hiveType + " to HAWQ's type"); } + +/** + * + * @param dataType Hawq data type + * @return compatible Hive type to given Hawq type, if there are more than one compatible types, it returns one with bigger size + * @throws UnsupportedTypeException if there is no corresponding Hive type for given Hawq type + */ +public static EnumHiveToHawqType getCompatibleHawqToHiveType(DataType dataType) { + +SortedSet types = new TreeSet( +new Comparator() { +public int compare(EnumHiveToHawqType a, +EnumHiveToHawqType b) { +return Byte.compare(a.getSize(), b.getSize()); +} +}); --- End diff -- Can this `types` variable be instantiated outside the method? It doesn't changed based on input right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...
Github user GodenYao commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/873#discussion_r77085040 --- Diff: pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/EnumHawqType.java --- @@ -43,37 +45,40 @@ public void serialize(EnumHawqType value, JsonGenerator generator, */ @JsonSerialize(using = EnumHawqTypeSerializer.class) public enum EnumHawqType { -Int2Type("int2"), -Int4Type("int4"), -Int8Type("int8"), -Float4Type("float4"), -Float8Type("float8"), -TextType("text"), -VarcharType("varchar", (byte) 1, true), -ByteaType("bytea"), -DateType("date"), -TimestampType("timestamp"), -BoolType("bool"), -NumericType("numeric", (byte) 2, true), -BpcharType("bpchar", (byte) 1, true); +Int2Type("int2", DataType.SMALLINT), +Int4Type("int4", DataType.INTEGER), +Int8Type("int8", DataType.BIGINT), +Float4Type("float4", DataType.REAL), +Float8Type("float8", DataType.FLOAT8), +TextType("text", DataType.TEXT), +VarcharType("varchar", DataType.VARCHAR, (byte) 1, false), +ByteaType("bytea", DataType.BYTEA), +DateType("date", DataType.DATE), +TimestampType("timestamp", DataType.TIMESTAMP), +BoolType("bool", DataType.BOOLEAN), +NumericType("numeric", DataType.NUMERIC, (byte) 2, false), +BpcharType("bpchar", DataType.BPCHAR, (byte) 1, false), +//modifier is mandatory for this type because by default it's 1 +CharType("char", DataType.CHAR, (byte) 1, true); --- End diff -- this type is not used in hivetohawq conversion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...
Github user GodenYao commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/873#discussion_r77084340 --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java --- @@ -256,4 +257,68 @@ private static boolean verifyIntegerModifiers(String[] modifiers) { throw new RuntimeException("Failed connecting to Hive MetaStore service: " + cause.getMessage(), cause); } } + + +/** + * Converts HAWQ type to hive type. The supported mappings are: + * {@code BOOLEAN -> boolean} + * {@code SMALLINT -> smallint (tinyint is converted to smallint)} + * {@code BIGINT -> bigint} + * {@code TIMESTAMP -> timestamp} + * {@code NUMERIC -> decimal} + * {@code BYTEA -> binary} + * {@code INTERGER -> int} + * {@code TEXT -> string} + * {@code REAL -> float} + * {@code FLOAT8 -> double} + * + * All other types (both in HAWQ and in HIVE) are not supported. + * + * @param type HAWQ data type + * @param name field name + * @return Hive type + * @throws UnsupportedTypeException if type is not supported + */ +public static String toCompatibleHiveType(DataType type) { + +EnumHiveToHawqType hiveToHawqType = EnumHiveToHawqType.getCompatibleHawqToHiveType(type); +return hiveToHawqType.getTypeName(); +} + + + +public static void validateTypeCompatible(DataType hawqDataType, Integer[] hawqTypeMods, String hiveType, String hawqColumnName) { + +EnumHiveToHawqType hiveToHawqType = EnumHiveToHawqType.getHiveToHawqType(hiveType); +EnumHawqType expectedHawqType = hiveToHawqType.getHawqType(); + +if (!expectedHawqType.getDataType().equals(hawqDataType)) { +throw new UnsupportedTypeException("Invalid definition for column " + hawqColumnName ++ ": expected HAWQ type " + expectedHawqType.getDataType() + +", actual HAWQ type " + hawqDataType); +} + +if ((hawqTypeMods == null || hawqTypeMods.length == 0) && expectedHawqType.isMandatoryModifiers()) +throw new UnsupportedTypeException("Invalid definition for column " + hawqColumnName + ": modifiers are mandatory for type " + expectedHawqType.getTypeName()); + --- End diff -- Another thing: ``` CharType("char", EnumHawqType.BpcharType, "[(,)]"), ``` according to the code above, we convert Hive: char to HAWQ: Bpchar type. In that case, why do we need to check modifier as HAWQ: char (xxx) won't be compared, it's different oid of HAWQ: Bpchar --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...
Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/873#discussion_r77082531 --- Diff: pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/EnumHawqType.java --- @@ -43,37 +45,40 @@ public void serialize(EnumHawqType value, JsonGenerator generator, */ @JsonSerialize(using = EnumHawqTypeSerializer.class) public enum EnumHawqType { -Int2Type("int2"), -Int4Type("int4"), -Int8Type("int8"), -Float4Type("float4"), -Float8Type("float8"), -TextType("text"), -VarcharType("varchar", (byte) 1, true), -ByteaType("bytea"), -DateType("date"), -TimestampType("timestamp"), -BoolType("bool"), -NumericType("numeric", (byte) 2, true), -BpcharType("bpchar", (byte) 1, true); +Int2Type("int2", DataType.SMALLINT), +Int4Type("int4", DataType.INTEGER), +Int8Type("int8", DataType.BIGINT), +Float4Type("float4", DataType.REAL), +Float8Type("float8", DataType.FLOAT8), +TextType("text", DataType.TEXT), +VarcharType("varchar", DataType.VARCHAR, (byte) 1, false), +ByteaType("bytea", DataType.BYTEA), +DateType("date", DataType.DATE), +TimestampType("timestamp", DataType.TIMESTAMP), +BoolType("bool", DataType.BOOLEAN), +NumericType("numeric", DataType.NUMERIC, (byte) 2, false), +BpcharType("bpchar", DataType.BPCHAR, (byte) 1, false), --- End diff -- Nvm, I understand why because the modifiers are not mandatory right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...
Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/873#discussion_r77082166 --- Diff: pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/EnumHawqType.java --- @@ -43,37 +45,40 @@ public void serialize(EnumHawqType value, JsonGenerator generator, */ @JsonSerialize(using = EnumHawqTypeSerializer.class) public enum EnumHawqType { -Int2Type("int2"), -Int4Type("int4"), -Int8Type("int8"), -Float4Type("float4"), -Float8Type("float8"), -TextType("text"), -VarcharType("varchar", (byte) 1, true), -ByteaType("bytea"), -DateType("date"), -TimestampType("timestamp"), -BoolType("bool"), -NumericType("numeric", (byte) 2, true), -BpcharType("bpchar", (byte) 1, true); +Int2Type("int2", DataType.SMALLINT), +Int4Type("int4", DataType.INTEGER), +Int8Type("int8", DataType.BIGINT), +Float4Type("float4", DataType.REAL), +Float8Type("float8", DataType.FLOAT8), +TextType("text", DataType.TEXT), +VarcharType("varchar", DataType.VARCHAR, (byte) 1, false), +ByteaType("bytea", DataType.BYTEA), +DateType("date", DataType.DATE), +TimestampType("timestamp", DataType.TIMESTAMP), +BoolType("bool", DataType.BOOLEAN), +NumericType("numeric", DataType.NUMERIC, (byte) 2, false), +BpcharType("bpchar", DataType.BPCHAR, (byte) 1, false), --- End diff -- Can you explain why the last parameter is changed to `false` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...
Github user GodenYao commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/873#discussion_r77079334 --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/EnumHiveToHawqType.java --- @@ -29,8 +35,8 @@ */ public enum EnumHiveToHawqType { -TinyintType("tinyint", EnumHawqType.Int2Type), -SmallintType("smallint", EnumHawqType.Int2Type), +TinyintType("tinyint", EnumHawqType.Int2Type, (byte) 1), +SmallintType("smallint", EnumHawqType.Int2Type, (byte) 2), --- End diff -- maybe some comments about adding the byte 1/2 here to differentiate the 2 types on hawq side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...
Github user GodenYao commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/873#discussion_r77077634 --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/EnumHiveToHawqType.java --- @@ -110,4 +122,68 @@ public static EnumHiveToHawqType getHiveToHawqType(String hiveType) { + hiveType + " to HAWQ's type"); } + +/** + * + * @param dataType Hawq data type + * @return compatible Hive type to given Hawq type, if there are more than one compatible types, it returns one with bigger size + * @throws UnsupportedTypeException if there is no corresponding Hive type for given Hawq type + */ +public static EnumHiveToHawqType getCompatibleHawqToHiveType(DataType dataType) { + +SortedSet types = new TreeSet( +new Comparator() { +public int compare(EnumHiveToHawqType a, +EnumHiveToHawqType b) { +return Byte.compare(a.getSize(), b.getSize()); +} +}); + +for (EnumHiveToHawqType t : values()) { +if (t.getHawqType().getDataType().equals(dataType)) { +types.add(t); +} +} + +if (types.size() == 0) +throw new UnsupportedTypeException("Unable to find compatible Hive type for given HAWQ's type: " + dataType); + --- End diff -- the error message should be reversed: > Unable to find compatible **HAWQ** type for given **Hive**'s type --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...
Github user GodenYao commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/873#discussion_r77076942 --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java --- @@ -256,4 +257,68 @@ private static boolean verifyIntegerModifiers(String[] modifiers) { throw new RuntimeException("Failed connecting to Hive MetaStore service: " + cause.getMessage(), cause); } } + + +/** + * Converts HAWQ type to hive type. The supported mappings are: + * {@code BOOLEAN -> boolean} + * {@code SMALLINT -> smallint (tinyint is converted to smallint)} + * {@code BIGINT -> bigint} + * {@code TIMESTAMP -> timestamp} + * {@code NUMERIC -> decimal} + * {@code BYTEA -> binary} + * {@code INTERGER -> int} + * {@code TEXT -> string} + * {@code REAL -> float} + * {@code FLOAT8 -> double} + * + * All other types (both in HAWQ and in HIVE) are not supported. + * + * @param type HAWQ data type + * @param name field name + * @return Hive type + * @throws UnsupportedTypeException if type is not supported + */ +public static String toCompatibleHiveType(DataType type) { + +EnumHiveToHawqType hiveToHawqType = EnumHiveToHawqType.getCompatibleHawqToHiveType(type); +return hiveToHawqType.getTypeName(); +} + + + +public static void validateTypeCompatible(DataType hawqDataType, Integer[] hawqTypeMods, String hiveType, String hawqColumnName) { + +EnumHiveToHawqType hiveToHawqType = EnumHiveToHawqType.getHiveToHawqType(hiveType); +EnumHawqType expectedHawqType = hiveToHawqType.getHawqType(); + +if (!expectedHawqType.getDataType().equals(hawqDataType)) { +throw new UnsupportedTypeException("Invalid definition for column " + hawqColumnName ++ ": expected HAWQ type " + expectedHawqType.getDataType() + +", actual HAWQ type " + hawqDataType); +} + +if ((hawqTypeMods == null || hawqTypeMods.length == 0) && expectedHawqType.isMandatoryModifiers()) +throw new UnsupportedTypeException("Invalid definition for column " + hawqColumnName + ": modifiers are mandatory for type " + expectedHawqType.getTypeName()); + +switch (hawqDataType) { +case NUMERIC: +case VARCHAR: +case BPCHAR: +case CHAR: +if (hawqTypeMods != null && hawqTypeMods.length > 0) { +Integer[] hiveTypeModifiers = EnumHiveToHawqType +.extractModifiers(hiveType); +for (int i = 0; i < hiveTypeModifiers.length; i++) { +if (hawqTypeMods[i] < hiveTypeModifiers[i]) +throw new UnsupportedTypeException( +"Invalid definition for column " + hawqColumnName ++ ": modifiers are not compatible, " ++ Arrays.toString(hiveTypeModifiers) + ", " ++ Arrays.toString(hawqTypeMods)); --- End diff -- same as above comment, should tell user the length for modifier , for numeric, it needs to be exactly match I suppose? for VARCHAR, BPCHAR and CHAR, if modifier exists on HAWQ side, it needs to be >= Hive same type modifier. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...
Github user GodenYao commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/873#discussion_r77076673 --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java --- @@ -256,4 +257,68 @@ private static boolean verifyIntegerModifiers(String[] modifiers) { throw new RuntimeException("Failed connecting to Hive MetaStore service: " + cause.getMessage(), cause); } } + + +/** + * Converts HAWQ type to hive type. The supported mappings are: + * {@code BOOLEAN -> boolean} + * {@code SMALLINT -> smallint (tinyint is converted to smallint)} + * {@code BIGINT -> bigint} + * {@code TIMESTAMP -> timestamp} + * {@code NUMERIC -> decimal} + * {@code BYTEA -> binary} + * {@code INTERGER -> int} + * {@code TEXT -> string} + * {@code REAL -> float} + * {@code FLOAT8 -> double} + * --- End diff -- based on the logic we also support: ` case VARCHAR: +case BPCHAR: +case CHAR: ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (HAWQ-1038) Missing BPCHAR in Data Type
[ https://issues.apache.org/jira/browse/HAWQ-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Goden Yao updated HAWQ-1038: Fix Version/s: backlog > Missing BPCHAR in Data Type > --- > > Key: HAWQ-1038 > URL: https://issues.apache.org/jira/browse/HAWQ-1038 > Project: Apache HAWQ > Issue Type: Bug > Components: Documentation >Reporter: Goden Yao >Assignee: David Yozie > Fix For: backlog > > > referring to 3rd party site: > http://hdb.docs.pivotal.io/20/reference/catalog/pg_type.html > and > http://hdb.docs.pivotal.io/20/reference/HAWQDataTypes.html > It's quite out of date if you check source code: > https://github.com/apache/incubator-hawq/blob/master/src/interfaces/ecpg/ecpglib/pg_type.h > {code} > ... > #define BPCHAROID 1042 > ... > {code} > We at least miss BPCHAR in the type table, maybe more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1038) Missing BPCHAR in Data Type
[ https://issues.apache.org/jira/browse/HAWQ-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Goden Yao updated HAWQ-1038: Summary: Missing BPCHAR in Data Type (was: Missing bpchar in Data Type) > Missing BPCHAR in Data Type > --- > > Key: HAWQ-1038 > URL: https://issues.apache.org/jira/browse/HAWQ-1038 > Project: Apache HAWQ > Issue Type: Bug > Components: Documentation >Reporter: Goden Yao >Assignee: David Yozie > Fix For: backlog > > > referring to 3rd party site: > http://hdb.docs.pivotal.io/20/reference/catalog/pg_type.html > and > http://hdb.docs.pivotal.io/20/reference/HAWQDataTypes.html > It's quite out of date if you check source code: > https://github.com/apache/incubator-hawq/blob/master/src/interfaces/ecpg/ecpglib/pg_type.h > {code} > ... > #define BPCHAROID 1042 > ... > {code} > We at least miss BPCHAR in the type table, maybe more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HAWQ-1038) Missing bpchar in Data Type
Goden Yao created HAWQ-1038: --- Summary: Missing bpchar in Data Type Key: HAWQ-1038 URL: https://issues.apache.org/jira/browse/HAWQ-1038 Project: Apache HAWQ Issue Type: Bug Components: Documentation Reporter: Goden Yao Assignee: David Yozie referring to 3rd party site: http://hdb.docs.pivotal.io/20/reference/catalog/pg_type.html and http://hdb.docs.pivotal.io/20/reference/HAWQDataTypes.html It's quite out of date if you check source code: https://github.com/apache/incubator-hawq/blob/master/src/interfaces/ecpg/ecpglib/pg_type.h {code} ... #define BPCHAROID 1042 ... {code} We at least miss BPCHAR in the type table, maybe more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...
Github user GodenYao commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/873#discussion_r77071575 --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java --- @@ -256,4 +257,68 @@ private static boolean verifyIntegerModifiers(String[] modifiers) { throw new RuntimeException("Failed connecting to Hive MetaStore service: " + cause.getMessage(), cause); } } + + +/** + * Converts HAWQ type to hive type. The supported mappings are: + * {@code BOOLEAN -> boolean} + * {@code SMALLINT -> smallint (tinyint is converted to smallint)} + * {@code BIGINT -> bigint} + * {@code TIMESTAMP -> timestamp} + * {@code NUMERIC -> decimal} + * {@code BYTEA -> binary} + * {@code INTERGER -> int} + * {@code TEXT -> string} + * {@code REAL -> float} + * {@code FLOAT8 -> double} + * + * All other types (both in HAWQ and in HIVE) are not supported. + * + * @param type HAWQ data type + * @param name field name + * @return Hive type + * @throws UnsupportedTypeException if type is not supported + */ +public static String toCompatibleHiveType(DataType type) { + +EnumHiveToHawqType hiveToHawqType = EnumHiveToHawqType.getCompatibleHawqToHiveType(type); +return hiveToHawqType.getTypeName(); +} + + + +public static void validateTypeCompatible(DataType hawqDataType, Integer[] hawqTypeMods, String hiveType, String hawqColumnName) { + +EnumHiveToHawqType hiveToHawqType = EnumHiveToHawqType.getHiveToHawqType(hiveType); +EnumHawqType expectedHawqType = hiveToHawqType.getHawqType(); + +if (!expectedHawqType.getDataType().equals(hawqDataType)) { +throw new UnsupportedTypeException("Invalid definition for column " + hawqColumnName ++ ": expected HAWQ type " + expectedHawqType.getDataType() + +", actual HAWQ type " + hawqDataType); +} + +if ((hawqTypeMods == null || hawqTypeMods.length == 0) && expectedHawqType.isMandatoryModifiers()) +throw new UnsupportedTypeException("Invalid definition for column " + hawqColumnName + ": modifiers are mandatory for type " + expectedHawqType.getTypeName()); + --- End diff -- In the case of Hive: Char(xxx) with a fixed length to HAWQ: Char , length omitted, default to 1, we should give user information about the length from Hive: Char(xxx), so they can modify the table definition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...
Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/873#discussion_r77047856 --- Diff: pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/ColumnDescriptor.java --- @@ -26,10 +26,11 @@ */ public class ColumnDescriptor { - int gpdbColumnTypeCode; -String gpdbColumnName; -String gpdbColumnTypeName; -int gpdbColumnIndex; + int dbColumnTypeCode; --- End diff -- Indent --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #837: HAWQ-779 support pxf filter pushdwon at th...
Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/837#discussion_r77047594 --- Diff: pxf/pxf-hbase/src/main/java/org/apache/hawq/pxf/plugins/hbase/HBaseFilterBuilder.java --- @@ -165,6 +165,14 @@ private Filter handleSimpleOperations(FilterParser.Operation opId, ByteArrayComparable comparator = getComparator(hbaseColumn.columnTypeCode(), constant.constant()); +if(operatorsMap.get(opId) == null){ +//HBase not support HDOP_LIKE, use 'NOT NULL' Comarator --- End diff -- No, @hsyuan comment was that the comment should read "//HBase does not support..." HBase does not support the LIKE Filter as far as I am aware --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (HAWQ-1032) Bucket number of newly added partition is not consistent with parent table.
[ https://issues.apache.org/jira/browse/HAWQ-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Goden Yao updated HAWQ-1032: Summary: Bucket number of newly added partition is not consistent with parent table. (was: Bucket number of new added partition is not consistent with parent table.) > Bucket number of newly added partition is not consistent with parent table. > --- > > Key: HAWQ-1032 > URL: https://issues.apache.org/jira/browse/HAWQ-1032 > Project: Apache HAWQ > Issue Type: Bug > Components: Core >Reporter: Hubert Zhang >Assignee: Hubert Zhang > Fix For: 2.0.1.0-incubating > > > Failure Case > {code} > set deafult_hash_table_bucket_number = 12; > CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) > DISTRIBUTED BY (id) > PARTITION BY RANGE (date) > ( START (date '2008-01-01') INCLUSIVE >END (date '2009-01-01') EXCLUSIVE >EVERY (INTERVAL '1 day') ); > set deafult_hash_table_bucket_number = 16; > ALTER TABLE sales3 ADD PARTITION START > (date '2009-03-01') INCLUSIVE END > (date '2009-04-01') EXCLUSIVE; > {code} > The newly added partition with buckcet number 16 is not consistent with > parent partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1036) Support user impersonation in PXF for external tables
[ https://issues.apache.org/jira/browse/HAWQ-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Goden Yao updated HAWQ-1036: Summary: Support user impersonation in PXF for external tables (was: Support user impersonation in HAWQ) > Support user impersonation in PXF for external tables > - > > Key: HAWQ-1036 > URL: https://issues.apache.org/jira/browse/HAWQ-1036 > Project: Apache HAWQ > Issue Type: New Feature > Components: PXF, Security >Reporter: Alastair "Bell" Turner >Assignee: Goden Yao >Priority: Critical > Fix For: backlog > > Attachments: HAWQ_Impersonation_rationale.txt > > > Currently HAWQ executes all queries as the user running the HAWQ process or > the user running the PXF process, not as the user who issued the query via > ODBC/JDBC/... This restricts the options available for integrating with > existing security defined in HDFS, Hive, etc. > Impersonation provides an alternative Ranger integration (as discussed in > HAWQ-256 ) for consistent security across HAWQ, HDFS, Hive... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1036) Support user impersonation in HAWQ
[ https://issues.apache.org/jira/browse/HAWQ-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Goden Yao updated HAWQ-1036: Priority: Critical (was: Major) > Support user impersonation in HAWQ > -- > > Key: HAWQ-1036 > URL: https://issues.apache.org/jira/browse/HAWQ-1036 > Project: Apache HAWQ > Issue Type: New Feature > Components: PXF, Security >Reporter: Alastair "Bell" Turner >Assignee: Goden Yao >Priority: Critical > Fix For: backlog > > Attachments: HAWQ_Impersonation_rationale.txt > > > Currently HAWQ executes all queries as the user running the HAWQ process or > the user running the PXF process, not as the user who issued the query via > ODBC/JDBC/... This restricts the options available for integrating with > existing security defined in HDFS, Hive, etc. > Impersonation provides an alternative Ranger integration (as discussed in > HAWQ-256 ) for consistent security across HAWQ, HDFS, Hive... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1032) Bucket number of newly added partition is not consistent with parent table.
[ https://issues.apache.org/jira/browse/HAWQ-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452924#comment-15452924 ] Goden Yao commented on HAWQ-1032: - what's the partition number of newly added partition in this case? or do you see any errors? > Bucket number of newly added partition is not consistent with parent table. > --- > > Key: HAWQ-1032 > URL: https://issues.apache.org/jira/browse/HAWQ-1032 > Project: Apache HAWQ > Issue Type: Bug > Components: Core >Reporter: Hubert Zhang >Assignee: Hubert Zhang > Fix For: 2.0.1.0-incubating > > > Failure Case > {code} > set deafult_hash_table_bucket_number = 12; > CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) > DISTRIBUTED BY (id) > PARTITION BY RANGE (date) > ( START (date '2008-01-01') INCLUSIVE >END (date '2009-01-01') EXCLUSIVE >EVERY (INTERVAL '1 day') ); > set deafult_hash_table_bucket_number = 16; > ALTER TABLE sales3 ADD PARTITION START > (date '2009-03-01') INCLUSIVE END > (date '2009-04-01') EXCLUSIVE; > {code} > The newly added partition with buckcet number 16 is not consistent with > parent partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1032) Bucket number of new added partition is not consistent with parent table.
[ https://issues.apache.org/jira/browse/HAWQ-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Goden Yao updated HAWQ-1032: Description: Failure Case {code} set deafult_hash_table_bucket_number = 12; CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) DISTRIBUTED BY (id) PARTITION BY RANGE (date) ( START (date '2008-01-01') INCLUSIVEEND (date '2009-01-01') EXCLUSIVE EVERY (INTERVAL '1 day') ); set deafult_hash_table_bucket_number = 16; ALTER TABLE sales3 ADD PARTITION START (date '2009-03-01') INCLUSIVE END (date '2009-04-01') EXCLUSIVE; {code} The newly added partition with buckcet number 16 is not consistent with parent partition. was: Failure Case set deafult_hash_table_bucket_number = 12; CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) DISTRIBUTED BY (id) PARTITION BY RANGE (date) ( START (date '2008-01-01') INCLUSIVEEND (date '2009-01-01') EXCLUSIVE EVERY (INTERVAL '1 day') ); set deafult_hash_table_bucket_number = 16; ALTER TABLE sales3 ADD PARTITION START (date '2009-03-01') INCLUSIVE END (date '2009-04-01') EXCLUSIVE; The new added partition with bukcet number 16 which is not consistent with parent partition. > Bucket number of new added partition is not consistent with parent table. > - > > Key: HAWQ-1032 > URL: https://issues.apache.org/jira/browse/HAWQ-1032 > Project: Apache HAWQ > Issue Type: Bug > Components: Core >Reporter: Hubert Zhang >Assignee: Hubert Zhang > Fix For: 2.0.1.0-incubating > > > Failure Case > {code} > set deafult_hash_table_bucket_number = 12; > CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) > DISTRIBUTED BY (id) > PARTITION BY RANGE (date) > ( START (date '2008-01-01') INCLUSIVE >END (date '2009-01-01') EXCLUSIVE >EVERY (INTERVAL '1 day') ); > set deafult_hash_table_bucket_number = 16; > ALTER TABLE sales3 ADD PARTITION START > (date '2009-03-01') INCLUSIVE END > (date '2009-04-01') EXCLUSIVE; > {code} > The newly added partition with buckcet number 16 is not consistent with > parent partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1032) Bucket number of new added partition is not consistent with parent table.
[ https://issues.apache.org/jira/browse/HAWQ-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Goden Yao updated HAWQ-1032: Fix Version/s: 2.0.1.0-incubating > Bucket number of new added partition is not consistent with parent table. > - > > Key: HAWQ-1032 > URL: https://issues.apache.org/jira/browse/HAWQ-1032 > Project: Apache HAWQ > Issue Type: Bug > Components: Core >Reporter: Hubert Zhang >Assignee: Hubert Zhang > Fix For: 2.0.1.0-incubating > > > Failure Case > set deafult_hash_table_bucket_number = 12; > CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) > DISTRIBUTED BY (id) > PARTITION BY RANGE (date) > ( START (date '2008-01-01') INCLUSIVE >END (date '2009-01-01') EXCLUSIVE >EVERY (INTERVAL '1 day') ); > set deafult_hash_table_bucket_number = 16; > ALTER TABLE sales3 ADD PARTITION START > (date '2009-03-01') INCLUSIVE END > (date '2009-04-01') EXCLUSIVE; > The new added partition with bukcet number 16 which is not consistent with > parent partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1036) Support user impersonation in HAWQ
[ https://issues.apache.org/jira/browse/HAWQ-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Goden Yao updated HAWQ-1036: Assignee: Goden Yao (was: Lei Chang) > Support user impersonation in HAWQ > -- > > Key: HAWQ-1036 > URL: https://issues.apache.org/jira/browse/HAWQ-1036 > Project: Apache HAWQ > Issue Type: New Feature > Components: PXF, Security >Reporter: Alastair "Bell" Turner >Assignee: Goden Yao > Fix For: backlog > > Attachments: HAWQ_Impersonation_rationale.txt > > > Currently HAWQ executes all queries as the user running the HAWQ process or > the user running the PXF process, not as the user who issued the query via > ODBC/JDBC/... This restricts the options available for integrating with > existing security defined in HDFS, Hive, etc. > Impersonation provides an alternative Ranger integration (as discussed in > HAWQ-256 ) for consistent security across HAWQ, HDFS, Hive... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1037) modify way to get HDFS port in TestHawqRegister
[ https://issues.apache.org/jira/browse/HAWQ-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Goden Yao updated HAWQ-1037: Fix Version/s: backlog > modify way to get HDFS port in TestHawqRegister > --- > > Key: HAWQ-1037 > URL: https://issues.apache.org/jira/browse/HAWQ-1037 > Project: Apache HAWQ > Issue Type: Bug > Components: Tests >Reporter: Chunling Wang >Assignee: Chunling Wang > Fix For: backlog > > > In test TestHawqRegister, the HDFS port is hard-coded. Now we get the HDFS > port from HdfsConfig. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger
[ https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452587#comment-15452587 ] Alastair "Bell" Turner commented on HAWQ-256: - Thanks [~lilima] There are three gpadmin users and I think we could have a better discussion if we give them different names. 1. The gpadmin operating system user who own the HAWQ processes and the /hawq/* data on the local file system (OSGPAdmin). This user is not relevant to this issue. 2. The gpadmin Hadoop user (HAWQFileOwner). This is user identity used for HAWQ to access HDFS and owns the files created by HAWQ in HDFS. 3. The gpadmin user in HAWQ (HAWQSuperUser). This user is subject to very few, if any, restrictions on access to data held in HAWQ. For PXF there is also a user which accessed HDFS, Hive, etc on behalf of PXF queries. For consistency let's call this PXFFileOwner. My question about gpadmin access to data in Ranger managed tables is about access by HAWQSuperUser: If access to a table is managed by Ranger then the files containing that table's data in HDFS would be owned by HAWQFileOwner. This is not an issue as long as nobody can log in as HAWQFileOwner. The problem occurs when HAWQSuperUser can read any data in any table. This is currently the case for HAWQ internal tables. If PXFFileOwner has access to data then HAWQSuperUser would also be able to access it through external tables. If access on a database was managed by Ranger through this feature would HAWQSuperUser have access to read the data in that table? If only users authenticated through Ranger had access to data in the table it would not matter that HAWQFileOwner controlled the underlying file, HAWQ would be acting as a PEP and controlling access to the data. This is different from the scenario which I describe in HAWQ-1036 where policy is enforced by HDFS. Either approach would satisfy the requirement for HAWQSuperUser not to have access to the data. > Integrate Security with Apache Ranger > - > > Key: HAWQ-256 > URL: https://issues.apache.org/jira/browse/HAWQ-256 > Project: Apache HAWQ > Issue Type: New Feature > Components: PXF, Security >Reporter: Michael Andre Pearce (IG) >Assignee: Lili Ma > Fix For: backlog > > Attachments: HAWQRangerSupportDesign.pdf, > HAWQRangerSupportDesign_v0.2.pdf > > > Integrate security with Apache Ranger for a unified Hadoop security solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-975) Queries run much slower with 'explain analyze' than which without 'explain analyze'
[ https://issues.apache.org/jira/browse/HAWQ-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chunling Wang updated HAWQ-975: --- Affects Version/s: 2.0.0.0-incubating > Queries run much slower with 'explain analyze' than which without 'explain > analyze' > > > Key: HAWQ-975 > URL: https://issues.apache.org/jira/browse/HAWQ-975 > Project: Apache HAWQ > Issue Type: Bug > Components: Core >Affects Versions: 2.0.0.0-incubating >Reporter: Chunling Wang >Assignee: Chunling Wang >Priority: Critical > Labels: performance > Fix For: 2.0.1.0-incubating > > > When we run queries with 'explain analyze' in AWS cluster, the total running > time is about 2-3 times longer than which without 'explain analyze'. > Here is a group of TPC-H results for queries with 'explain analyze' and > queries without 'explain analyze'. > ||query ||without 'explain analyze' ||with 'explain analyze' > ||multiple > |TPCH_Query_01| 311843 | 818658 | 2.63 > |TPCH_Query_02| 34675 | 117884 | 3.40 > |TPCH_Query_03| 166155 | 422131 | 2.54 > |TPCH_Query_04| 157807 | 507143 | 3.21 > |TPCH_Query_05| 272657 | 710573 | 2.61 > |TPCH_Query_06| 12508 | 22276 | 1.78 > |TPCH_Query_07| 71893 | 370338 | 5.15 > |TPCH_Query_08| 12 | 672625 | 5.17 > |TPCH_Query_09| 575709 | 1171672 | 2.04 > |TPCH_Query_10| 93770 | 233391 | 2.49 > |TPCH_Query_11| 16252 | 58360 | 3.59 > |TPCH_Query_12| 142576 | 237270 | 1.66 > |TPCH_Query_13| 72682 | 343257 | 4.72 > |TPCH_Query_14| 10410 | 32337 | 3.11 > |TPCH_Query_15| 25719 | 98705 | 3.84 > |TPCH_Query_16| 21382 | 76877 | 3.60 > |TPCH_Query_17| 839683 | 2041169 | 2.43 > |TPCH_Query_18| 460570 | 1065940 | 2.31 > |TPCH_Query_19| 69075 | 82286 | 1.19 > |TPCH_Query_20| 78263 | 292041 | 3.73 > |TPCH_Query_21| 505606 | 1549690 | 3.07 > |TPCH_Query_22| 56450 | 329837 | 5.84 > |Total| 4125684 | 11254460| > 2.73 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (HAWQ-975) Queries run much slower with 'explain analyze' than which without 'explain analyze'
[ https://issues.apache.org/jira/browse/HAWQ-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chunling Wang closed HAWQ-975. -- Resolution: Not A Bug > Queries run much slower with 'explain analyze' than which without 'explain > analyze' > > > Key: HAWQ-975 > URL: https://issues.apache.org/jira/browse/HAWQ-975 > Project: Apache HAWQ > Issue Type: Bug > Components: Core >Affects Versions: 2.0.0.0-incubating >Reporter: Chunling Wang >Assignee: Chunling Wang >Priority: Critical > Labels: performance > Fix For: 2.0.1.0-incubating > > > When we run queries with 'explain analyze' in AWS cluster, the total running > time is about 2-3 times longer than which without 'explain analyze'. > Here is a group of TPC-H results for queries with 'explain analyze' and > queries without 'explain analyze'. > ||query ||without 'explain analyze' ||with 'explain analyze' > ||multiple > |TPCH_Query_01| 311843 | 818658 | 2.63 > |TPCH_Query_02| 34675 | 117884 | 3.40 > |TPCH_Query_03| 166155 | 422131 | 2.54 > |TPCH_Query_04| 157807 | 507143 | 3.21 > |TPCH_Query_05| 272657 | 710573 | 2.61 > |TPCH_Query_06| 12508 | 22276 | 1.78 > |TPCH_Query_07| 71893 | 370338 | 5.15 > |TPCH_Query_08| 12 | 672625 | 5.17 > |TPCH_Query_09| 575709 | 1171672 | 2.04 > |TPCH_Query_10| 93770 | 233391 | 2.49 > |TPCH_Query_11| 16252 | 58360 | 3.59 > |TPCH_Query_12| 142576 | 237270 | 1.66 > |TPCH_Query_13| 72682 | 343257 | 4.72 > |TPCH_Query_14| 10410 | 32337 | 3.11 > |TPCH_Query_15| 25719 | 98705 | 3.84 > |TPCH_Query_16| 21382 | 76877 | 3.60 > |TPCH_Query_17| 839683 | 2041169 | 2.43 > |TPCH_Query_18| 460570 | 1065940 | 2.31 > |TPCH_Query_19| 69075 | 82286 | 1.19 > |TPCH_Query_20| 78263 | 292041 | 3.73 > |TPCH_Query_21| 505606 | 1549690 | 3.07 > |TPCH_Query_22| 56450 | 329837 | 5.84 > |Total| 4125684 | 11254460| > 2.73 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HAWQ-975) Queries run much slower with 'explain analyze' than which without 'explain analyze'
[ https://issues.apache.org/jira/browse/HAWQ-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chunling Wang reopened HAWQ-975: > Queries run much slower with 'explain analyze' than which without 'explain > analyze' > > > Key: HAWQ-975 > URL: https://issues.apache.org/jira/browse/HAWQ-975 > Project: Apache HAWQ > Issue Type: Bug > Components: Core >Reporter: Chunling Wang >Assignee: Chunling Wang >Priority: Critical > Labels: performance > Fix For: 2.0.1.0-incubating > > > When we run queries with 'explain analyze' in AWS cluster, the total running > time is about 2-3 times longer than which without 'explain analyze'. > Here is a group of TPC-H results for queries with 'explain analyze' and > queries without 'explain analyze'. > ||query ||without 'explain analyze' ||with 'explain analyze' > ||multiple > |TPCH_Query_01| 311843 | 818658 | 2.63 > |TPCH_Query_02| 34675 | 117884 | 3.40 > |TPCH_Query_03| 166155 | 422131 | 2.54 > |TPCH_Query_04| 157807 | 507143 | 3.21 > |TPCH_Query_05| 272657 | 710573 | 2.61 > |TPCH_Query_06| 12508 | 22276 | 1.78 > |TPCH_Query_07| 71893 | 370338 | 5.15 > |TPCH_Query_08| 12 | 672625 | 5.17 > |TPCH_Query_09| 575709 | 1171672 | 2.04 > |TPCH_Query_10| 93770 | 233391 | 2.49 > |TPCH_Query_11| 16252 | 58360 | 3.59 > |TPCH_Query_12| 142576 | 237270 | 1.66 > |TPCH_Query_13| 72682 | 343257 | 4.72 > |TPCH_Query_14| 10410 | 32337 | 3.11 > |TPCH_Query_15| 25719 | 98705 | 3.84 > |TPCH_Query_16| 21382 | 76877 | 3.60 > |TPCH_Query_17| 839683 | 2041169 | 2.43 > |TPCH_Query_18| 460570 | 1065940 | 2.31 > |TPCH_Query_19| 69075 | 82286 | 1.19 > |TPCH_Query_20| 78263 | 292041 | 3.73 > |TPCH_Query_21| 505606 | 1549690 | 3.07 > |TPCH_Query_22| 56450 | 329837 | 5.84 > |Total| 4125684 | 11254460| > 2.73 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HAWQ-975) Queries run much slower with 'explain analyze' than which without 'explain analyze'
[ https://issues.apache.org/jira/browse/HAWQ-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chunling Wang resolved HAWQ-975. Resolution: Not A Bug It is a system configuration issue other than a bug in HAWQ. > Queries run much slower with 'explain analyze' than which without 'explain > analyze' > > > Key: HAWQ-975 > URL: https://issues.apache.org/jira/browse/HAWQ-975 > Project: Apache HAWQ > Issue Type: Bug > Components: Core >Reporter: Chunling Wang >Assignee: Chunling Wang >Priority: Critical > Labels: performance > Fix For: 2.0.1.0-incubating > > > When we run queries with 'explain analyze' in AWS cluster, the total running > time is about 2-3 times longer than which without 'explain analyze'. > Here is a group of TPC-H results for queries with 'explain analyze' and > queries without 'explain analyze'. > ||query ||without 'explain analyze' ||with 'explain analyze' > ||multiple > |TPCH_Query_01| 311843 | 818658 | 2.63 > |TPCH_Query_02| 34675 | 117884 | 3.40 > |TPCH_Query_03| 166155 | 422131 | 2.54 > |TPCH_Query_04| 157807 | 507143 | 3.21 > |TPCH_Query_05| 272657 | 710573 | 2.61 > |TPCH_Query_06| 12508 | 22276 | 1.78 > |TPCH_Query_07| 71893 | 370338 | 5.15 > |TPCH_Query_08| 12 | 672625 | 5.17 > |TPCH_Query_09| 575709 | 1171672 | 2.04 > |TPCH_Query_10| 93770 | 233391 | 2.49 > |TPCH_Query_11| 16252 | 58360 | 3.59 > |TPCH_Query_12| 142576 | 237270 | 1.66 > |TPCH_Query_13| 72682 | 343257 | 4.72 > |TPCH_Query_14| 10410 | 32337 | 3.11 > |TPCH_Query_15| 25719 | 98705 | 3.84 > |TPCH_Query_16| 21382 | 76877 | 3.60 > |TPCH_Query_17| 839683 | 2041169 | 2.43 > |TPCH_Query_18| 460570 | 1065940 | 2.31 > |TPCH_Query_19| 69075 | 82286 | 1.19 > |TPCH_Query_20| 78263 | 292041 | 3.73 > |TPCH_Query_21| 505606 | 1549690 | 3.07 > |TPCH_Query_22| 56450 | 329837 | 5.84 > |Total| 4125684 | 11254460| > 2.73 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HAWQ-975) Queries run much slower with 'explain analyze' than which without 'explain analyze'
[ https://issues.apache.org/jira/browse/HAWQ-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451831#comment-15451831 ] Chunling Wang edited comment on HAWQ-975 at 8/31/16 10:30 AM: -- The performance of explain analyze on AWS is low because the VDSO on agents of AWS is not properly configured and does not work well. To be specific, gettimeofday() takes too much time. was (Author: wcl14): It is because that the VDSO on agents of AWS does not work well. So the execution time of function 'gettimeofday()' is too much. > Queries run much slower with 'explain analyze' than which without 'explain > analyze' > > > Key: HAWQ-975 > URL: https://issues.apache.org/jira/browse/HAWQ-975 > Project: Apache HAWQ > Issue Type: Bug > Components: Core >Reporter: Chunling Wang >Assignee: Chunling Wang >Priority: Critical > Labels: performance > Fix For: 2.0.1.0-incubating > > > When we run queries with 'explain analyze' in AWS cluster, the total running > time is about 2-3 times longer than which without 'explain analyze'. > Here is a group of TPC-H results for queries with 'explain analyze' and > queries without 'explain analyze'. > ||query ||without 'explain analyze' ||with 'explain analyze' > ||multiple > |TPCH_Query_01| 311843 | 818658 | 2.63 > |TPCH_Query_02| 34675 | 117884 | 3.40 > |TPCH_Query_03| 166155 | 422131 | 2.54 > |TPCH_Query_04| 157807 | 507143 | 3.21 > |TPCH_Query_05| 272657 | 710573 | 2.61 > |TPCH_Query_06| 12508 | 22276 | 1.78 > |TPCH_Query_07| 71893 | 370338 | 5.15 > |TPCH_Query_08| 12 | 672625 | 5.17 > |TPCH_Query_09| 575709 | 1171672 | 2.04 > |TPCH_Query_10| 93770 | 233391 | 2.49 > |TPCH_Query_11| 16252 | 58360 | 3.59 > |TPCH_Query_12| 142576 | 237270 | 1.66 > |TPCH_Query_13| 72682 | 343257 | 4.72 > |TPCH_Query_14| 10410 | 32337 | 3.11 > |TPCH_Query_15| 25719 | 98705 | 3.84 > |TPCH_Query_16| 21382 | 76877 | 3.60 > |TPCH_Query_17| 839683 | 2041169 | 2.43 > |TPCH_Query_18| 460570 | 1065940 | 2.31 > |TPCH_Query_19| 69075 | 82286 | 1.19 > |TPCH_Query_20| 78263 | 292041 | 3.73 > |TPCH_Query_21| 505606 | 1549690 | 3.07 > |TPCH_Query_22| 56450 | 329837 | 5.84 > |Total| 4125684 | 11254460| > 2.73 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HAWQ-975) Queries run much slower with 'explain analyze' than which without 'explain analyze'
[ https://issues.apache.org/jira/browse/HAWQ-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chunling Wang reassigned HAWQ-975: -- Assignee: Chunling Wang (was: Lei Chang) > Queries run much slower with 'explain analyze' than which without 'explain > analyze' > > > Key: HAWQ-975 > URL: https://issues.apache.org/jira/browse/HAWQ-975 > Project: Apache HAWQ > Issue Type: Bug > Components: Core >Reporter: Chunling Wang >Assignee: Chunling Wang >Priority: Critical > Labels: performance > Fix For: 2.0.1.0-incubating > > > When we run queries with 'explain analyze' in AWS cluster, the total running > time is about 2-3 times longer than which without 'explain analyze'. > Here is a group of TPC-H results for queries with 'explain analyze' and > queries without 'explain analyze'. > ||query ||without 'explain analyze' ||with 'explain analyze' > ||multiple > |TPCH_Query_01| 311843 | 818658 | 2.63 > |TPCH_Query_02| 34675 | 117884 | 3.40 > |TPCH_Query_03| 166155 | 422131 | 2.54 > |TPCH_Query_04| 157807 | 507143 | 3.21 > |TPCH_Query_05| 272657 | 710573 | 2.61 > |TPCH_Query_06| 12508 | 22276 | 1.78 > |TPCH_Query_07| 71893 | 370338 | 5.15 > |TPCH_Query_08| 12 | 672625 | 5.17 > |TPCH_Query_09| 575709 | 1171672 | 2.04 > |TPCH_Query_10| 93770 | 233391 | 2.49 > |TPCH_Query_11| 16252 | 58360 | 3.59 > |TPCH_Query_12| 142576 | 237270 | 1.66 > |TPCH_Query_13| 72682 | 343257 | 4.72 > |TPCH_Query_14| 10410 | 32337 | 3.11 > |TPCH_Query_15| 25719 | 98705 | 3.84 > |TPCH_Query_16| 21382 | 76877 | 3.60 > |TPCH_Query_17| 839683 | 2041169 | 2.43 > |TPCH_Query_18| 460570 | 1065940 | 2.31 > |TPCH_Query_19| 69075 | 82286 | 1.19 > |TPCH_Query_20| 78263 | 292041 | 3.73 > |TPCH_Query_21| 505606 | 1549690 | 3.07 > |TPCH_Query_22| 56450 | 329837 | 5.84 > |Total| 4125684 | 11254460| > 2.73 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-975) Queries run much slower with 'explain analyze' than which without 'explain analyze'
[ https://issues.apache.org/jira/browse/HAWQ-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451831#comment-15451831 ] Chunling Wang commented on HAWQ-975: It is because that the VDSO on agents of AWS does not work well. So the execution time of function 'gettimeofday()' is too much. > Queries run much slower with 'explain analyze' than which without 'explain > analyze' > > > Key: HAWQ-975 > URL: https://issues.apache.org/jira/browse/HAWQ-975 > Project: Apache HAWQ > Issue Type: Bug > Components: Core >Reporter: Chunling Wang >Assignee: Lei Chang >Priority: Critical > Labels: performance > Fix For: 2.0.1.0-incubating > > > When we run queries with 'explain analyze' in AWS cluster, the total running > time is about 2-3 times longer than which without 'explain analyze'. > Here is a group of TPC-H results for queries with 'explain analyze' and > queries without 'explain analyze'. > ||query ||without 'explain analyze' ||with 'explain analyze' > ||multiple > |TPCH_Query_01| 311843 | 818658 | 2.63 > |TPCH_Query_02| 34675 | 117884 | 3.40 > |TPCH_Query_03| 166155 | 422131 | 2.54 > |TPCH_Query_04| 157807 | 507143 | 3.21 > |TPCH_Query_05| 272657 | 710573 | 2.61 > |TPCH_Query_06| 12508 | 22276 | 1.78 > |TPCH_Query_07| 71893 | 370338 | 5.15 > |TPCH_Query_08| 12 | 672625 | 5.17 > |TPCH_Query_09| 575709 | 1171672 | 2.04 > |TPCH_Query_10| 93770 | 233391 | 2.49 > |TPCH_Query_11| 16252 | 58360 | 3.59 > |TPCH_Query_12| 142576 | 237270 | 1.66 > |TPCH_Query_13| 72682 | 343257 | 4.72 > |TPCH_Query_14| 10410 | 32337 | 3.11 > |TPCH_Query_15| 25719 | 98705 | 3.84 > |TPCH_Query_16| 21382 | 76877 | 3.60 > |TPCH_Query_17| 839683 | 2041169 | 2.43 > |TPCH_Query_18| 460570 | 1065940 | 2.31 > |TPCH_Query_19| 69075 | 82286 | 1.19 > |TPCH_Query_20| 78263 | 292041 | 3.73 > |TPCH_Query_21| 505606 | 1549690 | 3.07 > |TPCH_Query_22| 56450 | 329837 | 5.84 > |Total| 4125684 | 11254460| > 2.73 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-1037) modify way to get HDFS port in TestHawqRegister
[ https://issues.apache.org/jira/browse/HAWQ-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chunling Wang updated HAWQ-1037: Summary: modify way to get HDFS port in TestHawqRegister (was: modify to get HDFS port in TestHawqRegister) > modify way to get HDFS port in TestHawqRegister > --- > > Key: HAWQ-1037 > URL: https://issues.apache.org/jira/browse/HAWQ-1037 > Project: Apache HAWQ > Issue Type: Bug > Components: Tests >Reporter: Chunling Wang >Assignee: Chunling Wang > > In test TestHawqRegister, the HDFS port is hard-coded. Now we get the HDFS > port from HdfsConfig. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger
[ https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451720#comment-15451720 ] Hubert Zhang commented on HAWQ-256: --- +1 for two stage authorization. Hawq ranger plugin(REST service) manages the access privilege of hawq object, include database, table, function, language and so on. While HDFS ranger plugin manages the access privilege of hdfs file. They are not conflicted with each other. User must first have the privilege to access hawq object(calculated in planner), next user also need to have the privilege to access the hdfs file. Currently, hawq use the admin user to create/append hdfs file, this is convenient for hawq user management. For example, user A own table t1, and if user A grant select and insert privilege of table t1 to user B, user B can directly access table t1, because on HDFS, the files of table t1 are created and accessed both by admin. But user-identity passing down will lead to table t1 is created by user A and user B cannot access file directly, unless add user B to user A's group, or change the file privilege. I do agree "user-identity passing down" is useful especially in hadoop eco, but when implementing it, pay attention to the problem I mentioned above.(Also this is beyond the discussion of issue256) > Integrate Security with Apache Ranger > - > > Key: HAWQ-256 > URL: https://issues.apache.org/jira/browse/HAWQ-256 > Project: Apache HAWQ > Issue Type: New Feature > Components: PXF, Security >Reporter: Michael Andre Pearce (IG) >Assignee: Lili Ma > Fix For: backlog > > Attachments: HAWQRangerSupportDesign.pdf, > HAWQRangerSupportDesign_v0.2.pdf > > > Integrate security with Apache Ranger for a unified Hadoop security solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HAWQ-256) Integrate Security with Apache Ranger
[ https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451558#comment-15451558 ] Lili Ma edited comment on HAWQ-256 at 8/31/16 8:24 AM: --- [~thebellhead], quit good questions! 1. In order for tools, syntax checking, etc to work everyone (the HAWQ public role) requires access to the catalog and some of the toolkit. Will Ranger-only access control apply only to user created tables, views and external tables? Yes, since the catalog tables and toolkits are shared and used by various users, Ranger-only access control just applies to user defined objects. But the objects include not only database, table and view, but also include function, language, schema, tablespace and protocol. You can find the detailed objects and privileges in the design doc. I have reviewed your proposal in HAWQ-1036, could you share what's your handing for the objects which don't lie in HDFS layer such as function, schema, language, etc? 2. If so - will gpadmin and any other HAWQ-defined roles not have access to the data in Ranger managed tables? Just as you mentioned, HAWQ uses gpadmin identity to create files on HDFS, say, when a specified userA creates a table in HAWQ, the HDFS files for the table are created by gpadmin instead of userA. Since Ranger lies in Hadoop eco-system, it usually needs to control both HAWQ and HDFS, I think we need assign gpadmin to the full privileges of hawq data file directory on HDFS in Ranger UI previously. About your concern about the superuser can see all the users' data, I think it's kind of like the "root" role in operation system? If the users have concerns about the DBA/Superuser's unlimited access, I totally agree with you about the solution of "passing down user-identifiy" for solving this problem :) 3. How would this be extended for the hcatalog virtual database in HAWQ? Could the Ranger permissions for the underlying store (for instance Hive) be read and enforced/reported at the HAWQ level? If HAWQ keeps the gpadmin for operating HDFS or external storage, I think we just need grant the privilege to superuser. But if we have implemented the user-identity passing down, say, the data files on HDFS for a table created by userA are owned by userA instead of gpadmin, in this way we need to double connect to Ranger, from HAWQ and HDFS respectively. I haven't include the underlying store privileges check into HAWQ side, that may need multiple code changes. I think keeping the privileges in the component is another choice. Your thoughts? Thanks Lili was (Author: lilima): [~thebellhead], quit good questions! 1. In order for tools, syntax checking, etc to work everyone (the HAWQ public role) requires access to the catalog and some of the toolkit. Will Ranger-only access control apply only to user created tables, views and external tables? Yes, since the catalog tables and toolkits are shared and used by various users, Ranger-only access control just applies to user defined objects. But the objects include not only database, table and view, but also include function, language, schema, tablespace and protocol. You can find the detailed objects and privileges in the design doc. 2. If so - will gpadmin and any other HAWQ-defined roles not have access to the data in Ranger managed tables? Just as you mentioned, HAWQ uses gpadmin identity to create files on HDFS, say, when a specified userA creates a table in HAWQ, the HDFS files for the table are created by gpadmin instead of userA. Since Ranger lies in Hadoop eco-system, it usually needs to control both HAWQ and HDFS, I think we need assign gpadmin to the full privileges of hawq data file directory on HDFS in Ranger UI previously. About your concern about the superuser can see all the users' data, I think it's kind of like the "root" role in operation system? If the users have concerns about the DBA/Superuser's unlimited access, I totally agree with you about the solution of "passing down user-identifiy" for solving this problem :) 3. How would this be extended for the hcatalog virtual database in HAWQ? Could the Ranger permissions for the underlying store (for instance Hive) be read and enforced/reported at the HAWQ level? If HAWQ keeps the gpadmin for operating HDFS or external storage, I think we just need grant the privilege to superuser. But if we have implemented the user-identity passing down, say, the data files on HDFS for a table created by userA are owned by userA instead of gpadmin, in this way we need to double connect to Ranger, from HAWQ and HDFS respectively. I haven't include the underlying store privileges check into HAWQ side, that may need multiple code changes. I think keeping the privileges in the component is another choice. Your thoughts? Thanks Lili > Integrate Security with Apache Ranger > - > >
[jira] [Assigned] (HAWQ-1003) Implement enhanced hawq ACL check through Ranger
[ https://issues.apache.org/jira/browse/HAWQ-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hubert Zhang reassigned HAWQ-1003: -- Assignee: Hubert Zhang (was: Lei Chang) > Implement enhanced hawq ACL check through Ranger > > > Key: HAWQ-1003 > URL: https://issues.apache.org/jira/browse/HAWQ-1003 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Core >Reporter: Lili Ma >Assignee: Hubert Zhang > Fix For: backlog > > > Implement enhanced hawq ACL check through Ranger, which means, if a query > contains several tables, we can combine the multiple table request together, > to send just one REST request to Ranger REST API Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HAWQ-1002) Implement a switch in hawq-site.xml to configure whether use Ranger or not for ACL
[ https://issues.apache.org/jira/browse/HAWQ-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hubert Zhang reassigned HAWQ-1002: -- Assignee: Hubert Zhang (was: Lei Chang) > Implement a switch in hawq-site.xml to configure whether use Ranger or not > for ACL > -- > > Key: HAWQ-1002 > URL: https://issues.apache.org/jira/browse/HAWQ-1002 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Core >Reporter: Lili Ma >Assignee: Hubert Zhang > Fix For: backlog > > > Implement a switch in hawq-site.xml to configure whether use Ranger or not > for ACL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger
[ https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451558#comment-15451558 ] Lili Ma commented on HAWQ-256: -- [~thebellhead], quit good questions! 1. In order for tools, syntax checking, etc to work everyone (the HAWQ public role) requires access to the catalog and some of the toolkit. Will Ranger-only access control apply only to user created tables, views and external tables? Yes, since the catalog tables and toolkits are shared and used by various users, Ranger-only access control just applies to user defined objects. But the objects include not only database, table and view, but also include function, language, schema, tablespace and protocol. You can find the detailed objects and privileges in the design doc. 2. If so - will gpadmin and any other HAWQ-defined roles not have access to the data in Ranger managed tables? Just as you mentioned, HAWQ uses gpadmin identity to create files on HDFS, say, when a specified userA creates a table in HAWQ, the HDFS files for the table are created by gpadmin instead of userA. Since Ranger lies in Hadoop eco-system, it usually needs to control both HAWQ and HDFS, I think we need assign gpadmin to the full privileges of hawq data file directory on HDFS in Ranger UI previously. About your concern about the superuser can see all the users' data, I think it's kind of like the "root" role in operation system? If the users have concerns about the DBA/Superuser's unlimited access, I totally agree with you about the solution of "passing down user-identifiy" for solving this problem :) 3. How would this be extended for the hcatalog virtual database in HAWQ? Could the Ranger permissions for the underlying store (for instance Hive) be read and enforced/reported at the HAWQ level? If HAWQ keeps the gpadmin for operating HDFS or external storage, I think we just need grant the privilege to superuser. But if we have implemented the user-identity passing down, say, the data files on HDFS for a table created by userA are owned by userA instead of gpadmin, in this way we need to double connect to Ranger, from HAWQ and HDFS respectively. I haven't include the underlying store privileges check into HAWQ side, that may need multiple code changes. I think keeping the privileges in the component is another choice. Your thoughts? Thanks Lili > Integrate Security with Apache Ranger > - > > Key: HAWQ-256 > URL: https://issues.apache.org/jira/browse/HAWQ-256 > Project: Apache HAWQ > Issue Type: New Feature > Components: PXF, Security >Reporter: Michael Andre Pearce (IG) >Assignee: Lili Ma > Fix For: backlog > > Attachments: HAWQRangerSupportDesign.pdf, > HAWQRangerSupportDesign_v0.2.pdf > > > Integrate security with Apache Ranger for a unified Hadoop security solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-hawq issue #879: HAWQ-845. Parameterize kerberos principal service...
Github user ztao1987 commented on the issue: https://github.com/apache/incubator-hawq/pull/879 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #879: HAWQ-845. Parameterize kerberos principal ...
GitHub user linwen opened a pull request: https://github.com/apache/incubator-hawq/pull/879 HAWQ-845. Parameterize kerberos principal service name for HAWQ This fix remove the check of "postgres" for kerberos service name if HAWQ is running with kerberos enable. So that, customers can replace "postgres" with a different service name. If user want to use a different name, below property/value should be added into hawq-site.xml, otherwise "postgres" is used. krb_srvname gpadmin Please review, thanks! You can merge this pull request into a Git repository by running: $ git pull https://github.com/linwen/incubator-hawq hawq_845 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/879.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #879 commit 030491b896875e574a2563d77c92fbfd1503d5bf Author: Wen LinDate: 2016-08-31T06:10:51Z HAWQ-845. Parameterize kerberos principal name for HAWQ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (HAWQ-1033) add --force option for hawq register
[ https://issues.apache.org/jira/browse/HAWQ-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongwu reassigned HAWQ-1033: Assignee: hongwu (was: Lei Chang) > add --force option for hawq register > > > Key: HAWQ-1033 > URL: https://issues.apache.org/jira/browse/HAWQ-1033 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Lili Ma >Assignee: hongwu > Fix For: 2.0.1.0-incubating > > > add --force option for hawq register > Will clear all the catalog contents in pg_aoseg.pg_paqseg_$relid while keep > the files on HDFS, and then re-register all the files to the table. This is > for scenario cluster Disaster Recovery: Two clusters co-exist, periodically > import data from Cluster A to Cluster B. Need Register data to Cluster B. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HAWQ-1035) support partition table register
[ https://issues.apache.org/jira/browse/HAWQ-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongwu reassigned HAWQ-1035: Assignee: hongwu (was: Lei Chang) > support partition table register > > > Key: HAWQ-1035 > URL: https://issues.apache.org/jira/browse/HAWQ-1035 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Lili Ma >Assignee: hongwu > Fix For: 2.0.1.0-incubating > > > Support partitiont table register, limited to 1 level partition table, since > hawq extract only supports 1-level partition table -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HAWQ-1034) add --repair option for hawq register
[ https://issues.apache.org/jira/browse/HAWQ-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongwu reassigned HAWQ-1034: Assignee: hongwu (was: Lei Chang) > add --repair option for hawq register > - > > Key: HAWQ-1034 > URL: https://issues.apache.org/jira/browse/HAWQ-1034 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Lili Ma >Assignee: hongwu > Fix For: 2.0.1.0-incubating > > > add --repair option for hawq register > Will change both file folder and catalog table pg_aoseg.pg_paqseg_$relid to > the state which .yml file configures. Note may some new generated files since > the checkpoint may be deleted here. Also note the all the files in .yml file > should all under the table folder on HDFS. Limitation: Do not support cases > for hash table redistribution, table truncate and table drop. This is for > scenario rollback of table: Do checkpoints somewhere, and need to rollback to > previous checkpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-845) Parameterize kerberos principal name for HAWQ
[ https://issues.apache.org/jira/browse/HAWQ-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451291#comment-15451291 ] Lin Wen commented on HAWQ-845: -- For now I think we can keep 'postgres' as default kerberos service name, but customers should be able to parameterize it with other name. If user want to use a different name, below property/value should be added into hawq-site.xml krb_srvname gpadmin > Parameterize kerberos principal name for HAWQ > - > > Key: HAWQ-845 > URL: https://issues.apache.org/jira/browse/HAWQ-845 > Project: Apache HAWQ > Issue Type: Improvement >Reporter: bhuvnesh chaudhary >Assignee: Lei Chang >Priority: Minor > Fix For: backlog > > > Currently HAWQ only accepts the principle 'postgres' for kerberos settings. > This is because it is hardcoded in gpcheckhdfs, we should ensure that it can > be parameterized. > Also, it's better to change the default principal name postgres to gpadmin. > It will avoid the need of changing the the hdfs directory during securing the > cluster to postgres and will avoid the need of maintaining postgres user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HAWQ-1025) Modify the content of yml file, and change hawq register implementation for the modification
[ https://issues.apache.org/jira/browse/HAWQ-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongwu reassigned HAWQ-1025: Assignee: hongwu (was: Lili Ma) > Modify the content of yml file, and change hawq register implementation for > the modification > > > Key: HAWQ-1025 > URL: https://issues.apache.org/jira/browse/HAWQ-1025 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Affects Versions: 2.0.1.0-incubating >Reporter: hongwu >Assignee: hongwu > Fix For: 2.0.1.0-incubating > > > 1. Add bucket number for hash-distributed table in yml file, when hawq > register, ensure the number of files be multiple times of the bucket number > 2. hawq register should use the file size information in yml file to update > the catalog table pg_aoseg.pg_paqseg_$relid > 3. hawq register processing steps: >a. create table >b. mv all the files >c. change the catalog table once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-hawq pull request #878: HAWQ-1025.
GitHub user xunzhang opened a pull request: https://github.com/apache/incubator-hawq/pull/878 HAWQ-1025. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xunzhang/incubator-hawq HAWQ-1025 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/878.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #878 commit b060a365c8d42e22afaaffca78affef7f4cd556f Author: xunzhangDate: 2016-08-30T08:03:42Z HAWQ-1025. Add bucket number in the yaml file of hawq extract, modify to use actual eof for usage1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---