[GitHub] incubator-hawq issue #873: HAWQ-992. PXF Hive data type check in Fragmenter ...

2016-08-31 Thread kavinderd
Github user kavinderd commented on the issue:

https://github.com/apache/incubator-hawq/pull/873
  
Overall LGTM, but I'd like to spend more time reviewing. This is a lot of 
code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (HAWQ-1039) Add test case of bucket number may not be consistent with parent table.

2016-08-31 Thread Hubert Zhang (JIRA)
Hubert Zhang created HAWQ-1039:
--

 Summary: Add test case of bucket number may not be consistent with 
parent table.
 Key: HAWQ-1039
 URL: https://issues.apache.org/jira/browse/HAWQ-1039
 Project: Apache HAWQ
  Issue Type: Sub-task
  Components: Core
Reporter: Hubert Zhang
Assignee: Lei Chang


add test case for HAWQ-1032



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger

2016-08-31 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454137#comment-15454137
 ] 

Lili Ma commented on HAWQ-256:
--

[~thebellhead]  From technical view, we can restrict HAWQSuperUser privilege in 
Ranger definitely. 

But, if we restrict that, HAWQ superuser behavior changes. I think this needs 
careful discussion, and it's out of the scope of this JIRA. Right?  Anyway, if 
everyone agrees to remove the superuser privileges, we can implement that 
function. Thanks

> Integrate Security with Apache Ranger
> -
>
> Key: HAWQ-256
> URL: https://issues.apache.org/jira/browse/HAWQ-256
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF, Security
>Reporter: Michael Andre Pearce (IG)
>Assignee: Lili Ma
> Fix For: backlog
>
> Attachments: HAWQRangerSupportDesign.pdf, 
> HAWQRangerSupportDesign_v0.2.pdf
>
>
> Integrate security with Apache Ranger for a unified Hadoop security solution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-hawq issue #881: HAWQ-1032. Bucket number of new added partition i...

2016-08-31 Thread wengyanqing
Github user wengyanqing commented on the issue:

https://github.com/apache/incubator-hawq/pull/881
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq issue #881: HAWQ-1032. Bucket number of new added partition i...

2016-08-31 Thread ictmalili
Github user ictmalili commented on the issue:

https://github.com/apache/incubator-hawq/pull/881
  
LGTM. +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...

2016-08-31 Thread GodenYao
Github user GodenYao commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/873#discussion_r77103957
  
--- Diff: 
pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/EnumHawqType.java 
---
@@ -43,37 +45,40 @@ public void serialize(EnumHawqType value, JsonGenerator 
generator,
  */
 @JsonSerialize(using = EnumHawqTypeSerializer.class)
 public enum EnumHawqType {
-Int2Type("int2"),
-Int4Type("int4"),
-Int8Type("int8"),
-Float4Type("float4"),
-Float8Type("float8"),
-TextType("text"),
-VarcharType("varchar", (byte) 1, true),
-ByteaType("bytea"),
-DateType("date"),
-TimestampType("timestamp"),
-BoolType("bool"),
-NumericType("numeric", (byte) 2, true),
-BpcharType("bpchar", (byte) 1, true);
+Int2Type("int2", DataType.SMALLINT),
+Int4Type("int4", DataType.INTEGER),
+Int8Type("int8", DataType.BIGINT),
+Float4Type("float4", DataType.REAL),
+Float8Type("float8", DataType.FLOAT8),
+TextType("text", DataType.TEXT),
+VarcharType("varchar", DataType.VARCHAR, (byte) 1, false),
+ByteaType("bytea", DataType.BYTEA),
+DateType("date", DataType.DATE),
+TimestampType("timestamp", DataType.TIMESTAMP),
+BoolType("bool", DataType.BOOLEAN),
+NumericType("numeric", DataType.NUMERIC, (byte) 2, false),
+BpcharType("bpchar", DataType.BPCHAR, (byte) 1, false),
--- End diff --

yes, if omitting the modifier, bpchar will be dynamic length


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #881: HAWQ-1032. Bucket number of new added part...

2016-08-31 Thread zhangh43
GitHub user zhangh43 opened a pull request:

https://github.com/apache/incubator-hawq/pull/881

HAWQ-1032. Bucket number of new added partition is not consistent wit…

…h parent table.
Failure Case
{code}
set deafult_hash_table_bucket_number = 12;
CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) 
DISTRIBUTED BY (id)   
PARTITION BY RANGE (date) ( 
START (date '2008-01-01') INCLUSIVE
END (date '2009-01-01') EXCLUSIVE 
EVERY (INTERVAL '1 day') );

set deafult_hash_table_bucket_number = 16;
ALTER TABLE sales3 ADD PARTITION   START 
(date '2009-03-01') INCLUSIVE   END 
(date '2009-04-01') EXCLUSIVE;
{code}
The newly added partition with buckcet number 16 is not consistent with 
parent partition.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhangh43/incubator-hawq hawq1032

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq/pull/881.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #881


commit 2b1124d44917c2955652fe4e7583f4e9d855897b
Author: hzhang2 
Date:   2016-09-01T01:33:58Z

HAWQ-1032. Bucket number of new added partition is not consistent with 
parent table.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (HAWQ-1032) Bucket number of newly added partition is not consistent with parent table.

2016-08-31 Thread Hubert Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453989#comment-15453989
 ] 

Hubert Zhang commented on HAWQ-1032:


not partition number, but bucket number of sub partition is not consistent with 
bucket number of root partition.
In the above case, the bucket number of root partition is 12, while the new 
added partition's bucket number is 16.
The query will failed if bucket numbers of every partitions are not the same 

> Bucket number of newly added partition is not consistent with parent table.
> ---
>
> Key: HAWQ-1032
> URL: https://issues.apache.org/jira/browse/HAWQ-1032
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Reporter: Hubert Zhang
>Assignee: Hubert Zhang
> Fix For: 2.0.1.0-incubating
>
>
> Failure Case
> {code}
> set deafult_hash_table_bucket_number = 12;
> CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) 
> DISTRIBUTED BY (id)   
> PARTITION BY RANGE (date) 
> ( START (date '2008-01-01') INCLUSIVE 
>END (date '2009-01-01') EXCLUSIVE  
>EVERY (INTERVAL '1 day') );
> set deafult_hash_table_bucket_number = 16;
> ALTER TABLE sales3 ADD PARTITION   START 
> (date '2009-03-01') INCLUSIVE   END 
> (date '2009-04-01') EXCLUSIVE;
> {code}
> The newly added partition with buckcet number 16 is not consistent with 
> parent partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-hawq pull request #880: HAWQ-1037. Modify way to get HDFS port in ...

2016-08-31 Thread wcl14
GitHub user wcl14 opened a pull request:

https://github.com/apache/incubator-hawq/pull/880

HAWQ-1037. Modify way to get HDFS port in TestHawqRegister



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wcl14/incubator-hawq HAWQ-1037

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq/pull/880.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #880


commit caaace3cb340dbeb41da1eaed02f0aa5152c0b38
Author: Chunling Wang 
Date:   2016-08-31T10:19:05Z

HAWQ-1037. Modify way to get HDFS port in TestHawqRegister




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...

2016-08-31 Thread kavinderd
Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/873#discussion_r77096574
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/EnumHiveToHawqType.java
 ---
@@ -110,4 +122,68 @@ public static EnumHiveToHawqType 
getHiveToHawqType(String hiveType) {
 + hiveType + " to HAWQ's type");
 }
 
+
+/**
+ * 
+ * @param dataType Hawq data type
+ * @return compatible Hive type to given Hawq type, if there are more 
than one compatible types, it returns one with bigger size
+ * @throws UnsupportedTypeException if there is no corresponding Hive 
type for given Hawq type
+ */
+public static EnumHiveToHawqType getCompatibleHawqToHiveType(DataType 
dataType) {
+
+SortedSet types = new 
TreeSet(
+new Comparator() {
+public int compare(EnumHiveToHawqType a,
+EnumHiveToHawqType b) {
+return Byte.compare(a.getSize(), b.getSize());
+}
+});
--- End diff --

Can this `types` variable be instantiated outside the method? It doesn't 
changed based on input right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...

2016-08-31 Thread GodenYao
Github user GodenYao commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/873#discussion_r77085040
  
--- Diff: 
pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/EnumHawqType.java 
---
@@ -43,37 +45,40 @@ public void serialize(EnumHawqType value, JsonGenerator 
generator,
  */
 @JsonSerialize(using = EnumHawqTypeSerializer.class)
 public enum EnumHawqType {
-Int2Type("int2"),
-Int4Type("int4"),
-Int8Type("int8"),
-Float4Type("float4"),
-Float8Type("float8"),
-TextType("text"),
-VarcharType("varchar", (byte) 1, true),
-ByteaType("bytea"),
-DateType("date"),
-TimestampType("timestamp"),
-BoolType("bool"),
-NumericType("numeric", (byte) 2, true),
-BpcharType("bpchar", (byte) 1, true);
+Int2Type("int2", DataType.SMALLINT),
+Int4Type("int4", DataType.INTEGER),
+Int8Type("int8", DataType.BIGINT),
+Float4Type("float4", DataType.REAL),
+Float8Type("float8", DataType.FLOAT8),
+TextType("text", DataType.TEXT),
+VarcharType("varchar", DataType.VARCHAR, (byte) 1, false),
+ByteaType("bytea", DataType.BYTEA),
+DateType("date", DataType.DATE),
+TimestampType("timestamp", DataType.TIMESTAMP),
+BoolType("bool", DataType.BOOLEAN),
+NumericType("numeric", DataType.NUMERIC, (byte) 2, false),
+BpcharType("bpchar", DataType.BPCHAR, (byte) 1, false),
+//modifier is mandatory for this type because by default it's 1
+CharType("char", DataType.CHAR, (byte) 1, true);
--- End diff --

this type is not used in hivetohawq conversion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...

2016-08-31 Thread GodenYao
Github user GodenYao commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/873#discussion_r77084340
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java
 ---
@@ -256,4 +257,68 @@ private static boolean verifyIntegerModifiers(String[] 
modifiers) {
 throw new RuntimeException("Failed connecting to Hive 
MetaStore service: " + cause.getMessage(), cause);
 }
 }
+
+
+/**
+ * Converts HAWQ type to hive type. The supported mappings are:
+ * {@code BOOLEAN -> boolean}
+ * {@code SMALLINT -> smallint (tinyint is converted to 
smallint)}
+ * {@code BIGINT -> bigint}
+ * {@code TIMESTAMP -> timestamp}
+ * {@code NUMERIC -> decimal}
+ * {@code BYTEA -> binary}
+ * {@code INTERGER -> int}
+ * {@code TEXT -> string}
+ * {@code REAL -> float}
+ * {@code FLOAT8 -> double}
+ * 
+ * All other types (both in HAWQ and in HIVE) are not supported.
+ *
+ * @param type HAWQ data type
+ * @param name field name
+ * @return Hive type
+ * @throws UnsupportedTypeException if type is not supported
+ */
+public static String toCompatibleHiveType(DataType type) {
+
+EnumHiveToHawqType hiveToHawqType = 
EnumHiveToHawqType.getCompatibleHawqToHiveType(type);
+return hiveToHawqType.getTypeName();
+}
+
+
+
+public static void validateTypeCompatible(DataType hawqDataType, 
Integer[] hawqTypeMods, String hiveType, String hawqColumnName) {
+
+EnumHiveToHawqType hiveToHawqType = 
EnumHiveToHawqType.getHiveToHawqType(hiveType);
+EnumHawqType expectedHawqType = hiveToHawqType.getHawqType();
+
+if (!expectedHawqType.getDataType().equals(hawqDataType)) {
+throw new UnsupportedTypeException("Invalid definition for 
column " + hawqColumnName
++  ": expected HAWQ type " + 
expectedHawqType.getDataType() +
+", actual HAWQ type " + hawqDataType);
+}
+
+if ((hawqTypeMods == null || hawqTypeMods.length == 0) && 
expectedHawqType.isMandatoryModifiers())
+throw new UnsupportedTypeException("Invalid definition for 
column " + hawqColumnName +  ": modifiers are mandatory for type " + 
expectedHawqType.getTypeName());
+
--- End diff --

Another thing:
```
 CharType("char", EnumHawqType.BpcharType, "[(,)]"),
```
according to the code above, we convert Hive: char to HAWQ: Bpchar type. In 
that case, why do we need to check modifier as HAWQ: char (xxx) won't be 
compared, it's different oid of HAWQ: Bpchar


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...

2016-08-31 Thread kavinderd
Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/873#discussion_r77082531
  
--- Diff: 
pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/EnumHawqType.java 
---
@@ -43,37 +45,40 @@ public void serialize(EnumHawqType value, JsonGenerator 
generator,
  */
 @JsonSerialize(using = EnumHawqTypeSerializer.class)
 public enum EnumHawqType {
-Int2Type("int2"),
-Int4Type("int4"),
-Int8Type("int8"),
-Float4Type("float4"),
-Float8Type("float8"),
-TextType("text"),
-VarcharType("varchar", (byte) 1, true),
-ByteaType("bytea"),
-DateType("date"),
-TimestampType("timestamp"),
-BoolType("bool"),
-NumericType("numeric", (byte) 2, true),
-BpcharType("bpchar", (byte) 1, true);
+Int2Type("int2", DataType.SMALLINT),
+Int4Type("int4", DataType.INTEGER),
+Int8Type("int8", DataType.BIGINT),
+Float4Type("float4", DataType.REAL),
+Float8Type("float8", DataType.FLOAT8),
+TextType("text", DataType.TEXT),
+VarcharType("varchar", DataType.VARCHAR, (byte) 1, false),
+ByteaType("bytea", DataType.BYTEA),
+DateType("date", DataType.DATE),
+TimestampType("timestamp", DataType.TIMESTAMP),
+BoolType("bool", DataType.BOOLEAN),
+NumericType("numeric", DataType.NUMERIC, (byte) 2, false),
+BpcharType("bpchar", DataType.BPCHAR, (byte) 1, false),
--- End diff --

Nvm, I understand why because the modifiers are not mandatory right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...

2016-08-31 Thread kavinderd
Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/873#discussion_r77082166
  
--- Diff: 
pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/EnumHawqType.java 
---
@@ -43,37 +45,40 @@ public void serialize(EnumHawqType value, JsonGenerator 
generator,
  */
 @JsonSerialize(using = EnumHawqTypeSerializer.class)
 public enum EnumHawqType {
-Int2Type("int2"),
-Int4Type("int4"),
-Int8Type("int8"),
-Float4Type("float4"),
-Float8Type("float8"),
-TextType("text"),
-VarcharType("varchar", (byte) 1, true),
-ByteaType("bytea"),
-DateType("date"),
-TimestampType("timestamp"),
-BoolType("bool"),
-NumericType("numeric", (byte) 2, true),
-BpcharType("bpchar", (byte) 1, true);
+Int2Type("int2", DataType.SMALLINT),
+Int4Type("int4", DataType.INTEGER),
+Int8Type("int8", DataType.BIGINT),
+Float4Type("float4", DataType.REAL),
+Float8Type("float8", DataType.FLOAT8),
+TextType("text", DataType.TEXT),
+VarcharType("varchar", DataType.VARCHAR, (byte) 1, false),
+ByteaType("bytea", DataType.BYTEA),
+DateType("date", DataType.DATE),
+TimestampType("timestamp", DataType.TIMESTAMP),
+BoolType("bool", DataType.BOOLEAN),
+NumericType("numeric", DataType.NUMERIC, (byte) 2, false),
+BpcharType("bpchar", DataType.BPCHAR, (byte) 1, false),
--- End diff --

Can you explain why the last parameter is changed to `false`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...

2016-08-31 Thread GodenYao
Github user GodenYao commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/873#discussion_r77079334
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/EnumHiveToHawqType.java
 ---
@@ -29,8 +35,8 @@
  */
 public enum EnumHiveToHawqType {
 
-TinyintType("tinyint", EnumHawqType.Int2Type),
-SmallintType("smallint", EnumHawqType.Int2Type),
+TinyintType("tinyint", EnumHawqType.Int2Type, (byte) 1),
+SmallintType("smallint", EnumHawqType.Int2Type, (byte) 2),
--- End diff --

maybe some comments about adding the byte 1/2 here to differentiate the  2 
types on hawq side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...

2016-08-31 Thread GodenYao
Github user GodenYao commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/873#discussion_r77077634
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/EnumHiveToHawqType.java
 ---
@@ -110,4 +122,68 @@ public static EnumHiveToHawqType 
getHiveToHawqType(String hiveType) {
 + hiveType + " to HAWQ's type");
 }
 
+
+/**
+ * 
+ * @param dataType Hawq data type
+ * @return compatible Hive type to given Hawq type, if there are more 
than one compatible types, it returns one with bigger size
+ * @throws UnsupportedTypeException if there is no corresponding Hive 
type for given Hawq type
+ */
+public static EnumHiveToHawqType getCompatibleHawqToHiveType(DataType 
dataType) {
+
+SortedSet types = new 
TreeSet(
+new Comparator() {
+public int compare(EnumHiveToHawqType a,
+EnumHiveToHawqType b) {
+return Byte.compare(a.getSize(), b.getSize());
+}
+});
+
+for (EnumHiveToHawqType t : values()) {
+if (t.getHawqType().getDataType().equals(dataType)) {
+types.add(t);
+}
+}
+
+if (types.size() == 0)
+throw new UnsupportedTypeException("Unable to find compatible 
Hive type for given HAWQ's type: " + dataType);
+
--- End diff --

the error message should be reversed: 

> Unable to find compatible **HAWQ** type for given **Hive**'s type


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...

2016-08-31 Thread GodenYao
Github user GodenYao commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/873#discussion_r77076942
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java
 ---
@@ -256,4 +257,68 @@ private static boolean verifyIntegerModifiers(String[] 
modifiers) {
 throw new RuntimeException("Failed connecting to Hive 
MetaStore service: " + cause.getMessage(), cause);
 }
 }
+
+
+/**
+ * Converts HAWQ type to hive type. The supported mappings are:
+ * {@code BOOLEAN -> boolean}
+ * {@code SMALLINT -> smallint (tinyint is converted to 
smallint)}
+ * {@code BIGINT -> bigint}
+ * {@code TIMESTAMP -> timestamp}
+ * {@code NUMERIC -> decimal}
+ * {@code BYTEA -> binary}
+ * {@code INTERGER -> int}
+ * {@code TEXT -> string}
+ * {@code REAL -> float}
+ * {@code FLOAT8 -> double}
+ * 
+ * All other types (both in HAWQ and in HIVE) are not supported.
+ *
+ * @param type HAWQ data type
+ * @param name field name
+ * @return Hive type
+ * @throws UnsupportedTypeException if type is not supported
+ */
+public static String toCompatibleHiveType(DataType type) {
+
+EnumHiveToHawqType hiveToHawqType = 
EnumHiveToHawqType.getCompatibleHawqToHiveType(type);
+return hiveToHawqType.getTypeName();
+}
+
+
+
+public static void validateTypeCompatible(DataType hawqDataType, 
Integer[] hawqTypeMods, String hiveType, String hawqColumnName) {
+
+EnumHiveToHawqType hiveToHawqType = 
EnumHiveToHawqType.getHiveToHawqType(hiveType);
+EnumHawqType expectedHawqType = hiveToHawqType.getHawqType();
+
+if (!expectedHawqType.getDataType().equals(hawqDataType)) {
+throw new UnsupportedTypeException("Invalid definition for 
column " + hawqColumnName
++  ": expected HAWQ type " + 
expectedHawqType.getDataType() +
+", actual HAWQ type " + hawqDataType);
+}
+
+if ((hawqTypeMods == null || hawqTypeMods.length == 0) && 
expectedHawqType.isMandatoryModifiers())
+throw new UnsupportedTypeException("Invalid definition for 
column " + hawqColumnName +  ": modifiers are mandatory for type " + 
expectedHawqType.getTypeName());
+
+switch (hawqDataType) {
+case NUMERIC:
+case VARCHAR:
+case BPCHAR:
+case CHAR:
+if (hawqTypeMods != null && hawqTypeMods.length > 0) {
+Integer[] hiveTypeModifiers = EnumHiveToHawqType
+.extractModifiers(hiveType);
+for (int i = 0; i < hiveTypeModifiers.length; i++) {
+if (hawqTypeMods[i] < hiveTypeModifiers[i])
+throw new UnsupportedTypeException(
+"Invalid definition for column " + 
hawqColumnName
++ ": modifiers are not 
compatible, "
++ 
Arrays.toString(hiveTypeModifiers) + ", "
++ 
Arrays.toString(hawqTypeMods));
--- End diff --

same as above comment, should tell user the length for modifier , for 
numeric, it needs to be exactly match I suppose? for VARCHAR, BPCHAR and CHAR, 
if modifier exists on HAWQ side, it needs to be >= Hive same type modifier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...

2016-08-31 Thread GodenYao
Github user GodenYao commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/873#discussion_r77076673
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java
 ---
@@ -256,4 +257,68 @@ private static boolean verifyIntegerModifiers(String[] 
modifiers) {
 throw new RuntimeException("Failed connecting to Hive 
MetaStore service: " + cause.getMessage(), cause);
 }
 }
+
+
+/**
+ * Converts HAWQ type to hive type. The supported mappings are:
+ * {@code BOOLEAN -> boolean}
+ * {@code SMALLINT -> smallint (tinyint is converted to 
smallint)}
+ * {@code BIGINT -> bigint}
+ * {@code TIMESTAMP -> timestamp}
+ * {@code NUMERIC -> decimal}
+ * {@code BYTEA -> binary}
+ * {@code INTERGER -> int}
+ * {@code TEXT -> string}
+ * {@code REAL -> float}
+ * {@code FLOAT8 -> double}
+ * 
--- End diff --

based on the logic we also support:
`
   case VARCHAR:
 +case BPCHAR:
 +case CHAR:
`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (HAWQ-1038) Missing BPCHAR in Data Type

2016-08-31 Thread Goden Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-1038:

Fix Version/s: backlog

> Missing BPCHAR in Data Type
> ---
>
> Key: HAWQ-1038
> URL: https://issues.apache.org/jira/browse/HAWQ-1038
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Goden Yao
>Assignee: David Yozie
> Fix For: backlog
>
>
> referring to 3rd party site:
> http://hdb.docs.pivotal.io/20/reference/catalog/pg_type.html 
> and 
> http://hdb.docs.pivotal.io/20/reference/HAWQDataTypes.html
> It's quite out of date if you check source code:
> https://github.com/apache/incubator-hawq/blob/master/src/interfaces/ecpg/ecpglib/pg_type.h
> {code}
> ...
> #define BPCHAROID 1042
> ...
> {code}
> We at least miss BPCHAR in the type table, maybe more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1038) Missing BPCHAR in Data Type

2016-08-31 Thread Goden Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-1038:

Summary: Missing BPCHAR in Data Type  (was: Missing bpchar in Data Type)

> Missing BPCHAR in Data Type
> ---
>
> Key: HAWQ-1038
> URL: https://issues.apache.org/jira/browse/HAWQ-1038
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Goden Yao
>Assignee: David Yozie
> Fix For: backlog
>
>
> referring to 3rd party site:
> http://hdb.docs.pivotal.io/20/reference/catalog/pg_type.html 
> and 
> http://hdb.docs.pivotal.io/20/reference/HAWQDataTypes.html
> It's quite out of date if you check source code:
> https://github.com/apache/incubator-hawq/blob/master/src/interfaces/ecpg/ecpglib/pg_type.h
> {code}
> ...
> #define BPCHAROID 1042
> ...
> {code}
> We at least miss BPCHAR in the type table, maybe more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1038) Missing bpchar in Data Type

2016-08-31 Thread Goden Yao (JIRA)
Goden Yao created HAWQ-1038:
---

 Summary: Missing bpchar in Data Type
 Key: HAWQ-1038
 URL: https://issues.apache.org/jira/browse/HAWQ-1038
 Project: Apache HAWQ
  Issue Type: Bug
  Components: Documentation
Reporter: Goden Yao
Assignee: David Yozie


referring to 3rd party site:
http://hdb.docs.pivotal.io/20/reference/catalog/pg_type.html 
and 
http://hdb.docs.pivotal.io/20/reference/HAWQDataTypes.html

It's quite out of date if you check source code:
https://github.com/apache/incubator-hawq/blob/master/src/interfaces/ecpg/ecpglib/pg_type.h
{code}
...
#define BPCHAROID   1042
...
{code}

We at least miss BPCHAR in the type table, maybe more.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...

2016-08-31 Thread GodenYao
Github user GodenYao commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/873#discussion_r77071575
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java
 ---
@@ -256,4 +257,68 @@ private static boolean verifyIntegerModifiers(String[] 
modifiers) {
 throw new RuntimeException("Failed connecting to Hive 
MetaStore service: " + cause.getMessage(), cause);
 }
 }
+
+
+/**
+ * Converts HAWQ type to hive type. The supported mappings are:
+ * {@code BOOLEAN -> boolean}
+ * {@code SMALLINT -> smallint (tinyint is converted to 
smallint)}
+ * {@code BIGINT -> bigint}
+ * {@code TIMESTAMP -> timestamp}
+ * {@code NUMERIC -> decimal}
+ * {@code BYTEA -> binary}
+ * {@code INTERGER -> int}
+ * {@code TEXT -> string}
+ * {@code REAL -> float}
+ * {@code FLOAT8 -> double}
+ * 
+ * All other types (both in HAWQ and in HIVE) are not supported.
+ *
+ * @param type HAWQ data type
+ * @param name field name
+ * @return Hive type
+ * @throws UnsupportedTypeException if type is not supported
+ */
+public static String toCompatibleHiveType(DataType type) {
+
+EnumHiveToHawqType hiveToHawqType = 
EnumHiveToHawqType.getCompatibleHawqToHiveType(type);
+return hiveToHawqType.getTypeName();
+}
+
+
+
+public static void validateTypeCompatible(DataType hawqDataType, 
Integer[] hawqTypeMods, String hiveType, String hawqColumnName) {
+
+EnumHiveToHawqType hiveToHawqType = 
EnumHiveToHawqType.getHiveToHawqType(hiveType);
+EnumHawqType expectedHawqType = hiveToHawqType.getHawqType();
+
+if (!expectedHawqType.getDataType().equals(hawqDataType)) {
+throw new UnsupportedTypeException("Invalid definition for 
column " + hawqColumnName
++  ": expected HAWQ type " + 
expectedHawqType.getDataType() +
+", actual HAWQ type " + hawqDataType);
+}
+
+if ((hawqTypeMods == null || hawqTypeMods.length == 0) && 
expectedHawqType.isMandatoryModifiers())
+throw new UnsupportedTypeException("Invalid definition for 
column " + hawqColumnName +  ": modifiers are mandatory for type " + 
expectedHawqType.getTypeName());
+
--- End diff --

In the case of Hive: Char(xxx) with a fixed length to HAWQ: Char , length 
omitted, default to 1, we should give user information about the length from 
Hive: Char(xxx), so they can modify the table definition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #873: HAWQ-992. PXF Hive data type check in Frag...

2016-08-31 Thread kavinderd
Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/873#discussion_r77047856
  
--- Diff: 
pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/ColumnDescriptor.java
 ---
@@ -26,10 +26,11 @@
  */
 public class ColumnDescriptor {
 
-   int gpdbColumnTypeCode;
-String gpdbColumnName;
-String gpdbColumnTypeName;
-int gpdbColumnIndex;
+   int dbColumnTypeCode;
--- End diff --

Indent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #837: HAWQ-779 support pxf filter pushdwon at th...

2016-08-31 Thread kavinderd
Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/837#discussion_r77047594
  
--- Diff: 
pxf/pxf-hbase/src/main/java/org/apache/hawq/pxf/plugins/hbase/HBaseFilterBuilder.java
 ---
@@ -165,6 +165,14 @@ private Filter 
handleSimpleOperations(FilterParser.Operation opId,
 ByteArrayComparable comparator = 
getComparator(hbaseColumn.columnTypeCode(),
 constant.constant());
 
+if(operatorsMap.get(opId) == null){
+//HBase not support HDOP_LIKE, use 'NOT NULL' Comarator
--- End diff --

No, @hsyuan comment was that the comment should read "//HBase does not 
support..." HBase does not support the LIKE Filter as far as I am aware


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (HAWQ-1032) Bucket number of newly added partition is not consistent with parent table.

2016-08-31 Thread Goden Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-1032:

Summary: Bucket number of newly added partition is not consistent with 
parent table.  (was: Bucket number of new added partition is not consistent 
with parent table.)

> Bucket number of newly added partition is not consistent with parent table.
> ---
>
> Key: HAWQ-1032
> URL: https://issues.apache.org/jira/browse/HAWQ-1032
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Reporter: Hubert Zhang
>Assignee: Hubert Zhang
> Fix For: 2.0.1.0-incubating
>
>
> Failure Case
> {code}
> set deafult_hash_table_bucket_number = 12;
> CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) 
> DISTRIBUTED BY (id)   
> PARTITION BY RANGE (date) 
> ( START (date '2008-01-01') INCLUSIVE 
>END (date '2009-01-01') EXCLUSIVE  
>EVERY (INTERVAL '1 day') );
> set deafult_hash_table_bucket_number = 16;
> ALTER TABLE sales3 ADD PARTITION   START 
> (date '2009-03-01') INCLUSIVE   END 
> (date '2009-04-01') EXCLUSIVE;
> {code}
> The newly added partition with buckcet number 16 is not consistent with 
> parent partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1036) Support user impersonation in PXF for external tables

2016-08-31 Thread Goden Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-1036:

Summary: Support user impersonation in PXF for external tables  (was: 
Support user impersonation in HAWQ)

> Support user impersonation in PXF for external tables
> -
>
> Key: HAWQ-1036
> URL: https://issues.apache.org/jira/browse/HAWQ-1036
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF, Security
>Reporter: Alastair "Bell" Turner
>Assignee: Goden Yao
>Priority: Critical
> Fix For: backlog
>
> Attachments: HAWQ_Impersonation_rationale.txt
>
>
> Currently HAWQ executes all queries as the user running the HAWQ process or 
> the user running the PXF process, not as the user who issued the query via 
> ODBC/JDBC/... This restricts the options available for integrating with 
> existing security defined in HDFS, Hive, etc.
> Impersonation provides an alternative Ranger integration (as discussed in 
> HAWQ-256 ) for consistent security across HAWQ, HDFS, Hive...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1036) Support user impersonation in HAWQ

2016-08-31 Thread Goden Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-1036:

Priority: Critical  (was: Major)

> Support user impersonation in HAWQ
> --
>
> Key: HAWQ-1036
> URL: https://issues.apache.org/jira/browse/HAWQ-1036
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF, Security
>Reporter: Alastair "Bell" Turner
>Assignee: Goden Yao
>Priority: Critical
> Fix For: backlog
>
> Attachments: HAWQ_Impersonation_rationale.txt
>
>
> Currently HAWQ executes all queries as the user running the HAWQ process or 
> the user running the PXF process, not as the user who issued the query via 
> ODBC/JDBC/... This restricts the options available for integrating with 
> existing security defined in HDFS, Hive, etc.
> Impersonation provides an alternative Ranger integration (as discussed in 
> HAWQ-256 ) for consistent security across HAWQ, HDFS, Hive...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1032) Bucket number of newly added partition is not consistent with parent table.

2016-08-31 Thread Goden Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452924#comment-15452924
 ] 

Goden Yao commented on HAWQ-1032:
-

what's the partition number of newly added partition in this case? or do you 
see any errors?

> Bucket number of newly added partition is not consistent with parent table.
> ---
>
> Key: HAWQ-1032
> URL: https://issues.apache.org/jira/browse/HAWQ-1032
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Reporter: Hubert Zhang
>Assignee: Hubert Zhang
> Fix For: 2.0.1.0-incubating
>
>
> Failure Case
> {code}
> set deafult_hash_table_bucket_number = 12;
> CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) 
> DISTRIBUTED BY (id)   
> PARTITION BY RANGE (date) 
> ( START (date '2008-01-01') INCLUSIVE 
>END (date '2009-01-01') EXCLUSIVE  
>EVERY (INTERVAL '1 day') );
> set deafult_hash_table_bucket_number = 16;
> ALTER TABLE sales3 ADD PARTITION   START 
> (date '2009-03-01') INCLUSIVE   END 
> (date '2009-04-01') EXCLUSIVE;
> {code}
> The newly added partition with buckcet number 16 is not consistent with 
> parent partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1032) Bucket number of new added partition is not consistent with parent table.

2016-08-31 Thread Goden Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-1032:

Description: 
Failure Case
{code}
set deafult_hash_table_bucket_number = 12;
CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) DISTRIBUTED 
BY (id)   PARTITION BY 
RANGE (date) ( START (date 
'2008-01-01') INCLUSIVEEND (date 
'2009-01-01') EXCLUSIVE EVERY 
(INTERVAL '1 day') );

set deafult_hash_table_bucket_number = 16;
ALTER TABLE sales3 ADD PARTITION   START (date 
'2009-03-01') INCLUSIVE   END (date 
'2009-04-01') EXCLUSIVE;
{code}

The newly added partition with buckcet number 16 is not consistent with parent 
partition.

  was:
Failure Case
set deafult_hash_table_bucket_number = 12;
CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) DISTRIBUTED 
BY (id)   PARTITION BY 
RANGE (date) ( START (date 
'2008-01-01') INCLUSIVEEND (date 
'2009-01-01') EXCLUSIVE EVERY 
(INTERVAL '1 day') );

set deafult_hash_table_bucket_number = 16;
ALTER TABLE sales3 ADD PARTITION   START (date 
'2009-03-01') INCLUSIVE   END (date 
'2009-04-01') EXCLUSIVE;

The new added partition with bukcet number 16 which is not consistent with 
parent partition.


> Bucket number of new added partition is not consistent with parent table.
> -
>
> Key: HAWQ-1032
> URL: https://issues.apache.org/jira/browse/HAWQ-1032
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Reporter: Hubert Zhang
>Assignee: Hubert Zhang
> Fix For: 2.0.1.0-incubating
>
>
> Failure Case
> {code}
> set deafult_hash_table_bucket_number = 12;
> CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) 
> DISTRIBUTED BY (id)   
> PARTITION BY RANGE (date) 
> ( START (date '2008-01-01') INCLUSIVE 
>END (date '2009-01-01') EXCLUSIVE  
>EVERY (INTERVAL '1 day') );
> set deafult_hash_table_bucket_number = 16;
> ALTER TABLE sales3 ADD PARTITION   START 
> (date '2009-03-01') INCLUSIVE   END 
> (date '2009-04-01') EXCLUSIVE;
> {code}
> The newly added partition with buckcet number 16 is not consistent with 
> parent partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1032) Bucket number of new added partition is not consistent with parent table.

2016-08-31 Thread Goden Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-1032:

Fix Version/s: 2.0.1.0-incubating

> Bucket number of new added partition is not consistent with parent table.
> -
>
> Key: HAWQ-1032
> URL: https://issues.apache.org/jira/browse/HAWQ-1032
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Reporter: Hubert Zhang
>Assignee: Hubert Zhang
> Fix For: 2.0.1.0-incubating
>
>
> Failure Case
> set deafult_hash_table_bucket_number = 12;
> CREATE TABLE sales3 (id int, date date, amt decimal(10,2)) 
> DISTRIBUTED BY (id)   
> PARTITION BY RANGE (date) 
> ( START (date '2008-01-01') INCLUSIVE 
>END (date '2009-01-01') EXCLUSIVE  
>EVERY (INTERVAL '1 day') );
> set deafult_hash_table_bucket_number = 16;
> ALTER TABLE sales3 ADD PARTITION   START 
> (date '2009-03-01') INCLUSIVE   END 
> (date '2009-04-01') EXCLUSIVE;
> The new added partition with bukcet number 16 which is not consistent with 
> parent partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1036) Support user impersonation in HAWQ

2016-08-31 Thread Goden Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-1036:

Assignee: Goden Yao  (was: Lei Chang)

> Support user impersonation in HAWQ
> --
>
> Key: HAWQ-1036
> URL: https://issues.apache.org/jira/browse/HAWQ-1036
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF, Security
>Reporter: Alastair "Bell" Turner
>Assignee: Goden Yao
> Fix For: backlog
>
> Attachments: HAWQ_Impersonation_rationale.txt
>
>
> Currently HAWQ executes all queries as the user running the HAWQ process or 
> the user running the PXF process, not as the user who issued the query via 
> ODBC/JDBC/... This restricts the options available for integrating with 
> existing security defined in HDFS, Hive, etc.
> Impersonation provides an alternative Ranger integration (as discussed in 
> HAWQ-256 ) for consistent security across HAWQ, HDFS, Hive...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1037) modify way to get HDFS port in TestHawqRegister

2016-08-31 Thread Goden Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-1037:

Fix Version/s: backlog

> modify way to get HDFS port in TestHawqRegister
> ---
>
> Key: HAWQ-1037
> URL: https://issues.apache.org/jira/browse/HAWQ-1037
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Tests
>Reporter: Chunling Wang
>Assignee: Chunling Wang
> Fix For: backlog
>
>
> In test TestHawqRegister, the HDFS port is hard-coded. Now we get the HDFS 
> port from HdfsConfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger

2016-08-31 Thread Alastair "Bell" Turner (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452587#comment-15452587
 ] 

Alastair "Bell" Turner commented on HAWQ-256:
-

Thanks [~lilima] 

There are three gpadmin users and I think we could have a better discussion if 
we give them different names.

 1. The gpadmin operating system user who own the HAWQ processes and the 
/hawq/* data on the local file system (OSGPAdmin). This user is not relevant to 
this issue.
 2. The gpadmin Hadoop user (HAWQFileOwner). This is user identity used for 
HAWQ to access HDFS and owns the files created by HAWQ in HDFS.
 3. The gpadmin user in HAWQ (HAWQSuperUser). This user is subject to very few, 
if any, restrictions on access to data held in HAWQ.

For PXF there is also a user which accessed HDFS, Hive, etc on behalf of PXF 
queries. For consistency let's call this PXFFileOwner.

My question about gpadmin access to data in Ranger managed tables is about 
access by HAWQSuperUser:

If access to a table is managed by Ranger then the files containing that 
table's data in HDFS would be owned by HAWQFileOwner. This is not an issue as 
long as nobody can log in as HAWQFileOwner. The problem occurs when 
HAWQSuperUser can read any data in any table. This is currently the case for 
HAWQ internal tables. If PXFFileOwner has access to data then HAWQSuperUser 
would also be able to access it through external tables.

If access on a database was managed by Ranger through this feature would 
HAWQSuperUser have access to read the data in that table?

If only users authenticated through Ranger had access to data in the table it 
would not matter that HAWQFileOwner controlled the underlying file, HAWQ would 
be acting as a PEP and controlling access to the data. This is different from 
the scenario which I describe in HAWQ-1036 where policy is enforced by HDFS. 
Either approach would satisfy the requirement for HAWQSuperUser not to have 
access to the data.

> Integrate Security with Apache Ranger
> -
>
> Key: HAWQ-256
> URL: https://issues.apache.org/jira/browse/HAWQ-256
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF, Security
>Reporter: Michael Andre Pearce (IG)
>Assignee: Lili Ma
> Fix For: backlog
>
> Attachments: HAWQRangerSupportDesign.pdf, 
> HAWQRangerSupportDesign_v0.2.pdf
>
>
> Integrate security with Apache Ranger for a unified Hadoop security solution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-975) Queries run much slower with 'explain analyze' than which without 'explain analyze'

2016-08-31 Thread Chunling Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunling Wang updated HAWQ-975:
---
Affects Version/s: 2.0.0.0-incubating

> Queries run much slower with 'explain analyze' than which without  'explain 
> analyze'
> 
>
> Key: HAWQ-975
> URL: https://issues.apache.org/jira/browse/HAWQ-975
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 2.0.0.0-incubating
>Reporter: Chunling Wang
>Assignee: Chunling Wang
>Priority: Critical
>  Labels: performance
> Fix For: 2.0.1.0-incubating
>
>
> When we run queries with 'explain analyze' in AWS cluster, the total running 
> time is about 2-3 times longer than which without 'explain analyze'.
> Here is a group of TPC-H results for queries with 'explain analyze' and 
> queries without 'explain analyze'.
> ||query   ||without 'explain analyze' ||with 'explain analyze'
> ||multiple
> |TPCH_Query_01|   311843  |   818658  |   2.63
> |TPCH_Query_02|   34675   |   117884  |   3.40
> |TPCH_Query_03|   166155  |   422131  |   2.54
> |TPCH_Query_04|   157807  |   507143  |   3.21
> |TPCH_Query_05|   272657  |   710573  |   2.61
> |TPCH_Query_06|   12508   |   22276   |   1.78
> |TPCH_Query_07|   71893   |   370338  |   5.15
> |TPCH_Query_08|   12  |   672625  |   5.17
> |TPCH_Query_09|   575709  |   1171672 |   2.04
> |TPCH_Query_10|   93770   |   233391  |   2.49
> |TPCH_Query_11|   16252   |   58360   |   3.59
> |TPCH_Query_12|   142576  |   237270  |   1.66
> |TPCH_Query_13|   72682   |   343257  |   4.72
> |TPCH_Query_14|   10410   |   32337   |   3.11
> |TPCH_Query_15|   25719   |   98705   |   3.84
> |TPCH_Query_16|   21382   |   76877   |   3.60
> |TPCH_Query_17|   839683  |   2041169 |   2.43
> |TPCH_Query_18|   460570  |   1065940 |   2.31
> |TPCH_Query_19|   69075   |   82286   |   1.19
> |TPCH_Query_20|   78263   |   292041  |   3.73
> |TPCH_Query_21|   505606  |   1549690 |   3.07
> |TPCH_Query_22|   56450   |   329837  |   5.84
> |Total|   4125684 |   11254460|   
> 2.73



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (HAWQ-975) Queries run much slower with 'explain analyze' than which without 'explain analyze'

2016-08-31 Thread Chunling Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunling Wang closed HAWQ-975.
--
Resolution: Not A Bug

> Queries run much slower with 'explain analyze' than which without  'explain 
> analyze'
> 
>
> Key: HAWQ-975
> URL: https://issues.apache.org/jira/browse/HAWQ-975
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 2.0.0.0-incubating
>Reporter: Chunling Wang
>Assignee: Chunling Wang
>Priority: Critical
>  Labels: performance
> Fix For: 2.0.1.0-incubating
>
>
> When we run queries with 'explain analyze' in AWS cluster, the total running 
> time is about 2-3 times longer than which without 'explain analyze'.
> Here is a group of TPC-H results for queries with 'explain analyze' and 
> queries without 'explain analyze'.
> ||query   ||without 'explain analyze' ||with 'explain analyze'
> ||multiple
> |TPCH_Query_01|   311843  |   818658  |   2.63
> |TPCH_Query_02|   34675   |   117884  |   3.40
> |TPCH_Query_03|   166155  |   422131  |   2.54
> |TPCH_Query_04|   157807  |   507143  |   3.21
> |TPCH_Query_05|   272657  |   710573  |   2.61
> |TPCH_Query_06|   12508   |   22276   |   1.78
> |TPCH_Query_07|   71893   |   370338  |   5.15
> |TPCH_Query_08|   12  |   672625  |   5.17
> |TPCH_Query_09|   575709  |   1171672 |   2.04
> |TPCH_Query_10|   93770   |   233391  |   2.49
> |TPCH_Query_11|   16252   |   58360   |   3.59
> |TPCH_Query_12|   142576  |   237270  |   1.66
> |TPCH_Query_13|   72682   |   343257  |   4.72
> |TPCH_Query_14|   10410   |   32337   |   3.11
> |TPCH_Query_15|   25719   |   98705   |   3.84
> |TPCH_Query_16|   21382   |   76877   |   3.60
> |TPCH_Query_17|   839683  |   2041169 |   2.43
> |TPCH_Query_18|   460570  |   1065940 |   2.31
> |TPCH_Query_19|   69075   |   82286   |   1.19
> |TPCH_Query_20|   78263   |   292041  |   3.73
> |TPCH_Query_21|   505606  |   1549690 |   3.07
> |TPCH_Query_22|   56450   |   329837  |   5.84
> |Total|   4125684 |   11254460|   
> 2.73



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HAWQ-975) Queries run much slower with 'explain analyze' than which without 'explain analyze'

2016-08-31 Thread Chunling Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunling Wang reopened HAWQ-975:


> Queries run much slower with 'explain analyze' than which without  'explain 
> analyze'
> 
>
> Key: HAWQ-975
> URL: https://issues.apache.org/jira/browse/HAWQ-975
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Reporter: Chunling Wang
>Assignee: Chunling Wang
>Priority: Critical
>  Labels: performance
> Fix For: 2.0.1.0-incubating
>
>
> When we run queries with 'explain analyze' in AWS cluster, the total running 
> time is about 2-3 times longer than which without 'explain analyze'.
> Here is a group of TPC-H results for queries with 'explain analyze' and 
> queries without 'explain analyze'.
> ||query   ||without 'explain analyze' ||with 'explain analyze'
> ||multiple
> |TPCH_Query_01|   311843  |   818658  |   2.63
> |TPCH_Query_02|   34675   |   117884  |   3.40
> |TPCH_Query_03|   166155  |   422131  |   2.54
> |TPCH_Query_04|   157807  |   507143  |   3.21
> |TPCH_Query_05|   272657  |   710573  |   2.61
> |TPCH_Query_06|   12508   |   22276   |   1.78
> |TPCH_Query_07|   71893   |   370338  |   5.15
> |TPCH_Query_08|   12  |   672625  |   5.17
> |TPCH_Query_09|   575709  |   1171672 |   2.04
> |TPCH_Query_10|   93770   |   233391  |   2.49
> |TPCH_Query_11|   16252   |   58360   |   3.59
> |TPCH_Query_12|   142576  |   237270  |   1.66
> |TPCH_Query_13|   72682   |   343257  |   4.72
> |TPCH_Query_14|   10410   |   32337   |   3.11
> |TPCH_Query_15|   25719   |   98705   |   3.84
> |TPCH_Query_16|   21382   |   76877   |   3.60
> |TPCH_Query_17|   839683  |   2041169 |   2.43
> |TPCH_Query_18|   460570  |   1065940 |   2.31
> |TPCH_Query_19|   69075   |   82286   |   1.19
> |TPCH_Query_20|   78263   |   292041  |   3.73
> |TPCH_Query_21|   505606  |   1549690 |   3.07
> |TPCH_Query_22|   56450   |   329837  |   5.84
> |Total|   4125684 |   11254460|   
> 2.73



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAWQ-975) Queries run much slower with 'explain analyze' than which without 'explain analyze'

2016-08-31 Thread Chunling Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunling Wang resolved HAWQ-975.

Resolution: Not A Bug

It is a system configuration issue other than a bug in HAWQ.

> Queries run much slower with 'explain analyze' than which without  'explain 
> analyze'
> 
>
> Key: HAWQ-975
> URL: https://issues.apache.org/jira/browse/HAWQ-975
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Reporter: Chunling Wang
>Assignee: Chunling Wang
>Priority: Critical
>  Labels: performance
> Fix For: 2.0.1.0-incubating
>
>
> When we run queries with 'explain analyze' in AWS cluster, the total running 
> time is about 2-3 times longer than which without 'explain analyze'.
> Here is a group of TPC-H results for queries with 'explain analyze' and 
> queries without 'explain analyze'.
> ||query   ||without 'explain analyze' ||with 'explain analyze'
> ||multiple
> |TPCH_Query_01|   311843  |   818658  |   2.63
> |TPCH_Query_02|   34675   |   117884  |   3.40
> |TPCH_Query_03|   166155  |   422131  |   2.54
> |TPCH_Query_04|   157807  |   507143  |   3.21
> |TPCH_Query_05|   272657  |   710573  |   2.61
> |TPCH_Query_06|   12508   |   22276   |   1.78
> |TPCH_Query_07|   71893   |   370338  |   5.15
> |TPCH_Query_08|   12  |   672625  |   5.17
> |TPCH_Query_09|   575709  |   1171672 |   2.04
> |TPCH_Query_10|   93770   |   233391  |   2.49
> |TPCH_Query_11|   16252   |   58360   |   3.59
> |TPCH_Query_12|   142576  |   237270  |   1.66
> |TPCH_Query_13|   72682   |   343257  |   4.72
> |TPCH_Query_14|   10410   |   32337   |   3.11
> |TPCH_Query_15|   25719   |   98705   |   3.84
> |TPCH_Query_16|   21382   |   76877   |   3.60
> |TPCH_Query_17|   839683  |   2041169 |   2.43
> |TPCH_Query_18|   460570  |   1065940 |   2.31
> |TPCH_Query_19|   69075   |   82286   |   1.19
> |TPCH_Query_20|   78263   |   292041  |   3.73
> |TPCH_Query_21|   505606  |   1549690 |   3.07
> |TPCH_Query_22|   56450   |   329837  |   5.84
> |Total|   4125684 |   11254460|   
> 2.73



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAWQ-975) Queries run much slower with 'explain analyze' than which without 'explain analyze'

2016-08-31 Thread Chunling Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451831#comment-15451831
 ] 

Chunling Wang edited comment on HAWQ-975 at 8/31/16 10:30 AM:
--

The performance of explain analyze on AWS is low because the VDSO on agents of 
AWS is not properly configured and does not work well. To be specific, 
gettimeofday() takes too much time.


was (Author: wcl14):
It is because that the VDSO on agents of AWS does not work well. So the 
execution time of function 'gettimeofday()' is too much.

> Queries run much slower with 'explain analyze' than which without  'explain 
> analyze'
> 
>
> Key: HAWQ-975
> URL: https://issues.apache.org/jira/browse/HAWQ-975
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Reporter: Chunling Wang
>Assignee: Chunling Wang
>Priority: Critical
>  Labels: performance
> Fix For: 2.0.1.0-incubating
>
>
> When we run queries with 'explain analyze' in AWS cluster, the total running 
> time is about 2-3 times longer than which without 'explain analyze'.
> Here is a group of TPC-H results for queries with 'explain analyze' and 
> queries without 'explain analyze'.
> ||query   ||without 'explain analyze' ||with 'explain analyze'
> ||multiple
> |TPCH_Query_01|   311843  |   818658  |   2.63
> |TPCH_Query_02|   34675   |   117884  |   3.40
> |TPCH_Query_03|   166155  |   422131  |   2.54
> |TPCH_Query_04|   157807  |   507143  |   3.21
> |TPCH_Query_05|   272657  |   710573  |   2.61
> |TPCH_Query_06|   12508   |   22276   |   1.78
> |TPCH_Query_07|   71893   |   370338  |   5.15
> |TPCH_Query_08|   12  |   672625  |   5.17
> |TPCH_Query_09|   575709  |   1171672 |   2.04
> |TPCH_Query_10|   93770   |   233391  |   2.49
> |TPCH_Query_11|   16252   |   58360   |   3.59
> |TPCH_Query_12|   142576  |   237270  |   1.66
> |TPCH_Query_13|   72682   |   343257  |   4.72
> |TPCH_Query_14|   10410   |   32337   |   3.11
> |TPCH_Query_15|   25719   |   98705   |   3.84
> |TPCH_Query_16|   21382   |   76877   |   3.60
> |TPCH_Query_17|   839683  |   2041169 |   2.43
> |TPCH_Query_18|   460570  |   1065940 |   2.31
> |TPCH_Query_19|   69075   |   82286   |   1.19
> |TPCH_Query_20|   78263   |   292041  |   3.73
> |TPCH_Query_21|   505606  |   1549690 |   3.07
> |TPCH_Query_22|   56450   |   329837  |   5.84
> |Total|   4125684 |   11254460|   
> 2.73



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HAWQ-975) Queries run much slower with 'explain analyze' than which without 'explain analyze'

2016-08-31 Thread Chunling Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunling Wang reassigned HAWQ-975:
--

Assignee: Chunling Wang  (was: Lei Chang)

> Queries run much slower with 'explain analyze' than which without  'explain 
> analyze'
> 
>
> Key: HAWQ-975
> URL: https://issues.apache.org/jira/browse/HAWQ-975
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Reporter: Chunling Wang
>Assignee: Chunling Wang
>Priority: Critical
>  Labels: performance
> Fix For: 2.0.1.0-incubating
>
>
> When we run queries with 'explain analyze' in AWS cluster, the total running 
> time is about 2-3 times longer than which without 'explain analyze'.
> Here is a group of TPC-H results for queries with 'explain analyze' and 
> queries without 'explain analyze'.
> ||query   ||without 'explain analyze' ||with 'explain analyze'
> ||multiple
> |TPCH_Query_01|   311843  |   818658  |   2.63
> |TPCH_Query_02|   34675   |   117884  |   3.40
> |TPCH_Query_03|   166155  |   422131  |   2.54
> |TPCH_Query_04|   157807  |   507143  |   3.21
> |TPCH_Query_05|   272657  |   710573  |   2.61
> |TPCH_Query_06|   12508   |   22276   |   1.78
> |TPCH_Query_07|   71893   |   370338  |   5.15
> |TPCH_Query_08|   12  |   672625  |   5.17
> |TPCH_Query_09|   575709  |   1171672 |   2.04
> |TPCH_Query_10|   93770   |   233391  |   2.49
> |TPCH_Query_11|   16252   |   58360   |   3.59
> |TPCH_Query_12|   142576  |   237270  |   1.66
> |TPCH_Query_13|   72682   |   343257  |   4.72
> |TPCH_Query_14|   10410   |   32337   |   3.11
> |TPCH_Query_15|   25719   |   98705   |   3.84
> |TPCH_Query_16|   21382   |   76877   |   3.60
> |TPCH_Query_17|   839683  |   2041169 |   2.43
> |TPCH_Query_18|   460570  |   1065940 |   2.31
> |TPCH_Query_19|   69075   |   82286   |   1.19
> |TPCH_Query_20|   78263   |   292041  |   3.73
> |TPCH_Query_21|   505606  |   1549690 |   3.07
> |TPCH_Query_22|   56450   |   329837  |   5.84
> |Total|   4125684 |   11254460|   
> 2.73



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-975) Queries run much slower with 'explain analyze' than which without 'explain analyze'

2016-08-31 Thread Chunling Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451831#comment-15451831
 ] 

Chunling Wang commented on HAWQ-975:


It is because that the VDSO on agents of AWS does not work well. So the 
execution time of function 'gettimeofday()' is too much.

> Queries run much slower with 'explain analyze' than which without  'explain 
> analyze'
> 
>
> Key: HAWQ-975
> URL: https://issues.apache.org/jira/browse/HAWQ-975
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Reporter: Chunling Wang
>Assignee: Lei Chang
>Priority: Critical
>  Labels: performance
> Fix For: 2.0.1.0-incubating
>
>
> When we run queries with 'explain analyze' in AWS cluster, the total running 
> time is about 2-3 times longer than which without 'explain analyze'.
> Here is a group of TPC-H results for queries with 'explain analyze' and 
> queries without 'explain analyze'.
> ||query   ||without 'explain analyze' ||with 'explain analyze'
> ||multiple
> |TPCH_Query_01|   311843  |   818658  |   2.63
> |TPCH_Query_02|   34675   |   117884  |   3.40
> |TPCH_Query_03|   166155  |   422131  |   2.54
> |TPCH_Query_04|   157807  |   507143  |   3.21
> |TPCH_Query_05|   272657  |   710573  |   2.61
> |TPCH_Query_06|   12508   |   22276   |   1.78
> |TPCH_Query_07|   71893   |   370338  |   5.15
> |TPCH_Query_08|   12  |   672625  |   5.17
> |TPCH_Query_09|   575709  |   1171672 |   2.04
> |TPCH_Query_10|   93770   |   233391  |   2.49
> |TPCH_Query_11|   16252   |   58360   |   3.59
> |TPCH_Query_12|   142576  |   237270  |   1.66
> |TPCH_Query_13|   72682   |   343257  |   4.72
> |TPCH_Query_14|   10410   |   32337   |   3.11
> |TPCH_Query_15|   25719   |   98705   |   3.84
> |TPCH_Query_16|   21382   |   76877   |   3.60
> |TPCH_Query_17|   839683  |   2041169 |   2.43
> |TPCH_Query_18|   460570  |   1065940 |   2.31
> |TPCH_Query_19|   69075   |   82286   |   1.19
> |TPCH_Query_20|   78263   |   292041  |   3.73
> |TPCH_Query_21|   505606  |   1549690 |   3.07
> |TPCH_Query_22|   56450   |   329837  |   5.84
> |Total|   4125684 |   11254460|   
> 2.73



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1037) modify way to get HDFS port in TestHawqRegister

2016-08-31 Thread Chunling Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunling Wang updated HAWQ-1037:

Summary: modify way to get HDFS port in TestHawqRegister  (was: modify to 
get HDFS port in TestHawqRegister)

> modify way to get HDFS port in TestHawqRegister
> ---
>
> Key: HAWQ-1037
> URL: https://issues.apache.org/jira/browse/HAWQ-1037
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Tests
>Reporter: Chunling Wang
>Assignee: Chunling Wang
>
> In test TestHawqRegister, the HDFS port is hard-coded. Now we get the HDFS 
> port from HdfsConfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger

2016-08-31 Thread Hubert Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451720#comment-15451720
 ] 

Hubert Zhang commented on HAWQ-256:
---

+1 for two stage authorization. 
Hawq ranger plugin(REST service) manages the access privilege of hawq object, 
include database, table, function, language and so on.
While HDFS ranger plugin manages the access privilege of hdfs file.
They are not conflicted with each other. User must first have the privilege to 
access hawq object(calculated in planner), next user also need 
to have the privilege to access the hdfs file.
Currently, hawq use the admin user to create/append hdfs file, this is 
convenient for hawq user management.
For example, user A own table t1, and if user A grant select and insert 
privilege of table t1 to user B, user B can directly access table t1,
because on HDFS, the files of table t1 are created and accessed both by admin. 
But user-identity passing down will lead to table t1 is created by
user A and user B cannot access file directly, unless add user B to user A's 
group, or change the file privilege.
I do agree "user-identity passing down" is useful especially in hadoop eco, but 
when implementing it, pay attention to the problem I mentioned above.(Also 
this is beyond the discussion of issue256)
 

> Integrate Security with Apache Ranger
> -
>
> Key: HAWQ-256
> URL: https://issues.apache.org/jira/browse/HAWQ-256
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF, Security
>Reporter: Michael Andre Pearce (IG)
>Assignee: Lili Ma
> Fix For: backlog
>
> Attachments: HAWQRangerSupportDesign.pdf, 
> HAWQRangerSupportDesign_v0.2.pdf
>
>
> Integrate security with Apache Ranger for a unified Hadoop security solution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAWQ-256) Integrate Security with Apache Ranger

2016-08-31 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451558#comment-15451558
 ] 

Lili Ma edited comment on HAWQ-256 at 8/31/16 8:24 AM:
---

[~thebellhead], quit good questions!

1. In order for tools, syntax checking, etc to work everyone (the HAWQ public 
role) requires access to the catalog and some of the toolkit. Will Ranger-only 
access control apply only to user created tables, views and external tables?
Yes, since the catalog tables and toolkits are shared and used by various 
users, Ranger-only access control just applies to user defined objects.  But 
the objects include not only database, table and view, but also include 
function, language, schema, tablespace and protocol. You can find the detailed 
objects and privileges in the design doc. I have reviewed your proposal in 
HAWQ-1036, could you share what's your handing for the objects which don't lie 
in HDFS layer such as function, schema, language, etc?

2. If so - will gpadmin and any other HAWQ-defined roles not have access to the 
data in Ranger managed tables?
Just as you mentioned, HAWQ uses gpadmin identity to create files on HDFS, say, 
when a specified userA creates a table in HAWQ, the HDFS files for the table 
are created by gpadmin instead of userA. Since Ranger lies in Hadoop 
eco-system, it usually needs to control both HAWQ and HDFS, I think we need 
assign gpadmin to the full privileges of hawq data file directory on HDFS in 
Ranger UI previously. 

About your concern about the superuser can see all the users' data, I think 
it's kind of like the "root" role in operation system?  If the users have 
concerns about the DBA/Superuser's unlimited access, I totally agree with you 
about the solution of "passing down user-identifiy" for solving this problem :)

3. How would this be extended for the hcatalog virtual database in HAWQ? Could 
the Ranger permissions for the underlying store (for instance Hive) be read and 
enforced/reported at the HAWQ level?
If HAWQ keeps the gpadmin for operating HDFS or external storage, I think we 
just need grant the privilege to superuser. But if we have implemented the 
user-identity passing down, say, the data files on HDFS for a table created by 
userA are owned by userA instead of gpadmin, in this way we need to double 
connect to Ranger, from HAWQ and HDFS respectively.  I haven't include the 
underlying store privileges check into HAWQ side, that may need multiple code 
changes. I think keeping the privileges in the component is another choice. 
Your thoughts?

Thanks
Lili



was (Author: lilima):
[~thebellhead], quit good questions!

1. In order for tools, syntax checking, etc to work everyone (the HAWQ public 
role) requires access to the catalog and some of the toolkit. Will Ranger-only 
access control apply only to user created tables, views and external tables?
Yes, since the catalog tables and toolkits are shared and used by various 
users, Ranger-only access control just applies to user defined objects.  But 
the objects include not only database, table and view, but also include 
function, language, schema, tablespace and protocol. You can find the detailed 
objects and privileges in the design doc.

2. If so - will gpadmin and any other HAWQ-defined roles not have access to the 
data in Ranger managed tables?
Just as you mentioned, HAWQ uses gpadmin identity to create files on HDFS, say, 
when a specified userA creates a table in HAWQ, the HDFS files for the table 
are created by gpadmin instead of userA. Since Ranger lies in Hadoop 
eco-system, it usually needs to control both HAWQ and HDFS, I think we need 
assign gpadmin to the full privileges of hawq data file directory on HDFS in 
Ranger UI previously. 

About your concern about the superuser can see all the users' data, I think 
it's kind of like the "root" role in operation system?  If the users have 
concerns about the DBA/Superuser's unlimited access, I totally agree with you 
about the solution of "passing down user-identifiy" for solving this problem :)

3. How would this be extended for the hcatalog virtual database in HAWQ? Could 
the Ranger permissions for the underlying store (for instance Hive) be read and 
enforced/reported at the HAWQ level?
If HAWQ keeps the gpadmin for operating HDFS or external storage, I think we 
just need grant the privilege to superuser. But if we have implemented the 
user-identity passing down, say, the data files on HDFS for a table created by 
userA are owned by userA instead of gpadmin, in this way we need to double 
connect to Ranger, from HAWQ and HDFS respectively.  I haven't include the 
underlying store privileges check into HAWQ side, that may need multiple code 
changes. I think keeping the privileges in the component is another choice. 
Your thoughts?

Thanks
Lili


> Integrate Security with Apache Ranger
> -
>
>  

[jira] [Assigned] (HAWQ-1003) Implement enhanced hawq ACL check through Ranger

2016-08-31 Thread Hubert Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hubert Zhang reassigned HAWQ-1003:
--

Assignee: Hubert Zhang  (was: Lei Chang)

> Implement enhanced hawq ACL check through Ranger
> 
>
> Key: HAWQ-1003
> URL: https://issues.apache.org/jira/browse/HAWQ-1003
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Lili Ma
>Assignee: Hubert Zhang
> Fix For: backlog
>
>
> Implement enhanced hawq ACL check through Ranger, which means, if a query 
> contains several tables, we can combine the multiple table request together, 
> to send just one REST request to Ranger REST API Server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HAWQ-1002) Implement a switch in hawq-site.xml to configure whether use Ranger or not for ACL

2016-08-31 Thread Hubert Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hubert Zhang reassigned HAWQ-1002:
--

Assignee: Hubert Zhang  (was: Lei Chang)

> Implement a switch in hawq-site.xml to configure whether use Ranger or not 
> for ACL
> --
>
> Key: HAWQ-1002
> URL: https://issues.apache.org/jira/browse/HAWQ-1002
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Lili Ma
>Assignee: Hubert Zhang
> Fix For: backlog
>
>
> Implement a switch in hawq-site.xml to configure whether use Ranger or not 
> for ACL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger

2016-08-31 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451558#comment-15451558
 ] 

Lili Ma commented on HAWQ-256:
--

[~thebellhead], quit good questions!

1. In order for tools, syntax checking, etc to work everyone (the HAWQ public 
role) requires access to the catalog and some of the toolkit. Will Ranger-only 
access control apply only to user created tables, views and external tables?
Yes, since the catalog tables and toolkits are shared and used by various 
users, Ranger-only access control just applies to user defined objects.  But 
the objects include not only database, table and view, but also include 
function, language, schema, tablespace and protocol. You can find the detailed 
objects and privileges in the design doc.

2. If so - will gpadmin and any other HAWQ-defined roles not have access to the 
data in Ranger managed tables?
Just as you mentioned, HAWQ uses gpadmin identity to create files on HDFS, say, 
when a specified userA creates a table in HAWQ, the HDFS files for the table 
are created by gpadmin instead of userA. Since Ranger lies in Hadoop 
eco-system, it usually needs to control both HAWQ and HDFS, I think we need 
assign gpadmin to the full privileges of hawq data file directory on HDFS in 
Ranger UI previously. 

About your concern about the superuser can see all the users' data, I think 
it's kind of like the "root" role in operation system?  If the users have 
concerns about the DBA/Superuser's unlimited access, I totally agree with you 
about the solution of "passing down user-identifiy" for solving this problem :)

3. How would this be extended for the hcatalog virtual database in HAWQ? Could 
the Ranger permissions for the underlying store (for instance Hive) be read and 
enforced/reported at the HAWQ level?
If HAWQ keeps the gpadmin for operating HDFS or external storage, I think we 
just need grant the privilege to superuser. But if we have implemented the 
user-identity passing down, say, the data files on HDFS for a table created by 
userA are owned by userA instead of gpadmin, in this way we need to double 
connect to Ranger, from HAWQ and HDFS respectively.  I haven't include the 
underlying store privileges check into HAWQ side, that may need multiple code 
changes. I think keeping the privileges in the component is another choice. 
Your thoughts?

Thanks
Lili


> Integrate Security with Apache Ranger
> -
>
> Key: HAWQ-256
> URL: https://issues.apache.org/jira/browse/HAWQ-256
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF, Security
>Reporter: Michael Andre Pearce (IG)
>Assignee: Lili Ma
> Fix For: backlog
>
> Attachments: HAWQRangerSupportDesign.pdf, 
> HAWQRangerSupportDesign_v0.2.pdf
>
>
> Integrate security with Apache Ranger for a unified Hadoop security solution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-hawq issue #879: HAWQ-845. Parameterize kerberos principal service...

2016-08-31 Thread ztao1987
Github user ztao1987 commented on the issue:

https://github.com/apache/incubator-hawq/pull/879
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #879: HAWQ-845. Parameterize kerberos principal ...

2016-08-31 Thread linwen
GitHub user linwen opened a pull request:

https://github.com/apache/incubator-hawq/pull/879

HAWQ-845. Parameterize kerberos principal service name for HAWQ

This fix remove the check of "postgres" for kerberos service name if HAWQ 
is running with kerberos enable.
So that, customers can replace "postgres" with a different service name.
If user want to use a different name, below property/value should be added 
into hawq-site.xml,
otherwise "postgres" is used. 

krb_srvname
gpadmin


Please review, thanks! 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/linwen/incubator-hawq hawq_845

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq/pull/879.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #879


commit 030491b896875e574a2563d77c92fbfd1503d5bf
Author: Wen Lin 
Date:   2016-08-31T06:10:51Z

HAWQ-845. Parameterize kerberos principal name for HAWQ




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (HAWQ-1033) add --force option for hawq register

2016-08-31 Thread hongwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongwu reassigned HAWQ-1033:


Assignee: hongwu  (was: Lei Chang)

> add --force option for hawq register
> 
>
> Key: HAWQ-1033
> URL: https://issues.apache.org/jira/browse/HAWQ-1033
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: hongwu
> Fix For: 2.0.1.0-incubating
>
>
> add --force option for hawq register
> Will clear all the catalog contents in pg_aoseg.pg_paqseg_$relid while keep 
> the files on HDFS, and then re-register all the files to the table.  This is 
> for scenario cluster Disaster Recovery: Two clusters co-exist, periodically 
> import data from Cluster A to Cluster B. Need Register data to Cluster B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HAWQ-1035) support partition table register

2016-08-31 Thread hongwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongwu reassigned HAWQ-1035:


Assignee: hongwu  (was: Lei Chang)

> support partition table register
> 
>
> Key: HAWQ-1035
> URL: https://issues.apache.org/jira/browse/HAWQ-1035
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: hongwu
> Fix For: 2.0.1.0-incubating
>
>
> Support partitiont table register, limited to 1 level partition table, since 
> hawq extract only supports 1-level partition table



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HAWQ-1034) add --repair option for hawq register

2016-08-31 Thread hongwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongwu reassigned HAWQ-1034:


Assignee: hongwu  (was: Lei Chang)

> add --repair option for hawq register
> -
>
> Key: HAWQ-1034
> URL: https://issues.apache.org/jira/browse/HAWQ-1034
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: hongwu
> Fix For: 2.0.1.0-incubating
>
>
> add --repair option for hawq register
> Will change both file folder and catalog table pg_aoseg.pg_paqseg_$relid to 
> the state which .yml file configures. Note may some new generated files since 
> the checkpoint may be deleted here. Also note the all the files in .yml file 
> should all under the table folder on HDFS. Limitation: Do not support cases 
> for hash table redistribution, table truncate and table drop. This is for 
> scenario rollback of table: Do checkpoints somewhere, and need to rollback to 
> previous checkpoint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-845) Parameterize kerberos principal name for HAWQ

2016-08-31 Thread Lin Wen (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451291#comment-15451291
 ] 

Lin Wen commented on HAWQ-845:
--

For now I think we can keep 'postgres' as default kerberos service name, but 
customers should be able to parameterize it with other name.
If user want to use a different name, below property/value should be added into 
hawq-site.xml 

krb_srvname
gpadmin




> Parameterize kerberos principal name for HAWQ
> -
>
> Key: HAWQ-845
> URL: https://issues.apache.org/jira/browse/HAWQ-845
> Project: Apache HAWQ
>  Issue Type: Improvement
>Reporter: bhuvnesh chaudhary
>Assignee: Lei Chang
>Priority: Minor
> Fix For: backlog
>
>
> Currently HAWQ only accepts the principle 'postgres' for kerberos settings.
> This is because it is hardcoded in gpcheckhdfs, we should ensure that it can 
> be parameterized.
> Also, it's better to change the default principal name postgres to gpadmin. 
> It will avoid the need of changing the the hdfs directory during securing the 
> cluster to postgres and will avoid the need of maintaining postgres user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HAWQ-1025) Modify the content of yml file, and change hawq register implementation for the modification

2016-08-31 Thread hongwu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongwu reassigned HAWQ-1025:


Assignee: hongwu  (was: Lili Ma)

> Modify the content of yml file, and change hawq register implementation for 
> the modification
> 
>
> Key: HAWQ-1025
> URL: https://issues.apache.org/jira/browse/HAWQ-1025
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: hongwu
>Assignee: hongwu
> Fix For: 2.0.1.0-incubating
>
>
> 1. Add bucket number for hash-distributed table in yml file, when hawq 
> register, ensure the number of files be multiple times of the bucket number
> 2. hawq register should use the file size information in yml file to update 
> the catalog table pg_aoseg.pg_paqseg_$relid
> 3. hawq register processing steps:
>a. create table
>b. mv all the files
>c. change the catalog table once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-hawq pull request #878: HAWQ-1025.

2016-08-31 Thread xunzhang
GitHub user xunzhang opened a pull request:

https://github.com/apache/incubator-hawq/pull/878

HAWQ-1025. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xunzhang/incubator-hawq HAWQ-1025

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq/pull/878.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #878


commit b060a365c8d42e22afaaffca78affef7f4cd556f
Author: xunzhang 
Date:   2016-08-30T08:03:42Z

HAWQ-1025. Add bucket number in the yaml file of hawq extract, modify to 
use actual eof for usage1.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---