from:"hornn"

[GitHub] incubator-hawq pull request: HAWQ-703. Serialize HCatalog Complex ...

2016-05-02 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/633#discussion_r61827025
  
--- Diff: 
pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/EnumHawqType.java 
---
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hawq.pxf.api.utilities;
+
+import java.io.IOException;
+import org.codehaus.jackson.JsonGenerator;
+import org.codehaus.jackson.map.JsonSerializer;
+import org.codehaus.jackson.map.annotate.JsonSerialize;
+import org.codehaus.jackson.map.SerializerProvider;
+import org.codehaus.jackson.JsonProcessingException;
+
+class EnumHawqTypeSerializer extends JsonSerializer {
+
+@Override
+public void serialize(EnumHawqType value, JsonGenerator generator,
+  SerializerProvider provider) throws IOException,
+  JsonProcessingException {
+  generator.writeString(value.getTypeName());
--- End diff --

is it planned to add the modifiers to the serialized data? if so, maybe add 
a TODO here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-703. Serialize HCatalog Complex ...

2016-05-02 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/633#discussion_r61826885
  
--- Diff: src/bin/psql/describe.c ---
@@ -4263,8 +4263,13 @@ describePxfTable(const char *profile, const char 
*pattern, bool verbose)
printQueryOpt myopt = pset.popt;
printTableContent cont;
int cols = 0;
+   if (verbose)
+   {
--- End diff --

remove brackets here or add them to the else block


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-703. Serialize HCatalog Complex ...

2016-05-02 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/633#discussion_r61819180
  
--- Diff: 
pxf/pxf-service/src/test/java/org/apache/hawq/pxf/service/MetadataResponseFormatterTest.java
 ---
@@ -85,26 +86,45 @@ public void formatResponseStringWithModifiers() throws 
Exception {
 List fields = new ArrayList();
 Metadata.Item itemName = new Metadata.Item("default", "table1");
 Metadata metadata = new Metadata(itemName, fields);
-fields.add(new Metadata.Field("field1", "int"));
-fields.add(new Metadata.Field("field2", "numeric",
+fields.add(new Metadata.Field("field1", EnumHawqType.Int8Type, 
"bigint"));
+fields.add(new Metadata.Field("field2", EnumHawqType.NumericType, 
"decimal",
 new String[] {"1349", "1789"}));
-fields.add(new Metadata.Field("field3", "char",
+fields.add(new Metadata.Field("field3", EnumHawqType.BpcharType, 
"char",
 new String[] {"50"}));
 metadataList.add(metadata);
 
 response = MetadataResponseFormatter.formatResponse(metadataList, 
"path.file");
 StringBuilder expected = new StringBuilder("{\"PXFMetadata\":[{");
 
expected.append("\"item\":{\"path\":\"default\",\"name\":\"table1\"},")
 .append("\"fields\":[")
-.append("{\"name\":\"field1\",\"type\":\"int\"},")
-
.append("{\"name\":\"field2\",\"type\":\"numeric\",\"modifiers\":[\"1349\",\"1789\"]},")
-
.append("{\"name\":\"field3\",\"type\":\"char\",\"modifiers\":[\"50\"]}")
+
.append("{\"name\":\"field1\",\"type\":\"int8\",\"sourceType\":\"bigint\"},")
+
.append("{\"name\":\"field2\",\"type\":\"numeric\",\"sourceType\":\"decimal\",\"modifiers\":[\"1349\",\"1789\"]},")
+
.append("{\"name\":\"field3\",\"type\":\"bpchar\",\"sourceType\":\"char\",\"modifiers\":[\"50\"]}")
 .append("]}]}");
 
 assertEquals(expected.toString(), 
convertResponseToString(response));
 }
 
 @Test
+public void formatResponseStringWithSourceType() throws Exception {
+List metadataList = new ArrayList();
+List fields = new ArrayList();
+Metadata.Item itemName = new Metadata.Item("default", "table1");
+Metadata metadata = new Metadata(itemName, fields);
+fields.add(new Metadata.Field("field1", EnumHawqType.Float8Type, 
"double"));
+metadataList.add(metadata);
+
+response = MetadataResponseFormatter.formatResponse(metadataList, 
"path.file");
+StringBuilder expected = new StringBuilder("{\"PXFMetadata\":[{");
+
expected.append("\"item\":{\"path\":\"default\",\"name\":\"table1\"},")
+.append("\"fields\":[")
+
.append("{\"name\":\"field1\",\"type\":\"float8\",\"sourceType\":\"double\"}")
+.append("]}]}");
+
+//assertEquals(expected.toString(), 
convertResponseToString(response));
--- End diff --

uncomment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-703. Serialize HCatalog Complex ...

2016-05-02 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/633#discussion_r61818951
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java
 ---
@@ -186,7 +155,7 @@ public static Table getHiveTable(HiveMetaStoreClient 
client, Metadata.Item itemN
  * @param modifiers type modifiers to be verified
  * @return whether modifiers are null or integers
  */
-private static boolean verifyModifers(String[] modifiers) {
+private static boolean verifyIntegerModifers(String[] modifiers) {
--- End diff --

typo Modifiers


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-703. Serialize HCatalog Complex ...

2016-05-02 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/633#discussion_r61818778
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java
 ---
@@ -88,94 +90,61 @@ public static Table getHiveTable(HiveMetaStoreClient 
client, Metadata.Item itemN
  * Unsupported types will result in an exception.
  * 
  * The supported mappings are:
- * {@code tinyint -> int2}
- * {@code smallint -> int2}
- * {@code int -> int4}
- * {@code bigint -> int8}
- * {@code boolean -> bool}
- * {@code float -> float4}
- * {@code double -> float8}
- * {@code string -> text}
- * {@code binary -> bytea}
- * {@code timestamp -> timestamp}
- * {@code date -> date}
- * {@code decimal(precision, scale) -> numeric(precision, 
scale)}
- * {@code varchar(size) -> varchar(size)}
- * {@code char(size) -> bpchar(size)}
+ * {@code tinyint -> int2}
--- End diff --

indentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-703. Serialize HCatalog Complex ...

2016-05-02 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/633#discussion_r61818748
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java
 ---
@@ -20,9 +20,12 @@
  */
 
 
+import java.util.Arrays;
 import java.util.List;
 import java.util.ArrayList;
 
+import org.apache.hawq.pxf.api.utilities.EnumHawqType;
--- End diff --

please put these imports together with the other pxf imports below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-703. Serialize HCatalog Complex ...

2016-05-02 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/633#discussion_r61818073
  
--- Diff: pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/Metadata.java 
---
@@ -67,36 +68,43 @@ public String toString() {
 }
 
 /**
- * Class representing item field - name and type.
+ * Class representing item field - name, type, source type, modifiers.
+ * Type - exposed type of field
+ * Source type - type of field in underlying source
+ * Modifiers - additional attributes which describe type or field
  */
 public static class Field {
 private String name;
-private String type; // TODO: change to enum
+private EnumHawqType type; // field type which PXF exposes
+private String sourceType; // filed type PXF reads from
 private String[] modifiers; // type modifiers, optional field
 
-public Field(String name, String type) {
-
-if (StringUtils.isBlank(name) || StringUtils.isBlank(type)) {
-throw new IllegalArgumentException("Field name and type 
cannot be empty");
-}
-
-this.name = name;
-this.type = type;
+public Field(String name, EnumHawqType type, String sourceType) {
+if (StringUtils.isBlank(name) || 
StringUtils.isBlank(type.getTypeName())
--- End diff --

type could be null, right? maybe just check that it's not null - it 
probably can't be an empty string because it's an enum.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Fix Orca error message: Orca supports...

2016-04-26 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/630#discussion_r61138397
  
--- Diff: src/backend/gpopt/translate/CTranslatorUtils.cpp ---
@@ -1583,9 +1585,9 @@ CTranslatorUtils::PdrgpbsGroupBy
return PdrgpbsRollup(pmp, pgrcl, ulCols, phmululGrpColPos, 
pbsGrpCols);
}
 
-   if (GROUPINGTYPE_GROUPING_SETS != pgrcl->groupType)
+   if (GROUPINGTYPE_CUBE == pgrcl->groupType)
{
-   GPOS_RAISE(gpdxl::ExmaDXL, 
gpdxl::ExmiQuery2DXLUnsupportedFeature, GPOS_WSZ_LIT("Rollup and cube"));
+   GPOS_RAISE(gpdxl::ExmaDXL, 
gpdxl::ExmiQuery2DXLUnsupportedFeature, GPOS_WSZ_LIT(âCubeâ));
--- End diff --

the quote marks look off. should be `"` and not `â`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-683. Fix param name for Protocol...

2016-04-19 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/625#discussion_r60321148
  
--- Diff: pxf/README.md ---
@@ -20,7 +20,7 @@ Package Contents
 
 
 PXF is distributed as a set of RPMs -
-
+git dif
--- End diff --

dif?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-615. Handle incomptible tables w...

2016-04-04 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/551#discussion_r58473430
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveMetadataFetcher.java
 ---
@@ -62,11 +74,24 @@ public HiveMetadataFetcher(InputData md) {
 
 List metadataList = new ArrayList();
 
+if(tblsDesc.size() > 1) {
--- End diff --

What if there are 2 tables but both are not supported? in that case the 
function will return 0 tables. Is that an acceptable scenario? if so, perhaps 
add a test to cover it?

Did you consider changing the condition to have at least one 'good' table - 
check in the end of the loop if there metadataList is not empty. If it is, 
throw an exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-615. Handle incomptible tables w...

2016-04-04 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/551#discussion_r58473077
  
--- Diff: 
pxf/pxf-hive/src/test/java/org/apache/hawq/pxf/plugins/hive/HiveMetadataFetcherTest.java
 ---
@@ -154,6 +155,115 @@ public void getTableMetadata() throws Exception {
 assertEquals("int4", field.getType());
 }
 
+@Test
+public void getTableMetadataWithMultipleTables() throws Exception {
+prepareConstruction();
+
+fetcher = new HiveMetadataFetcher(inputData);
+
+String tablepattern = "*";
+String dbpattern = "*";
+String dbname = "default";
+String tablenamebase = "regulartable";
+String pattern = dbpattern + "." + tablepattern;
+
+List dbNames = new 
ArrayList(Arrays.asList(dbname));
+List tableNames = new ArrayList();
+
+// Prepare for tables
+List fields = new ArrayList();
+fields.add(new FieldSchema("field1", "string", null));
+fields.add(new FieldSchema("field2", "int", null));
+StorageDescriptor sd = new StorageDescriptor();
+sd.setCols(fields);
+
+// Mock hive tables returned from hive client
+for(int index=1;index<=2;index++) {
+String tableName = tablenamebase + index;
+tableNames.add(tableName);;
--- End diff --

two ;


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-601. Clear external table code.

2016-03-28 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/525#discussion_r57665601
  
--- Diff: src/backend/cdb/cdbdatalocality.c ---
@@ -768,12 +769,23 @@ static void check_keep_hash_and_external_table(
 * whose default value is set to default_segment_num
 */
ExtTableEntry* extEnrty = GetExtTableEntry(rel->rd_id);
-   if(extEnrty->command){
-   // command external table case
-   if (context->externTableOnClauseSegNum == 0) {
-   context->externTableOnClauseSegNum = 
targetPolicy->bucketnum;
+
+   bool isPxf = false;
+   if(!extEnrty->command && extEnrty->locations){
--- End diff --

missing spaces after `if` and before `{`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-577. Updated PXF metadata api to...

2016-03-28 Thread hornn

Github user hornn commented on the pull request:

https://github.com/apache/incubator-hawq/pull/522#issuecomment-202643273
  
+1. looks good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-577. Updated PXF metadata api to...

2016-03-28 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/522#discussion_r57654981
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/rest/MetadataResource.java
 ---
@@ -41,6 +41,7 @@
 import org.apache.hawq.pxf.api.MetadataFetcher;
 import org.apache.hawq.pxf.api.utilities.InputData;
 import org.apache.hawq.pxf.service.MetadataFetcherFactory;
+import org.apache.hawq.pxf.service.MetadataResponse;
--- End diff --

is this a new file? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-577. Updated PXF metadata api to...

2016-03-28 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/522#discussion_r57654541
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/MetadataResponseFormatter.java
 ---
@@ -46,76 +45,50 @@
  * @return JSON formatted response
  * @throws IOException if converting the data to JSON fails
  */
-public static String formatResponseString(List metadataList) 
throws IOException {
-return MetadataResponseFormatter.metadataToJSON(metadataList);
-}
-
-/**
- * Serializes a metadata in JSON,
- * To be used as the result string for HAWQ.
- * An example result is as follows:
- *
- * 
{"PXFMetadata":[{"item":{"path":"default","name":"t1"},"fields":[{"name":"a","type":"int"},{"name":"b","type":"float"}]}]}
- */
-private static String metadataToJSON(List metadataList) 
throws IOException {
-
-if (metadataList == null || metadataList.isEmpty()) {
-   return METADATA_DEFAULT_RESPONSE;
+public static MetadataResponse formatResponse(List 
metadataList, String path) throws IOException {
+/* print the fragment list to log when in debug level */
+if (LOG.isDebugEnabled()) {
+MetadataResponseFormatter.printMetadata(metadataList, path);
 }
 
-StringBuilder result = null;
-
-for(Metadata metadata: metadataList) {
-if(metadata == null) {
-throw new IllegalArgumentException("metadata object is 
null - cannot serialize");
-}
-if ((metadata.getFields() == null) || 
metadata.getFields().isEmpty()) {
-throw new IllegalArgumentException("metadata for " + 
metadata.getItem() + " contains no fields - cannot serialize");
-}
-if (result == null) {
-result = new StringBuilder("{\"PXFMetadata\":["); /* 
prefix info */
-} else {
-result.append(",");
-}
-
-ObjectMapper mapper = new ObjectMapper();
-mapper.setSerializationInclusion(Inclusion.NON_EMPTY); // 
ignore empty fields
-result.append(mapper.writeValueAsString(metadata));
-}
-
-return result.append("]}").toString(); /* append suffix info */
-
+return new MetadataResponse(metadataList);
 }
 
 /**
  * Converts metadata list to a readable string.
  * Intended for debugging purposes only.
  */
-private static String metadataToString(List metadataList) {
-StringBuilder result = new StringBuilder("Metadata:");
+private static void printMetadata(List metadataList, String 
path) {
+LOG.debug("Metadata List for path " + path + ": ");
+
+if (null == metadataList || metadataList.isEmpty()) {
+LOG.debug("No metadata");
+return;
+}
 
 for(Metadata metadata: metadataList) {
-result.append(" Metadata for item \"");
+StringBuilder result = new StringBuilder();
+result.append("Metadata for item \"");
--- End diff --

the quotes are not closed for the case of null metadata.
perhaps move `\"` to be part of line 79.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-462. Dispatch dfs_address from m...

2016-03-24 Thread hornn

Github user hornn commented on the pull request:

https://github.com/apache/incubator-hawq/pull/503#issuecomment-201108761
  
I really like this solution - it looks really clean. 
What happens in the non secured case - do we always pass this var 
regardless?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Writable security fix

2016-03-23 Thread hornn

Github user hornn commented on the pull request:

https://github.com/apache/incubator-hawq/pull/497#issuecomment-200624680
  
Not sure I understood everything, so correct me if I am wrong. 
So right now we have two flows
1. For readable tables we append dfs_address in hd_work_mgr and parse it in 
pxfuriparser.
2. For writable tables we append dfs_address as part of the URI that is 
part of the external table context. Then we parse it in gpbridgeapi, but only 
in the export case (writable).
My question is - shouldn't we consolidate the two solutions and adopt the 
2nd flow for both readable and writable? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Writable security fix

2016-03-23 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/497#discussion_r57267606
  
--- Diff: src/bin/gpfusion/gpbridgeapi.c ---
@@ -273,6 +277,26 @@ void gpbridge_export_start(PG_FUNCTION_ARGS)
 
 }
 
+void parse_namespace(char *uri, char** extracted_namespace){
+   int i = 0;
+   int len = strlen(uri);
+
+   while(uri[i] != '*' && i < len){
--- End diff --

Actually, shouldn't we parse the URI as part of the parse function we 
already have? that way we can know that we parsed everything else before we 
expect to read the delimiter and the dfs_address - similar to the way it's 
implemented right now, when we parse dfs_address that was appended by 
hd_work_mgr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Writable security fix

2016-03-23 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/497#discussion_r57267537
  
--- Diff: src/backend/cdb/cdbquerycontextdispatching.c ---
@@ -616,6 +632,37 @@ RebuildFilesystemCredentials(QueryContextInfo *cxt, 
HTAB ** currentFilesystemCre
pfree(binary);
 }
 
+
+/*
+ * deserialize the namespace data
+ */
+
+static void
+RebuildNamespace(QueryContextInfo *cxt)
+{
+
+   int len;
+   char buffer[4], *binary;
+   ReadData(cxt, buffer, sizeof(buffer), TRUE);
+
+   len = (int) ntohl(*(uint32 *) buffer);
+   binary = palloc(len);
+   if(ReadData(cxt, binary, len, TRUE))
+   {
+   StringInfoData buffer;
+   initStringInfoOfString(, binary, len);
+
+   cxt->sharedPath = palloc(buffer.len + 1);
+   while(buffer.cursor < buffer.len) {
--- End diff --

missing space after while.
Also I think the palloc + loop can be replaced by a copy function `pnstrdup`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Writable security fix

2016-03-23 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/497#discussion_r57267450
  
--- Diff: src/backend/access/external/hd_work_mgr.c ---
@@ -768,6 +769,16 @@ make_allocation_output_string(List *segment_fragments)
initStringInfo();
appendStringInfoString(, SEGWORK_PREFIX);

+   if (enable_secure_filesystem)
--- End diff --

if we append the dfs_address through the external framework, why do we need 
to append it here as well? wouldn't it work for both readable and writable 
tables?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Writable security fix

2016-03-23 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/497#discussion_r57267369
  
--- Diff: src/backend/access/external/fileam.c ---
@@ -681,6 +687,8 @@ external_insert(ExternalInsertDesc extInsertDesc, 
HeapTuple instup)
 
fcinfo.arg[0] = HeapTupleGetDatum(instup);
fcinfo.argnull[0] = false;
+   fcinfo.arg[1] = "test";
--- End diff --

what does it mean?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Writable security fix

2016-03-23 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/497#discussion_r57267295
  
--- Diff: src/include/access/fileam.h ---
@@ -44,12 +44,13 @@
  * ExternalInsertDescData is used for storing state related
  * to inserting data into a writable external table.
  */
-typedef struct ExternalInsertDescData
+typedef struct ExternalInsertDescDat
 {
Relationext_rel;
FILE*   ext_file;
char*   ext_uri;/* "command:" or 
"tablespace:" */
boolext_noop;   /* no op. this segdb 
needs to do nothing (e.g. mirror seg) */
+   char*   ext_namespace;  /* Namespace data */
--- End diff --

indentation looks off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Writable security fix

2016-03-23 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/497#discussion_r57267242
  
--- Diff: src/bin/gpfusion/gpbridgeapi.c ---
@@ -508,17 +532,16 @@ size_t fill_buffer(gphadoop_context* context, char* 
start, size_t size)
 void add_delegation_token(PxfInputData *inputData)
 {
PxfHdfsTokenData *token = NULL;
-   char* dfs_address = NULL;
+   char *dfs_address = inputData->gphduri->dfs_address;
 
if (!enable_secure_filesystem)
return;
 
token = palloc0(sizeof(PxfHdfsTokenData));
 
-   get_hdfs_location_from_filespace(_address);
-
-elog(DEBUG2, "locating token for %s", dfs_address);
+// get_hdfs_location_from_filespace(_address);
--- End diff --

remove :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Writable security fix

2016-03-23 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/497#discussion_r57267229
  
--- Diff: src/bin/gpfusion/gpbridgeapi.c ---
@@ -273,6 +277,26 @@ void gpbridge_export_start(PG_FUNCTION_ARGS)
 
 }
 
+void parse_namespace(char *uri, char** extracted_namespace){
+   int i = 0;
+   int len = strlen(uri);
+
+   while(uri[i] != '*' && i < len){
+   i++;
+   }
+
+   uri[i++] = '\0';
+   if(i < len){
+   *extracted_namespace = palloc(len - i + 1);
+   int extracted_namespace_len = len - i;
+   int j = 0;
+   while(i < len && j < extracted_namespace_len){
--- End diff --

I think this loop can be replaced by `pnstrdup`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Writable security fix

2016-03-23 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/497#discussion_r57267124
  
--- Diff: src/bin/gpfusion/gpbridgeapi.c ---
@@ -273,6 +277,26 @@ void gpbridge_export_start(PG_FUNCTION_ARGS)
 
 }
 
+void parse_namespace(char *uri, char** extracted_namespace){
+   int i = 0;
+   int len = strlen(uri);
+
+   while(uri[i] != '*' && i < len){
--- End diff --

`*` can be part of the PXF location (URL), because we support regex 
matching for file names (e.g. `file*.txt`).
If this is going to be a permanent part of the URL (not just for secured 
env), perhaps we can scan from the end? 
Another option is to replace the delimiter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Writable security fix

2016-03-23 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/497#discussion_r57266957
  
--- Diff: src/backend/gpopt/ivy.xml ---
@@ -38,7 +38,7 @@ under the License.
 
 
 
-  
+  
--- End diff --

relevant to this commit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Writable security fix

2016-03-23 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/497#discussion_r57266801
  
--- Diff: src/backend/cdb/cdbquerycontextdispatching.c ---
@@ -616,6 +632,37 @@ RebuildFilesystemCredentials(QueryContextInfo *cxt, 
HTAB ** currentFilesystemCre
pfree(binary);
 }
 
+
+/*
+ * deserialize the namespace data
+ */
+
--- End diff --

remove new line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Writable security fix

2016-03-23 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/497#discussion_r57266715
  
--- Diff: src/backend/cdb/cdbquerycontextdispatching.c ---
@@ -313,6 +313,22 @@ FinalizeQueryContextInfo(QueryContextInfo *cxt)
WriteData(cxt, buffer.data, buffer.len);
WriteData(cxt, credential, size);
 }
+
+}
+
+//Add Namespace for PXF
+if(Gp_role != GP_ROLE_EXECUTE){
+   const char *namespace = cxt->sharedPath;
+   int size = strlen(namespace);
--- End diff --

indentation look off...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Writable security fix

2016-03-23 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/497#discussion_r57266695
  
--- Diff: src/backend/cdb/cdbquerycontextdispatching.c ---
@@ -313,6 +313,22 @@ FinalizeQueryContextInfo(QueryContextInfo *cxt)
WriteData(cxt, buffer.data, buffer.len);
WriteData(cxt, credential, size);
 }
+
+}
+
+//Add Namespace for PXF
--- End diff --

please change comment to c standard `/* */`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-23 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r57214858
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java
 ---
@@ -203,19 +205,37 @@ private static boolean verifyModifers(String[] 
modifiers) {
  * It can be either table_name or 
db_name.table_name.
  *
  * @param qualifiedName Hive table name
- * @return {@link org.apache.hawq.pxf.api.Metadata.Table} object 
holding the full table name
+ * @return {@link Metadata.Item} object holding the full table name
  */
-public static Metadata.Table parseTableQualifiedName(String 
qualifiedName) {
+public static Metadata.Item extractTableFromName(String qualifiedName) 
{
+List items = extractTablesFromPattern(null, 
qualifiedName);
+if(items.isEmpty()) {
+throw new IllegalArgumentException("No tables found");
--- End diff --

We have code in HAWQ (in libchurl) that already handles exceptions from 
PXF. It could be improved of course, but we cannot rely solely on json because 
some exceptions are thrown directly from tomcat, or are a result of 
connectivity issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-465. Implement stored procedure ...

2016-03-22 Thread hornn

Github user hornn commented on the pull request:

https://github.com/apache/incubator-hawq/pull/479#issuecomment-200136727
  
@sansanichfb Thank you for the link to the design doc - it made things much 
more clear.
I think the examples in the JIRA need to be updated, though - the pattern 
and path are combined.
Great job, it looks really good!

@shivzone Good point regarding the API changes. We didn't consider take 
releases into account until now when changing the protocol version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-465. Implement stored procedure ...

2016-03-22 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/479#discussion_r57101750
  
--- Diff: src/backend/utils/adt/pxf_functions.c ---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "catalog/external/externalmd.h"
+#include "postgres.h"
+#include "fmgr.h"
+#include "funcapi.h"
+#include "utils/builtins.h"
+
+
+typedef struct ObjectContext
+{
+   ListCell *current_object;
+   ListCell *current_field;
+} ObjectContext;
+
+ListCell* pxf_object_fields_enum_start(text *profile, text *pattern);
+ObjectContext* pxf_object_fields_enum_next(ObjectContext *object_context);
+void pxf_object_fields_enum_end(void);
+
+ListCell*
+pxf_object_fields_enum_start(text *profile, text *pattern)
+{
+   List *objects = NIL;
+
+   char *profile_cstr = text_to_cstring(profile);
+   char *pattern_cstr = text_to_cstring(pattern);
+
+   objects = get_pxf_object_metadata(profile_cstr, pattern_cstr, NULL);
+
+   return list_head(objects);
+}
+
+ObjectContext*
+pxf_object_fields_enum_next(ObjectContext *object_context)
+{
+
+   //first time call
+   if (object_context->current_object && !object_context->current_field)
+   object_context->current_field = list_head(((PxfItem *) 
lfirst(object_context->current_object))->fields);
+
+   //next field for the same object
+   else if lnext(object_context->current_field)
+   object_context->current_field = 
lnext(object_context->current_field);
+   //next table
+   else if lnext(object_context->current_object)
+   {
+   object_context->current_object = 
lnext(object_context->current_object);
+   object_context->current_field = list_head(((PxfItem *) 
lfirst(object_context->current_object))->fields);
+
+   //no objects, no fields left
+   } else
+   object_context = NULL;
+
+   return object_context;
+}
+
+void pxf_object_fields_enum_end(void)
+{
+   //cleanup
+}
+
+Datum pxf_get_object_fields(PG_FUNCTION_ARGS)
+{
+   MemoryContext oldcontext;
+   FuncCallContext *funcctx;
+   HeapTuple tuple;
+   Datum result;
+   Datum values[4];
+   bool nulls[4];
+
+   ObjectContext *object_context;
+
+   text *profile = PG_GETARG_TEXT_P(0);
+   text *pattern = PG_GETARG_TEXT_P(1);
+
+   /* stuff done only on the first call of the function */
+   if (SRF_IS_FIRSTCALL())
+   {
+   TupleDesc tupdesc;
+
+   /* create a function context for cross-call persistence */
+   funcctx = SRF_FIRSTCALL_INIT();
+
+   /*
+* switch to memory context appropriate for multiple function 
calls
+*/
+   oldcontext = 
MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+   /* initialize object fileds metadata scanning code */
+   object_context = (ObjectContext *) 
palloc0(sizeof(ObjectContext));
+   object_context->current_object = 
pxf_object_fields_enum_start(profile, pattern);
+   funcctx->user_fctx = (void *) object_context;
+
+   /*
+* build tupdesc for result tuples. This must match this 
function's
+* pg_proc entry!
+*/
+   tupdesc = CreateTemplateTupleDesc(4, false);
+   TupleDescInitEntry(tupdesc, (AttrNumber) 1, "path",
+   TEXTOID, -1, 0);
+   TupleDescInitEntry(tupdesc, (AttrNumber) 2, "objectname",
+   TEXTOID, -1, 0);
+   TupleDescInitEntry(tupdesc, (AttrNumber) 3, "columnname",
+   TEXTOID, -1, 0);
+   TupleDescInitEntry(tupdesc, (AttrNumber) 4, "columntype",

[GitHub] incubator-hawq pull request: HAWQ-465. Implement stored procedure ...

2016-03-22 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/479#discussion_r57101396
  
--- Diff: src/include/catalog/namespace.h ---
@@ -95,4 +95,6 @@ extern char *namespace_search_path;
 
 extern List *fetch_search_path(bool includeImplicit);
 
+#define HiveProfileName "Hive"
--- End diff --

then I think it should be part of namespace.c - the header file is used by 
other files, so if it's not useful for them it shouldn't be there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-465. Implement stored procedure ...

2016-03-22 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/479#discussion_r57044806
  
--- Diff: src/test/unit/mock/backend/utils/adt/pxf_functions_mock.c ---
@@ -0,0 +1,58 @@
+/*
--- End diff --

I don't auto-generated files should be committed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-465. Implement stored procedure ...

2016-03-22 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/479#discussion_r57044714
  
--- Diff: src/test/regress/output/json_load.source ---
@@ -184,7 +184,7 @@ END TRANSACTION;
 -- negative test: duplicated tables
 BEGIN TRANSACTION;
 SELECT 
load_json_data('@abs_builddir@/data/hcatalog/multi_table_duplicates.json');
-ERROR:  relation "hcatalog.db.t" already exists
+ERROR:  relation "db.t" already exists in namespace with oid=4284481535
--- End diff --

is this oid fixed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-465. Implement stored procedure ...

2016-03-22 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/479#discussion_r57044521
  
--- Diff: src/backend/utils/adt/pxf_functions.c ---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "catalog/external/externalmd.h"
+#include "postgres.h"
+#include "fmgr.h"
+#include "funcapi.h"
+#include "utils/builtins.h"
+
+
+typedef struct ObjectContext
+{
+   ListCell *current_object;
+   ListCell *current_field;
+} ObjectContext;
+
+ListCell* pxf_object_fields_enum_start(text *profile, text *pattern);
+ObjectContext* pxf_object_fields_enum_next(ObjectContext *object_context);
+void pxf_object_fields_enum_end(void);
+
+ListCell*
+pxf_object_fields_enum_start(text *profile, text *pattern)
+{
+   List *objects = NIL;
+
+   char *profile_cstr = text_to_cstring(profile);
+   char *pattern_cstr = text_to_cstring(pattern);
+
+   objects = get_pxf_object_metadata(profile_cstr, pattern_cstr, NULL);
+
+   return list_head(objects);
+}
+
+ObjectContext*
+pxf_object_fields_enum_next(ObjectContext *object_context)
+{
+
+   //first time call
+   if (object_context->current_object && !object_context->current_field)
+   object_context->current_field = list_head(((PxfItem *) 
lfirst(object_context->current_object))->fields);
+
+   //next field for the same object
+   else if lnext(object_context->current_field)
+   object_context->current_field = 
lnext(object_context->current_field);
+   //next table
+   else if lnext(object_context->current_object)
+   {
+   object_context->current_object = 
lnext(object_context->current_object);
+   object_context->current_field = list_head(((PxfItem *) 
lfirst(object_context->current_object))->fields);
+
+   //no objects, no fields left
+   } else
+   object_context = NULL;
+
+   return object_context;
+}
+
+void pxf_object_fields_enum_end(void)
+{
+   //cleanup
+}
+
+Datum pxf_get_object_fields(PG_FUNCTION_ARGS)
+{
+   MemoryContext oldcontext;
+   FuncCallContext *funcctx;
+   HeapTuple tuple;
+   Datum result;
+   Datum values[4];
+   bool nulls[4];
+
+   ObjectContext *object_context;
+
+   text *profile = PG_GETARG_TEXT_P(0);
+   text *pattern = PG_GETARG_TEXT_P(1);
+
+   /* stuff done only on the first call of the function */
+   if (SRF_IS_FIRSTCALL())
+   {
+   TupleDesc tupdesc;
+
+   /* create a function context for cross-call persistence */
+   funcctx = SRF_FIRSTCALL_INIT();
+
+   /*
+* switch to memory context appropriate for multiple function 
calls
+*/
+   oldcontext = 
MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+   /* initialize object fileds metadata scanning code */
+   object_context = (ObjectContext *) 
palloc0(sizeof(ObjectContext));
+   object_context->current_object = 
pxf_object_fields_enum_start(profile, pattern);
+   funcctx->user_fctx = (void *) object_context;
+
+   /*
+* build tupdesc for result tuples. This must match this 
function's
+* pg_proc entry!
+*/
+   tupdesc = CreateTemplateTupleDesc(4, false);
+   TupleDescInitEntry(tupdesc, (AttrNumber) 1, "path",
+   TEXTOID, -1, 0);
+   TupleDescInitEntry(tupdesc, (AttrNumber) 2, "objectname",
+   TEXTOID, -1, 0);
+   TupleDescInitEntry(tupdesc, (AttrNumber) 3, "columnname",
+   TEXTOID, -1, 0);
+   TupleDescInitEntry(tupdesc, (AttrNumber) 4, "columntype",

[GitHub] incubator-hawq pull request: HAWQ-465. Implement stored procedure ...

2016-03-22 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/479#discussion_r57041688
  
--- Diff: src/backend/utils/adt/pxf_functions.c ---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "catalog/external/externalmd.h"
--- End diff --

please re-org the includes in lexicographic order (but I think postgres.h 
should be in the beginning)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-465. Implement stored procedure ...

2016-03-22 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/479#discussion_r57040549
  
--- Diff: src/backend/catalog/external/externalmd.c ---
@@ -23,10 +23,12 @@
  *  Author: antova
  *
  *
- * Utilities for loading external hcatalog metadata
+ * Utilities for loading external PXF metadata
  *
  */
 
+#include "catalog/external/externalmd.h"
--- End diff --

please move to its right place in the include list (after 
catalog/catquery.h)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-465. Implement stored procedure ...

2016-03-22 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/479#discussion_r57040438
  
--- Diff: src/backend/access/external/pxfmasterapi.c ---
@@ -258,14 +258,20 @@ ha_failover(GPHDUri *hadoop_uri,
}
 }
 
-/* Concatenate two literal strings using stringinfo */
-char* concat(char *body, char *tail)
+/* Concatenate multiple literal strings using stringinfo */
+char* concat(int num_args, ...)
--- End diff --

:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on the pull request:

https://github.com/apache/incubator-hawq/pull/477#issuecomment-199623793
  
@shivzone Really nice changes!
Does it affect the way PXF interacts with HAWQ? IIRC we already assume the 
returned data will be a list of items, but I think the JSON field names are 
different now.
I think since the JSON answer is different, the API should change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r56933654
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/utilities/ProtocolData.java
 ---
@@ -95,6 +95,7 @@ public ProtocolData(Map<String, String> paramsMap) {
 accessor = getProperty("ACCESSOR");
 resolver = getProperty("RESOLVER");
 fragmenter = getOptionalProperty("FRAGMENTER");
+metadata = getOptionalProperty("METADATA");
--- End diff --

perhaps call it MDFETCHER ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r56933558
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/MetadataResponseFormatter.java
 ---
@@ -55,50 +54,65 @@ public static String formatResponseString(Metadata 
metadata) throws IOException
  * To be used as the result string for HAWQ.
  * An example result is as follows:
  *
- * 
{"PXFMetadata":[{"table":{"dbName":"default","tableName":"t1"},"fields":[{"name":"a","type":"int"},{"name":"b","type":"float"}]}]}
+ * 
{"PXFMetadata":[{"item":{"path":"default","name":"t1"},"fields":[{"name":"a","type":"int"},{"name":"b","type":"float"}]}]}
  */
-private static String metadataToJSON(Metadata metadata) throws 
IOException {
+private static String metadataToJSON(List metadataList) 
throws IOException {
--- End diff --

Have a look at the way we serialize the Fragments - we had a problem of 
having too many fragments, so we needed to serialize them inside a streaming 
object. 
I think that now that the metadata can include multiple tables, the same 
approach should be taken here as well.
The same applies also for the debug function that prints all of the items - 
if there are too many of them the StringBuilder will run out of memory. The 
solution in the fragments case was to print a log of one fragment at a time - 
perhaps the same should be done here as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r56933465
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/rest/MetadataResource.java
 ---
@@ -56,51 +61,53 @@ public MetadataResource() throws IOException {
 }
 
 /**
- * This function queries the HiveMetaStore to get the given table's
- * metadata: Table name, field names, field types. The types are 
converted
- * from HCatalog types to HAWQ types. Supported HCatalog types: 
TINYINT,
- * SMALLINT, INT, BIGINT, BOOLEAN, FLOAT, DOUBLE, STRING, BINARY, 
TIMESTAMP,
- * DATE, DECIMAL, VARCHAR, CHAR. 
+ * This function queries the underlying store based on the given 
profile to get schema for items that match the given pattern
+ * metadata: Item name, field names, field types. The types are 
converted
+ * from the underlying types to HAWQ types.
  * Unsupported types result in an error. 
  * Response Examples:
  * For a table default.t1 with 2 fields (a int, b float) 
will
  * be returned as:
- * 
{"PXFMetadata":[{"table":{"dbName":"default","tableName":"t1"},"fields":[{"name":"a","type":"int"},{"name":"b","type":"float"}]}]}
+ * 
{"PXFMetadata":[{"item":{"path":"default","name":"t1"},"fields":[{"name":"a","type":"int"},{"name":"b","type":"float"}]}]}
  *
  * @param servletContext servlet context
  * @param headers http headers
- * @param table HCatalog table name
- * @return JSON formatted response with metadata for given table
- * @throws Exception if connection to Hcatalog failed, table didn't 
exist or
+ * @param profile based on this the metadata source can be inferred
+ * @param pattern table/file name or pattern in the given source
+ * @return JSON formatted response with metadata of each item that 
corresponds to the pattern
+ * @throws Exception if connection to the source/catalog failed, item 
didn't exist for the pattern
  * its type or fields are not supported
  */
 @GET
-@Path("getTableMetadata")
+@Path("getMetadata")
 @Produces("application/json")
 public Response read(@Context final ServletContext servletContext,
  @Context final HttpHeaders headers,
- @QueryParam("table") final String table)
+ @QueryParam("profile") final String profile,
+ @QueryParam("pattern") final String pattern)
 throws Exception {
-LOG.debug("getTableMetadata started");
+LOG.debug("getMetadata started");
 String jsonOutput;
 try {
+
+// Convert headers into a regular map
+Map<String, String> params = 
convertToCaseInsensitiveMap(headers.getRequestHeaders());
+
+// Add profile and verify token
+params.put("X-GP-PROFILE", profile.toLowerCase());
+ProtocolData protData = new ProtocolData(params);
+SecuredHDFS.verifyToken(protData, servletContext);
+
 // 1. start MetadataFetcher
-MetadataFetcher metadataFetcher = 
MetadataFetcherFactory.create("org.apache.hawq.pxf.plugins.hive.HiveMetadataFetcher");
 // TODO:
-   
  // nhorn
-   
  // -
-   
  // 09-03-15
-   
  // -
-   
  // pass
-   
  // as
-   
  // param
+MetadataFetcher metadataFetcher = 
MetadataFetcherFactory.create(protData);
--- End diff --

aweome!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r56933378
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/MetadataResponseFormatter.java
 ---
@@ -55,50 +54,65 @@ public static String formatResponseString(Metadata 
metadata) throws IOException
  * To be used as the result string for HAWQ.
  * An example result is as follows:
  *
- * 
{"PXFMetadata":[{"table":{"dbName":"default","tableName":"t1"},"fields":[{"name":"a","type":"int"},{"name":"b","type":"float"}]}]}
+ * 
{"PXFMetadata":[{"item":{"path":"default","name":"t1"},"fields":[{"name":"a","type":"int"},{"name":"b","type":"float"}]}]}
  */
-private static String metadataToJSON(Metadata metadata) throws 
IOException {
+private static String metadataToJSON(List metadataList) 
throws IOException {
 
-if (metadata == null) {
-throw new IllegalArgumentException("metadata object is null - 
cannot serialize");
+if (metadataList == null || metadataList.isEmpty()) {
+throw new IllegalArgumentException("no metadata objects found 
- cannot serialize");
 }
 
-if ((metadata.getFields() == null) || 
metadata.getFields().isEmpty()) {
-throw new IllegalArgumentException("metadata contains no 
fields - cannot serialize");
+StringBuilder result = null;
+
+for(Metadata metadata: metadataList) {
+if(metadata == null) {
+throw new IllegalArgumentException("metadata object is 
null - cannot serialize");
+}
+if ((metadata.getFields() == null) || 
metadata.getFields().isEmpty()) {
+throw new IllegalArgumentException("metadata contains no 
fields - cannot serialize");
--- End diff --

do you want to specify what is the faulty item or do we not care?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r5695
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/MetadataResponseFormatter.java
 ---
@@ -55,50 +54,65 @@ public static String formatResponseString(Metadata 
metadata) throws IOException
  * To be used as the result string for HAWQ.
  * An example result is as follows:
  *
- * 
{"PXFMetadata":[{"table":{"dbName":"default","tableName":"t1"},"fields":[{"name":"a","type":"int"},{"name":"b","type":"float"}]}]}
+ * 
{"PXFMetadata":[{"item":{"path":"default","name":"t1"},"fields":[{"name":"a","type":"int"},{"name":"b","type":"float"}]}]}
--- End diff --

if it's a list of items, should there be more `[` or `{` in the jsob?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r56933223
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/MetadataFetcherFactory.java
 ---
@@ -28,7 +30,12 @@
  * abstract class which is returned by the MetadataFetcherFactory. 
  */
 public class MetadataFetcherFactory {
-static public MetadataFetcher create(String fetcherName) throws 
Exception {
-return (MetadataFetcher) Class.forName(fetcherName).newInstance();
+/* TODO: This is a tempororary workaround.
+ * The metadata class will be moved to the pxf-profile.xml in the 
future
+ */
--- End diff --

btw, why is it temporary? currently, both profile and specific plugin names 
are supported - so it will probably be possible to define a profile that 
includes the metadata fetcher, or specify the metadata fetcher itself. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r56933120
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/MetadataFetcherFactory.java
 ---
@@ -28,7 +30,12 @@
  * abstract class which is returned by the MetadataFetcherFactory. 
  */
 public class MetadataFetcherFactory {
-static public MetadataFetcher create(String fetcherName) throws 
Exception {
-return (MetadataFetcher) Class.forName(fetcherName).newInstance();
+/* TODO: This is a tempororary workaround.
--- End diff --

typo temporary 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r56933030
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java
 ---
@@ -225,15 +245,47 @@ private static boolean verifyModifers(String[] 
modifiers) {
 }
 
 if (toks.size() == 1) {
-dbName = HIVE_DEFAULT_DBNAME;
-tableName = toks.get(0);
+dbPattern = HIVE_DEFAULT_DBNAME;
+tablePattern = toks.get(0);
 } else if (toks.size() == 2) {
-dbName = toks.get(0);
-tableName = toks.get(1);
+dbPattern = toks.get(0);
+tablePattern = toks.get(1);
 } else {
-throw new IllegalArgumentException("\"" + qualifiedName + "\"" 
+ errorMsg);
+throw new IllegalArgumentException("\"" + pattern + "\"" + 
errorMsg);
+}
+
+return getTablesFromPattern(client, dbPattern, tablePattern);
+
+}
+
+private static List 
getTablesFromPattern(HiveMetaStoreClient client, String dbPattern, String 
tablePattern) {
+
+List databases = null;
+List itemList = new ArrayList();
+List tables = new ArrayList();
+
+if(client == null || (!dbPattern.contains(WILDCARD) && 
!tablePattern.contains(WILDCARD)) ) {
+/* This case occurs when the call is invoked as part of the 
fragmenter api or when metadata is requested for a specific table name */
+itemList.add(new Metadata.Item(dbPattern, tablePattern));
+return itemList;
 }
 
-return new Metadata.Table(dbName, tableName);
+try {
+databases = client.getDatabases(dbPattern);
+if(databases.isEmpty()) {
+throw new IllegalArgumentException("No database found for 
the given pattern");
+}
+for(String dbName: databases) {
+for(String tableName: client.getTables(dbName, 
tablePattern)) {
--- End diff --

can `getTables` accept as parameters both `dbName` pattern and 
`tablePattern` or does the `dbName` must be a real db?
If it's possible, perhaps we can get the list in one call.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r56932987
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java
 ---
@@ -225,15 +245,47 @@ private static boolean verifyModifers(String[] 
modifiers) {
 }
 
 if (toks.size() == 1) {
-dbName = HIVE_DEFAULT_DBNAME;
-tableName = toks.get(0);
+dbPattern = HIVE_DEFAULT_DBNAME;
+tablePattern = toks.get(0);
 } else if (toks.size() == 2) {
-dbName = toks.get(0);
-tableName = toks.get(1);
+dbPattern = toks.get(0);
+tablePattern = toks.get(1);
 } else {
-throw new IllegalArgumentException("\"" + qualifiedName + "\"" 
+ errorMsg);
+throw new IllegalArgumentException("\"" + pattern + "\"" + 
errorMsg);
+}
+
+return getTablesFromPattern(client, dbPattern, tablePattern);
+
+}
+
+private static List 
getTablesFromPattern(HiveMetaStoreClient client, String dbPattern, String 
tablePattern) {
+
+List databases = null;
+List itemList = new ArrayList();
+List tables = new ArrayList();
+
+if(client == null || (!dbPattern.contains(WILDCARD) && 
!tablePattern.contains(WILDCARD)) ) {
+/* This case occurs when the call is invoked as part of the 
fragmenter api or when metadata is requested for a specific table name */
+itemList.add(new Metadata.Item(dbPattern, tablePattern));
+return itemList;
 }
 
-return new Metadata.Table(dbName, tableName);
+try {
+databases = client.getDatabases(dbPattern);
+if(databases.isEmpty()) {
+throw new IllegalArgumentException("No database found for 
the given pattern");
+}
+for(String dbName: databases) {
+for(String tableName: client.getTables(dbName, 
tablePattern)) {
+if (!tableName.isEmpty()) {
--- End diff --

why do you need to check for an empty string?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r56932849
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java
 ---
@@ -225,15 +245,47 @@ private static boolean verifyModifers(String[] 
modifiers) {
 }
 
 if (toks.size() == 1) {
-dbName = HIVE_DEFAULT_DBNAME;
-tableName = toks.get(0);
+dbPattern = HIVE_DEFAULT_DBNAME;
+tablePattern = toks.get(0);
 } else if (toks.size() == 2) {
-dbName = toks.get(0);
-tableName = toks.get(1);
+dbPattern = toks.get(0);
+tablePattern = toks.get(1);
 } else {
-throw new IllegalArgumentException("\"" + qualifiedName + "\"" 
+ errorMsg);
+throw new IllegalArgumentException("\"" + pattern + "\"" + 
errorMsg);
+}
+
+return getTablesFromPattern(client, dbPattern, tablePattern);
+
--- End diff --

remove extra line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r56932656
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java
 ---
@@ -203,19 +205,37 @@ private static boolean verifyModifers(String[] 
modifiers) {
  * It can be either table_name or 
db_name.table_name.
  *
  * @param qualifiedName Hive table name
- * @return {@link org.apache.hawq.pxf.api.Metadata.Table} object 
holding the full table name
+ * @return {@link Metadata.Item} object holding the full table name
  */
-public static Metadata.Table parseTableQualifiedName(String 
qualifiedName) {
+public static Metadata.Item extractTableFromName(String qualifiedName) 
{
+List items = extractTablesFromPattern(null, 
qualifiedName);
+if(items.isEmpty()) {
+throw new IllegalArgumentException("No tables found");
+}
+return items.get(0);
+}
 
-String dbName, tableName;
+/**
+ * Extracts the db_name and table_name from the qualifiedName.
--- End diff --

qualifiedName -> pattern? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r56932687
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/utilities/HiveUtilities.java
 ---
@@ -203,19 +205,37 @@ private static boolean verifyModifers(String[] 
modifiers) {
  * It can be either table_name or 
db_name.table_name.
  *
  * @param qualifiedName Hive table name
- * @return {@link org.apache.hawq.pxf.api.Metadata.Table} object 
holding the full table name
+ * @return {@link Metadata.Item} object holding the full table name
  */
-public static Metadata.Table parseTableQualifiedName(String 
qualifiedName) {
+public static Metadata.Item extractTableFromName(String qualifiedName) 
{
+List items = extractTablesFromPattern(null, 
qualifiedName);
+if(items.isEmpty()) {
+throw new IllegalArgumentException("No tables found");
+}
+return items.get(0);
+}
 
-String dbName, tableName;
+/**
+ * Extracts the db_name and table_name from the qualifiedName.
+ * qualifiedName is the Hive table name or pattern that the user 
enters in the CREATE EXTERNAL TABLE statement
+ * or when querying HCatalog table.
+ * It can be either table_name_pattern or 
db_name_pattern.table_name_pattern.
+ *
+ * @param client Hivemetastore client
+ * @param pattern Hive table name or pattern
+ * @return {@link Metadata.Item} object holding the full table name
--- End diff --

returns list, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r56932473
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveDataFragmenter.java
 ---
@@ -251,11 +251,11 @@ private void fetchTableMetaData(Metadata.Table 
tblDesc) throws Exception {
 StorageDescriptor descPartition = partition.getSd();
 props = MetaStoreUtils.getSchema(descPartition, descTable,
 null, // Map<string, string> parameters - can be 
empty
-tblDesc.getDbName(), tblDesc.getTableName(), // 
table
+tblDesc.getPath(), tblDesc.getName(), // table
--- End diff --

fix comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r56932413
  
--- Diff: 
pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/InputData.java ---
@@ -46,6 +46,7 @@
 protected String accessor;
 protected String resolver;
 protected String fragmenter;
+protected String metadata;
--- End diff --

since this is actually the MetadataFetcher class name, perhaps rename 
metadataFetcher? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ 459 Enhanced metadata api to sup...

2016-03-21 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/477#discussion_r56932345
  
--- Diff: 
pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/MetadataFetcher.java ---
@@ -19,27 +19,33 @@
  * under the License.
  */
 
+import java.util.List;
+
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+
 
 /**
- * Abstract class that defines getting metadata of a table.
+ * Abstract class that defines getting metadata of an item.
--- End diff --

an item -> items :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Fixing error message for exttab1 sour...

2016-03-10 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/433#discussion_r55791714
  
--- Diff: src/test/regress/output/exttab1.source ---
@@ -669,7 +669,7 @@ HINT:  use the gpfdist protocol or COPY FROM instead
 create writable external table wet_neg1(a text, b text) 
location('gpfdist://@hostname@:7070/wet.out', 
'gpfdist://@hostname@:7070/wet.out') format 'text';
 ERROR:  location uri "gpfdist://@hostname@:7070/wet.out" appears more than 
once
 create writable external web table wet_pos5(a text, b text) execute 'some 
command' on segment 0 format 'text';
-ERROR:  ON clause may not be used with a writable external table
+ERROR:  the ON segment syntax for writableexternal tables is 
deprecated
--- End diff --

just looks like a typo: `writableexternal`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Fixing error message for exttab1 sour...

2016-03-10 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/433#discussion_r55789501
  
--- Diff: src/test/regress/output/exttab1.source ---
@@ -669,7 +669,7 @@ HINT:  use the gpfdist protocol or COPY FROM instead
 create writable external table wet_neg1(a text, b text) 
location('gpfdist://@hostname@:7070/wet.out', 
'gpfdist://@hostname@:7070/wet.out') format 'text';
 ERROR:  location uri "gpfdist://@hostname@:7070/wet.out" appears more than 
once
 create writable external web table wet_pos5(a text, b text) execute 'some 
command' on segment 0 format 'text';
-ERROR:  ON clause may not be used with a writable external table
+ERROR:  the ON segment syntax for writableexternal tables is 
deprecated
--- End diff --

should there be so many spaces in the middle of the error?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-462. Adding dfs_address from pgf...

2016-03-08 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/412#discussion_r55459533
  
--- Diff: src/backend/access/external/pxfuriparser.c ---
@@ -634,9 +638,20 @@ GPHDUri_parse_segwork(GPHDUri *uri, const char 
*uri_str)
segwork = strstr(uri_str, segwork_substring);
if (segwork == NULL)
return;
-
segwork += strlen(segwork_substring);
 
+   if (enable_secure_filesystem)
+   {
+   /* parse dfs address */
+   size_end = strchr(segwork, segwork_dfs_separator);
+   if(size_end != NULL)
--- End diff --

if it's a secured environment, if there is no dfs address data maybe we 
should error out. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-462. Adding dfs_address from pgf...

2016-03-08 Thread hornn

Github user hornn commented on the pull request:

https://github.com/apache/incubator-hawq/pull/412#issuecomment-194056909
  
Looks good. Do you plan to open a separate story for writable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-462. Adding dfs_address from pgf...

2016-03-04 Thread hornn

Github user hornn commented on the pull request:

https://github.com/apache/incubator-hawq/pull/412#issuecomment-192448360
  
General question - does it also solve the problem for writable? I thought 
for writable tables the code path didn't pass through hd_work_mgr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-462. Adding dfs_address from pgf...

2016-03-04 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/412#discussion_r55083101
  
--- Diff: src/backend/access/external/pxfuriparser.c ---
@@ -719,7 +727,7 @@ GPHDUri_parse_fragment(char* fragment, List* fragments)
has_user_data = true;
*value_end = '\0';
}
-   fragment_data->fragment_md = pstrdup(value_start);
+   fragment_data->dfs_address = pstrdup(value_start);
--- End diff --

please make sure to pfree it in GPHDUri_free_fragments.
I also noticed that we don't free fragment_md and user_data.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-462. Adding dfs_address from pgf...

2016-03-04 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/412#discussion_r55082603
  
--- Diff: src/backend/access/external/hd_work_mgr.c ---
@@ -783,6 +786,10 @@ make_allocation_output_string(List *segment_fragments)
appendStringInfoChar(_str, SEGWORK_IN_PAIR_DELIM);
if (frag->fragment_md)
appendStringInfo(_str, "%s", 
frag->fragment_md);
+   /* Adding dfs_address from pg_filespace entry required for 
HAWQ-462 */
+   appendStringInfoChar(_str, SEGWORK_IN_PAIR_DELIM);
+   if (dfs_address)
--- End diff --

why do we need to pass this value for each fragment? wouldn't it be better 
to pass it just once, perhaps append it in the end of the segments string 
(after the loop)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request:

2016-03-01 Thread hornn

Github user hornn commented on the pull request:


https://github.com/apache/incubator-hawq/commit/495bb2a2b264da533be237d312507d5e97faec53#commitcomment-16435206
  
In src/bin/gpfusion/gpbridgeapi.c:
In src/bin/gpfusion/gpbridgeapi.c on line 518:
I don't know if the filespace info is available from the segment or not - 
but from your description it sounds like it's not.

In that case, I think the solution might be to propagate the dfs_url from 
the master to the segment as part of the external scan info. For readable table 
it should be easy - the data can be added in hd_work_mgr, together with the 
fragments info. For writable tables it's more tricky, because there is no 
pxf-specific code that is called in the master for writable tables. So I think 
a better solution would be to always make this info available to external 
protocol code, by making it part of the external scan, or perhaps populating a 
global variable when the segment starts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

2016-02-26 Thread hornn

Github user hornn closed the pull request at:

https://github.com/apache/incubator-hawq/pull/279


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Karthifoyzur hawq 368

2016-02-23 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/375#discussion_r53857118
  
--- Diff: src/test/regress/pg_regress_main.c ---
@@ -108,12 +108,7 @@ static void
 psql_init(void)
 {
/* set default regression database name */
-   if (!dblist)
--- End diff --

why is the if check not needed anymore? can't psql_init be called a second 
time when dblist is not null?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Karthifoyzur hawq 368

2016-02-23 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/375#discussion_r53856594
  
--- Diff: src/backend/tcop/postgres.c ---
@@ -4652,7 +4652,22 @@ PostgresMain(int argc, char *argv[], const char 
*username)
}
 
IdleTracker_DeactivateProcess();
-   firstchar = ReadCommand(_message);
+   /*
+* During read command, if we have any exception we want to 
reactivate the process
+* before falling back to postgres main exception handler.
+*/
+   PG_TRY();
+   {
+   firstchar = ReadCommand(_message);
+   }
+   PG_CATCH();
+   {
+   IdleTracker_ActivateProcess();
+   elog(LOG, "Caught exception while reading command");
+   PG_RE_THROW();
+   }
+PG_END_TRY();
--- End diff --

fix indentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-17 Thread hornn

Github user hornn commented on the pull request:

https://github.com/apache/incubator-hawq/pull/302#issuecomment-185331246
  
There are also a few minor fixes to be merged: 
https://github.com/tzolov/incubator-hawq/pull/1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

2016-02-17 Thread hornn

Github user hornn commented on the pull request:

https://github.com/apache/incubator-hawq/pull/279#issuecomment-185330578
  
Awaiting verification on Isilon cluster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-405. Fix for handling of pg_temp...

2016-02-10 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/338#discussion_r52531944
  
--- Diff: src/backend/catalog/namespace.c ---
@@ -1495,18 +1495,34 @@ LookupInternalNamespaceId(const char *nspname)
 Oid
 LookupNamespaceId(const char *nspname, Oid dboid)
 {
-if(gp_upgrade_mode){
+
+   /* check for pg_temp alias */
+   if (NSPDBOID_CURRENT == dboid && strcmp(nspname, "pg_temp") == 0)
+   {
+   if (TempNamespaceValid(true))
+   return myTempNamespace;
+   /*
+* Since this is used only for looking up existing objects, 
there
+* is no point in trying to initialize the temp namespace here;
+* and doing so might create problems for some callers.
+* Just fall through and give the "does not exist" error.
+*/
+   }
+
+if(gp_upgrade_mode)
+{
 // This code is only need at hawq2.0 upgrade, in this case the old 
pg_namespace doesn't 
 // have column nspdboid, so we report error in else clause
-if(ObjectIdGetDatum(dboid)==NSPDBOID_CURRENT){
+if(ObjectIdGetDatum(dboid)==NSPDBOID_CURRENT)
+{
 return caql_getoid(
NULL,
cql("SELECT oid FROM pg_namespace "
" WHERE nspname = :1",
CStringGetDatum((char *) nspname)));
-}else{
-elog(ERROR, "Upgrade cann't process the namespace: %s in dbid: 
%i", nspname, dboid);
 }
+else
+   elog(ERROR, "Upgrade cann't process the namespace: %s in dbid: 
%i", nspname, dboid);
--- End diff --

typo cannot


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-405. Fix for handling of pg_temp...

2016-02-10 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/338#discussion_r52534496
  
--- Diff: src/backend/catalog/namespace.c ---
@@ -1495,18 +1495,34 @@ LookupInternalNamespaceId(const char *nspname)
 Oid
 LookupNamespaceId(const char *nspname, Oid dboid)
 {
-if(gp_upgrade_mode){
+
+   /* check for pg_temp alias */
+   if (NSPDBOID_CURRENT == dboid && strcmp(nspname, "pg_temp") == 0)
+   {
+   if (TempNamespaceValid(true))
+   return myTempNamespace;
+   /*
+* Since this is used only for looking up existing objects, 
there
+* is no point in trying to initialize the temp namespace here;
+* and doing so might create problems for some callers.
+* Just fall through and give the "does not exist" error.
+*/
+   }
+
+if(gp_upgrade_mode)
--- End diff --

It should not be part of this commit though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-405. Fix for handling of pg_temp...

2016-02-10 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/338#discussion_r52548075
  
--- Diff: src/test/regress/expected/temp.out ---
@@ -170,3 +170,19 @@ select * from whereami;
 -- you can invoke a temp function explicitly, though
 -- select pg_temp.whoami();
 drop table public.whereami;
+create temp table temptest (row integer, count integer);
--- End diff --

I think it would be better to have the exact scenario that failed:
```
CREATE TABLE pg_temp.temptest ...
SELECT avg(pg_temp.temptest.count) ... 
```
Do you think we should cover all three cases (db.table.col, table.col, col)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-05 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r52079820
  
--- Diff: 
pxf/pxf-json/src/main/java/org/apache/hawq/pxf/plugins/json/JsonResolver.java 
---
@@ -0,0 +1,252 @@
+package org.apache.hawq.pxf.plugins.json;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hawq.pxf.api.OneField;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadResolver;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.codehaus.jackson.JsonFactory;
+import org.codehaus.jackson.JsonNode;
+import org.codehaus.jackson.map.ObjectMapper;
+
+/**
+ * This JSON resolver for PXF will decode a given object from the {@link 
JsonAccessor} into a row for HAWQ. It will
+ * decode this data into a JsonNode and walk the tree for each column. It 
supports normal value mapping via projections
+ * and JSON array indexing.
+ */
+public class JsonResolver extends Plugin implements ReadResolver {
+
+   private static final Log LOG = LogFactory.getLog(JsonResolver.class);
+
+   private ArrayList oneFieldList;
+   private ColumnDescriptorCache[] columnDescriptorCache;
+   private ObjectMapper mapper;
+
+   /**
+* Row with empty fields. Returned in case of broken or malformed json 
records.
+*/
+   private final List emptyRow;
+
+   public JsonResolver(InputData inputData) throws Exception {
+   super(inputData);
+   oneFieldList = new ArrayList();
+   mapper = new ObjectMapper(new JsonFactory());
+
+   // Precompute the column metadata. The metadata is used for 
mapping column names to json nodes.
+   columnDescriptorCache = new 
ColumnDescriptorCache[inputData.getColumns()];
+   for (int i = 0; i < inputData.getColumns(); ++i) {
+   ColumnDescriptor cd = inputData.getColumn(i);
+   columnDescriptorCache[i] = new 
ColumnDescriptorCache(cd);
+   }
+
+   emptyRow = createEmptyRow();
+   }
+
+   @Override
+   public List getFields(OneRow row) throws Exception {
+   oneFieldList.clear();
+
+   String jsonRecordAsText = row.getData().toString();
+
+   JsonNode root = decodeLineToJsonNode(jsonRecordAsText);
+
+   if (root == null) {
+   LOG.warn("Return empty-fields row due to invalid JSON: 
" + jsonRecordAsText);
+   return emptyRow;
+   }
+
+   // Iterate through the column definition and fetch our JSON data
+   for (ColumnDescriptorCache columnMetadata : 
columnDescriptorCache) {
+
+   JsonNode node = getChildJsonNode(root, 
columnMetadata.getNormalizedProjections());
+
+   // If this node is null or missing, add a null value 
here
+   if (node == null || node.isMissingNode()) {
+   addNullField(columnMetadata.getColumnType());
+   } else if (columnMetadata.isArray()) {
+   // If this column is an array index, ex. 
"tweet.hashtags[0]"
+   if (node.isArray()) {
+   // If the JSON node is an array, then 
add it to our list
+   
addFieldFromJsonArray(columnMetadata.getColumnType(), node, 
columnMetadata.getArrayNodeIndex());
+   } else {
+

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-04 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51906457
  
--- Diff: 
pxf/pxf-json/src/test/java/org/apache/hawq/pxf/plugins/json/PxfUnit.java ---
@@ -0,0 +1,672 @@
+package org.apache.hawq.pxf.plugins.json;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.BufferedReader;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStream;
+import java.lang.reflect.Constructor;
+import java.security.InvalidParameterException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hawq.pxf.api.Fragment;
+import org.apache.hawq.pxf.api.Fragmenter;
+import org.apache.hawq.pxf.api.OneField;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.ReadResolver;
+import org.apache.hawq.pxf.api.WriteAccessor;
+import org.apache.hawq.pxf.api.WriteResolver;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.service.FragmentsResponse;
--- End diff --

Let's leave it as a follow up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51818643
  
--- Diff: 
pxf/pxf-json/src/main/java/org/apache/hawq/pxf/plugins/json/ColumnDescriptorCache.java
 ---
@@ -0,0 +1,165 @@
+package org.apache.hawq.pxf.plugins.json;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+
+/**
+ * Helper class used to retrieve all column details relevant for the json 
processing.
+ */
+public class ColumnDescriptorCache {
+
+   private static Pattern ARRAY_PROJECTION_PATTERN = 
Pattern.compile("(.*)\\[([0-9]+)\\]");
+   private static int ARRAY_NAME_GROUPID = 1;
+   private static int ARRAY_INDEX_GROUPID = 2;
+
+   private final DataType columnType;
+   private final String[] projections;
+   private final String nodeName;
+   private final int arrayIndex;
+   private final boolean isArray;
+
+   public ColumnDescriptorCache(ColumnDescriptor columnDescriptor) {
+
+   this.columnType = 
DataType.get(columnDescriptor.columnTypeCode());
+
+   this.projections = columnDescriptor.columnName().split("\\.");
+
+   this.isArray = isArrayIndex(projections);
--- End diff --

slight improvement - if we encapsulate the whole array logic in a function, 
we can call `getMatchGroup` once, and use the results to populate this.isArray, 
this.nodeName and this.arrayIndex. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51818210
  
--- Diff: 
pxf/pxf-json/src/main/java/org/apache/hawq/pxf/plugins/json/ColumnDescriptorCache.java
 ---
@@ -0,0 +1,165 @@
+package org.apache.hawq.pxf.plugins.json;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+
+/**
+ * Helper class used to retrieve all column details relevant for the json 
processing.
+ */
+public class ColumnDescriptorCache {
+
+   private static Pattern ARRAY_PROJECTION_PATTERN = 
Pattern.compile("(.*)\\[([0-9]+)\\]");
--- End diff --

just to be safe, I think the first group should be `.+` and not `.*`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51833835
  
--- Diff: 
pxf/pxf-json/src/main/java/org/apache/hawq/pxf/plugins/json/parser/PartitionedJsonParser.java
 ---
@@ -0,0 +1,198 @@
+package org.apache.hawq.pxf.plugins.json.parser;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.EnumSet;
+import java.util.List;
+
+import org.apache.hawq.pxf.plugins.json.parser.JsonLexer.JsonLexerState;
+
+/**
+ * A simple parser that can support reading JSON objects from a random 
point in JSON text. It reads from the supplied
+ * stream (which is assumed to be positioned at any arbitrary position 
inside some JSON text) until it find the first
--- End diff --

typo finds


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51834024
  
--- Diff: 
pxf/pxf-json/src/main/java/org/apache/hawq/pxf/plugins/json/parser/PartitionedJsonParser.java
 ---
@@ -0,0 +1,198 @@
+package org.apache.hawq.pxf.plugins.json.parser;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.EnumSet;
+import java.util.List;
+
+import org.apache.hawq.pxf.plugins.json.parser.JsonLexer.JsonLexerState;
+
+/**
+ * A simple parser that can support reading JSON objects from a random 
point in JSON text. It reads from the supplied
+ * stream (which is assumed to be positioned at any arbitrary position 
inside some JSON text) until it find the first
+ * JSON begin-object "{". From this point on it will keep reading JSON 
objects until it finds one containing a member
+ * string that the user supplies.
+ * 
+ * It is not recommended to use this with JSON text where individual JSON 
objects that can be large (MB's or larger).
+ */
+public class PartitionedJsonParser {
+
+   private static final int EOF = -1;
+   private final InputStreamReader inputStreamReader;
+   private final JsonLexer lexer;
+   private long bytesRead = 0;
+   private boolean endOfStream;
+
+   public PartitionedJsonParser(InputStream is) {
+   this.lexer = new JsonLexer();
+
+   // You need to wrap the InputStream with an InputStreamReader, 
so that it can encode the incoming byte stream as
+   // UTF-8 characters
+   this.inputStreamReader = new InputStreamReader(is, 
StandardCharsets.UTF_8);
+   }
+
+   private boolean scanToFirstBeginObject() throws IOException {
+   // seek until we hit the first begin-object
+   char prev = ' ';
+   int i;
+   while ((i = inputStreamReader.read()) != EOF) {
+   char c = (char) i;
+   bytesRead++;
+   if (c == '{' && prev != '\\') {
--- End diff --

`{` and `\\` should probably be consts


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51835615
  
--- Diff: 
pxf/pxf-json/src/test/java/org/apache/hawq/pxf/plugins/json/JsonExtensionTest.java
 ---
@@ -0,0 +1,288 @@
+package org.apache.hawq.pxf.plugins.json;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hawq.pxf.api.Fragmenter;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.ReadResolver;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter;
+import org.apache.hawq.pxf.plugins.json.JsonAccessor;
+import org.apache.hawq.pxf.plugins.json.JsonResolver;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+public class JsonExtensionTest extends PxfUnit {
+
+   private List<Pair<String, DataType>> columnDefs = null;
+   private List<Pair<String, String>> extraParams = new 
ArrayList<Pair<String, String>>();
+
+   @Before
+   public void before() {
+
+   columnDefs = new ArrayList<Pair<String, DataType>>();
+
+   columnDefs.add(new Pair<String, DataType>("created_at", 
DataType.TEXT));
+   columnDefs.add(new Pair<String, DataType>("id", 
DataType.BIGINT));
+   columnDefs.add(new Pair<String, DataType>("text", 
DataType.TEXT));
+   columnDefs.add(new Pair<String, DataType>("user.screen_name", 
DataType.TEXT));
+   columnDefs.add(new Pair<String, 
DataType>("entities.hashtags[0]", DataType.TEXT));
+   columnDefs.add(new Pair<String, 
DataType>("coordinates.coordinates[0]", DataType.FLOAT8));
+   columnDefs.add(new Pair<String, 
DataType>("coordinates.coordinates[1]", DataType.FLOAT8));
+   }
+
+   @After
+   public void cleanup() throws Exception {
+   columnDefs.clear();
+   extraParams.clear();
+   }
+
+   @Test
+   public void testCompressedMultilineJsonFile() throws Exception {
+
+   extraParams.add(new Pair<String, String>("IDENTIFIER", 
"created_at"));
+
+   List output = new ArrayList();
+
+   output.add("Fri Jun 07 22:45:02 + 
2013,343136547115253761,text1,SpreadButter,tweetCongress,,");
+   output.add("Fri Jun 07 22:45:02 + 
2013,343136547123646465,text2,patronusdeadly,,,");
+   output.add("Fri Jun 07 22:45:02 + 
2013,343136547136233472,text3,NoSecrets_Vagas,,,");
+
+   super.assertOutput(new Path(System.getProperty("user.dir") + 
File.separator
+   + "src/test/resources/tweets.tar.gz"), output);
+   }
+
+   @Test
+   public void testMaxRecordLength() throws Exception {
+
+   // variable-size-objects.json contains 3 json objects but only 
2 of them fit in the 27 byte length limitation
+
+   extraParams.add(new Pair<String, String>("IDENTIFIER", 
"key666"));
+   extraParams.add(new Pair<String, String>("MAXLENGTH", "27"));
+
+   columnDefs.clear();
+   columnDefs.add(new Pair<String, DataType>("key666", 
DataType.TEXT));
+
+   List output = new ArrayList();
+
+   output.add("small object1");
+   // skip the large object2 

+   output.add("small object3");
+
+   super.assertOutput(new Path(System.getProperty("user.dir"

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51836013
  
--- Diff: 
pxf/pxf-json/src/test/java/org/apache/hawq/pxf/plugins/json/PxfUnit.java ---
@@ -0,0 +1,672 @@
+package org.apache.hawq.pxf.plugins.json;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.BufferedReader;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStream;
+import java.lang.reflect.Constructor;
+import java.security.InvalidParameterException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hawq.pxf.api.Fragment;
+import org.apache.hawq.pxf.api.Fragmenter;
+import org.apache.hawq.pxf.api.OneField;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.ReadResolver;
+import org.apache.hawq.pxf.api.WriteAccessor;
+import org.apache.hawq.pxf.api.WriteResolver;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.service.FragmentsResponse;
--- End diff --

This test file breaks the convention of separating between pxf-service and 
the plugins. Only pxf-api is supposed to be used by the plugins.
I think it's a really good test mechanism, and we should use it, but it 
definitely should be moved to another location - perhaps its own package, 
perhaps under pxf-service.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on the pull request:

https://github.com/apache/incubator-hawq/pull/302#issuecomment-179682943
  
@tzolov regarding test coverage, what you did is very impressive.
Normally we have an automation suite where we test end to end query using a 
specific plugin, but I really like the idea of PxfUnit. 
I think after your commit is done we'll also have to add automation tests, 
but they will probably not be as exhaustive as what you already did.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51834454
  
--- Diff: 
pxf/pxf-json/src/main/java/org/apache/hawq/pxf/plugins/json/parser/PartitionedJsonParser.java
 ---
@@ -0,0 +1,198 @@
+package org.apache.hawq.pxf.plugins.json.parser;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.EnumSet;
+import java.util.List;
+
+import org.apache.hawq.pxf.plugins.json.parser.JsonLexer.JsonLexerState;
+
+/**
+ * A simple parser that can support reading JSON objects from a random 
point in JSON text. It reads from the supplied
+ * stream (which is assumed to be positioned at any arbitrary position 
inside some JSON text) until it find the first
+ * JSON begin-object "{". From this point on it will keep reading JSON 
objects until it finds one containing a member
+ * string that the user supplies.
+ * 
+ * It is not recommended to use this with JSON text where individual JSON 
objects that can be large (MB's or larger).
+ */
+public class PartitionedJsonParser {
+
+   private static final int EOF = -1;
+   private final InputStreamReader inputStreamReader;
+   private final JsonLexer lexer;
+   private long bytesRead = 0;
+   private boolean endOfStream;
--- End diff --

for clarify, perhaps explicitly assign `false` to it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51835076
  
--- Diff: 
pxf/pxf-json/src/test/java/org/apache/hawq/pxf/plugins/json/JsonExtensionTest.java
 ---
@@ -0,0 +1,288 @@
+package org.apache.hawq.pxf.plugins.json;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hawq.pxf.api.Fragmenter;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.ReadResolver;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter;
+import org.apache.hawq.pxf.plugins.json.JsonAccessor;
+import org.apache.hawq.pxf.plugins.json.JsonResolver;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+public class JsonExtensionTest extends PxfUnit {
+
+   private List<Pair<String, DataType>> columnDefs = null;
+   private List<Pair<String, String>> extraParams = new 
ArrayList<Pair<String, String>>();
+
+   @Before
+   public void before() {
+
+   columnDefs = new ArrayList<Pair<String, DataType>>();
+
+   columnDefs.add(new Pair<String, DataType>("created_at", 
DataType.TEXT));
+   columnDefs.add(new Pair<String, DataType>("id", 
DataType.BIGINT));
+   columnDefs.add(new Pair<String, DataType>("text", 
DataType.TEXT));
+   columnDefs.add(new Pair<String, DataType>("user.screen_name", 
DataType.TEXT));
+   columnDefs.add(new Pair<String, 
DataType>("entities.hashtags[0]", DataType.TEXT));
+   columnDefs.add(new Pair<String, 
DataType>("coordinates.coordinates[0]", DataType.FLOAT8));
+   columnDefs.add(new Pair<String, 
DataType>("coordinates.coordinates[1]", DataType.FLOAT8));
+   }
+
+   @After
+   public void cleanup() throws Exception {
+   columnDefs.clear();
+   extraParams.clear();
+   }
+
+   @Test
+   public void testCompressedMultilineJsonFile() throws Exception {
+
+   extraParams.add(new Pair<String, String>("IDENTIFIER", 
"created_at"));
+
+   List output = new ArrayList();
+
+   output.add("Fri Jun 07 22:45:02 + 
2013,343136547115253761,text1,SpreadButter,tweetCongress,,");
+   output.add("Fri Jun 07 22:45:02 + 
2013,343136547123646465,text2,patronusdeadly,,,");
+   output.add("Fri Jun 07 22:45:02 + 
2013,343136547136233472,text3,NoSecrets_Vagas,,,");
+
+   super.assertOutput(new Path(System.getProperty("user.dir") + 
File.separator
+   + "src/test/resources/tweets.tar.gz"), output);
+   }
+
+   @Test
+   public void testMaxRecordLength() throws Exception {
+
+   // variable-size-objects.json contains 3 json objects but only 
2 of them fit in the 27 byte length limitation
+
+   extraParams.add(new Pair<String, String>("IDENTIFIER", 
"key666"));
+   extraParams.add(new Pair<String, String>("MAXLENGTH", "27"));
+
+   columnDefs.clear();
+   columnDefs.add(new Pair<String, DataType>("key666", 
DataType.TEXT));
+
+   List output = new ArrayList();
+
+   output.add("small object1");
+   // skip the large object2 

+   output.add("small object3");
+
+   super.assertOutput(new Path(System.getProperty("user.dir"

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51835264
  
--- Diff: 
pxf/pxf-json/src/test/java/org/apache/hawq/pxf/plugins/json/JsonExtensionTest.java
 ---
@@ -0,0 +1,288 @@
+package org.apache.hawq.pxf.plugins.json;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hawq.pxf.api.Fragmenter;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.ReadResolver;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter;
+import org.apache.hawq.pxf.plugins.json.JsonAccessor;
+import org.apache.hawq.pxf.plugins.json.JsonResolver;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+public class JsonExtensionTest extends PxfUnit {
+
+   private List<Pair<String, DataType>> columnDefs = null;
+   private List<Pair<String, String>> extraParams = new 
ArrayList<Pair<String, String>>();
+
+   @Before
+   public void before() {
+
+   columnDefs = new ArrayList<Pair<String, DataType>>();
+
+   columnDefs.add(new Pair<String, DataType>("created_at", 
DataType.TEXT));
+   columnDefs.add(new Pair<String, DataType>("id", 
DataType.BIGINT));
+   columnDefs.add(new Pair<String, DataType>("text", 
DataType.TEXT));
+   columnDefs.add(new Pair<String, DataType>("user.screen_name", 
DataType.TEXT));
+   columnDefs.add(new Pair<String, 
DataType>("entities.hashtags[0]", DataType.TEXT));
+   columnDefs.add(new Pair<String, 
DataType>("coordinates.coordinates[0]", DataType.FLOAT8));
+   columnDefs.add(new Pair<String, 
DataType>("coordinates.coordinates[1]", DataType.FLOAT8));
+   }
+
+   @After
+   public void cleanup() throws Exception {
+   columnDefs.clear();
+   extraParams.clear();
+   }
+
+   @Test
+   public void testCompressedMultilineJsonFile() throws Exception {
+
+   extraParams.add(new Pair<String, String>("IDENTIFIER", 
"created_at"));
+
+   List output = new ArrayList();
+
+   output.add("Fri Jun 07 22:45:02 + 
2013,343136547115253761,text1,SpreadButter,tweetCongress,,");
+   output.add("Fri Jun 07 22:45:02 + 
2013,343136547123646465,text2,patronusdeadly,,,");
+   output.add("Fri Jun 07 22:45:02 + 
2013,343136547136233472,text3,NoSecrets_Vagas,,,");
+
+   super.assertOutput(new Path(System.getProperty("user.dir") + 
File.separator
+   + "src/test/resources/tweets.tar.gz"), output);
+   }
+
+   @Test
+   public void testMaxRecordLength() throws Exception {
+
+   // variable-size-objects.json contains 3 json objects but only 
2 of them fit in the 27 byte length limitation
+
+   extraParams.add(new Pair<String, String>("IDENTIFIER", 
"key666"));
+   extraParams.add(new Pair<String, String>("MAXLENGTH", "27"));
+
+   columnDefs.clear();
+   columnDefs.add(new Pair<String, DataType>("key666", 
DataType.TEXT));
+
+   List output = new ArrayList();
+
+   output.add("small object1");
+   // skip the large object2 

+   output.add("small object3");
+
+   super.assertOutput(new Path(System.getProperty("user.dir") + 
File.separator

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51835237
  
--- Diff: 
pxf/pxf-json/src/test/java/org/apache/hawq/pxf/plugins/json/JsonExtensionTest.java
 ---
@@ -0,0 +1,288 @@
+package org.apache.hawq.pxf.plugins.json;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hawq.pxf.api.Fragmenter;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.ReadResolver;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter;
+import org.apache.hawq.pxf.plugins.json.JsonAccessor;
+import org.apache.hawq.pxf.plugins.json.JsonResolver;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+public class JsonExtensionTest extends PxfUnit {
+
+   private List<Pair<String, DataType>> columnDefs = null;
+   private List<Pair<String, String>> extraParams = new 
ArrayList<Pair<String, String>>();
+
+   @Before
+   public void before() {
+
+   columnDefs = new ArrayList<Pair<String, DataType>>();
+
+   columnDefs.add(new Pair<String, DataType>("created_at", 
DataType.TEXT));
+   columnDefs.add(new Pair<String, DataType>("id", 
DataType.BIGINT));
+   columnDefs.add(new Pair<String, DataType>("text", 
DataType.TEXT));
+   columnDefs.add(new Pair<String, DataType>("user.screen_name", 
DataType.TEXT));
+   columnDefs.add(new Pair<String, 
DataType>("entities.hashtags[0]", DataType.TEXT));
+   columnDefs.add(new Pair<String, 
DataType>("coordinates.coordinates[0]", DataType.FLOAT8));
+   columnDefs.add(new Pair<String, 
DataType>("coordinates.coordinates[1]", DataType.FLOAT8));
+   }
+
+   @After
+   public void cleanup() throws Exception {
+   columnDefs.clear();
+   extraParams.clear();
+   }
+
+   @Test
+   public void testCompressedMultilineJsonFile() throws Exception {
+
+   extraParams.add(new Pair<String, String>("IDENTIFIER", 
"created_at"));
+
+   List output = new ArrayList();
+
+   output.add("Fri Jun 07 22:45:02 + 
2013,343136547115253761,text1,SpreadButter,tweetCongress,,");
+   output.add("Fri Jun 07 22:45:02 + 
2013,343136547123646465,text2,patronusdeadly,,,");
+   output.add("Fri Jun 07 22:45:02 + 
2013,343136547136233472,text3,NoSecrets_Vagas,,,");
+
+   super.assertOutput(new Path(System.getProperty("user.dir") + 
File.separator
+   + "src/test/resources/tweets.tar.gz"), output);
+   }
+
+   @Test
+   public void testMaxRecordLength() throws Exception {
+
+   // variable-size-objects.json contains 3 json objects but only 
2 of them fit in the 27 byte length limitation
+
+   extraParams.add(new Pair<String, String>("IDENTIFIER", 
"key666"));
+   extraParams.add(new Pair<String, String>("MAXLENGTH", "27"));
+
+   columnDefs.clear();
--- End diff --

I think the clear is not necessary since it's also done in the @After 
function


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51835736
  
--- Diff: 
pxf/pxf-json/src/test/java/org/apache/hawq/pxf/plugins/json/PxfUnit.java ---
@@ -0,0 +1,672 @@
+package org.apache.hawq.pxf.plugins.json;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.BufferedReader;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStream;
+import java.lang.reflect.Constructor;
+import java.security.InvalidParameterException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hawq.pxf.api.Fragment;
+import org.apache.hawq.pxf.api.Fragmenter;
+import org.apache.hawq.pxf.api.OneField;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.ReadResolver;
+import org.apache.hawq.pxf.api.WriteAccessor;
+import org.apache.hawq.pxf.api.WriteResolver;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.service.FragmentsResponse;
+import org.apache.hawq.pxf.service.FragmentsResponseFormatter;
+import org.apache.hawq.pxf.service.utilities.ProtocolData;
+import org.codehaus.jackson.JsonFactory;
+import org.codehaus.jackson.JsonNode;
+import org.codehaus.jackson.JsonParseException;
+import org.codehaus.jackson.map.JsonMappingException;
+import org.codehaus.jackson.map.ObjectMapper;
+import org.junit.Assert;
+
+/**
+ * This abstract class contains a number of helpful utilities in 
developing a PXF extension for HAWQ. Extend this class
+ * and use the various assert methods to check given input 
against known output.
+ */
+public abstract class PxfUnit {
+
+   private static final Log LOG = LogFactory.getLog(PxfUnit.class);
+
+   private static JsonFactory factory = new JsonFactory();
+   private static ObjectMapper mapper = new ObjectMapper(factory);
+
+   protected static List inputs = null;
+
+   /**
+* Uses the given input directory to run through the PXF unit testing 
framework. Uses the lines in the file for
+* output testing.
+* 
+* @param input
+*Input records
+* @param expectedOutput
+*File containing output to check
+* @throws Exception
+*/
+   public void assertOutput(Path input, Path expectedOutput) throws 
Exception {
+
+   BufferedReader rdr = new BufferedReader(new 
InputStreamReader(FileSystem.get(new Configuration()).open(
+   expectedOutput)));
+
+   List outputLines = new ArrayList();
+
+   String line;
+   while ((line = rdr.readLine()) != null) {
+   outputLines.add(line);
+   }
+
+   assertOutput(input, outputLines);
+
+   rdr.close();
+   }
+
+   /**
+* Uses the given input directory to run through the PXF unit testing 
framework. Uses the lines in the given
+* parameter for output testing.
+* 
+* @param input
+*Input records
+* @param expectedOutput
+*File containing output to check
+* @throws Exception
+*/
+   public void assertOutput(Path input, List expectedOutput) 
throws Exception {
+
+   setup(input);
+   List actualOutput = new ArrayList();
+   for (InputData data : inputs) {
+   R

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51836238
  
--- Diff: 
pxf/pxf-json/src/test/java/org/apache/hawq/pxf/plugins/json/parser/PartitionedJsonParserOffsetTest.java
 ---
@@ -0,0 +1,54 @@
+package org.apache.hawq.pxf.plugins.json.parser;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+
+import org.junit.Test;
+
+public class PartitionedJsonParserOffsetTest {
+   /*
+* [{"color": "red","v": "vv"},{"color": "red","v": "vv"}]
+*/
+   public static String json2 = "[{\"color\": \"red\",\"v\": 
\"vv\"},{\"color\": \"red\",\"v\": \"vv\"}]";
+
+   @Test
+   public void testOffset() throws IOException {
+   InputStream jsonInputStream = createFromString(json2);
--- End diff --

close stream


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51827170
  
--- Diff: 
pxf/pxf-json/src/main/java/org/apache/hawq/pxf/plugins/json/JsonRecordReader.java
 ---
@@ -0,0 +1,176 @@
+package org.apache.hawq.pxf.plugins.json;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import java.io.InputStream;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.io.compress.CompressionCodec;
+import org.apache.hadoop.io.compress.CompressionCodecFactory;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.RecordReader;
+import org.apache.hawq.pxf.plugins.json.parser.PartitionedJsonParser;
+
+/**
+ * Multi-line json object reader. JsonRecordReader uses a member name (set 
by the IDENTIFIER PXF parameter) to
+ * determine the encapsulating object to extract and read.
+ * 
+ * JsonRecordReader supports compressed input files as well.
+ * 
+ * As a safe guard set the optional MAXLENGTH parameter to limit 
the max size of a record.
+ */
+public class JsonRecordReader implements RecordReader<LongWritable, Text> {
+
+   private static final Log LOG = 
LogFactory.getLog(JsonRecordReader.class);
+
+   public static final String RECORD_MEMBER_IDENTIFIER = 
"json.input.format.record.identifier";
+   public static final String RECORD_MAX_LENGTH = 
"multilinejsonrecordreader.maxlength";
+
+   private CompressionCodecFactory compressionCodecs = null;
+   private long start;
+   private long pos;
+   private long end;
+   private int maxObjectLength;
+   private InputStream is;
+   private PartitionedJsonParser parser;
+   private final String jsonMemberName;
+
+   /**
+* Create new multi-line json object reader.
+* 
+* @param conf
+*Hadoop context
+* @param split
+*HDFS split to start the reading from
+* @throws IOException
+*/
+   public JsonRecordReader(JobConf conf, FileSplit split) throws 
IOException {
+
+   this.jsonMemberName = conf.get(RECORD_MEMBER_IDENTIFIER);
+   this.maxObjectLength = conf.getInt(RECORD_MAX_LENGTH, 
Integer.MAX_VALUE);
+
+   start = split.getStart();
+   end = start + split.getLength();
+   final Path file = split.getPath();
+   compressionCodecs = new CompressionCodecFactory(conf);
+   final CompressionCodec codec = compressionCodecs.getCodec(file);
--- End diff --

Regarding compression, we support it for hdfs files, and the implementation 
here looks very similar to the one in LineRecordReader. The issue worth 
mentioning here is that BZip2 Codec is not thread safe. For map-reduce jobs it 
doesn't matter, because each job is running in its own JVM, but we are handling 
multiple concurrent requests through tomcat, so that introduced a problem.
For Hdfs files, we "solved" it by forcing the processing of bzip2 files to 
be done in a single thread. See HdfsUtilities.isThreadSafe() which is called by 
HdfsSplittableDataAccessor. I see that the JsonAccessor inherits from this 
class, so I think it should be fine, but it's worth verifying.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51828041
  
--- Diff: 
pxf/pxf-json/src/main/java/org/apache/hawq/pxf/plugins/json/JsonResolver.java 
---
@@ -0,0 +1,255 @@
+package org.apache.hawq.pxf.plugins.json;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hawq.pxf.api.OneField;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadResolver;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.codehaus.jackson.JsonFactory;
+import org.codehaus.jackson.JsonNode;
+import org.codehaus.jackson.map.ObjectMapper;
+
+/**
+ * This JSON resolver for PXF will decode a given object from the {@link 
JsonAccessor} into a row for HAWQ. It will
+ * decode this data into a JsonNode and walk the tree for each column. It 
supports normal value mapping via projections
+ * and JSON array indexing.
+ */
+public class JsonResolver extends Plugin implements ReadResolver {
+
+   private static final Log LOG = LogFactory.getLog(JsonResolver.class);
+
+   private ArrayList oneFieldList;
+   private ColumnDescriptorCache[] columnDescriptorCache;
+   private ObjectMapper mapper;
+
+   public JsonResolver(InputData inputData) throws Exception {
+   super(inputData);
+   oneFieldList = new ArrayList();
+   mapper = new ObjectMapper(new JsonFactory());
+
+   // Pre-generate all column structure attributes concerning the 
JSON to column value resolution.
+   columnDescriptorCache = new 
ColumnDescriptorCache[inputData.getColumns()];
+   for (int i = 0; i < inputData.getColumns(); ++i) {
+   ColumnDescriptor cd = inputData.getColumn(i);
+   columnDescriptorCache[i] = new 
ColumnDescriptorCache(cd);
+   }
+   }
+
+   @Override
+   public List getFields(OneRow row) throws Exception {
+   oneFieldList.clear();
+
+   String jsonRecordAsText = row.getData().toString();
+
+   JsonNode root = decodeLineToJsonNode(jsonRecordAsText);
+
+   if (root == null) {
+   LOG.warn("Return null-fields row due to invalid JSON:" 
+ jsonRecordAsText);
--- End diff --

another possible improvement: create a final list of null fields for the 
case where the root is null, and immediately return that list here.
That will save us the iteration over the columns and checking for each one 
if the root is null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/302#discussion_r51828253
  
--- Diff: 
pxf/pxf-json/src/main/java/org/apache/hawq/pxf/plugins/json/JsonResolver.java 
---
@@ -0,0 +1,255 @@
+package org.apache.hawq.pxf.plugins.json;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hawq.pxf.api.OneField;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadResolver;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.codehaus.jackson.JsonFactory;
+import org.codehaus.jackson.JsonNode;
+import org.codehaus.jackson.map.ObjectMapper;
+
+/**
+ * This JSON resolver for PXF will decode a given object from the {@link 
JsonAccessor} into a row for HAWQ. It will
+ * decode this data into a JsonNode and walk the tree for each column. It 
supports normal value mapping via projections
+ * and JSON array indexing.
+ */
+public class JsonResolver extends Plugin implements ReadResolver {
+
+   private static final Log LOG = LogFactory.getLog(JsonResolver.class);
+
+   private ArrayList oneFieldList;
+   private ColumnDescriptorCache[] columnDescriptorCache;
+   private ObjectMapper mapper;
+
+   public JsonResolver(InputData inputData) throws Exception {
+   super(inputData);
+   oneFieldList = new ArrayList();
+   mapper = new ObjectMapper(new JsonFactory());
+
+   // Pre-generate all column structure attributes concerning the 
JSON to column value resolution.
+   columnDescriptorCache = new 
ColumnDescriptorCache[inputData.getColumns()];
+   for (int i = 0; i < inputData.getColumns(); ++i) {
+   ColumnDescriptor cd = inputData.getColumn(i);
+   columnDescriptorCache[i] = new 
ColumnDescriptorCache(cd);
+   }
+   }
+
+   @Override
+   public List getFields(OneRow row) throws Exception {
+   oneFieldList.clear();
+
+   String jsonRecordAsText = row.getData().toString();
+
+   JsonNode root = decodeLineToJsonNode(jsonRecordAsText);
+
+   if (root == null) {
+   LOG.warn("Return null-fields row due to invalid JSON:" 
+ jsonRecordAsText);
+   }
+
+   // Iterate through the column definition and fetch our JSON data
+   for (ColumnDescriptorCache column : columnDescriptorCache) {
+
+   // Get the current column description
+
+   if (root == null) {
+   // Return empty (e.g. null) filed in case of 
malformed json.
+   addNullField(column.getColumnType());
+   } else {
+
+   // Move down the JSON path to the final name
+   JsonNode node = getPriorJsonNode(root, 
column.getProjections());
+
+   // If this column is an array index, ex. 
"tweet.hashtags[0]"
+   if (column.isArrayName()) {
+
+   // Move to the array node
+   node = 
node.get(column.getArrayNodeName());
+
+   // If this node is null or missing, add 
a null value here
+   if (node == null || 
node.isMissingNode()) {
+   
addNullField(column.getColumnType());
+

[GitHub] incubator-hawq pull request: HAWQ-343. Core when setting enable_se...

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/328#discussion_r51771858
  
--- Diff: src/backend/cdb/cdbfilesystemcredential.c ---
@@ -382,6 +382,17 @@ cleanup_filesystem_credentials(Portal portal)
}
else
{
+   /**
+* this case happened for the command "set 
enable_secure_filesystem to true;"
+* enable_secure_filesystem is true but credentials did 
not allocated"
--- End diff --

redundant " in the end of sentence.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-369. Hcatalog as reserved name.

2016-02-03 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/324#discussion_r51762431
  
--- Diff: src/backend/utils/init/postinit.c ---
@@ -368,6 +368,22 @@ InitPostgres(const char *in_dbname, Oid dboid, const 
char *username,
chardbname[NAMEDATALEN];
 
/*
+* User is not supposed to connect to hcatalog database,
+* because it's reserved for hcatalog feature integration
+*/
+   if (!bootstrap)
+   {
+   if (strcmp(in_dbname, HcatalogDbName) == 0)
+   {
+   ereport(ERROR,
+   (errcode(ERRCODE_UNDEFINED_DATABASE),
+   errmsg("\"%s\" database is only for 
system use",
--- End diff --

Fair enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-369. Hcatalog as reserved name.

2016-02-02 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/324#discussion_r51658725
  
--- Diff: src/backend/commands/dbcommands.c ---
@@ -1533,6 +1540,15 @@ RenameDatabase(const char *oldname, const char 
*newname)
cqContext   cqc;
cqContext  *pcqCtx;
 
+
+   /*
+* Make sure "hcatalog" is not used as new name, because it's reserved 
for
+* hcatalog feature integration
+*/
+   if (strcmp(newname, HcatalogDbName) == 0)
+   ereport(ERROR,
+   (errcode(ERRCODE_RESERVED_NAME),
+   errmsg("\"%s\" is a reserved name for hcatalog 
feature integration", HcatalogDbName)));
--- End diff --

same


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-369. Hcatalog as reserved name.

2016-02-02 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/324#discussion_r51658990
  
--- Diff: src/backend/utils/init/postinit.c ---
@@ -368,6 +368,22 @@ InitPostgres(const char *in_dbname, Oid dboid, const 
char *username,
chardbname[NAMEDATALEN];
 
/*
+* User is not supposed to connect to hcatalog database,
+* because it's reserved for hcatalog feature integration
+*/
+   if (!bootstrap)
+   {
+   if (strcmp(in_dbname, HcatalogDbName) == 0)
+   {
+   ereport(ERROR,
+   (errcode(ERRCODE_UNDEFINED_DATABASE),
+   errmsg("\"%s\" database is only for 
system use",
--- End diff --

perhaps add something like "cannot connect to database hcatalog. Database 
is only for system use"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-369. Hcatalog as reserved name.

2016-02-02 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/324#discussion_r51658794
  
--- Diff: src/backend/utils/init/postinit.c ---
@@ -368,6 +368,22 @@ InitPostgres(const char *in_dbname, Oid dboid, const 
char *username,
chardbname[NAMEDATALEN];
 
/*
+* User is not supposed to connect to hcatalog database,
+* because it's reserved for hcatalog feature integration
--- End diff --

same


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-369. Hcatalog as reserved name.

2016-02-02 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/324#discussion_r51658617
  
--- Diff: src/backend/commands/dbcommands.c ---
@@ -848,11 +848,18 @@ createdb(CreatedbStmt *stmt)
 * Check for db name conflict.  This is just to give a more friendly 
error
 * message than "unique index violation".  There's a race condition but
 * we're willing to accept the less friendly message in that case.
+* Also check that user is not trying to use "hcatalog" as a database 
name,
+* because it's already reserved for hcatalog feature integration.
 */
if (OidIsValid(get_database_oid(dbname)))
-   ereport(ERROR,
-   (errcode(ERRCODE_DUPLICATE_DATABASE),
-errmsg("database \"%s\" already exists", 
dbname)));
+   if (strcmp(dbname, HcatalogDbName) == 0)
+   ereport(ERROR,
+   (errcode(ERRCODE_RESERVED_NAME),
+errmsg("\"%s\" is a reserved 
name for hcatalog feature integration", HcatalogDbName)));
--- End diff --

same comment about the message :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-369. Hcatalog as reserved name.

2016-02-02 Thread hornn

Github user hornn commented on the pull request:

https://github.com/apache/incubator-hawq/pull/324#issuecomment-178903189
  
Amazing job! 
The only standing comment is about the error messages - we should decide if 
it's going to be "HCatalog integration feature" or "HCatalog feature 
integration"
+100


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-369. Hcatalog as reserved name.

2016-02-02 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/324#discussion_r51659023
  
--- Diff: src/backend/utils/init/postinit.c ---
@@ -368,6 +368,22 @@ InitPostgres(const char *in_dbname, Oid dboid, const 
char *username,
chardbname[NAMEDATALEN];
 
/*
+* User is not supposed to connect to hcatalog database,
+* because it's reserved for hcatalog feature integration
+*/
+   if (!bootstrap)
+   {
+   if (strcmp(in_dbname, HcatalogDbName) == 0)
+   {
+   ereport(ERROR,
+   (errcode(ERRCODE_UNDEFINED_DATABASE),
--- End diff --

instead of undefined database, maybe use the reserved name error code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-369. Hcatalog as reserved name.

2016-02-02 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/324#discussion_r51659823
  
--- Diff: src/test/regress/input/hcatalog_lookup.source ---
@@ -138,16 +138,49 @@ alter table test_schema.p exchange partition p1 with 
table hcatalog.test_schema.
 select pg_catalog.pg_database_size('hcatalog');
 select pg_catalog.pg_database_size(6120);
 
+--positive test: should be able to create table named "hcatalog"
+CREATE TABLE hcatalog(a int);
+
+--negative test: cannot create database named "hcatalog"
+CREATE DATABASE hcatalog;
+
+--allow renaming schemas and databases
+SET gp_called_by_pgdump = true;
+
+--negative test: cannot rename existing database to "hcatalog"
+ALTER DATABASE regression RENAME TO hcatalog;
+
+--positive test: can rename existing schema to "hcatalog"
+CREATE SCHEMA test_schema3;
+ALTER SCHEMA test_schema3 RENAME to hcatalog;
+
+--positive test: can rename schema "hcatalog"
+ALTER SCHEMA hcatalog RENAME to hcatalog1;
+
+--positive test: should be able to create schema named "hcatalog"
+CREATE SCHEMA hcatalog;
+
+--negative test: cannot create a database using "hcatalog" as a template
+CREATE DATABASE hcatalog2 TEMPLATE hcatalog;
+
+--restrict renaming schemas and databases
+SET gp_called_by_pgdump = false;
+
 -- cleanup
 DROP schema test_schema cascade;
 SELECT convert_to_internal_schema('test_schema');
 DROP schema test_schema cascade;
 DROP schema test_schema2 cascade;
+DROP schema hcatalog1 cascade;
--- End diff --

missing drop for hcatalog schema


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...

2016-02-02 Thread hornn

Github user hornn commented on the pull request:

https://github.com/apache/incubator-hawq/pull/302#issuecomment-178685025
  
I agree. We can afford to wait a little longer for a better plugin.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-369. Hcatalog as reserved name.

2016-02-02 Thread hornn

Github user hornn commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/324#discussion_r51658480
  
--- Diff: doc/src/sgml/ref/create_database.sgml ---
@@ -184,6 +184,10 @@ CREATE DATABASE name
connection slot remains for the database, it is possible that
both will fail.  Also, the limit is not enforced against superusers.
   
+
+  
+   User can not create database named "hcatalog", because it's reserved 
for Hcatalog feature integration.
--- End diff --

change to "HCatalog integration feature" to match comments above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

1 2 >

1 - 100 of 177 matches

Mail list logo