[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-06-22 Thread sansanichfb
Github user sansanichfb closed the pull request at:

https://github.com/apache/incubator-hawq/pull/1225


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-26 Thread sansanichfb
Github user sansanichfb commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118808410
  
--- Diff: 
pxf/pxf-hive/src/test/java/org/apache/hawq/pxf/plugins/hive/utilities/ProfileFactoryTest.java
 ---
@@ -34,31 +34,31 @@
 public void get() throws Exception {
 
 // For TextInputFormat when table has no complex types, HiveText 
profile should be used
-String profileName = ProfileFactory.get(new TextInputFormat(), 
false);
+String profileName = ProfileFactory.get(new TextInputFormat(), 
false, null);
--- End diff --

Sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-26 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118807815
  
--- Diff: 
pxf/pxf-hive/src/test/java/org/apache/hawq/pxf/plugins/hive/utilities/ProfileFactoryTest.java
 ---
@@ -34,31 +34,31 @@
 public void get() throws Exception {
 
 // For TextInputFormat when table has no complex types, HiveText 
profile should be used
-String profileName = ProfileFactory.get(new TextInputFormat(), 
false);
+String profileName = ProfileFactory.get(new TextInputFormat(), 
false, null);
--- End diff --

can revert back these changes now that the function with 2 arguments is 
back, right ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-26 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118807905
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/BridgeOutputBuilder.java
 ---
@@ -137,6 +137,18 @@ public Writable getErrorOutput(Exception ex) throws 
Exception {
 return outputList;
 }
 
+public LinkedList makeVectorizedOutput(List 
recordsBatch) throws BadRecordException {
+outputList.clear();
+for (List record : recordsBatch) {
--- End diff --

no null checks necessary for recordsBatch and record ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-26 Thread sansanichfb
Github user sansanichfb commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118788222
  
--- Diff: pxf/pxf-service/src/main/resources/pxf-profiles-default.xml ---
@@ -101,6 +101,17 @@ under the License.
 
org.apache.hawq.pxf.service.io.GPDBWritable
 
 
+
+HiveVectorizedORC
--- End diff --

Renamed all classes to use "vectorized"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-26 Thread sansanichfb
Github user sansanichfb commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118766404
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/ReadVectorizedBridge.java
 ---
@@ -0,0 +1,126 @@
+package org.apache.hawq.pxf.service;
--- End diff --

Makes sense, extended.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-25 Thread sansanichfb
Github user sansanichfb commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118606404
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java
 ---
@@ -0,0 +1,257 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
+import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
+import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
+import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
+import static org.apache.hawq.pxf.api.io.DataType.DATE;
+import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
+import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
+import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
+import static org.apache.hawq.pxf.api.io.DataType.REAL;
+import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
+import static org.apache.hawq.pxf.api.io.DataType.TEXT;
+import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
+import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Calendar;
+import java.util.List;
+import java.sql.Timestamp;
+import java.sql.Date;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hive.common.type.HiveDecimal;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hawq.pxf.api.OneField;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadVectorizedResolver;
+import org.apache.hawq.pxf.api.UnsupportedTypeException;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
+import org.apache.hadoop.hive.serde2.*;
+import org.apache.hadoop.hive.serde2.objectinspector.*;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
+import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
+import 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
+import org.apache.hadoop.hive.ql.exec.vector.*;
+
+@SuppressWarnings("deprecation")
+public class HiveORCBatchResolver extends Plugin implements 
ReadVectorizedResolver {
+
+private static final Log LOG = 
LogFactory.getLog(HiveORCBatchResolver.class);
+
+private List resolvedBatch;
+private StructObjectInspector soi;
+
+public HiveORCBatchResolver(InputData input) throws Exception {
+super(input);
+try {
+soi = (StructObjectInspector) 
HiveUtilities.getOrcReader(input).getObjectInspector();
+} catch (Exception e) {
+LOG.error("Unable to create an object inspector.");
+throw e;
+}
+}
+
+@Override
+public List getFieldsForBatch(OneRow batch) {
+
+Writable writableObject = null;
+Object fieldValue = null;
+VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) 
batch.getData();
+
+// Allocate empty result set
+resolvedBatch = new 
ArrayList(vectorizedBatch.size);
+for (int i = 0; i < vectorizedBatch.size; i++) {
+ArrayList row = new 
ArrayList(inputData.getColumns());
+resolvedBatch.add(row);
+for (int j = 0; j < inputData.getColumns(); j++) {
+

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-25 Thread sansanichfb
Github user sansanichfb commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118602026
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java
 ---
@@ -0,0 +1,257 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
+import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
+import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
+import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
+import static org.apache.hawq.pxf.api.io.DataType.DATE;
+import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
+import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
+import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
+import static org.apache.hawq.pxf.api.io.DataType.REAL;
+import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
+import static org.apache.hawq.pxf.api.io.DataType.TEXT;
+import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
+import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Calendar;
+import java.util.List;
+import java.sql.Timestamp;
+import java.sql.Date;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hive.common.type.HiveDecimal;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hawq.pxf.api.OneField;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadVectorizedResolver;
+import org.apache.hawq.pxf.api.UnsupportedTypeException;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
+import org.apache.hadoop.hive.serde2.*;
+import org.apache.hadoop.hive.serde2.objectinspector.*;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
+import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
+import 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
+import org.apache.hadoop.hive.ql.exec.vector.*;
+
+@SuppressWarnings("deprecation")
+public class HiveORCBatchResolver extends Plugin implements 
ReadVectorizedResolver {
+
+private static final Log LOG = 
LogFactory.getLog(HiveORCBatchResolver.class);
+
+private List resolvedBatch;
+private StructObjectInspector soi;
+
+public HiveORCBatchResolver(InputData input) throws Exception {
+super(input);
+try {
+soi = (StructObjectInspector) 
HiveUtilities.getOrcReader(input).getObjectInspector();
+} catch (Exception e) {
+LOG.error("Unable to create an object inspector.");
+throw e;
+}
+}
+
+@Override
+public List getFieldsForBatch(OneRow batch) {
+
+Writable writableObject = null;
+Object fieldValue = null;
+VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) 
batch.getData();
+
+// Allocate empty result set
+resolvedBatch = new 
ArrayList(vectorizedBatch.size);
+for (int i = 0; i < vectorizedBatch.size; i++) {
+ArrayList row = new 
ArrayList(inputData.getColumns());
--- End diff --

Thanks, updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on 

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-25 Thread sansanichfb
Github user sansanichfb commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118601254
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchAccessor.java
 ---
@@ -0,0 +1,115 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.*;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.api.utilities.Utilities;
+import org.apache.hawq.pxf.plugins.hdfs.utilities.HdfsUtilities;
+import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.orc.OrcFile;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.Reader.Options;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.io.LongWritable;
+
+/**
+ * Accessor class which reads data in batches.
+ * One batch is 1024 rows of all projected columns
+ *
+ */
+public class HiveORCBatchAccessor extends Plugin implements ReadAccessor {
--- End diff --

Sure, updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-25 Thread sansanichfb
Github user sansanichfb commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118601231
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchAccessor.java
 ---
@@ -0,0 +1,115 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.*;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.api.utilities.Utilities;
+import org.apache.hawq.pxf.plugins.hdfs.utilities.HdfsUtilities;
+import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.orc.OrcFile;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.Reader.Options;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.io.LongWritable;
+
+/**
+ * Accessor class which reads data in batches.
+ * One batch is 1024 rows of all projected columns
+ *
+ */
+public class HiveORCBatchAccessor extends Plugin implements ReadAccessor {
+
+protected RecordReader vrr;
+private int batchIndex;
+private VectorizedRowBatch batch;
+
+public HiveORCBatchAccessor(InputData input) throws Exception {
+super(input);
+}
+
+@Override
+public boolean openForRead() throws Exception {
+Reader reader = HiveUtilities.getOrcReader(inputData);
+Options options = new Options();
+addColumns(options);
+addFragments(options);
+vrr = reader.rowsOptions(options);
+return vrr.hasNext();
+}
+
+/**
+ * File might have multiple splits, so this method restricts
+ * reader to one split.
+ * @param options reader options to modify
+ */
+private void addFragments(Options options) {
+FileSplit fileSplit = HdfsUtilities.parseFileSplit(inputData);
+options.range(fileSplit.getStart(), fileSplit.getLength());
+}
+
+/**
+ * Reads next batch for current fragment.
+ * @return next batch in OneRow format, key is a batch number, data is 
a batch
+ */
+@Override
+public OneRow readNextObject() throws IOException {
+if (vrr.hasNext()) {
+batch = vrr.nextBatch(batch);
+batchIndex++;
+return new OneRow(new LongWritable(batchIndex), batch);
+} else {
+//All batches are exhausted
+return null;
+}
+}
+
+/**
+ * This method updated reader optionst to include projected columns 
only.
--- End diff --

Thanks, fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-25 Thread sansanichfb
Github user sansanichfb commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118599822
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchAccessor.java
 ---
@@ -0,0 +1,115 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.*;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.api.utilities.Utilities;
+import org.apache.hawq.pxf.plugins.hdfs.utilities.HdfsUtilities;
+import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.orc.OrcFile;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.Reader.Options;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.io.LongWritable;
+
+/**
+ * Accessor class which reads data in batches.
+ * One batch is 1024 rows of all projected columns
+ *
+ */
+public class HiveORCBatchAccessor extends Plugin implements ReadAccessor {
+
+protected RecordReader vrr;
+private int batchIndex;
+private VectorizedRowBatch batch;
+
+public HiveORCBatchAccessor(InputData input) throws Exception {
+super(input);
+}
+
+@Override
+public boolean openForRead() throws Exception {
+Reader reader = HiveUtilities.getOrcReader(inputData);
+Options options = new Options();
+addColumns(options);
+addFragments(options);
+vrr = reader.rowsOptions(options);
+return vrr.hasNext();
+}
+
+/**
+ * File might have multiple splits, so this method restricts
+ * reader to one split.
+ * @param options reader options to modify
+ */
+private void addFragments(Options options) {
+FileSplit fileSplit = HdfsUtilities.parseFileSplit(inputData);
+options.range(fileSplit.getStart(), fileSplit.getLength());
+}
+
+/**
+ * Reads next batch for current fragment.
+ * @return next batch in OneRow format, key is a batch number, data is 
a batch
+ */
+@Override
+public OneRow readNextObject() throws IOException {
+if (vrr.hasNext()) {
+batch = vrr.nextBatch(batch);
+batchIndex++;
+return new OneRow(new LongWritable(batchIndex), batch);
+} else {
+//All batches are exhausted
+return null;
+}
+}
+
+/**
+ * This method updated reader optionst to include projected columns 
only.
+ * @param options reader options to modify
+ * @throws Exception
+ */
+private void addColumns(Options options) throws Exception {
+boolean[] includeColumns = new boolean[inputData.getColumns() + 1];
--- End diff --

That's the way which ORC batch API expects this parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-24 Thread shivzone
Github user shivzone commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118358798
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/ReadVectorizedBridge.java
 ---
@@ -0,0 +1,126 @@
+package org.apache.hawq.pxf.service;
--- End diff --

ReadVectorizedBridge looks very similar to ReadBridge except for getNext() 
function. Please refactor both classes to avoid duplication


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-24 Thread shivzone
Github user shivzone commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118332954
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java
 ---
@@ -0,0 +1,257 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
+import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
+import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
+import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
+import static org.apache.hawq.pxf.api.io.DataType.DATE;
+import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
+import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
+import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
+import static org.apache.hawq.pxf.api.io.DataType.REAL;
+import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
+import static org.apache.hawq.pxf.api.io.DataType.TEXT;
+import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
+import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Calendar;
+import java.util.List;
+import java.sql.Timestamp;
+import java.sql.Date;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hive.common.type.HiveDecimal;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hawq.pxf.api.OneField;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadVectorizedResolver;
+import org.apache.hawq.pxf.api.UnsupportedTypeException;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
+import org.apache.hadoop.hive.serde2.*;
+import org.apache.hadoop.hive.serde2.objectinspector.*;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
+import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
+import 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
+import org.apache.hadoop.hive.ql.exec.vector.*;
+
+@SuppressWarnings("deprecation")
+public class HiveORCBatchResolver extends Plugin implements 
ReadVectorizedResolver {
+
+private static final Log LOG = 
LogFactory.getLog(HiveORCBatchResolver.class);
+
+private List resolvedBatch;
+private StructObjectInspector soi;
+
+public HiveORCBatchResolver(InputData input) throws Exception {
+super(input);
+try {
+soi = (StructObjectInspector) 
HiveUtilities.getOrcReader(input).getObjectInspector();
+} catch (Exception e) {
+LOG.error("Unable to create an object inspector.");
+throw e;
+}
+}
+
+@Override
+public List getFieldsForBatch(OneRow batch) {
+
+Writable writableObject = null;
+Object fieldValue = null;
+VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) 
batch.getData();
+
+// Allocate empty result set
+resolvedBatch = new 
ArrayList(vectorizedBatch.size);
+for (int i = 0; i < vectorizedBatch.size; i++) {
+ArrayList row = new 
ArrayList(inputData.getColumns());
+resolvedBatch.add(row);
+for (int j = 0; j < inputData.getColumns(); j++) {
+row.add(null);
  

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-24 Thread shivzone
Github user shivzone commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118339930
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java
 ---
@@ -0,0 +1,257 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
+import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
+import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
+import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
+import static org.apache.hawq.pxf.api.io.DataType.DATE;
+import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
+import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
+import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
+import static org.apache.hawq.pxf.api.io.DataType.REAL;
+import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
+import static org.apache.hawq.pxf.api.io.DataType.TEXT;
+import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
+import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Calendar;
+import java.util.List;
+import java.sql.Timestamp;
+import java.sql.Date;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hive.common.type.HiveDecimal;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hawq.pxf.api.OneField;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadVectorizedResolver;
+import org.apache.hawq.pxf.api.UnsupportedTypeException;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
+import org.apache.hadoop.hive.serde2.*;
+import org.apache.hadoop.hive.serde2.objectinspector.*;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
+import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
+import 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
+import org.apache.hadoop.hive.ql.exec.vector.*;
+
+@SuppressWarnings("deprecation")
+public class HiveORCBatchResolver extends Plugin implements 
ReadVectorizedResolver {
+
+private static final Log LOG = 
LogFactory.getLog(HiveORCBatchResolver.class);
+
+private List resolvedBatch;
+private StructObjectInspector soi;
+
+public HiveORCBatchResolver(InputData input) throws Exception {
+super(input);
+try {
+soi = (StructObjectInspector) 
HiveUtilities.getOrcReader(input).getObjectInspector();
+} catch (Exception e) {
+LOG.error("Unable to create an object inspector.");
+throw e;
+}
+}
+
+@Override
+public List getFieldsForBatch(OneRow batch) {
+
+Writable writableObject = null;
+Object fieldValue = null;
+VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) 
batch.getData();
+
+// Allocate empty result set
+resolvedBatch = new 
ArrayList(vectorizedBatch.size);
+for (int i = 0; i < vectorizedBatch.size; i++) {
+ArrayList row = new 
ArrayList(inputData.getColumns());
+resolvedBatch.add(row);
+for (int j = 0; j < inputData.getColumns(); j++) {
+row.add(null);
  

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-23 Thread sansanichfb
Github user sansanichfb commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118135496
  
--- Diff: 
pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/Utilities.java ---
@@ -234,4 +235,15 @@ public static boolean useStats(ReadAccessor accessor, 
InputData inputData) {
 return false;
 }
 }
+
+public static boolean useVectorization(InputData inputData) {
+boolean isVectorizedResolver = false;
+try {
+isVectorizedResolver = 
ArrayUtils.contains(Class.forName(inputData.getResolver()).getInterfaces(), 
ReadVectorizedResolver.class);
+} catch (ClassNotFoundException e) {
+LOG.error("Unable to load resolver class: " + e.getMessage());
+return false;
--- End diff --

Sure, thanks



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-23 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118132215
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/ReadBridge.java ---
@@ -149,9 +149,10 @@ public static ReadAccessor getFileAccessor(InputData 
inputData)
 inputData.getAccessor(), inputData);
 }
 
-public static ReadResolver getFieldsResolver(InputData inputData)
+@SuppressWarnings("unchecked")
--- End diff --

ouch, can you make Utilities.createAnyInstance templetized instead ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-23 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118131278
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java
 ---
@@ -0,0 +1,257 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
+import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
+import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
+import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
+import static org.apache.hawq.pxf.api.io.DataType.DATE;
+import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
+import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
+import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
+import static org.apache.hawq.pxf.api.io.DataType.REAL;
+import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
+import static org.apache.hawq.pxf.api.io.DataType.TEXT;
+import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
+import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Calendar;
+import java.util.List;
+import java.sql.Timestamp;
+import java.sql.Date;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hive.common.type.HiveDecimal;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hawq.pxf.api.OneField;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadVectorizedResolver;
+import org.apache.hawq.pxf.api.UnsupportedTypeException;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
+import org.apache.hadoop.hive.serde2.*;
+import org.apache.hadoop.hive.serde2.objectinspector.*;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
+import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
+import 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
+import org.apache.hadoop.hive.ql.exec.vector.*;
+
+@SuppressWarnings("deprecation")
+public class HiveORCBatchResolver extends Plugin implements 
ReadVectorizedResolver {
+
+private static final Log LOG = 
LogFactory.getLog(HiveORCBatchResolver.class);
+
+private List resolvedBatch;
+private StructObjectInspector soi;
+
+public HiveORCBatchResolver(InputData input) throws Exception {
+super(input);
+try {
+soi = (StructObjectInspector) 
HiveUtilities.getOrcReader(input).getObjectInspector();
+} catch (Exception e) {
+LOG.error("Unable to create an object inspector.");
+throw e;
+}
+}
+
+@Override
+public List getFieldsForBatch(OneRow batch) {
+
+Writable writableObject = null;
+Object fieldValue = null;
+VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) 
batch.getData();
+
+// Allocate empty result set
+resolvedBatch = new 
ArrayList(vectorizedBatch.size);
+for (int i = 0; i < vectorizedBatch.size; i++) {
+ArrayList row = new 
ArrayList(inputData.getColumns());
+resolvedBatch.add(row);
+for (int j = 0; j < inputData.getColumns(); j++) {
+row.add(null);
   

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-23 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118129347
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveDataFragmenter.java
 ---
@@ -289,7 +289,7 @@ private void fetchMetaData(HiveTablePartition 
tablePartition, boolean hasComplex
 if (inputData.getProfile() != null) {
 // evaluate optimal profile based on file format if profile 
was explicitly specified in url
 // if user passed accessor+fragmenter+resolver - use them
-profile = ProfileFactory.get(fformat, hasComplexTypes);
+profile = ProfileFactory.get(fformat, hasComplexTypes, 
inputData.getProfile());
--- End diff --

getProfile() is called twice (in if statement and here, its better to call 
once and then evaluate and reuse the variable)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-23 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118129472
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveMetadataFetcher.java
 ---
@@ -136,7 +136,7 @@ public HiveMetadataFetcher(InputData md) {
 private OutputFormat getOutputFormat(String inputFormat, boolean 
hasComplexTypes) throws Exception {
 OutputFormat outputFormat = null;
 InputFormat fformat = 
HiveDataFragmenter.makeInputFormat(inputFormat, jobConf);
-String profile = ProfileFactory.get(fformat, hasComplexTypes);
+String profile = ProfileFactory.get(fformat, hasComplexTypes, 
null);
--- End diff --

passing explicit null params should be avoided, if possible, override the 
function if more/less params are desired.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-23 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118129835
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchAccessor.java
 ---
@@ -0,0 +1,115 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.*;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.api.utilities.Utilities;
+import org.apache.hawq.pxf.plugins.hdfs.utilities.HdfsUtilities;
+import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.orc.OrcFile;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.Reader.Options;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.io.LongWritable;
+
+/**
+ * Accessor class which reads data in batches.
+ * One batch is 1024 rows of all projected columns
+ *
+ */
+public class HiveORCBatchAccessor extends Plugin implements ReadAccessor {
+
+protected RecordReader vrr;
--- End diff --

why protected, any child class is using it ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-23 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118129724
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchAccessor.java
 ---
@@ -0,0 +1,115 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.*;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.api.utilities.Utilities;
+import org.apache.hawq.pxf.plugins.hdfs.utilities.HdfsUtilities;
+import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.orc.OrcFile;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.Reader.Options;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.io.LongWritable;
+
+/**
+ * Accessor class which reads data in batches.
+ * One batch is 1024 rows of all projected columns
+ *
+ */
+public class HiveORCBatchAccessor extends Plugin implements ReadAccessor {
--- End diff --

would it be useful if it extended the HiveORCAccessor and overwrite 
functions ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-23 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118132761
  
--- Diff: pxf/pxf-service/src/main/resources/pxf-profiles-default.xml ---
@@ -101,6 +101,17 @@ under the License.
 
org.apache.hawq.pxf.service.io.GPDBWritable
 
 
+
+HiveVectorizedORC
--- End diff --

seems like "batch" and "vectorized" are used interchangeably, should we use 
just one term ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-23 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118131080
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java
 ---
@@ -0,0 +1,257 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
+import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
+import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
+import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
+import static org.apache.hawq.pxf.api.io.DataType.DATE;
+import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
+import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
+import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
+import static org.apache.hawq.pxf.api.io.DataType.REAL;
+import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
+import static org.apache.hawq.pxf.api.io.DataType.TEXT;
+import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
+import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Calendar;
+import java.util.List;
+import java.sql.Timestamp;
+import java.sql.Date;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hive.common.type.HiveDecimal;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hawq.pxf.api.OneField;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadVectorizedResolver;
+import org.apache.hawq.pxf.api.UnsupportedTypeException;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
+import org.apache.hadoop.hive.serde2.*;
+import org.apache.hadoop.hive.serde2.objectinspector.*;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
+import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
+import 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
+import org.apache.hadoop.hive.ql.exec.vector.*;
+
+@SuppressWarnings("deprecation")
+public class HiveORCBatchResolver extends Plugin implements 
ReadVectorizedResolver {
+
+private static final Log LOG = 
LogFactory.getLog(HiveORCBatchResolver.class);
+
+private List resolvedBatch;
+private StructObjectInspector soi;
+
+public HiveORCBatchResolver(InputData input) throws Exception {
+super(input);
+try {
+soi = (StructObjectInspector) 
HiveUtilities.getOrcReader(input).getObjectInspector();
+} catch (Exception e) {
+LOG.error("Unable to create an object inspector.");
+throw e;
+}
+}
+
+@Override
+public List getFieldsForBatch(OneRow batch) {
+
+Writable writableObject = null;
+Object fieldValue = null;
+VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) 
batch.getData();
+
+// Allocate empty result set
+resolvedBatch = new 
ArrayList(vectorizedBatch.size);
+for (int i = 0; i < vectorizedBatch.size; i++) {
+ArrayList row = new 
ArrayList(inputData.getColumns());
+resolvedBatch.add(row);
+for (int j = 0; j < inputData.getColumns(); j++) {
+row.add(null);
   

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-23 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118132449
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/ReadVectorizedBridge.java
 ---
@@ -0,0 +1,126 @@
+package org.apache.hawq.pxf.service;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.DataInputStream;
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.hawq.pxf.api.BadRecordException;
+import org.apache.hawq.pxf.api.OneField;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.ReadVectorizedResolver;
+import org.apache.hawq.pxf.service.io.Writable;
+import org.apache.hawq.pxf.service.utilities.ProtocolData;
+
+public class ReadVectorizedBridge implements Bridge {
+
+ReadAccessor fileAccessor = null;
+ReadVectorizedResolver fieldsResolver;
+BridgeOutputBuilder outputBuilder = null;
+LinkedList outputQueue = null;
+
+public ReadVectorizedBridge(ProtocolData protData) throws Exception {
+outputBuilder = new BridgeOutputBuilder(protData);
+outputQueue = new LinkedList();
+fileAccessor = ReadBridge.getFileAccessor(protData);
+fieldsResolver = ReadBridge.getFieldsResolver(protData);
+}
+
+@Override
+public Writable getNext() throws Exception {
+Writable output = null;
+OneRow batch = null;
+
+if (!outputQueue.isEmpty()) {
+return outputQueue.pop();
+}
+
+try {
+while (outputQueue.isEmpty()) {
+batch = fileAccessor.readNextObject();
+if (batch == null) {
+output = outputBuilder.getPartialLine();
+if (output != null) {
+//LOG.warn("A partial record in the end of the 
fragment");
--- End diff --

remove commented lines ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-23 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118129564
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchAccessor.java
 ---
@@ -0,0 +1,115 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.*;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.api.utilities.Utilities;
+import org.apache.hawq.pxf.plugins.hdfs.utilities.HdfsUtilities;
+import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.orc.OrcFile;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.Reader.Options;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.io.LongWritable;
+
+/**
+ * Accessor class which reads data in batches.
+ * One batch is 1024 rows of all projected columns
+ *
+ */
+public class HiveORCBatchAccessor extends Plugin implements ReadAccessor {
+
+protected RecordReader vrr;
+private int batchIndex;
+private VectorizedRowBatch batch;
+
+public HiveORCBatchAccessor(InputData input) throws Exception {
+super(input);
+}
+
+@Override
+public boolean openForRead() throws Exception {
+Reader reader = HiveUtilities.getOrcReader(inputData);
+Options options = new Options();
+addColumns(options);
+addFragments(options);
+vrr = reader.rowsOptions(options);
+return vrr.hasNext();
+}
+
+/**
+ * File might have multiple splits, so this method restricts
+ * reader to one split.
+ * @param options reader options to modify
+ */
+private void addFragments(Options options) {
+FileSplit fileSplit = HdfsUtilities.parseFileSplit(inputData);
+options.range(fileSplit.getStart(), fileSplit.getLength());
+}
+
+/**
+ * Reads next batch for current fragment.
+ * @return next batch in OneRow format, key is a batch number, data is 
a batch
+ */
+@Override
+public OneRow readNextObject() throws IOException {
+if (vrr.hasNext()) {
+batch = vrr.nextBatch(batch);
+batchIndex++;
+return new OneRow(new LongWritable(batchIndex), batch);
+} else {
+//All batches are exhausted
+return null;
+}
+}
+
+/**
+ * This method updated reader optionst to include projected columns 
only.
--- End diff --

typo "optionst"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-23 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118131006
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java
 ---
@@ -0,0 +1,257 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
+import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
+import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
+import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
+import static org.apache.hawq.pxf.api.io.DataType.DATE;
+import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
+import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
+import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
+import static org.apache.hawq.pxf.api.io.DataType.REAL;
+import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
+import static org.apache.hawq.pxf.api.io.DataType.TEXT;
+import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
+import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Calendar;
+import java.util.List;
+import java.sql.Timestamp;
+import java.sql.Date;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hive.common.type.HiveDecimal;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hawq.pxf.api.OneField;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadVectorizedResolver;
+import org.apache.hawq.pxf.api.UnsupportedTypeException;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
+import org.apache.hadoop.hive.serde2.*;
+import org.apache.hadoop.hive.serde2.objectinspector.*;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
+import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
+import 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
+import org.apache.hadoop.hive.ql.exec.vector.*;
+
+@SuppressWarnings("deprecation")
+public class HiveORCBatchResolver extends Plugin implements 
ReadVectorizedResolver {
+
+private static final Log LOG = 
LogFactory.getLog(HiveORCBatchResolver.class);
+
+private List resolvedBatch;
+private StructObjectInspector soi;
+
+public HiveORCBatchResolver(InputData input) throws Exception {
+super(input);
+try {
+soi = (StructObjectInspector) 
HiveUtilities.getOrcReader(input).getObjectInspector();
+} catch (Exception e) {
+LOG.error("Unable to create an object inspector.");
+throw e;
+}
+}
+
+@Override
+public List getFieldsForBatch(OneRow batch) {
+
+Writable writableObject = null;
+Object fieldValue = null;
+VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) 
batch.getData();
+
+// Allocate empty result set
+resolvedBatch = new 
ArrayList(vectorizedBatch.size);
+for (int i = 0; i < vectorizedBatch.size; i++) {
+ArrayList row = new 
ArrayList(inputData.getColumns());
+resolvedBatch.add(row);
+for (int j = 0; j < inputData.getColumns(); j++) {
+row.add(null);
   

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-05-23 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1225#discussion_r118130590
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java
 ---
@@ -0,0 +1,257 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
+import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
+import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
+import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
+import static org.apache.hawq.pxf.api.io.DataType.DATE;
+import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
+import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
+import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
+import static org.apache.hawq.pxf.api.io.DataType.REAL;
+import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
+import static org.apache.hawq.pxf.api.io.DataType.TEXT;
+import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
+import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Calendar;
+import java.util.List;
+import java.sql.Timestamp;
+import java.sql.Date;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hive.common.type.HiveDecimal;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hawq.pxf.api.OneField;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadVectorizedResolver;
+import org.apache.hawq.pxf.api.UnsupportedTypeException;
+import org.apache.hawq.pxf.api.io.DataType;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
+import org.apache.hadoop.hive.serde2.*;
+import org.apache.hadoop.hive.serde2.objectinspector.*;
+import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
+import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
+import 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
+import org.apache.hadoop.hive.ql.exec.vector.*;
+
+@SuppressWarnings("deprecation")
+public class HiveORCBatchResolver extends Plugin implements 
ReadVectorizedResolver {
+
+private static final Log LOG = 
LogFactory.getLog(HiveORCBatchResolver.class);
+
+private List resolvedBatch;
+private StructObjectInspector soi;
+
+public HiveORCBatchResolver(InputData input) throws Exception {
+super(input);
+try {
+soi = (StructObjectInspector) 
HiveUtilities.getOrcReader(input).getObjectInspector();
+} catch (Exception e) {
+LOG.error("Unable to create an object inspector.");
+throw e;
+}
+}
+
+@Override
+public List getFieldsForBatch(OneRow batch) {
+
+Writable writableObject = null;
+Object fieldValue = null;
+VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) 
batch.getData();
+
+// Allocate empty result set
+resolvedBatch = new 
ArrayList(vectorizedBatch.size);
+for (int i = 0; i < vectorizedBatch.size; i++) {
+ArrayList row = new 
ArrayList(inputData.getColumns());
--- End diff --

call inputData.getColumns() once outside for loop if the data returned is 
always the same


---
If your project is 

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

2017-04-28 Thread sansanichfb
GitHub user sansanichfb opened a pull request:

https://github.com/apache/incubator-hawq/pull/1225

HAWQ-1446: Introduce vectorized profile for ORC.

Work still in progress, want to get earlier feedback.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sansanichfb/incubator-hawq HAWQ-1446

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq/pull/1225.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1225


commit 9fb7929120910163e30043b4fd2ebd000f869b4c
Author: Oleksandr Diachenko 
Date:   2017-04-18T21:38:45Z

[#143733171] Added vectorized accessor and new profile.

commit b65e0e25f6a0520af9fc84ffe71d340c3c896948
Author: Oleksandr Diachenko 
Date:   2017-04-21T08:27:05Z

[#143192433] Added batch resolver.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---