[jira] [Created] (HIVE-21163) ParseUtils.parseQueryAndGetSchema fails on views with global limit

2019-01-24 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-21163:
---

 Summary: ParseUtils.parseQueryAndGetSchema fails on views with 
global limit
 Key: HIVE-21163
 URL: https://issues.apache.org/jira/browse/HIVE-21163
 Project: Hive
  Issue Type: Bug
Reporter: Eric Wohlstadter


{code:java}
hive> USE tpcds_bin_partitioned_orc_1000;
hive> CREATE VIEW profit_view AS SELECT ss_net_profit, d_date FROM store_sales, 
date_dim WHERE d_date = ss_sold_date LIMIT 100;
hive> SELECT get_splits("SELECT * from profit_view", 0);

Error: java.io.IOException: org.apache.hadoop.hive.ql.parse.SemanticException: 
View profit_view is corresponding to HiveSortLimit#3447, rather than a 
HiveProject. (state=,code=0)
{code}

This works fine if the view doesn't have a global limit. 
It also works fine if you define a view without a global limit, and then apply 
a limit on top of the view. 

{{Calcite.genLogicalPlan}} is expecting a {{HiveProject}} root but when going 
through {{ParseUtils.parseQueryAndGetSchema}} the {{HiveSortLimit}} appears at 
the root. Perhaps it is simply missing a step to wrap the limit with a 
projection?

{code}
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: View profit_view 
is corresponding to HiveSortLimit#2275, rather than a HiveProject.
  at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4931)
  at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1741)
  at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1689)
  at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
  at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1043)
  at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
  at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
  at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1448)
  at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genLogicalPlan(CalcitePlanner.java:395)
  at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parseQueryAndGetSchema(ParseUtils.java:561)
  at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:254)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20981) streaming/AbstractRecordWriter leaks HeapMemoryMonitor

2018-11-28 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-20981:
---

 Summary: streaming/AbstractRecordWriter leaks HeapMemoryMonitor
 Key: HIVE-20981
 URL: https://issues.apache.org/jira/browse/HIVE-20981
 Project: Hive
  Issue Type: Bug
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


Each record writer registers a memory monitor with the MemoryMXBean but they 
aren't removed. So the listener objects/lambdas accumulate over time in the 
bean. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20322) FlakyTest: TestMiniDruidCliDriver

2018-08-06 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-20322:
---

 Summary: FlakyTest: TestMiniDruidCliDriver
 Key: HIVE-20322
 URL: https://issues.apache.org/jira/browse/HIVE-20322
 Project: Hive
  Issue Type: Bug
Reporter: Eric Wohlstadter


TestMiniDruidCliDriver is failing intermittently but I'm seeing it fail a 
significant percentage of the time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService

2018-08-03 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-20312:
---

 Summary: Allow arrow clients to use their own BufferAllocator with 
LlapOutputFormatService
 Key: HIVE-20312
 URL: https://issues.apache.org/jira/browse/HIVE-20312
 Project: Hive
  Issue Type: Improvement
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


Clients should be able to provide their own BufferAllocator to 
LlapBaseInputFormat if allocator operations depend on client-side logic. For 
example, clients may want to manage the allocator hierarchy per client-side 
task, thread, etc.. 

Currently the client is forced to use one global RootAllocator per process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-02 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-20300:
---

 Summary: VectorFileSinkArrowOperator
 Key: HIVE-20300
 URL: https://issues.apache.org/jira/browse/HIVE-20300
 Project: Hive
  Issue Type: Improvement
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20290) Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during GetSplits

2018-08-01 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-20290:
---

 Summary: Lazy initialize ArrowColumnarBatchSerDe so it doesn't 
allocate buffers during GetSplits
 Key: HIVE-20290
 URL: https://issues.apache.org/jira/browse/HIVE-20290
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.0
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


When using {{GenericUDTFGetSplits}} to create {{LlapInputSplit}} for submission 
to {{LlapOutputFormatService}}, the physical plan generation initializes 
whatever SerDe is being used.

{{ArrowColumnarBatchSerDe}} initializes buffers for Arrow and 
{{VectorizedRowBatch}} at this point inside HS2 which are never used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-18 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-20203:
---

 Summary: Arrow SerDe leaks a DirectByteBuffer
 Key: HIVE-20203
 URL: https://issues.apache.org/jira/browse/HIVE-20203
 Project: Hive
  Issue Type: Bug
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task that 
uses the serde.

The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.

This buffer is never closed and leaks about 1K of physical memory for each task.

This patch does three things:
 # Ensure the buffer is closed when the RecordWriter for the task is closed. 
 # Adds per-task memory accounting by assigning a ChildAllocator to each task 
from the RootAllocator.
 # Enforces that the ChildAllocator for a task has released all memory assigned 
to it, when the task is completed. 

The patch assumes that close() is always called on the RecordWriter when a task 
is finished (even if their is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20093) LlapOutputFomatService: Use ArrowBuf with Netty for Accounting

2018-07-05 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-20093:
---

 Summary: LlapOutputFomatService: Use ArrowBuf with Netty for 
Accounting
 Key: HIVE-20093
 URL: https://issues.apache.org/jira/browse/HIVE-20093
 Project: Hive
  Issue Type: Bug
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


Combining {{Unpooled.wrappedBuffer}} with Arrow buffers can create corrupted 
buffers from buffer reuse race-condition.

This change ensures Arrow memory to be accounted by the same BufferAllocator.

RootAllocator will return an ArrowBuf which cooperates with Arrow memory arrow 
accounting after Netty {{release(1)}} the buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19808) GenericUDTFGetSplits should support ACID reads in the temp. table read path

2018-06-05 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19808:
---

 Summary: GenericUDTFGetSplits should support ACID reads in the 
temp. table read path
 Key: HIVE-19808
 URL: https://issues.apache.org/jira/browse/HIVE-19808
 Project: Hive
  Issue Type: Bug
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


1. Map-only reads work on ACID tables.
2. Temp. table reads (for multi-vertex queries) work on non-ACID tables.
3. But temp. table reads don't work on ACID tables.

{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create 
temp table: java.lang.IllegalStateException: calling recordValidTxn() more than 
once in the same txnid:420
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:303)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:202)
at 
org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
at 
org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:918)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:492)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:484)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145)
... 16 more
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19703) GenericUDTFGetSplits never uses num splits argument

2018-05-24 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19703:
---

 Summary: GenericUDTFGetSplits never uses num splits argument
 Key: HIVE-19703
 URL: https://issues.apache.org/jira/browse/HIVE-19703
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Eric Wohlstadter


The description for GenericUDTFGetSplits says
{code}
Returns an array of length int serialized splits for the referenced tables 
string.
{code}

but the argument to control the number of splits is DOA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19682) Provide option for GenericUDTFGetSplits to return only schema metadata

2018-05-23 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19682:
---

 Summary: Provide option for GenericUDTFGetSplits to return only 
schema metadata
 Key: HIVE-19682
 URL: https://issues.apache.org/jira/browse/HIVE-19682
 Project: Hive
  Issue Type: Improvement
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


For some uses cases it is necessary to know the output schema for a HiveQL 
before executing the query. But there is no existing client API that provides 
this information.

Hive JDBC doesn't provide the schema for parametric types in 
{{ResultSetMetaData}}.

GenericUDTFGetSplits bundles the proper schema metadata with the fragments for 
input splits. An option can be added to return only the schema metadata from 
compilation, and the generation of input splits can be skipped.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19627) Add support for LlapArrowBatchRecordReader to be used through a Hadoop InputFormat

2018-05-21 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19627:
---

 Summary: Add support for LlapArrowBatchRecordReader to be used 
through a Hadoop InputFormat
 Key: HIVE-19627
 URL: https://issues.apache.org/jira/browse/HIVE-19627
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


LlapArrowBatchRecordReader would need to support configuration through JobConf, 
rather than, or in addition to, the external client's native configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19495) Arrow SerDe itest failure

2018-05-10 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19495:
---

 Summary: Arrow SerDe itest failure
 Key: HIVE-19495
 URL: https://issues.apache.org/jira/browse/HIVE-19495
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Eric Wohlstadter
Assignee: Teddy Choi
 Fix For: 3.1.0


"You tried to write a Bit type when you are using a ValueWriter of type 
NullableMapWriter."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19445) Graceful handling of "close" in WritableByteChannelAdapter

2018-05-07 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19445:
---

 Summary: Graceful handling of "close" in WritableByteChannelAdapter
 Key: HIVE-19445
 URL: https://issues.apache.org/jira/browse/HIVE-19445
 Project: Hive
  Issue Type: Bug
Reporter: Eric Wohlstadter


org.apache.hadoop.hive.llap.WritableByteChannelAdapter
{quote}"I see now that the writeListener could be implemented in such a way as 
to propagate a write error back to the writer (so we can possibly throw an 
exception and fail the current operation rather than just log and ignore the 
error). Plus on close I'm wondering if it is better just to wait for the close 
future to complete so we can check the status."
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19359) itest for Arrow LLAP OutputFormat

2018-04-29 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19359:
---

 Summary: itest for Arrow LLAP OutputFormat
 Key: HIVE-19359
 URL: https://issues.apache.org/jira/browse/HIVE-19359
 Project: Hive
  Issue Type: Task
  Components: Tests
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


Modified version of TestJdbcWithMiniLlap

Exercises HIVE-19306, HIVE-19307, and HIVE-19308.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19309) Add Arrow dependencies to LlapServiceDriver

2018-04-25 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19309:
---

 Summary: Add Arrow dependencies to LlapServiceDriver
 Key: HIVE-19309
 URL: https://issues.apache.org/jira/browse/HIVE-19309
 Project: Hive
  Issue Type: Task
  Components: llap
Reporter: Eric Wohlstadter


Need to make arrow jars available to daemons.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19308) Provide an Arrow stream reader for external LLAP clients

2018-04-25 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19308:
---

 Summary: Provide an Arrow stream reader for external LLAP clients 
 Key: HIVE-19308
 URL: https://issues.apache.org/jira/browse/HIVE-19308
 Project: Hive
  Issue Type: Task
  Components: llap
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


This is a sub-class of LlapBaseRecordReader that wraps the socket inputStream 
and produces Arrow batches for an external client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19307) Support ArrowOutputStream in LlapOutputFormatService

2018-04-25 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19307:
---

 Summary: Support ArrowOutputStream in LlapOutputFormatService
 Key: HIVE-19307
 URL: https://issues.apache.org/jira/browse/HIVE-19307
 Project: Hive
  Issue Type: Task
  Components: llap
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


Support pushing arrow batches through 
org.apache.arrow.vector.ipc.ArrowOutputStream in LllapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19306) Arrow batch serializer

2018-04-25 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19306:
---

 Summary: Arrow batch serializer
 Key: HIVE-19306
 URL: https://issues.apache.org/jira/browse/HIVE-19306
 Project: Hive
  Issue Type: Task
  Components: Serializers/Deserializers
Reporter: Eric Wohlstadter
Assignee: Teddy Choi


Leverage the ThriftJDBCBinarySerDe code path that already exists in 
SematicAnalyzer/FileSinkOperator to create a serializer that batches rows into 
Arrow vector batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19305) Arrow format for LlapOutputFormatService (umbrella)

2018-04-25 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19305:
---

 Summary: Arrow format for LlapOutputFormatService (umbrella)
 Key: HIVE-19305
 URL: https://issues.apache.org/jira/browse/HIVE-19305
 Project: Hive
  Issue Type: Improvement
  Components: llap
Reporter: Eric Wohlstadter
Assignee: Eric Wohlstadter


Allows external clients to consume output from LLAP daemons in Arrow stream 
format.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)