[jira] [Created] (HIVE-21163) ParseUtils.parseQueryAndGetSchema fails on views with global limit
Eric Wohlstadter created HIVE-21163: --- Summary: ParseUtils.parseQueryAndGetSchema fails on views with global limit Key: HIVE-21163 URL: https://issues.apache.org/jira/browse/HIVE-21163 Project: Hive Issue Type: Bug Reporter: Eric Wohlstadter {code:java} hive> USE tpcds_bin_partitioned_orc_1000; hive> CREATE VIEW profit_view AS SELECT ss_net_profit, d_date FROM store_sales, date_dim WHERE d_date = ss_sold_date LIMIT 100; hive> SELECT get_splits("SELECT * from profit_view", 0); Error: java.io.IOException: org.apache.hadoop.hive.ql.parse.SemanticException: View profit_view is corresponding to HiveSortLimit#3447, rather than a HiveProject. (state=,code=0) {code} This works fine if the view doesn't have a global limit. It also works fine if you define a view without a global limit, and then apply a limit on top of the view. {{Calcite.genLogicalPlan}} is expecting a {{HiveProject}} root but when going through {{ParseUtils.parseQueryAndGetSchema}} the {{HiveSortLimit}} appears at the root. Perhaps it is simply missing a step to wrap the limit with a projection? {code} Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: View profit_view is corresponding to HiveSortLimit#2275, rather than a HiveProject. at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4931) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1741) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1689) at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1043) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1448) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genLogicalPlan(CalcitePlanner.java:395) at org.apache.hadoop.hive.ql.parse.ParseUtils.parseQueryAndGetSchema(ParseUtils.java:561) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:254) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20981) streaming/AbstractRecordWriter leaks HeapMemoryMonitor
Eric Wohlstadter created HIVE-20981: --- Summary: streaming/AbstractRecordWriter leaks HeapMemoryMonitor Key: HIVE-20981 URL: https://issues.apache.org/jira/browse/HIVE-20981 Project: Hive Issue Type: Bug Reporter: Eric Wohlstadter Assignee: Eric Wohlstadter Each record writer registers a memory monitor with the MemoryMXBean but they aren't removed. So the listener objects/lambdas accumulate over time in the bean. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20322) FlakyTest: TestMiniDruidCliDriver
Eric Wohlstadter created HIVE-20322: --- Summary: FlakyTest: TestMiniDruidCliDriver Key: HIVE-20322 URL: https://issues.apache.org/jira/browse/HIVE-20322 Project: Hive Issue Type: Bug Reporter: Eric Wohlstadter TestMiniDruidCliDriver is failing intermittently but I'm seeing it fail a significant percentage of the time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService
Eric Wohlstadter created HIVE-20312: --- Summary: Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService Key: HIVE-20312 URL: https://issues.apache.org/jira/browse/HIVE-20312 Project: Hive Issue Type: Improvement Reporter: Eric Wohlstadter Assignee: Eric Wohlstadter Clients should be able to provide their own BufferAllocator to LlapBaseInputFormat if allocator operations depend on client-side logic. For example, clients may want to manage the allocator hierarchy per client-side task, thread, etc.. Currently the client is forced to use one global RootAllocator per process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20300) VectorFileSinkArrowOperator
Eric Wohlstadter created HIVE-20300: --- Summary: VectorFileSinkArrowOperator Key: HIVE-20300 URL: https://issues.apache.org/jira/browse/HIVE-20300 Project: Hive Issue Type: Improvement Reporter: Eric Wohlstadter Assignee: Eric Wohlstadter Bypass the row-mode FileSinkOperator for pushing Arrow format to the LlapOutputFormatService. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20290) Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during GetSplits
Eric Wohlstadter created HIVE-20290: --- Summary: Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during GetSplits Key: HIVE-20290 URL: https://issues.apache.org/jira/browse/HIVE-20290 Project: Hive Issue Type: Bug Affects Versions: 3.1.0 Reporter: Eric Wohlstadter Assignee: Eric Wohlstadter When using {{GenericUDTFGetSplits}} to create {{LlapInputSplit}} for submission to {{LlapOutputFormatService}}, the physical plan generation initializes whatever SerDe is being used. {{ArrowColumnarBatchSerDe}} initializes buffers for Arrow and {{VectorizedRowBatch}} at this point inside HS2 which are never used. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer
Eric Wohlstadter created HIVE-20203: --- Summary: Arrow SerDe leaks a DirectByteBuffer Key: HIVE-20203 URL: https://issues.apache.org/jira/browse/HIVE-20203 Project: Hive Issue Type: Bug Reporter: Eric Wohlstadter Assignee: Eric Wohlstadter ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task that uses the serde. The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool. This buffer is never closed and leaks about 1K of physical memory for each task. This patch does three things: # Ensure the buffer is closed when the RecordWriter for the task is closed. # Adds per-task memory accounting by assigning a ChildAllocator to each task from the RootAllocator. # Enforces that the ChildAllocator for a task has released all memory assigned to it, when the task is completed. The patch assumes that close() is always called on the RecordWriter when a task is finished (even if their is a failure during task execution). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20093) LlapOutputFomatService: Use ArrowBuf with Netty for Accounting
Eric Wohlstadter created HIVE-20093: --- Summary: LlapOutputFomatService: Use ArrowBuf with Netty for Accounting Key: HIVE-20093 URL: https://issues.apache.org/jira/browse/HIVE-20093 Project: Hive Issue Type: Bug Reporter: Eric Wohlstadter Assignee: Eric Wohlstadter Combining {{Unpooled.wrappedBuffer}} with Arrow buffers can create corrupted buffers from buffer reuse race-condition. This change ensures Arrow memory to be accounted by the same BufferAllocator. RootAllocator will return an ArrowBuf which cooperates with Arrow memory arrow accounting after Netty {{release(1)}} the buffer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19808) GenericUDTFGetSplits should support ACID reads in the temp. table read path
Eric Wohlstadter created HIVE-19808: --- Summary: GenericUDTFGetSplits should support ACID reads in the temp. table read path Key: HIVE-19808 URL: https://issues.apache.org/jira/browse/HIVE-19808 Project: Hive Issue Type: Bug Reporter: Eric Wohlstadter Assignee: Eric Wohlstadter 1. Map-only reads work on ACID tables. 2. Temp. table reads (for multi-vertex queries) work on non-ACID tables. 3. But temp. table reads don't work on ACID tables. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create temp table: java.lang.IllegalStateException: calling recordValidTxn() more than once in the same txnid:420 at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:303) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:202) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116) at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:918) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:492) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:484) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145) ... 16 more {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19703) GenericUDTFGetSplits never uses num splits argument
Eric Wohlstadter created HIVE-19703: --- Summary: GenericUDTFGetSplits never uses num splits argument Key: HIVE-19703 URL: https://issues.apache.org/jira/browse/HIVE-19703 Project: Hive Issue Type: Bug Components: UDF Reporter: Eric Wohlstadter The description for GenericUDTFGetSplits says {code} Returns an array of length int serialized splits for the referenced tables string. {code} but the argument to control the number of splits is DOA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19682) Provide option for GenericUDTFGetSplits to return only schema metadata
Eric Wohlstadter created HIVE-19682: --- Summary: Provide option for GenericUDTFGetSplits to return only schema metadata Key: HIVE-19682 URL: https://issues.apache.org/jira/browse/HIVE-19682 Project: Hive Issue Type: Improvement Reporter: Eric Wohlstadter Assignee: Eric Wohlstadter For some uses cases it is necessary to know the output schema for a HiveQL before executing the query. But there is no existing client API that provides this information. Hive JDBC doesn't provide the schema for parametric types in {{ResultSetMetaData}}. GenericUDTFGetSplits bundles the proper schema metadata with the fragments for input splits. An option can be added to return only the schema metadata from compilation, and the generation of input splits can be skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19627) Add support for LlapArrowBatchRecordReader to be used through a Hadoop InputFormat
Eric Wohlstadter created HIVE-19627: --- Summary: Add support for LlapArrowBatchRecordReader to be used through a Hadoop InputFormat Key: HIVE-19627 URL: https://issues.apache.org/jira/browse/HIVE-19627 Project: Hive Issue Type: Sub-task Reporter: Eric Wohlstadter Assignee: Eric Wohlstadter LlapArrowBatchRecordReader would need to support configuration through JobConf, rather than, or in addition to, the external client's native configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19495) Arrow SerDe itest failure
Eric Wohlstadter created HIVE-19495: --- Summary: Arrow SerDe itest failure Key: HIVE-19495 URL: https://issues.apache.org/jira/browse/HIVE-19495 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Eric Wohlstadter Assignee: Teddy Choi Fix For: 3.1.0 "You tried to write a Bit type when you are using a ValueWriter of type NullableMapWriter." -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19445) Graceful handling of "close" in WritableByteChannelAdapter
Eric Wohlstadter created HIVE-19445: --- Summary: Graceful handling of "close" in WritableByteChannelAdapter Key: HIVE-19445 URL: https://issues.apache.org/jira/browse/HIVE-19445 Project: Hive Issue Type: Bug Reporter: Eric Wohlstadter org.apache.hadoop.hive.llap.WritableByteChannelAdapter {quote}"I see now that the writeListener could be implemented in such a way as to propagate a write error back to the writer (so we can possibly throw an exception and fail the current operation rather than just log and ignore the error). Plus on close I'm wondering if it is better just to wait for the close future to complete so we can check the status." {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19359) itest for Arrow LLAP OutputFormat
Eric Wohlstadter created HIVE-19359: --- Summary: itest for Arrow LLAP OutputFormat Key: HIVE-19359 URL: https://issues.apache.org/jira/browse/HIVE-19359 Project: Hive Issue Type: Task Components: Tests Reporter: Eric Wohlstadter Assignee: Eric Wohlstadter Modified version of TestJdbcWithMiniLlap Exercises HIVE-19306, HIVE-19307, and HIVE-19308. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19309) Add Arrow dependencies to LlapServiceDriver
Eric Wohlstadter created HIVE-19309: --- Summary: Add Arrow dependencies to LlapServiceDriver Key: HIVE-19309 URL: https://issues.apache.org/jira/browse/HIVE-19309 Project: Hive Issue Type: Task Components: llap Reporter: Eric Wohlstadter Need to make arrow jars available to daemons. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19308) Provide an Arrow stream reader for external LLAP clients
Eric Wohlstadter created HIVE-19308: --- Summary: Provide an Arrow stream reader for external LLAP clients Key: HIVE-19308 URL: https://issues.apache.org/jira/browse/HIVE-19308 Project: Hive Issue Type: Task Components: llap Reporter: Eric Wohlstadter Assignee: Eric Wohlstadter This is a sub-class of LlapBaseRecordReader that wraps the socket inputStream and produces Arrow batches for an external client. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19307) Support ArrowOutputStream in LlapOutputFormatService
Eric Wohlstadter created HIVE-19307: --- Summary: Support ArrowOutputStream in LlapOutputFormatService Key: HIVE-19307 URL: https://issues.apache.org/jira/browse/HIVE-19307 Project: Hive Issue Type: Task Components: llap Reporter: Eric Wohlstadter Assignee: Eric Wohlstadter Support pushing arrow batches through org.apache.arrow.vector.ipc.ArrowOutputStream in LllapOutputFormatService. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19306) Arrow batch serializer
Eric Wohlstadter created HIVE-19306: --- Summary: Arrow batch serializer Key: HIVE-19306 URL: https://issues.apache.org/jira/browse/HIVE-19306 Project: Hive Issue Type: Task Components: Serializers/Deserializers Reporter: Eric Wohlstadter Assignee: Teddy Choi Leverage the ThriftJDBCBinarySerDe code path that already exists in SematicAnalyzer/FileSinkOperator to create a serializer that batches rows into Arrow vector batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19305) Arrow format for LlapOutputFormatService (umbrella)
Eric Wohlstadter created HIVE-19305: --- Summary: Arrow format for LlapOutputFormatService (umbrella) Key: HIVE-19305 URL: https://issues.apache.org/jira/browse/HIVE-19305 Project: Hive Issue Type: Improvement Components: llap Reporter: Eric Wohlstadter Assignee: Eric Wohlstadter Allows external clients to consume output from LLAP daemons in Arrow stream format. -- This message was sent by Atlassian JIRA (v7.6.3#76005)