[jira] [Created] (HIVE-23041) LLAP purge command can lead to resource leak
Slim Bouguerra created HIVE-23041: - Summary: LLAP purge command can lead to resource leak Key: HIVE-23041 URL: https://issues.apache.org/jira/browse/HIVE-23041 Project: Hive Issue Type: Bug Reporter: Slim Bouguerra As per the Java Spec https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html An unused ExecutorService should be shut down to allow reclamation of its resources. Code like this is a serious resource leak in case user fires multiple commands. https://github.com/apache/hive/blob/7ae6756d40468d18b65423a0b5174b827dc42b60/ql/src/java/org/apache/hadoop/hive/ql/processors/LlapCacheResourceProcessor.java#L132 The other question that this raise is how those tasks responds to interrupt or cancel on the thread level [~prasanth_j] any idea if one task hangs on IO what happens ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23040) Checkpointing for repl dump incremental phase
Aasha Medhi created HIVE-23040: -- Summary: Checkpointing for repl dump incremental phase Key: HIVE-23040 URL: https://issues.apache.org/jira/browse/HIVE-23040 Project: Hive Issue Type: Bug Reporter: Aasha Medhi Assignee: Aasha Medhi -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23039) Checkpointing for repl dump bootstrap phase
Aasha Medhi created HIVE-23039: -- Summary: Checkpointing for repl dump bootstrap phase Key: HIVE-23039 URL: https://issues.apache.org/jira/browse/HIVE-23039 Project: Hive Issue Type: Bug Reporter: Aasha Medhi Assignee: Aasha Medhi -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23038) Suspect Direct SQL Statement Regarding BitVector
David Mollitor created HIVE-23038: - Summary: Suspect Direct SQL Statement Regarding BitVector Key: HIVE-23038 URL: https://issues.apache.org/jira/browse/HIVE-23038 Project: Hive Issue Type: Improvement Components: Standalone Metastore, Statistics Affects Versions: 3.1.2, 4.0.0 Reporter: David Mollitor https://github.com/apache/hive/blob/26cc3154c061d2194fba1c3bb156bb7e06e4a6c5/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1249 The list of things that gets SELECT-ed from the backend database depends on a flag: {code:java} final String queryText0 = "select " + getStatsList(enableBitVector) + " from " + TAB_COL_STATS + " where \"CAT_NAME\" = ? and \"DB_NAME\" = ? and \"TABLE_NAME\" = ? " + " and \"ENGINE\" = ? and \"COLUMN_NAME\" in ("; {code} However, the same flag is not passed to the Java Marshaling code, so I sincerely doubt that it is being parsed correctly. How can it know what the number of columns are? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23037) Print Logging Information for Exception in AcidUtils tryListLocatedHdfsStatus
David Mollitor created HIVE-23037: - Summary: Print Logging Information for Exception in AcidUtils tryListLocatedHdfsStatus Key: HIVE-23037 URL: https://issues.apache.org/jira/browse/HIVE-23037 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Attachments: HIVE-23037.1.patch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23036) Incorrect ORC PPD eval with sub-millisecond timestamps
Panagiotis Garefalakis created HIVE-23036: - Summary: Incorrect ORC PPD eval with sub-millisecond timestamps Key: HIVE-23036 URL: https://issues.apache.org/jira/browse/HIVE-23036 Project: Hive Issue Type: Bug Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis See [ORC-611|https://issues.apache.org/jira/browse/ORC-611] for more details ORC stores timestamps with: - nanosecond precision for the data itself - milliseconds precision for min-max statistics As both min and max are rounded to the same value, timestamps with ns precision will not pass the PPD evaluator. {code:java} create table tsstat (ts timestamp) stored as orc; insert into tsstat values ("1970-01-01 00:00:00.0005"); select * from tsstat where ts = "1970-01-01 00:00:00.0005"; -- returned 0 rows{code} ORC PPD evaluation currently happens as part of OrcInputFormat [https://github.com/apache/hive/blob/7e39a2c13711f9377c9ce1edb4224880421b1ea5/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2314] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23035) Scheduled query executor may hang in case TezAMs are launched on-demand
Zoltan Haindrich created HIVE-23035: --- Summary: Scheduled query executor may hang in case TezAMs are launched on-demand Key: HIVE-23035 URL: https://issues.apache.org/jira/browse/HIVE-23035 Project: Hive Issue Type: Bug Reporter: Zoltan Haindrich Assignee: Zoltan Haindrich Right now the schq executor hangs during session initialization - because it tries to open the tez session while it initializes the SessionState -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23034) Arrow serializer should not keep the reference of arrow offset and validity buffers
Shubham Chaurasia created HIVE-23034: Summary: Arrow serializer should not keep the reference of arrow offset and validity buffers Key: HIVE-23034 URL: https://issues.apache.org/jira/browse/HIVE-23034 Project: Hive Issue Type: Bug Components: llap, Serializers/Deserializers Reporter: Shubham Chaurasia Assignee: Shubham Chaurasia Currently, a part of writeList() method in arrow serializer is implemented like - {code:java} final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer(); int nextOffset = 0; for (int rowIndex = 0; rowIndex < size; rowIndex++) { int selectedIndex = rowIndex; if (vectorizedRowBatch.selectedInUse) { selectedIndex = vectorizedRowBatch.selected[rowIndex]; } if (hiveVector.isNull[selectedIndex]) { offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset); } else { offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset); nextOffset += (int) hiveVector.lengths[selectedIndex]; arrowVector.setNotNull(rowIndex); } } offsetBuffer.setInt(size * OFFSET_WIDTH, nextOffset); {code} 1) Here we obtain a reference to {{final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer();}} and keep updating the arrow vector and offset vector. Problem - {{arrowVector.setNotNull(rowIndex)}} keeps checking the index and reallocates the offset and validity buffers when a threshold is crossed, updates the references internally and also releases the old buffers (which decrements the buffer reference count). Now the reference which we obtained in 1) becomes obsolete. Furthermore if try to read or write old buffer, we see - {code:java} Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0 at io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1413) at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:131) at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:162) at io.netty.buffer.ArrowBuf.setInt(ArrowBuf.java:656) at org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:432) at org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285) at org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:352) at org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:288) at org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:419) at org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285) at org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:205) {code} Solution - This can be fixed by getting the buffers each time ( {{arrowVector.getOffsetBuffer()}} ) we want to update them. In our internal tests, this is very frequently seen on arrow 0.8.0 but not on 0.10.0 but should be handled the same way for 0.10.0 too as it does the same thing. -- This message was sent by Atlassian Jira (v8.3.4#803005)