[jira] [Created] (HIVE-23041) LLAP purge command can lead to resource leak

2020-03-17 Thread Slim Bouguerra (Jira)
Slim Bouguerra created HIVE-23041:
-

 Summary: LLAP purge command can lead to resource leak
 Key: HIVE-23041
 URL: https://issues.apache.org/jira/browse/HIVE-23041
 Project: Hive
  Issue Type: Bug
Reporter: Slim Bouguerra


As per the Java Spec 
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html

An unused ExecutorService should be shut down to allow reclamation of its 
resources.

Code like this is a serious resource leak in case  user fires multiple commands.

https://github.com/apache/hive/blob/7ae6756d40468d18b65423a0b5174b827dc42b60/ql/src/java/org/apache/hadoop/hive/ql/processors/LlapCacheResourceProcessor.java#L132

The other question that this raise is how those tasks responds to interrupt or 
cancel on the thread level [~prasanth_j] any idea if one task hangs on IO what 
happens ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23040) Checkpointing for repl dump incremental phase

2020-03-17 Thread Aasha Medhi (Jira)
Aasha Medhi created HIVE-23040:
--

 Summary: Checkpointing for repl dump incremental phase
 Key: HIVE-23040
 URL: https://issues.apache.org/jira/browse/HIVE-23040
 Project: Hive
  Issue Type: Bug
Reporter: Aasha Medhi
Assignee: Aasha Medhi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23039) Checkpointing for repl dump bootstrap phase

2020-03-17 Thread Aasha Medhi (Jira)
Aasha Medhi created HIVE-23039:
--

 Summary: Checkpointing for repl dump bootstrap phase
 Key: HIVE-23039
 URL: https://issues.apache.org/jira/browse/HIVE-23039
 Project: Hive
  Issue Type: Bug
Reporter: Aasha Medhi
Assignee: Aasha Medhi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23038) Suspect Direct SQL Statement Regarding BitVector

2020-03-17 Thread David Mollitor (Jira)
David Mollitor created HIVE-23038:
-

 Summary: Suspect Direct SQL Statement Regarding BitVector
 Key: HIVE-23038
 URL: https://issues.apache.org/jira/browse/HIVE-23038
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore, Statistics
Affects Versions: 3.1.2, 4.0.0
Reporter: David Mollitor


https://github.com/apache/hive/blob/26cc3154c061d2194fba1c3bb156bb7e06e4a6c5/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1249

The list of things that gets SELECT-ed from the backend database depends on a 
flag:

{code:java}
final String queryText0 = "select " + getStatsList(enableBitVector) + " 
from " + TAB_COL_STATS
  + " where \"CAT_NAME\" = ? and \"DB_NAME\" = ? and \"TABLE_NAME\" = ? 
"
  + " and \"ENGINE\" = ? and \"COLUMN_NAME\" in (";
{code}

However, the same flag is not passed to the Java Marshaling code, so I 
sincerely doubt that it is being parsed correctly.  How can it know what the 
number of columns are?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23037) Print Logging Information for Exception in AcidUtils tryListLocatedHdfsStatus

2020-03-17 Thread David Mollitor (Jira)
David Mollitor created HIVE-23037:
-

 Summary: Print Logging Information for Exception in AcidUtils 
tryListLocatedHdfsStatus
 Key: HIVE-23037
 URL: https://issues.apache.org/jira/browse/HIVE-23037
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor
 Attachments: HIVE-23037.1.patch





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23036) Incorrect ORC PPD eval with sub-millisecond timestamps

2020-03-17 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-23036:
-

 Summary: Incorrect ORC PPD eval with sub-millisecond timestamps
 Key: HIVE-23036
 URL: https://issues.apache.org/jira/browse/HIVE-23036
 Project: Hive
  Issue Type: Bug
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


See [ORC-611|https://issues.apache.org/jira/browse/ORC-611] for more details

ORC stores timestamps with:
 - nanosecond precision for the data itself
 - milliseconds precision for min-max statistics

As both min and max are rounded to the same value,  timestamps with ns 
precision will not pass the PPD evaluator.
{code:java}
create table tsstat (ts timestamp) stored as orc;
insert into tsstat values ("1970-01-01 00:00:00.0005");
select * from tsstat where ts = "1970-01-01 00:00:00.0005";
-- returned 0 rows{code}

ORC PPD evaluation currently happens as part of OrcInputFormat 
[https://github.com/apache/hive/blob/7e39a2c13711f9377c9ce1edb4224880421b1ea5/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2314]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23035) Scheduled query executor may hang in case TezAMs are launched on-demand

2020-03-17 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23035:
---

 Summary: Scheduled query executor may hang in case TezAMs are 
launched on-demand
 Key: HIVE-23035
 URL: https://issues.apache.org/jira/browse/HIVE-23035
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


Right now the schq executor hangs during session initialization - because it 
tries to open the tez session while it initializes the SessionState



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23034) Arrow serializer should not keep the reference of arrow offset and validity buffers

2020-03-17 Thread Shubham Chaurasia (Jira)
Shubham Chaurasia created HIVE-23034:


 Summary: Arrow serializer should not keep the reference of arrow 
offset and validity buffers
 Key: HIVE-23034
 URL: https://issues.apache.org/jira/browse/HIVE-23034
 Project: Hive
  Issue Type: Bug
  Components: llap, Serializers/Deserializers
Reporter: Shubham Chaurasia
Assignee: Shubham Chaurasia


Currently, a part of writeList() method in arrow serializer is implemented like 
- 
{code:java}
final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer();
int nextOffset = 0;

for (int rowIndex = 0; rowIndex < size; rowIndex++) {
  int selectedIndex = rowIndex;
  if (vectorizedRowBatch.selectedInUse) {
selectedIndex = vectorizedRowBatch.selected[rowIndex];
  }
  if (hiveVector.isNull[selectedIndex]) {
offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
  } else {
offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
nextOffset += (int) hiveVector.lengths[selectedIndex];
arrowVector.setNotNull(rowIndex);
  }
}
offsetBuffer.setInt(size * OFFSET_WIDTH, nextOffset);
{code}

1) Here we obtain a reference to {{final ArrowBuf offsetBuffer = 
arrowVector.getOffsetBuffer();}} and keep updating the arrow vector and offset 
vector. 

Problem - 

{{arrowVector.setNotNull(rowIndex)}} keeps checking the index and reallocates 
the offset and validity buffers when a threshold is crossed, updates the 
references internally and also releases the old buffers (which decrements the 
buffer reference count). Now the reference which we obtained in 1) becomes 
obsolete. Furthermore if try to read or write old buffer, we see - 
{code:java}
Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0
at 
io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1413)
at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:131)
at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:162)
at io.netty.buffer.ArrowBuf.setInt(ArrowBuf.java:656)
at 
org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:432)
at 
org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
at 
org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:352)
at 
org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:288)
at 
org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:419)
at 
org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
at 
org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:205)
{code}
 
Solution - 
This can be fixed by getting the buffers each time ( 
{{arrowVector.getOffsetBuffer()}} ) we want to update them. 

In our internal tests, this is very frequently seen on arrow 0.8.0 but not on 
0.10.0 but should be handled the same way for 0.10.0 too as it does the same 
thing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)