[jira] [Created] (HIVE-24196) Refactor getAcidState in AcidUtils to use HMS endpoint

2020-09-24 Thread Barnabas Maidics (Jira)
Barnabas Maidics created HIVE-24196:
---

 Summary: Refactor getAcidState in AcidUtils to use HMS endpoint
 Key: HIVE-24196
 URL: https://issues.apache.org/jira/browse/HIVE-24196
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Barnabas Maidics
Assignee: Barnabas Maidics






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23987) Upgrade arrow version to 0.11.0

2020-08-04 Thread Barnabas Maidics (Jira)
Barnabas Maidics created HIVE-23987:
---

 Summary: Upgrade arrow version to 0.11.0
 Key: HIVE-23987
 URL: https://issues.apache.org/jira/browse/HIVE-23987
 Project: Hive
  Issue Type: Improvement
Reporter: Barnabas Maidics
Assignee: Barnabas Maidics


As part of [HIVE-23890|https://issues.apache.org/jira/browse/HIVE-23890], we're 
introducing flatbuffers as a dependency. 
Arrow 0.10.0 has an unofficial flatbuffer dependency, which is incompatible 
with the official ones: https://issues.apache.org/jira/browse/ARROW-3175

It was fixed in 0.11.0. We should upgrade to that version




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

2020-07-21 Thread Barnabas Maidics (Jira)
Barnabas Maidics created HIVE-23890:
---

 Summary: Create HMS endpoint for querying file lists using 
FlatBuffers as serialization
 Key: HIVE-23890
 URL: https://issues.apache.org/jira/browse/HIVE-23890
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Barnabas Maidics
Assignee: Barnabas Maidics






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23849) Hive skips the creation of ColumnAccessInfo when creating a view

2020-07-15 Thread Barnabas Maidics (Jira)
Barnabas Maidics created HIVE-23849:
---

 Summary: Hive skips the creation of ColumnAccessInfo when creating 
a view
 Key: HIVE-23849
 URL: https://issues.apache.org/jira/browse/HIVE-23849
 Project: Hive
  Issue Type: Bug
Reporter: Barnabas Maidics
Assignee: Barnabas Maidics


When creating a view, Hive skips the creation of ColumnAccessInfo that should 
be created at [step 
8|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12601]].
 This causes Authorization error. 

Currently, this issue is "hidden" when CBO is enabled. By introducing 
[HIVE-14496|https://issues.apache.org/jira/browse/HIVE-14496], CalcitePlanner 
creates this ColumnAccessInfo at [step 
2|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12460]].
 But after turning off CBO, the issue is still there. 

I think the return statement in [step 
5|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]]
 is not necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23774) Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java

2020-06-29 Thread Barnabas Maidics (Jira)
Barnabas Maidics created HIVE-23774:
---

 Summary: Reduce log level at aggrColStatsForPartitions in 
MetaStoreDirectSql.java
 Key: HIVE-23774
 URL: https://issues.apache.org/jira/browse/HIVE-23774
 Project: Hive
  Issue Type: Improvement
Reporter: Barnabas Maidics
Assignee: Barnabas Maidics


[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1589]

This log is not needed at INFO log level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23211) Fix metastore schema differences between init scripts, and upgrade scripts

2020-04-15 Thread Barnabas Maidics (Jira)
Barnabas Maidics created HIVE-23211:
---

 Summary: Fix metastore schema differences between init scripts, 
and upgrade scripts
 Key: HIVE-23211
 URL: https://issues.apache.org/jira/browse/HIVE-23211
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Barnabas Maidics
Assignee: Barnabas Maidics


There are some differences (character encoding, defaults etc..) in metastore 
schema if we initialize using the init scripts, or upgrade using the upgrade 
scripts. The schema should be identical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22976) Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 constraint

2020-03-04 Thread Barnabas Maidics (Jira)
Barnabas Maidics created HIVE-22976:
---

 Summary: Oracle and MSSQL upgrade script missing the addition of 
WM_RESOURCEPLAN_FK1 constraint
 Key: HIVE-22976
 URL: https://issues.apache.org/jira/browse/HIVE-22976
 Project: Hive
  Issue Type: Bug
Reporter: Barnabas Maidics
Assignee: Barnabas Maidics


The schema init script (hive-schema-3.1.3000) contains a constraint addition, 
which is missing from the upgrade scripts in oracle and mssql. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22037) HS2 should log when shutting down due to OOM

2019-07-24 Thread Barnabas Maidics (JIRA)
Barnabas Maidics created HIVE-22037:
---

 Summary: HS2 should log when shutting down due to OOM
 Key: HIVE-22037
 URL: https://issues.apache.org/jira/browse/HIVE-22037
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Barnabas Maidics
Assignee: Barnabas Maidics


Currently, if HS2 runs into OOM issue, ThreadPoolExecutorWithOomHook kicks in 
and runs oomHook, which will stop HS2. Everything happens without logging. In 
the log, you can only see, that HS2 stopped. 

We should log the stacktrace. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-10-17 Thread Barnabas Maidics (JIRA)
Barnabas Maidics created HIVE-20760:
---

 Summary: Reducing memory overhead due to multiple HiveConfs
 Key: HIVE-20760
 URL: https://issues.apache.org/jira/browse/HIVE-20760
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Barnabas Maidics
 Attachments: hiveconf_interned.html, hiveconf_original.html

The issue is that every Hive task has to load its own version of {{HiveConf}}. 
When running with a large number of cores per executor (HoS), there is a 
significant (~10%) amount of memory wasted due to this duplication. 

I looked into the problem and found a way to reduce the overhead caused by the 
multiple HiveConf objects.

I've created an implementation of Properties, somewhat similar to 
CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
this problem, because it drops the interned Properties right after we add a new 
property.

So my implementation looks like this:
 * When we create a new HiveConf from an existing one (copy constructor), we 
change the properties object stored by HiveConf to the new Properties 
implementation (HiveConfProperties). We have 2 possible way to do this. Either 
we change the visibility of the properties field in the ancestor class 
(Configuration which comes from hadoop) to protected, or a simpler way is to 
just change the type using reflection.
 * HiveConfProperties instantly intern the given properties. After this, every 
time we add a new property to HiveConf, we add it to an additional Properties 
object. This way if we create multiple HiveConf with the same base properties, 
they will use the same Properties object but each session/task can add its own 
unique properties.
 * Getting a property from HiveConfProperties would look like this: (I stored 
the non-interned properties in super class)

                String property=super.getProperty(key);
                if (property == null) property= interned.getProperty(key);
                return property;

Running some tests showed that the interning works (with 50 connections to 
HiveServer2, heapdumps created after sessions are created for queries): 

Overall memory:

         original: 34,599K              interned: 20,582K

Retained memory of HiveConfs:

        original: 16,366K               interned: 10,804K

I attach the JXray reports about the heapdumps.

What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)