[jira] [Created] (HIVE-19265) Potential NPE and hiding actual exception in Hive#copyFiles

2018-04-21 Thread Igor Kryvenko (JIRA)
Igor Kryvenko created HIVE-19265:


 Summary: Potential NPE and hiding actual exception in 
Hive#copyFiles
 Key: HIVE-19265
 URL: https://issues.apache.org/jira/browse/HIVE-19265
 Project: Hive
  Issue Type: Bug
Reporter: Igor Kryvenko
Assignee: Igor Kryvenko


{{In Hive#copyFiles}} we have such code
{code:java}
if (src.isDirectory()) {
try {
  files = srcFs.listStatus(src.getPath(), 
FileUtils.HIDDEN_FILES_PATH_FILTER);
} catch (IOException e) {
  pool.shutdownNow();
  throw new HiveException(e);
}
  }
{code}
If pool is null we will get NPE and actual cause will be lost.

Initializing of pool
{code:java}
final ExecutorService pool = 
conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname, 25) > 0 ?

Executors.newFixedThreadPool(conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname,
 25),
new 
ThreadFactoryBuilder().setDaemon(true).setNameFormat("Move-Thread-%d").build()) 
: null;
{code}
So in the case when the pool is not created we can get potential NPE and 
swallow an actual exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19264) Vectorization: Reenable vectorization in vector_adaptor_usage_mode.q

2018-04-21 Thread Matt McCline (JIRA)
Matt McCline created HIVE-19264:
---

 Summary: Vectorization: Reenable vectorization in 
vector_adaptor_usage_mode.q
 Key: HIVE-19264
 URL: https://issues.apache.org/jira/browse/HIVE-19264
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline
 Fix For: 3.0.0, 3.1.0


[~vihangk1] observed vectorization had accidentally been turned off.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19263) Improve ugly exception handling in HiveMetaStore

2018-04-21 Thread Igor Kryvenko (JIRA)
Igor Kryvenko created HIVE-19263:


 Summary: Improve ugly exception handling in HiveMetaStore
 Key: HIVE-19263
 URL: https://issues.apache.org/jira/browse/HIVE-19263
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Reporter: Igor Kryvenko
Assignee: Igor Kryvenko


In {{HiveMetaStore}} class we have a lot of  ugly exception handling code using 
which use {{instanceof}}
{code:java}
 catch (Exception e) {
ex = e;
if (e instanceof MetaException) {
  throw (MetaException) e;
} else if (e instanceof InvalidObjectException) {
  throw (InvalidObjectException) e;
} else if (e instanceof AlreadyExistsException) {
  throw (AlreadyExistsException) e;
} else {
  throw newMetaException(e);
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19262) empty array will be saved as NULL by insert into select

2018-04-21 Thread liupengcheng (JIRA)
liupengcheng created HIVE-19262:
---

 Summary: empty array will be saved as NULL by insert into select
 Key: HIVE-19262
 URL: https://issues.apache.org/jira/browse/HIVE-19262
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 0.13.1
Reporter: liupengcheng


Data is generated by MR parquet, and the data contains empty list.

When executing the following sql, the emtpy list col of the result is different 
from the original data.

`insert into table a as select * from b `
{code:java}
>select col1 from a where size(col1) = 0 limit 1;

 []// will show []

>insert into table b select col1 from a;
>select col1 from b;

 NULL  // will show NULL


{code}
I was wondering if we should return the same result as before, and should not 
change the data saved.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19261) Avro SerDe's InstanceCache should not be synchronized on retrieve

2018-04-21 Thread Fangshi Li (JIRA)
Fangshi Li created HIVE-19261:
-

 Summary: Avro SerDe's InstanceCache should not be synchronized on 
retrieve
 Key: HIVE-19261
 URL: https://issues.apache.org/jira/browse/HIVE-19261
 Project: Hive
  Issue Type: Improvement
Reporter: Fangshi Li
Assignee: Fangshi Li


In HIVE-16175, upstream made a patch to fix the thread safety issue in 
AvroSerDe's InstanceCache. This fix made the retrieve method in InstanceCache 
synchronized. While it should make InstanceCache thread-safe, adding 
synchronized on retrieve for the cache can be expensive in highly concurrent 
environment like Spark, as multiple threads need to be synchronized on entering 
the retrieve method.

We are proposing another way to fix this thread safety issue by making the 
underlying map of InstanceCache as ConcurrentHashMap. Ideally, we can use 
atomic computeIfAbsent in the retrieve method to avoid synchronizing the entire 
method.

While computeIfAbsent is only available on java 8 and java 7 is still supported 
in Hive,
 /we use a pattern to simulate the behavior of computeIfAbsent. In the future, 
we should move to computeIfAbsent when Hive requires java 8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)