[jira] [Created] (HIVE-24018) Review necessity of AggregationDesc#setGenericUDAFWritableEvaluator for bloom filter aggregations

2020-08-07 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-24018:
--

 Summary: Review necessity of 
AggregationDesc#setGenericUDAFWritableEvaluator for bloom filter aggregations
 Key: HIVE-24018
 URL: https://issues.apache.org/jira/browse/HIVE-24018
 Project: Hive
  Issue Type: Improvement
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Few places in the code have following pattern 
{code:java}
GenericUDAFBloomFilterEvaluator bloomFilterEval = new 
GenericUDAFBloomFilterEvaluator();
...
AggregationDesc bloom = new AggregationDesc("bloom_filter", bloomFilterEval, p, 
false, mode);
bloom.setGenericUDAFWritableEvaluator(bloomFilterEval);
{code}
where the bloom filter evaluator is passed in the constructor of the 
aggregation and  directly after using a setter. The use of the setter is 
necessary otherwise there are runtime failures of the query however the pattern 
is a bit confusing. 

Investigate if there is a way to avoid the double passing of the evaluator. 

To reproduce the failure remove the setter and run the following test.
{noformat}
mvn test -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=vectorized_dynamic_semijoin_reduction.q -Dtest.output.overwrite 
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24017) "CREATE VIEW" command failing with AlreadyExistsException

2020-08-07 Thread Rohit Saxena (Jira)
Rohit Saxena created HIVE-24017:
---

 Summary: "CREATE VIEW" command failing with AlreadyExistsException
 Key: HIVE-24017
 URL: https://issues.apache.org/jira/browse/HIVE-24017
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Spark
Affects Versions: 2.1.1
Reporter: Rohit Saxena


We are using our custom code to fire 50 concurrent hive view creation queries. 
The view name is randomly generated for each query and is always unique. We get 
this stacktrace which we do not get for low concurrency like 20. This is very 
puzzling, we tried following scenarios to avoid the error:
 # using "DROP VIEW IF EXISTS" then "CREATE VIEW IF NOT EXISTS"
 # using "DROP VIEW IF EXISTS" then "CREATE VIEW"
 # using "CREATE OR REPLACE VIEW"

We still get the "AlreadyExistsException". Seems like a concurrency/threading 
issue from either Hive or Spark. We are running our queries in SparkSQL which 
is internally invoking the Hive libraries. These are one of the query sequence 
from our custom code :

// [4] -> [USE default]

// [5] -> [DROP VIEW IF EXISTS 
`default`.`w2314287698276073922_generatedsource_46_view_102__m_sparkengine_alltx_allsrc`]
// [6] -> [CREATE VIEW IF NOT EXISTS 
`default`.`w2314287698276073922_generatedsource_46_view_102__m_sparkengine_alltx_allsrc`
 (`a0`, `a1`, `a2`, `a3`) AS SELECT CAST(CAST(5 * CAST(alias.c_custkey AS 
DECIMAL(18, 0)) AS DECIMAL(28, 0)) AS DECIMAL(18, 0)) as a0, alias.c_name as 
a1, alias.c_address as a2, alias.c_nationkey as a3 FROM 
default.dst_sanity_test_customer_hive alias WHERE (CAST(CAST(5 * 
CAST(alias.c_custkey AS DECIMAL(18, 0)) AS DECIMAL(28, 0)) AS DECIMAL(18, 0)) % 
3) = 0]

 

This is the stacktrace we are getting:-

 

_[com.informatica.sdk.dtm.ExecutionException: [SPARK_1003] Spark task 
[InfaSpark0] failed with the following error: [User class threw exception: 
java.lang.reflect.InvocationTargetException_
 _at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)_
 _at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)_
 _at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)_
 _at java.lang.reflect.Method.invoke(Method.java:498)_
 _at com.informatica.compiler.InfaSparkMain$.main(InfaSparkMain.scala:124)_
 _at com.informatica.compiler.InfaSparkMain.main(InfaSparkMain.scala)_
 _at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)_
 _at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)_
 _at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)_
 _at java.lang.reflect.Method.invoke(Method.java:498)_
 _at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)_
_Caused by: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
AlreadyExistsException(message:Table 
w2314287698276073922_generatedsource_46_view_102__m_sparkengine_alltx_allsrc 
already exists);_
 _at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:108)_
 _at 
org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:238)_
 _at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:102)_
 _at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:319)_
 _at 
org.apache.spark.sql.execution.command.CreateViewCommand.run(views.scala:175)_
 _at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)_
 _at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)_
 _at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)_
 _at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:196)_
 _at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:196)_
 _at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3384)_
 _at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)_
 _at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)_
 _at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)_
 _at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3383)_
 _at org.apache.spark.sql.Dataset.(Dataset.scala:196)_
 _at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:80)_
 _at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)_
 _at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)_
 _at com.informatica.exec.InfaSpark0$.main(InfaSpark0.scala:55)_
 _at com.informatica.exec.InfaSpark0.main(InfaSpark0.scala)_
 _... 11 more_
_Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
AlreadyExistsException(message:Table 

[jira] [Created] (HIVE-24016) Share bloom filter construction branch in multi column semijoin reducers

2020-08-07 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-24016:
--

 Summary: Share bloom filter construction branch in multi column 
semijoin reducers
 Key: HIVE-24016
 URL: https://issues.apache.org/jira/browse/HIVE-24016
 Project: Hive
  Issue Type: Improvement
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


In HIVE-21196, we add a transformation capable of merging single column 
semijoin reducers to multi column semijoin reducer.

Currently it transforms the subplan SB0 to subplan SB1.

+SB0+
{noformat}
  / RS -> TS_1[Editor] 
 / SEL[fname] - GB - RS - GB -  RS -> TS_0[Author] 
 SOURCE 
 \ SEL[lname] - GB - RS - GB -  RS -> TS_0[Author]
  \ RS -> TS_1[Editor]

TS_0[Author] - FIL[in_bloom(fname) ^ in_bloom(lname)]
TS_1[Editor] - FIL[in_bloom(fname) ^ in_bloom(lname)]  
{noformat}

+SB1+
{noformat}
 / SEL[fname,lname] - GB - RS - GB - RS -> TS[Author] - 
FIL[in_bloom(hash(fname,lname))]
 SOURCE  
 \ SEL[fname,lname] - GB - RS - GB - RS -> TS[Editor] - 
FIL[in_bloom(hash(fname,lname))]
{noformat}

Observe that in SB1 we could share the common path that creates the bloom 
filter (SEL - GB - RS -GB) to obtain a plan like SB2.

+SB2+
{noformat}
   / RS -> TS[Author] - 
FIL[in_bloom(hash(fname,lname))]
 SOURCE - SEL[fname,lname] - GB - RS - GB -
   \ RS -> TS[Editor] - 
FIL[in_bloom(hash(fname,lname))]
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24015) Disable query-based compaction on MR execution engine

2020-08-07 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-24015:


 Summary: Disable query-based compaction on MR execution engine
 Key: HIVE-24015
 URL: https://issues.apache.org/jira/browse/HIVE-24015
 Project: Hive
  Issue Type: Task
Reporter: Karen Coppage
Assignee: Karen Coppage


Major compaction can be run when the execution engine is MR. This can cause 
data loss a la HIVE-23703 (the fix for data loss when the execution engine is 
MR was reverted by HIVE-23763).
Currently minor compaction can only be run when the execution engine is Tez, 
otherwise it falls back to MR (non-query-based) compaction. We should extend 
this functionality to major compaction as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24014) Need to delete DumpDirectoryCleanerTask

2020-08-07 Thread Arko Sharma (Jira)
Arko Sharma created HIVE-24014:
--

 Summary: Need to delete DumpDirectoryCleanerTask
 Key: HIVE-24014
 URL: https://issues.apache.org/jira/browse/HIVE-24014
 Project: Hive
  Issue Type: Bug
Reporter: Arko Sharma
Assignee: Arko Sharma


With the newer implementation, every dump operation cleans up the  
dump-directory previously consumed by load operation. Hence, for a policy, at 
most only one dump directory will be there. Also, now dump directory base 
location config is policy level config and hence this DumpDirCleanerTask will 
not be effective.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)