[jira] [Created] (HIVE-23743) hive-druid-handler shaded jar doesn't include maven-artifact classes

2020-06-22 Thread Jira
Hankó Gergely created HIVE-23743:


 Summary: hive-druid-handler shaded jar doesn't include 
maven-artifact classes
 Key: HIVE-23743
 URL: https://issues.apache.org/jira/browse/HIVE-23743
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: Hankó Gergely
Assignee: Nishant Bangarwa


hive-druid-handler depends on druid-processing jar that depends on classes from 
maven-artifact jar but these classes are not included in the shaded jar so the 
following Exception may occur:
{code:java}
java.lang.ClassNotFoundException: 
org.apache.maven.artifact.versioning.ArtifactVersion at
...
org.apache.hive.druid.org.apache.druid.query.ordering.StringComparators.(StringComparators.java:44)
 at 
org.apache.hive.druid.org.apache.druid.query.ordering.StringComparator.fromString(StringComparator.java:35)
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23744) Reduce query startup latency

2020-06-22 Thread Mustafa Iman (Jira)
Mustafa Iman created HIVE-23744:
---

 Summary: Reduce query startup latency
 Key: HIVE-23744
 URL: https://issues.apache.org/jira/browse/HIVE-23744
 Project: Hive
  Issue Type: Task
  Components: llap
Affects Versions: 4.0.0
Reporter: Mustafa Iman
Assignee: Mustafa Iman
 Attachments: am_schedule_and_transmit.png, task_start.png

When I run queries with large number of tasks for a single vertex, I see a 
significant delay before all tasks start execution in llap daemons. 

Although llap daemons have the free capacity to run the tasks, it takes a 
significant time to schedule all the tasks in AM and actually transmit them to 
executors.

"am_schedule_and_transmit" shows scheduling of tasks of tpcds query 55. It 
shows only the tasks scheduled for one of 10 llap daemons. The scheduler works 
in a single thread, scheduling tasks one by one. A delay in scheduling of one 
task, delays all the tasks.

!am_schedule_and_transmit.png|width=831,height=573!

 

Another issue is that it takes long time to fill all the execution slots in 
llap daemons even though they are all empty initially. This is caused by 
LlapTaskCommunicator using a fixed number of threads (10 by default) to send 
the tasks to daemons. Also this communication is synchronized so these threads 
block communication staying idle. "task_start.png" shows running tasks on an 
llap daemon that has 12 execution slots. By the time 12th task starts running, 
more than 100ms already passes. That slot stays idle all this time. 

!task_start.png|width=1166,height=635!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23747) Increase the number of parallel tasks sent to daemons from am

2020-06-22 Thread Mustafa Iman (Jira)
Mustafa Iman created HIVE-23747:
---

 Summary: Increase the number of parallel tasks sent to daemons 
from am
 Key: HIVE-23747
 URL: https://issues.apache.org/jira/browse/HIVE-23747
 Project: Hive
  Issue Type: Sub-task
Reporter: Mustafa Iman
Assignee: Mustafa Iman


The number of inflight tasks from AM to a single executor is hardcoded to 1 
currently([https://github.com/apache/hive/blob/master/llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java#L57]
 ). It does not make sense to increase this right now as communication between 
am and daemons happen synchronously anyway. After resolving 
https://issues.apache.org/jira/browse/HIVE-23746 this must be increased to at 
least number of execution slots per daemon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23746) Send task attempts async from AM to daemons

2020-06-22 Thread Mustafa Iman (Jira)
Mustafa Iman created HIVE-23746:
---

 Summary: Send task attempts async from AM to daemons
 Key: HIVE-23746
 URL: https://issues.apache.org/jira/browse/HIVE-23746
 Project: Hive
  Issue Type: Sub-task
  Components: llap
Reporter: Mustafa Iman
Assignee: Mustafa Iman


LlapTaskCommunicator uses sync client to send task attempts. There are fixed 
number of communication threads (10 by default). This causes unneccessary 
delays when there are enough free execution slots in daemons but they do not 
receive all the tasks because of this bottleneck. LlapTaskCommunicator can use 
an async client to pass these tasks to daemons. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23745) Avoid copying userpayload in task communicator

2020-06-22 Thread Mustafa Iman (Jira)
Mustafa Iman created HIVE-23745:
---

 Summary: Avoid copying userpayload in task communicator
 Key: HIVE-23745
 URL: https://issues.apache.org/jira/browse/HIVE-23745
 Project: Hive
  Issue Type: Sub-task
Reporter: Mustafa Iman
Assignee: Mustafa Iman


[https://github.com/apache/hive/blob/master/llap-common/src/java/org/apache/hadoop/hive/llap/tez/Converters.java#L182]
 I see this copy take a few milliseconds sometimes. Delay here adds up for all 
tasks of a single vertex in LlapTaskCommunicator as it processes tasks one by 
one. User payload never changes in this codepath. Copy is made because of 
limitations of Protobuf library. Protobuf adds a UnsafeByteOperations class 
that avoid copying of ByteBuffers in 3.1 version. This can be resolved when 
Protobuf is upgraded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-22 Thread wanguangping (Jira)
wanguangping created HIVE-23748:
---

 Summary: tez task with File Merge operator generate tmp file with 
wrong suffix
 Key: HIVE-23748
 URL: https://issues.apache.org/jira/browse/HIVE-23748
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 3.1.0
Reporter: wanguangping


h1. background
 * SQL on TEZ 
 * it's a Occasional problem

h1. hiveserver2 log

[^hiveserver2 log.txt]

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23749) Improve Hive Hook Documentation

2020-06-22 Thread liuyan (Jira)
liuyan created HIVE-23749:
-

 Summary: Improve Hive Hook Documentation 
 Key: HIVE-23749
 URL: https://issues.apache.org/jira/browse/HIVE-23749
 Project: Hive
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.0.0
Reporter: liuyan


h5. Seems we have little documentation/examples arounnd 

_"org.apache.hadoop.hive.ql.hooks"_ on how to develop a hook and use it with 

hive.exec.post.hooks
hive.exec.pre.hooks
hive.exec.failure.hooks
hive.query.lifetime.hooks

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23738) DBLockManager::lock() : Move lock request to debug level

2020-06-22 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-23738:
---

 Summary: DBLockManager::lock() : Move lock request to debug level
 Key: HIVE-23738
 URL: https://issues.apache.org/jira/browse/HIVE-23738
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java#L102]

 

For Q78 @30TB scale, it ends up dumping couple of MBs of log in info level to 
print the lock request type. If possible, this should be moved to debug level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23736) Disable topn in ReduceSinkOp if a TNK is introduced

2020-06-22 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23736:
-

 Summary: Disable topn in ReduceSinkOp if a TNK is introduced
 Key: HIVE-23736
 URL: https://issues.apache.org/jira/browse/HIVE-23736
 Project: Hive
  Issue Type: Improvement
  Components: Physical Optimizer
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Both the Reduce Sink and TopNKey operator has Top n key filtering 
functionality. If TNK is introduced this functionality is done twice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-06-22 Thread Syed Shameerur Rahman (Jira)
Syed Shameerur Rahman created HIVE-23737:


 Summary: LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle 
Handler Instead Of LLAP's dagDelete
 Key: HIVE-23737
 URL: https://issues.apache.org/jira/browse/HIVE-23737
 Project: Hive
  Issue Type: Improvement
 Environment: *strong text*
Reporter: Syed Shameerur Rahman
Assignee: Syed Shameerur Rahman


LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez have 
added support for dagDelete in custom shuffle handler (TEZ-3362) we could 
re-use that feature in LLAP. 
There are some added advantages of using Tez's dagDelete feature rather than 
the current LLAP's dagDelete feature.

1) We can easily extend this feature to accommodate the upcoming features such 
as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 and 
TEZ-4129

2) It will be more easier to maintain this feature by separating it out from 
the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23741) Store CacheTags in the file cache level

2020-06-22 Thread Antal Sinkovits (Jira)
Antal Sinkovits created HIVE-23741:
--

 Summary: Store CacheTags in the file cache level
 Key: HIVE-23741
 URL: https://issues.apache.org/jira/browse/HIVE-23741
 Project: Hive
  Issue Type: Improvement
Reporter: Antal Sinkovits
Assignee: Antal Sinkovits


CacheTags are currently stored for every data buffer. The strings are 
internalized, but the number of cache tag objects can be reduced by moving them 
to the file cache level, and back referencing them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23739) ShuffleHandler: Unordered partitioned data could be evicted immediately after transfer

2020-06-22 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-23739:
---

 Summary: ShuffleHandler: Unordered partitioned data could be 
evicted immediately after transfer
 Key: HIVE-23739
 URL: https://issues.apache.org/jira/browse/HIVE-23739
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


[https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L1019]

Not optimal for unordered partitioned, as it ends up evicting the data 
immediately. (E.g Q78)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23740) [Hive]delete from ; without where clause not giving correct error msg

2020-06-22 Thread ABHISHEK KUMAR GUPTA (Jira)
ABHISHEK KUMAR GUPTA created HIVE-23740:
---

 Summary: [Hive]delete from ; without where clause not 
giving correct error msg
 Key: HIVE-23740
 URL: https://issues.apache.org/jira/browse/HIVE-23740
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.1.0
Reporter: ABHISHEK KUMAR GUPTA


Created Hive table from Hive
Inserted data 
and fire delete from ;
CREATE TABLE insert_only (key int, value string) STORED AS ORC
 TBLPROPERTIES ("transactional"="true", 
"transactional_properties"="insert_only");
 INSERT INTO insert_only VALUES (13,'BAD'), (14,'SUCCESS');
 delete from insert_only;

Error throws:
Error: Error while compiling statement: FAILED: SemanticException [Error 
10414]: Attempt to do update or delete on table hive.insert_only that is 
insert-only transactional (state=42000,code=10414)

Expectation:
Should throw as where clause is missing because to delete all the content of 
table hive provides truncate table ;




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Hive TPC-DS metastore dumps in Postgres

2020-06-22 Thread Stamatis Zampetakis
Hey guys,

I put up a small project on GitHub [1] with Hive metastore dumps from
tpcds10tb/tpcds30tb (+partitioning) and some scripts to quickly spin up a
dockerized Postgres with those loaded.

Personally, I find it useful to check the plans of TPC-DS queries using the
usual qtest mechanism (without external tools and tapping into a real
cluster) having at hand beefy stats + partitioning info. The driver and
other changes needed to run these tests are located in [2].

I am sharing it here in case it might be of use to somebody else.

The two main commands that you will need if you wanna try this out:
docker build --tag postgres-tpcds-metastore:1.0 .
mvn test -Dtest=TestTezPerfDBCliDriver -Dtest.output.overwrite=true
-Dtest.metastore.db=postgres.tpcds

Small caveat: Currently in [2] the dockerized postgres is restarted for
every query which makes things slow. This will be fixed later on.

Best,
Stamatis

[1] https://github.com/zabetak/hive-postgres-metastore
[2] https://github.com/zabetak/hive/tree/qtest_postgres_driver


[jira] [Created] (HIVE-23742) Remove unintentional execution of TPC-DS query39 in qtests

2020-06-22 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-23742:
--

 Summary: Remove unintentional execution of TPC-DS query39 in qtests
 Key: HIVE-23742
 URL: https://issues.apache.org/jira/browse/HIVE-23742
 Project: Hive
  Issue Type: Task
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


TPC-DS queries under clientpositive/perf are meant only to check plan 
regressions so they should never be really executed thus the execution part 
should be removed from query39.q and cbo_query39.q



--
This message was sent by Atlassian Jira
(v8.3.4#803005)