[jira] [Created] (HIVE-20091) Tez: Add security credentials for FileSinkOperator output

2018-07-04 Thread Matt McCline (JIRA)
Matt McCline created HIVE-20091:
---

 Summary: Tez: Add security credentials for FileSinkOperator output
 Key: HIVE-20091
 URL: https://issues.apache.org/jira/browse/HIVE-20091
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


DagUtils needs to add security credentials for the output for the 
FileSinkOperator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20090) Extend creation of semijoin reduction filters to be able to discover new opportunities

2018-07-04 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-20090:
--

 Summary: Extend creation of semijoin reduction filters to be able 
to discover new opportunities
 Key: HIVE-20090
 URL: https://issues.apache.org/jira/browse/HIVE-20090
 Project: Hive
  Issue Type: Improvement
  Components: Physical Optimizer
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Assume the following plan:
{noformat}
TS[0] - RS[1] - JOIN[4] - RS[5] - JOIN[8] - FS[9]
TS[2] - RS[3] - JOIN[4] 
TS[6] - RS[7] - JOIN[8]
{noformat}

Currently, {{TS\[6\]}} may only be reduced with the output of {{RS\[5\]}}, 
i.e., input to join between both subplans.
However, it may be useful to consider other possibilities too, e.g., reduced by 
the output of {{RS\[1\]}} or {{RS\[3\]}}. For instance, this is important when, 
given a large plan, an edge between {{RS[5]}} and {{TS[0]}} would create a 
cycle, while an edge between {{RS[1]}} and {{TS[6]}} would not.

This patch comprises two parts. First, it creates additional predicates when 
possible. Secondly, it removes duplicate semijoin reduction 
branches/predicates, e.g., if another semijoin that consumes the output of the 
same expression already reduces a certain table scan operator (heuristic, since 
this may not result in most efficient plan in all cases). Ultimately, the 
decision on whether to use one or another should be cost-driven (follow-up).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20089) CTAS doesn't work into nonexisting /tmp/... directory while CT works

2018-07-04 Thread Laszlo Bodor (JIRA)
Laszlo Bodor created HIVE-20089:
---

 Summary: CTAS doesn't work into nonexisting /tmp/... directory 
while CT works
 Key: HIVE-20089
 URL: https://issues.apache.org/jira/browse/HIVE-20089
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Laszlo Bodor


While checking negative qtests I've found some strange behavior according to CT 
and CTAS statements.

ct_noperm_loc.q
ctas_noperm_loc.q

The common part these tests are initialization:
{code}

set hive.test.authz.sstd.hs2.mode=true;

set 
hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;

set 
hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator;

set hive.security.authorization.enabled=true;

set user.name=user1;
{code}

 

But while simple 'create table' works to a nonexisting dir...
{code}
create table foo0(id int) location 'hdfs:///tmp/ct_noperm_loc_foo0';
{code}

...'create table as select' doesn't work:
{code}
create table foo0 location 'hdfs:///tmp/ctas_noperm_loc_foo0' as select 1 as c1;
{code}

expected result is:
{code}
FAILED: HiveAccessControlException Permission denied: Principal [name=user1, 
type=USER] does not have following privileges for operation 
CREATETABLE_AS_SELECT [[INSERT, DELETE] on Object [type=DFS_URI, 
name=hdfs://### HDFS PATH ###]]
{code}

 

Is it by design, am I missing something here?

 

{code}
mvn test -Dtest=TestNegativeMinimrCliDriver -Dqfile=ct_noperm_loc.q 
-Pitests,hadoop-2 -pl itests/qtest
mvn test -Dtest=TestNegativeMinimrCliDriver -Dqfile=ctas_noperm_loc.q 
-Pitests,hadoop-2 -pl itests/qtest
{code}

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20088) Beeline config location path is assembled incorrectly

2018-07-04 Thread Denes Bodo (JIRA)
Denes Bodo created HIVE-20088:
-

 Summary: Beeline config location path is assembled incorrectly
 Key: HIVE-20088
 URL: https://issues.apache.org/jira/browse/HIVE-20088
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 3.0.0
Reporter: Denes Bodo
Assignee: Denes Bodo


Checking the code in

[https://github.com/apache/hive/blob/branch-3/beeline/src/java/org/apache/hive/beeline/hs2connection/UserHS2ConnectionFileParser.java]
or in
[https://github.com/apache/hive/blob/branch-3/beeline/src/java/org/apache/hive/beeline/hs2connection/BeelineSiteParser.java]
I see {code}locations.add(ETC_HIVE_CONF_LOCATION + 
DEFAULT_BEELINE_SITE_FILE_NAME);{code}
whee file separator shall be used:
{code}locations.add(ETC_HIVE_CONF_LOCATION + File.separator + 
DEFAULT_BEELINE_SITE_FILE_NAME);{code}
Due to this, BeeLine cannot use configuration in case if this would be the only 
way.

In my hadoop-3 setup, the locations list contains the following:
{code}
/home/myuser/.beeline/beeline-site.xml
/etc/hive/confbeeline-site.xml
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20087) Fix reoptimization for semijoin reduction cases

2018-07-04 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-20087:
---

 Summary: Fix reoptimization for semijoin reduction cases
 Key: HIVE-20087
 URL: https://issues.apache.org/jira/browse/HIVE-20087
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


The real TS will get further info about the other table; which makes the 
physically read record count inaccurate..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20086) Druid-hive kafka ingestion: indexing tasks kept running even after setting 'druid.kafka.ingestion' = 'STOP'

2018-07-04 Thread Dileep Kumar Chiguruvada (JIRA)
Dileep Kumar Chiguruvada created HIVE-20086:
---

 Summary: Druid-hive kafka ingestion: indexing tasks kept running 
even after setting 'druid.kafka.ingestion' = 'STOP'
 Key: HIVE-20086
 URL: https://issues.apache.org/jira/browse/HIVE-20086
 Project: Hive
  Issue Type: Bug
  Components: Hive, StorageHandler
Affects Versions: 3.0.0
Reporter: Dileep Kumar Chiguruvada
 Fix For: 3.0.0
 Attachments: Screen Shot 2018-07-02 at 8.51.10 PM.png

Druid-hive kafka ingestion: indexing tasks kept running even after setting 
'druid.kafka.ingestion' = 'STOP'.

when ingestion started( 'druid.kafka.ingestion' = 'START') the indexing task 
start running and could able to load rows into Druid-Hive table.

But after stopping it still the indexing task kept running without getting down 
gracefully.

The issue is for every START of ingestion this will pool up multiple indexing 
tasks.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20085) Druid-Hive (managed) table creation fails with strict managed table checks: Table is marked as a managed table but is not transactional

2018-07-04 Thread Dileep Kumar Chiguruvada (JIRA)
Dileep Kumar Chiguruvada created HIVE-20085:
---

 Summary: Druid-Hive (managed) table creation fails with strict 
managed table checks: Table is marked as a managed table but is not 
transactional
 Key: HIVE-20085
 URL: https://issues.apache.org/jira/browse/HIVE-20085
 Project: Hive
  Issue Type: Bug
  Components: Hive, StorageHandler
Affects Versions: 3.0.0
Reporter: Dileep Kumar Chiguruvada
 Fix For: 3.0.0


Druid-Hive (managed) table creation fails with strict managed table checks: 
Table is marked as a managed table but is not transactional

{code}
drop table if exists calcs;
create table calcs
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES (
"druid.segment.granularity" = "MONTH",
"druid.query.granularity" = "DAY")
AS SELECT
cast(datetime0 as timestamp with local time zone) `__time`,
key,
str0, str1, str2, str3,
date0, date1, date2, date3,
time0, time1,
datetime0, datetime1,
zzz,
cast(bool0 as string) bool0,
cast(bool1 as string) bool1,
cast(bool2 as string) bool2,
cast(bool3 as string) bool3,
int0, int1, int2, int3,
num0, num1, num2, num3, num4
from tableau_orc.calcs;

2018-07-03 04:57:31,911|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : Status: Running 
(Executing on YARN cluster with App id application_1530592209763_0009)
...
...
2018-07-03 04:57:36,334|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : SHUFFLE_BYTES_TO_MEM: 0
2018-07-03 04:57:36,334|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : SHUFFLE_PHASE_TIME: 330
2018-07-03 04:57:36,334|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : SPILLED_RECORDS: 17
2018-07-03 04:57:36,334|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : 
TaskCounter_Reducer_2_OUTPUT_out_Reducer_2:
2018-07-03 04:57:36,335|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : OUTPUT_RECORDS: 0
2018-07-03 04:57:36,335|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : 
org.apache.hadoop.hive.llap.counters.LlapWmCounters:
2018-07-03 04:57:36,335|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : GUARANTEED_QUEUED_NS: 0
2018-07-03 04:57:36,335|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : GUARANTEED_RUNNING_NS: 0
2018-07-03 04:57:36,335|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : SPECULATIVE_QUEUED_NS: 
2162643606
2018-07-03 04:57:36,335|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : SPECULATIVE_RUNNING_NS: 
12151664909
2018-07-03 04:57:36,335|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : Starting task 
[Stage-2:DEPENDENCY_COLLECTION] in serial mode
2018-07-03 04:57:36,335|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : Starting task 
[Stage-0:MOVE] in serial mode
2018-07-03 04:57:36,336|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : Moving data to 
directory 
hdfs://mycluster/warehouse/tablespace/managed/hive/druid_tableau.db/calcs from 
hdfs://mycluster/warehouse/tablespace/managed/hive/druid_tableau.db/.hive-staging_hive_2018-07-03_04-57-27_351_7124633902209008283-3/-ext-10002
2018-07-03 04:57:36,336|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : Starting task 
[Stage-4:DDL] in serial mode
2018-07-03 04:57:36,336|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|ERROR : FAILED: Execution 
Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
MetaException(message:Table druid_tableau.calcs failed strict managed table 
checks due to the following reason: Table is marked as a managed table but is 
not transactional.)
2018-07-03 04:57:36,336|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|INFO : Completed executing 
command(queryId=hive_20180703045727_c39c40d2-7d4a-46c7-a36d-7925e7c4a788); Time 
taken: 6.794 seconds
2018-07-03 04:57:36,337|INFO|Thread-721|machine.py:111 - 
tee_pipe()||aa121a45-29eb-48a8-8628-ae5368aa172d|Error: Error while processing 
statement: FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Table 
druid_tableau.calcs failed strict managed table checks due to the following 
reason: Table is marked as a managed table but is not transactional.) 
(state=08S01,code=1)
{code}

This will not allow druid tables to be managed.

So its not direct to create Druid tables.

while trying to modify things to external tables..we see below issues
1)