Build failing on latest Apache master

2017-01-11 Thread Deepak Jaiswal
Hi all,

The build is failing on latest Apache master in itests directory. Please find 
attached the last few lines of build log.

Regards,
Deepak


[jira] [Created] (HIVE-15591) Hive can not use "," in quoted column name

2017-01-11 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-15591:
--

 Summary: Hive can not use "," in quoted column name
 Key: HIVE-15591
 URL: https://issues.apache.org/jira/browse/HIVE-15591
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15590) add separate spnego principal config for LLAP Web UI

2017-01-11 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-15590:
---

 Summary: add separate spnego principal config for LLAP Web UI
 Key: HIVE-15590
 URL: https://issues.apache.org/jira/browse/HIVE-15590
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-15590.patch





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15589) Remove redundant test from TestDbTxnManager.testHeartbeater

2017-01-11 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15589:


 Summary: Remove redundant test from 
TestDbTxnManager.testHeartbeater
 Key: HIVE-15589
 URL: https://issues.apache.org/jira/browse/HIVE-15589
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


Case 1 claims there's no delay for the heartbeat startup, but actually the 
logic is when delay is specified as 0, we will unconditionally set 
HiveConf.ConfVars.HIVE_TXN_TIMEOUT / 2 to be the delay. So this case 1 is not 
needed, as it's covered by case 2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15588) Vectorization: Defer deallocation of scratch columns in complex VectorExpressions like VectorUDFAdaptor, VectorUDFCoalesce, etc to prevent wrong reuse

2017-01-11 Thread Matt McCline (JIRA)
Matt McCline created HIVE-15588:
---

 Summary: Vectorization: Defer deallocation of scratch columns in 
complex VectorExpressions like VectorUDFAdaptor, VectorUDFCoalesce, etc to 
prevent wrong reuse
 Key: HIVE-15588
 URL: https://issues.apache.org/jira/browse/HIVE-15588
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical


Make sure we don't deallocate a scratch column too quickly and cause result 
corruption due to scratch column reuse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15587) Using ChangeManager to copy files in ReplCopyTask

2017-01-11 Thread Daniel Dai (JIRA)
Daniel Dai created HIVE-15587:
-

 Summary: Using ChangeManager to copy files in ReplCopyTask 
 Key: HIVE-15587
 URL: https://issues.apache.org/jira/browse/HIVE-15587
 Project: Hive
  Issue Type: Sub-task
  Components: repl
Reporter: Daniel Dai
Assignee: Daniel Dai


Currently ReplCopyTask copy files directly from source repo. The files in the 
source repo may have been dropped or change. We shall use checksum transferred 
to ReplCopyTask to verify. If different, retrieve file from cmroot instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Separate release of storage-api

2017-01-11 Thread Owen O'Malley
There was also the wiki page that I put together that was discusses on jira.

proposal:
https://cwiki.apache.org/confluence/display/Hive/Storage+API+Release+Proposal
discussion on jira: https://issues.apache.org/jira/browse/HIVE-15419

.. Owen

On Wed, Jan 11, 2017 at 9:36 AM, Alan Gates  wrote:

> 90% of the discussion on the previous thread was on version numbering,
> which never came to a conclusion.  Based on your RC candidate I assume you
> chose the separate version numbering, but starting from the same 2.2.0 base
> number.
>
> I agree this is the only viable option[1], but I wanted to point it out
> here to be clear we're choosing this rather than there being questions next
> time we release something on what the number should be.
>
> Alan.
>
> 1. Support for my statement that this is the only viable option:
> a) Tying together the version numbers of Hive proper and the storage API
> undoes exactly the independence we're trying to enable here.
> b) Lefty's concern that Sergey's solution will make a hash of the JIRAs is
> a deal breaker for that one.  It also will be harder to truly separate
> these in the future if Sergio is correct and this eventually evolves into a
> proper sub-project or separate project.
> c) Other projects with this same issue let their version numbers move
> independently (e.g. datanucleus)
> d) Our pom files will give us the mapping of how the versions fit together.
>
>
> > On Jan 1, 2017, at 10:19 AM, Owen O'Malley  wrote:
> >
> > Hi all,
> >   As we discussed back in
> > https://mid.mail-archive.com/dev@hive.apache.org/msg121112.html . I'd
> like
> > to make a release of storage-api. I've written up a proposal at
> > https://cwiki.apache.org/confluence/display/Hive/
> Storage+API+Release+Proposal
> > and the patch for HIVE-15419 is pretty close. Once HIVE-15419 is
> committed,
> > I'd like to cut a branch (eg storage-branch-2.2) and make a release
> > candidate (eg. storage-release-2.2.0rc0).
> >
> > Any concerns?
> >
> > Thanks,
> >   Owen
>
>


[jira] [Created] (HIVE-15586) Make Insert and Create statement Transactional

2017-01-11 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15586:
-

 Summary: Make Insert and Create statement Transactional
 Key: HIVE-15586
 URL: https://issues.apache.org/jira/browse/HIVE-15586
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently insert/create will return the handle to user without waiting for the 
data been loaded by the druid cluster. In order to avoid that will add a 
passive wait till the segment are loaded by historical in case the coordinator 
is UP.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 55392: HIVE-15469: Fix REPL DUMP/LOAD DROP_PTN so it works on non-string-ptn-key tables

2017-01-11 Thread Sushanth Sowmyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55392/#review161290
---




itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/TestDbNotificationListener.java
 (line 550)


This has minor clashes with issues.apache.org/jira/browse/HIVE-15365 , and 
easier to fix here after that goes in rather than there.

Instead of this code segment, we can use the following:

```java
DropPartitionMessage dropPtnMsg = 
md.getDropPartitionMessage(event.getMessage());
Table tableObj = dropPtnMsg.getTableObj();
// .. and the asserts can remain as-is.
```

Note that the first line is likely spurious as well if HIVE-15365 goes in, 
since it will create the dropPtnMsg here, so the only line needing changing is 
the line instantiating tableObj.

I can regenerate this patch post-HIVE-15365, not a problem.



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java
 (line 345)


One more post-HIVE-15365 comment. :)

run(..) followed by verifyResults(..) is being replaced by two methods:

verifyRun(.. , ..) or
verifySetup(.. , ..)

verifySetup is called in cases where you're still setting up the test, and 
verifying that your setup happened correctly. In this case, for instance, the 
run followed by verifyResults would be replaced by verifySetup instead.

verifyRun is called when running some command that we're interested in 
testing where the results showcase the functionality we're testing.

The idea is that in steady state, after we finish our initial development, 
we flip a switch, and all verifySetups don't do the additional verification 
step, whereas verifyRun still would.



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java
 (line 372)


still verifySetup case, as per prior comment.



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java
 (line 385)


still verifySetup, since we're testing that the source dropped the data 
correctly.



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java
 (line 415)


This is now a verifyRun, finally. :)


- Sushanth Sowmyan


On Jan. 10, 2017, 9:29 p.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55392/
> ---
> 
> (Updated Jan. 10, 2017, 9:29 p.m.)
> 
> 
> Review request for hive, Daniel Dai, Sushanth Sowmyan, and Thejas Nair.
> 
> 
> Bugs: HIVE-15469
> https://issues.apache.org/jira/browse/HIVE-15469
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-15469
> 
> 
> Diffs
> -
> 
>   
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/TestDbNotificationListener.java
>  4eabb24 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestReplicationScenarios.java
>  6b86080 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/DropPartitionMessage.java
>  26aecb3 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONDropPartitionMessage.java
>  b8ea224 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/messaging/json/JSONMessageFactory.java
>  2749371 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java 
> 85f8c64 
> 
> Diff: https://reviews.apache.org/r/55392/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>



Re: [VOTE] Should we release hive-storage 2.2.0RC0

2017-01-11 Thread Alan Gates
+1.  Did a build, checked the signatures, looked over the new LICENSE and 
NOTICE files, ran rat.

Alan.

> On Jan 5, 2017, at 9:39 AM, Owen O'Malley  wrote:
> 
> All,
>   I'd like to make a release of Hive's storage-api module. This will allow
> ORC to remove its fork of storage-api and make a release based on it. The
> RC is based on Hive's master branch as of this morning.
> 
> Artifacts:
>   tag: https://github.com/apache/hive/releases/tag/storage-release-2.2.0rc0
>   tar ball: http://home.apache.org/~omalley/hive-storage-2.2.0rc0/
>   release branch: https://github.com/apache/hive/tree/storage-branch-2.2
> 
> If you download the tag or the release branch, you'll need to go into
> storage-api to build it, because I disconnected it from the main hive build.
> 
> Should we release storage-api 2.2.0RC0?
> 
> Thanks,
>   Owen



Re: [DISCUSS] Separate release of storage-api

2017-01-11 Thread Alan Gates
90% of the discussion on the previous thread was on version numbering, which 
never came to a conclusion.  Based on your RC candidate I assume you chose the 
separate version numbering, but starting from the same 2.2.0 base number.

I agree this is the only viable option[1], but I wanted to point it out here to 
be clear we're choosing this rather than there being questions next time we 
release something on what the number should be.

Alan.

1. Support for my statement that this is the only viable option:
a) Tying together the version numbers of Hive proper and the storage API undoes 
exactly the independence we're trying to enable here.
b) Lefty's concern that Sergey's solution will make a hash of the JIRAs is a 
deal breaker for that one.  It also will be harder to truly separate these in 
the future if Sergio is correct and this eventually evolves into a proper 
sub-project or separate project.
c) Other projects with this same issue let their version numbers move 
independently (e.g. datanucleus)
d) Our pom files will give us the mapping of how the versions fit together.


> On Jan 1, 2017, at 10:19 AM, Owen O'Malley  wrote:
> 
> Hi all,
>   As we discussed back in
> https://mid.mail-archive.com/dev@hive.apache.org/msg121112.html . I'd like
> to make a release of storage-api. I've written up a proposal at
> https://cwiki.apache.org/confluence/display/Hive/Storage+API+Release+Proposal
> and the patch for HIVE-15419 is pretty close. Once HIVE-15419 is committed,
> I'd like to cut a branch (eg storage-branch-2.2) and make a release
> candidate (eg. storage-release-2.2.0rc0).
> 
> Any concerns?
> 
> Thanks,
>   Owen



[jira] [Created] (HIVE-15585) LLAP failed to start on a host with only 1 cp

2017-01-11 Thread Attila Magyar (JIRA)
Attila Magyar created HIVE-15585:


 Summary: LLAP failed to start on a host with only 1 cp
 Key: HIVE-15585
 URL: https://issues.apache.org/jira/browse/HIVE-15585
 Project: Hive
  Issue Type: Bug
  Components: llap
Affects Versions: 2.1.1
Reporter: Attila Magyar
Assignee: Attila Magyar


LLAP failed to start on a host with only 1 cpu. The number of thread was 
calculated by dividing the number of cpus with 2. This resulted zero if the cpu 
count was 1 and caused an IllegalArgumentException upon startup. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] hive pull request #131: LLAP failed to start on a host with only 1 cpu

2017-01-11 Thread zeroflag
GitHub user zeroflag opened a pull request:

https://github.com/apache/hive/pull/131

LLAP failed to start on a host with only 1 cpu

LLAP failed to start on a host with only 1 cpu. The number of thread was 
calculating by dividing the number of cpus with 2. This resulted zero if the 
cpu count was 1 and caused an IllegalArgumentException upon startup. Fixed this 
by adding Math.max(1, cpus/2).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zeroflag/hive master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #131


commit b00463f5228b459c1bd6d0e6022142cd2fa84952
Author: Attila Magyar 
Date:   2017-01-11T15:46:54Z

Thread pool is initialized with 0 nThreads that causes 
IllegalArgumenException while starting LLAP on a host with 1 cpu




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (HIVE-15584) Early bail out when we use CTAS and Druid source already exists

2017-01-11 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-15584:
--

 Summary: Early bail out when we use CTAS and Druid source already 
exists
 Key: HIVE-15584
 URL: https://issues.apache.org/jira/browse/HIVE-15584
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: slim bouguerra
Priority: Minor


If we create a Druid source from Hive with CTAS, but a Druid source with the 
same name already exists, we fail (as expected).

However, we bail out when the query for creating the query results has already 
been executed.

We should bail out earlier so we do not execute the query (and thus, launch the 
Tez job, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15583) CTAS query removes leading underscore from column names with CBO

2017-01-11 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-15583:
--

 Summary: CTAS query removes leading underscore from column names 
with CBO
 Key: HIVE-15583
 URL: https://issues.apache.org/jira/browse/HIVE-15583
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
Priority: Minor


L209 in PlanModifierForASTConv.java:

{code:java}
  if (colAlias.startsWith("_")) {
colAlias = colAlias.substring(1);
colAlias = getNewColAlias(newSelAliases, colAlias);
  }
{code}

I would like to explore if we can just remove this limitation.

For instance, due to this issue, when we create a table with Druid storage 
handler, we need to add an additional underscore for column names as Druid is 
expecting columns with a certain name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15582) Druid CTAS should support BYTE/SHORT/INT types

2017-01-11 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-15582:
--

 Summary: Druid CTAS should support BYTE/SHORT/INT types
 Key: HIVE-15582
 URL: https://issues.apache.org/jira/browse/HIVE-15582
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Currently these types are not recognized and we throw an exception when we try 
to create a table with them.

{noformat}
Caused by: org.apache.hadoop.hive.serde2.SerDeException: Unknown type: INT
at 
org.apache.hadoop.hive.druid.serde.DruidSerDe.serialize(DruidSerDe.java:414)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:715)
... 22 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15581) Unable to use advanced aggregation with multiple inserts clause

2017-01-11 Thread James Ball (JIRA)
James Ball created HIVE-15581:
-

 Summary: Unable to use advanced aggregation with multiple inserts 
clause
 Key: HIVE-15581
 URL: https://issues.apache.org/jira/browse/HIVE-15581
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.1
Reporter: James Ball


■Use Cases
- Use multiple insert clauses within a single query to insert multiple static 
(user-defined) partitions into a single table.
- Use advanced aggregation (cube) features within each insert clause to include 
subtotals of columns for each partition

■Expected Behaviour
- Subtotals are inserted for all combinations of the set of columns

■Observed Behaviour
- No subtotals are not inserted for any combination of the set of columns

■Sample Queries
{code:sql}
// Create test tables
create table if not exists
table1
(
column1 string,
column2 string,
column3 int
)
stored as orc
tblproperties
(
"orc.compress" = "SNAPPY"
);

create table if not exists
table2
(
column1 string,
column2 string,
column3 int
)
partitioned by
(
partition1 string
)
stored as orc
tblproperties
(
"orc.compress" = "SNAPPY"
);

create table if not exists
table3
(
column1 string,
column2 string,
column3 int
)
partitioned by
(
partition1 string
)
stored as orc
tblproperties
(
"orc.compress" = "SNAPPY"
);
{code}

{code:sql}
// Insert test values
insert overwrite table
table1
values
('value1', 'value1', 1),
('value2', 'value2', 1),
('value3', 'value3', 1);
{code}

{code:sql}
// Single insert clause with multiple inserts syntax
// Subtotals are inserted into target table
from
table1
insert overwrite table
table2
partition
(
partition1 = 'value1'
)
select
column1,
column2,
sum(column3) as column3
group by
column1,
column2
with cube;
{code}

{code:sql}
// Multiple insert clauses with multiple inserts syntax
// Subtotals are not inserted into target table
from
table1
insert overwrite table
table3
partition
(
partition1 = 'value1'
)
select
column1,
column2,
sum(column3) as column3
group by
column1,
column2
with cube
insert overwrite table
table3
partition
(
partition1 = 'value2'
)
select
column1,
column2,
sum(column3) as column3
group by
column1,
column2
with cube;
{code}

■Executions Plans
- Single insert clause with multiple inserts syntax
{noformat}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1
  Stage-2 depends on stages: Stage-0

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: table1
Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: column1 (type: string), column2 (type: string), 
column3 (type: int)
  outputColumnNames: column1, column2, column3
  Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE 
Column stats: NONE
  Group By Operator
aggregations: sum(column3)
keys: column1 (type: string), column2 (type: string), '0' 
(type: string)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 12 Data size: 2208 Basic stats: COMPLETE 
Column stats: NONE
Reduce Output Operator
  key expressions: _col0 (type: string), _col1 (type: string), 
_col2 (type: string)
  sort order: +++
  Map-reduce partition columns: _col0 (type: string), _col1 
(type: string), _col2 (type: string)
  Statistics: Num rows: 12 Data size: 2208 Basic stats: 
COMPLETE Column stats: NONE
  value expressions: _col3 (type: bigint)
  Reduce Operator Tree:
Group By Operator
  aggregations: sum(VALUE._col0)
  keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2