date:20150616


 [ 
https://issues.apache.org/jira/browse/HIVE-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-9511:
---
Summary: Switch Tez to 0.6.1  (was: Switch Tez to 0.6.0)

 Switch Tez to 0.6.1
 ---

 Key: HIVE-9511
 URL: https://issues.apache.org/jira/browse/HIVE-9511
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Reporter: Damien Carol
Assignee: Damien Carol
 Attachments: HIVE-9511.2.patch, HIVE-9511.3.patch.txt, 
 HIVE-9511.patch.txt


 Tez 0.6.0 has been released.
 Research to switch to version 0.6.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9511) Switch Tez to 0.6.1


 [ 
https://issues.apache.org/jira/browse/HIVE-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-9511:
---
Component/s: Tez

 Switch Tez to 0.6.1
 ---

 Key: HIVE-9511
 URL: https://issues.apache.org/jira/browse/HIVE-9511
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Reporter: Damien Carol
Assignee: Damien Carol
 Attachments: HIVE-9511.2.patch, HIVE-9511.3.patch.txt, 
 HIVE-9511.patch.txt


 Tez 0.6.1 has been released.
 Research to switch to version 0.6.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9511) Switch Tez to 0.6.1


 [ 
https://issues.apache.org/jira/browse/HIVE-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-9511:
---
Description: 
Tez 0.6.1 has been released.
Research to switch to version 0.6.1

  was:
Tez 0.6.0 has been released.
Research to switch to version 0.6.0


 Switch Tez to 0.6.1
 ---

 Key: HIVE-9511
 URL: https://issues.apache.org/jira/browse/HIVE-9511
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Reporter: Damien Carol
Assignee: Damien Carol
 Attachments: HIVE-9511.2.patch, HIVE-9511.3.patch.txt, 
 HIVE-9511.patch.txt


 Tez 0.6.1 has been released.
 Research to switch to version 0.6.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9511) Switch Tez to 0.6.1


 [ 
https://issues.apache.org/jira/browse/HIVE-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-9511:
---
Attachment: HIVE-9511.4.patch.txt

Updated to TEZ 0.6.1.

 Switch Tez to 0.6.1
 ---

 Key: HIVE-9511
 URL: https://issues.apache.org/jira/browse/HIVE-9511
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Reporter: Damien Carol
Assignee: Damien Carol
 Attachments: HIVE-9511.2.patch, HIVE-9511.3.patch.txt, 
 HIVE-9511.4.patch.txt, HIVE-9511.patch.txt


 Tez 0.6.1 has been released.
 Research to switch to version 0.6.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-06-16 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-7018:
---
Attachment: HIVE-7018.5.patch

The scripts are called in sequence, I should not put same script in both
upgrade-1.2.0-to-1.3.0.mysql.sql
and 
upgrade-1.2.0-to-2.0.0.mysql.sql


 Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
 not others
 -

 Key: HIVE-7018
 URL: https://issues.apache.org/jira/browse/HIVE-7018
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Yongzhi Chen
 Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch, HIVE-7018.3.patch, 
 HIVE-7018.4.patch, HIVE-7018.5.patch


 It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
 column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11018) Turn on cbo in more q files


 [ 
https://issues.apache.org/jira/browse/HIVE-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11018:

Attachment: HIVE-11018.1.patch

Reupload to trigger QA run.

 Turn on cbo in more q files
 ---

 Key: HIVE-11018
 URL: https://issues.apache.org/jira/browse/HIVE-11018
 Project: Hive
  Issue Type: Task
  Components: Tests
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11018.1.patch, HIVE-11018.patch


 There are few tests in which cbo was turned off for various reasons. Those 
 reasons don't exists anymore. For those tests, we should turn on cbo. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8329) Enable postgres for storing stats


 [ 
https://issues.apache.org/jira/browse/HIVE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol resolved HIVE-8329.

Resolution: Won't Fix

Some JIRAs added postgres script to testing infra.

 Enable postgres for storing stats
 -

 Key: HIVE-8329
 URL: https://issues.apache.org/jira/browse/HIVE-8329
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
 Attachments: HIVE-8329.1.patch, HIVE-8329.1.patch, HIVE-8329.1.patch


 Simple patch to enable postgresql as JDBC publisher for statistics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11023) Disable directSQL if datanucleus.identifierFactory = datanucleus2

[
https://issues.apache.org/jira/browse/HIVE-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sushanth Sowmyan updated HIVE-11023:

Priority: Critical (was: Major)

Disable directSQL if datanucleus.identifierFactory = datanucleus2
-

Key: HIVE-11023
URL: https://issues.apache.org/jira/browse/HIVE-11023
Project: Hive
Issue Type: Bug
Components: Metastore
Affects Versions: 1.3.0, 1.2.1, 2.0.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Critical

We hit an interesting bug in a case where datanucleus.identifierFactory =
datanucleus2 .
The problem is that directSql handgenerates SQL strings assuming
datanucleus1 naming scheme. If a user has their metastore JDO managed by
datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate
are incorrect.
One simple example of what this results in is the following: whenever DN
persists a field which is held as a ListT, it winds up storing each T as a
separate line in the appropriate mapping table, and has a column called
INTEGER_IDX, which holds the position in the list. Then, upon reading, it
automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which
results in the list retaining its order. In DN2 naming scheme, the column is
called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool
upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX
and IDX.
Whenever they use JDO, such as with all writes, it will then use the IDX
field, and when they do any sort of optimized reads, such as through
directSQL, it will ORDER BY INTEGER_IDX.
An immediate danger is seen when we consider that the schema of a table is
stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX
will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch
schema for the table can come up mixed up in the table's native hashing
order, rather than sorted by the index.
This can then result in schema ordering being different from the actual
table. For eg:, if a user has a (a:int,b:string,c:string), a describe on this
may return (c:string, a:int, b: string), and thus, queries which are
inserting after selecting from another table can have ClassCastExceptions
when trying to insert data in the wong order - this is how we discovered this
bug. This problem, however, can be far worse, if there are no type problems -
it is possible, for eg., that if a,bc were all strings, that that insert
query would succeed but mix up the order, which then results in user table
data being mixed up. This has the potential to be very bad.
We should write a tool to help convert metastores that use datanucleus2 to
datanucleus1(more difficult, needs more one-time testing) or change
directSql to support both(easier to code, but increases test-coverage matrix
significantly and we should really then be testing against both schemes). But
in the short term, we should disable directSql if we see that the
identifierfactory is datanucleus2

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-10999:
---
Attachment: HIVE-10999.1-spark.patch

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, 
 HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-10999:
---
Attachment: (was: HIVE-10999.1-spark.patch)

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, 
 HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8133) Support Postgres via DirectSQL


[ 
https://issues.apache.org/jira/browse/HIVE-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588121#comment-14588121
 ] 

Damien Carol commented on HIVE-8133:


As we switched to MySQL backed metastore in our production cluster, I can't 
continue to work on this one.

 Support Postgres via DirectSQL
 --

 Key: HIVE-8133
 URL: https://issues.apache.org/jira/browse/HIVE-8133
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Damien Carol





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-10999:
---
Attachment: (was: HIVE-10999.1-spark.patch)

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9853) Bad version tested in org/apache/hive/hcatalog/templeton/TestWebHCatE2e.java


[ 
https://issues.apache.org/jira/browse/HIVE-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588113#comment-14588113
 ] 

Damien Carol commented on HIVE-9853:


[~laurent.gay] As nobody want to review the patch, do you mind if I close this 
one? Are you still blocked?

 Bad version tested in org/apache/hive/hcatalog/templeton/TestWebHCatE2e.java
 

 Key: HIVE-9853
 URL: https://issues.apache.org/jira/browse/HIVE-9853
 Project: Hive
  Issue Type: Test
Affects Versions: 1.0.0
Reporter: Laurent GAY
Assignee: Damien Carol
 Attachments: correct_version_test.patch


 The test getHiveVersion in class 
 org.apache.hive.hcatalog.templeton.TestWebHCatE2e check bad format of version.
 It checks 0.[0-9]+.[0-9]+.* and not 1.[0-9]+.[0-9]+.*
 This test is failed for hive, tag release-1.0.0
 I propose a patch to correct it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8133) Support Postgres via DirectSQL


 [ 
https://issues.apache.org/jira/browse/HIVE-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-8133:
---
Assignee: (was: Damien Carol)

 Support Postgres via DirectSQL
 --

 Key: HIVE-8133
 URL: https://issues.apache.org/jira/browse/HIVE-8133
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-10999:
---
Attachment: HIVE-10999.1-spark.patch

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, 
 HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10991) CBO: Calcite Operator To Hive Operator (Calcite Return Path): NonBlockingOpDeDupProc did not kick in rcfile_merge2.q


[ 
https://issues.apache.org/jira/browse/HIVE-10991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588258#comment-14588258
 ] 

Ashutosh Chauhan commented on HIVE-10991:
-

+1

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): 
 NonBlockingOpDeDupProc did not kick in rcfile_merge2.q
 

 Key: HIVE-10991
 URL: https://issues.apache.org/jira/browse/HIVE-10991
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-10991.patch


 NonBlockingOpDeDupProc did not kick in rcfile_merge2.q in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11022) Support collecting lists in user defined order

2015-06-16 Thread Michael Haeusler (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Haeusler updated HIVE-11022:

Description: 
Hive currently supports aggregation of lists in order of input rows with the 
UDF collect_list. Unfortunately, the order is not well defined when map-side 
aggregations are used.

Hive could support collecting lists in user-defined order by providing a UDF
COLLECT_LIST_SORTED(valueColumn, sortColumn[, limit]), that would return a list 
of values sorted in a user defined order. An optional limit parameter can 
restrict this to the n first values within that order.

Especially in the limit case, this can be efficiently pre-aggregated and 
reduces the amount of data transferred to reducers.

  was:
Hive currently supports aggregation of lists in order of input rows with the 
UDF collect_list. Unfortunately, the order is not well defined when map-side 
aggregations are used.

Hive could support collecting lists in user-defined order by providing a UDF
COLLECT_LIST_SORTED(valueColumn, sortColumn[, limit]), that would return a list 
of values sorted in a user defined order. An optional limit parameter can 
restrict this to the n first values within that order.

Especially in the limit case, this can be efficiently pre-aggregated and reduce 
the amount of data transferred to reducers.


 Support collecting lists in user defined order
 --

 Key: HIVE-11022
 URL: https://issues.apache.org/jira/browse/HIVE-11022
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Michael Haeusler

 Hive currently supports aggregation of lists in order of input rows with 
 the UDF collect_list. Unfortunately, the order is not well defined when 
 map-side aggregations are used.
 Hive could support collecting lists in user-defined order by providing a UDF
 COLLECT_LIST_SORTED(valueColumn, sortColumn[, limit]), that would return a 
 list of values sorted in a user defined order. An optional limit parameter 
 can restrict this to the n first values within that order.
 Especially in the limit case, this can be efficiently pre-aggregated and 
 reduces the amount of data transferred to reducers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11008) webhcat GET /jobs retries on getting job details from history server is too agressive

2015-06-16 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588483#comment-14588483
 ] 

Thejas M Nair commented on HIVE-11008:
--


[~jianhe] would have more background on the fix from [~cwelch].
 What is the behavior in above case mentioned by [~ekoifman] ? I understand 
that in above case as well we can have the RM having the job information, but 
History server not having it.
Would you recommend having retries in that case ? Can that result in timeouts ?



 webhcat GET /jobs retries on getting job details from history server is too 
 agressive
 -

 Key: HIVE-11008
 URL: https://issues.apache.org/jira/browse/HIVE-11008
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 1.2.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-11008.1.patch


 Webhcat jobs api gets the list of jobs from RM and then gets details from 
 history server.
 RM has a policy of retaining fixed number of jobs to accommodate for the 
 memory it has, while HistoryServer retains jobs based on their age. As a 
 result, jobs that RM returns might not be present in HistoryServer and can 
 result in a failure. HistoryServer also ends up retrying on failures even if 
 they happen because the job actually does not exist. 
 The retries to get details from HistoryServer in such cases is too aggressive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11008) webhcat GET /jobs retries on getting job details from history server is too agressive

2015-06-16 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588367#comment-14588367
]

Thejas M Nair commented on HIVE-11008:
--

As mentioned in description, this issue happens because of difference between
the jobs retained by RM and job history server, and that is applicable only to
showJobList. That problem is applicable only to showJobList() call, when
showDetails gets set to true.
This is not an ideal solution, but since the jobclient is not able to
distinguish between real failures that it needs to retry on (eg transient fs
errors) and failures due to job not existing, we don't have any good
alternative.
For showJobId(), it is better to still retry.

If we move this to StatusDelegator.run(), we will have to pass some boolean to
it, so that this is set only in case of showJobList() call. Please let me know
if you think that is better.

webhcat GET /jobs retries on getting job details from history server is too
agressive
-

Key: HIVE-11008
URL: https://issues.apache.org/jira/browse/HIVE-11008
Project: Hive
Issue Type: Bug
Components: WebHCat
Affects Versions: 1.2.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Attachments: HIVE-11008.1.patch

Webhcat jobs api gets the list of jobs from RM and then gets details from
history server.
RM has a policy of retaining fixed number of jobs to accommodate for the
memory it has, while HistoryServer retains jobs based on their age. As a
result, jobs that RM returns might not be present in HistoryServer and can
result in a failure. HistoryServer also ends up retrying on failures even if
they happen because the job actually does not exist.
The retries to get details from HistoryServer in such cases is too aggressive.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.


 [ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9736:
---
Affects Version/s: 1.2.1

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Affects Versions: 1.2.1
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
  Labels: TODOC1.2
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11006) improve logging wrt ACID module


[ 
https://issues.apache.org/jira/browse/HIVE-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588395#comment-14588395
 ] 

Sushanth Sowmyan commented on HIVE-11006:
-

+1 for inclusion to 1.2.1, please add it to the Release Status wiki page, and 
when you commit to master/branch-1, you can commit it to 1.2.1 as well.

 improve logging wrt ACID module
 ---

 Key: HIVE-11006
 URL: https://issues.apache.org/jira/browse/HIVE-11006
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11006.patch


 especially around metastore DB operations (TxnHandler) which are retried or 
 fail for some reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11018) Turn on cbo in more q files


[ 
https://issues.apache.org/jira/browse/HIVE-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588488#comment-14588488
 ] 

Hive QA commented on HIVE-11018:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739884/HIVE-11018.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9008 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_hybridgrace_hashjoin_2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4273/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4273/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4273/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12739884 - PreCommit-HIVE-TRUNK-Build

 Turn on cbo in more q files
 ---

 Key: HIVE-11018
 URL: https://issues.apache.org/jira/browse/HIVE-11018
 Project: Hive
  Issue Type: Task
  Components: Tests
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11018.1.patch, HIVE-11018.patch


 There are few tests in which cbo was turned off for various reasons. Those 
 reasons don't exists anymore. For those tests, we should turn on cbo. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11008) webhcat GET /jobs retries on getting job details from history server is too agressive


[ 
https://issues.apache.org/jira/browse/HIVE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588401#comment-14588401
 ] 

Eugene Koifman commented on HIVE-11008:
---

suppose the call http://www.myserver.com/templeton/v1/jobs/job123 is made and 
and job123 doesn't exist.  Why would the same retry logic not kick in?

 webhcat GET /jobs retries on getting job details from history server is too 
 agressive
 -

 Key: HIVE-11008
 URL: https://issues.apache.org/jira/browse/HIVE-11008
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 1.2.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-11008.1.patch


 Webhcat jobs api gets the list of jobs from RM and then gets details from 
 history server.
 RM has a policy of retaining fixed number of jobs to accommodate for the 
 memory it has, while HistoryServer retains jobs based on their age. As a 
 result, jobs that RM returns might not be present in HistoryServer and can 
 result in a failure. HistoryServer also ends up retrying on failures even if 
 they happen because the job actually does not exist. 
 The retries to get details from HistoryServer in such cases is too aggressive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588466#comment-14588466
 ] 

Hive QA commented on HIVE-10999:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739892/HIVE-10999.1-spark.patch

{color:red}ERROR:{color} -1 due to 706 failed/errored test(s), 7406 tests 
executed
*Failed tests:*
{noformat}
TestCliDriver-alter_file_format.q-udf_tan.q-bucket_map_join_tez1.q-and-12-more 
- did not produce a TEST-*.xml file
TestCliDriver-alter_table_not_sorted.q-ppd_join3.q-authorization_delete_own_table.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-authorization_1_sql_std.q-disallow_incompatible_type_change_off.q-encryption_insert_values.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-authorization_create_temp_table.q-skewjoinopt10.q-mapjoin_subquery2.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-authorization_parts.q-parquet_map_of_maps.q-join_vc.q-and-12-more 
- did not produce a TEST-*.xml file
TestCliDriver-authorization_role_grant2.q-alter_char2.q-avro_joins_native.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-authorization_update.q-udf_pmod.q-leadlag_queries.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-auto_join18.q-smb_mapjoin_7.q-join_merge_multi_expressions.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-auto_join9.q-udtf_posexplode.q-udf_least.q-and-12-more - did not 
produce a TEST-*.xml file
TestCliDriver-auto_join_reordering_values.q-authorization_cli_stdconfigauth.q-subquery_in.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-avro_decimal_native.q-udf_E.q-bucketmapjoin4.q-and-12-more - did 
not produce a TEST-*.xml file
TestCliDriver-ba_table3.q-tez_union_dynamic_partition.q-union30.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-bool_literal.q-authorization_cli_createtab.q-udf_when.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-bucketcontext_4.q-orc_ends_with_nulls.q-correlationoptimizer9.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-bucketmapjoin3.q-vector_partition_diff_num_cols.q-stats2.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-bucketsortoptimize_insert_7.q-dynpart_sort_optimization2.q-decimal_3.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-cluster.q-groupby_sort_6.q-tez_schema_evolution.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-columnstats_partlvl_dp.q-input31.q-leadlag.q-and-12-more - did 
not produce a TEST-*.xml file
TestCliDriver-compute_stats_string.q-show_columns.q-noalias_subq1.q-and-12-more 
- did not produce a TEST-*.xml file
TestCliDriver-cp_mj_rc.q-decimal_2.q-union32.q-and-12-more - did not produce a 
TEST-*.xml file
TestCliDriver-create_func1.q-enforce_order.q-interval_comparison.q-and-12-more 
- did not produce a TEST-*.xml file
TestCliDriver-create_genericudf.q-dynamic_partition_insert.q-auto_join10.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-describe_xpath.q-autogen_colalias.q-skewjoinopt3.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-escape_distributeby1.q-ambiguitycheck.q-udf_bitwise_and.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-groupby3_map.q-current_date_timestamp.q-skewjoinopt8.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-groupby4.q-convert_enum_to_string.q-load_dyn_part3.q-and-12-more 
- did not produce a TEST-*.xml file
TestCliDriver-groupby8_map.q-insert_values_tmp_table.q-union_remove_11.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-groupby_complex_types.q-groupby_map_ppr_multi_distinct.q-vector_decimal_round.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-groupby_grouping_id2.q-udf_decode.q-protectmode.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-groupby_grouping_sets5.q-auto_sortmerge_join_13.q-show_tblproperties.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-index_bitmap_rc.q-desc_tbl_part_cols.q-bucketmapjoin10.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-infer_bucket_sort.q-nonreserved_keywords_input37.q-udf_nvl.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-infer_bucket_sort_list_bucket.q-parquet_avro_array_of_primitives.q-fileformat_sequencefile.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-infer_bucket_sort_multi_insert.q-insert_compressed.q-udf4.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-input10.q-orc_empty_files.q-ppd_multi_insert.q-and-12-more - did 
not produce a TEST-*.xml file
TestCliDriver-input19.q-index_auth.q-input16.q-and-12-more - did not produce a 
TEST-*.xml file
TestCliDriver-interval_udf.q-metadataonly1.q-union13.q-and-12-more -

[jira] [Updated] (HIVE-11007) CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL


 [ 
https://issues.apache.org/jira/browse/HIVE-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11007:
---
Attachment: HIVE-11007.02.patch

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's 
 mapInputToDP should depends on the last SEL
 -

 Key: HIVE-11007
 URL: https://issues.apache.org/jira/browse/HIVE-11007
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11007.01.patch, HIVE-11007.02.patch


 In dynamic partitioning case, for example, we are going to have 
 TS0-SEL1-SEL2-FS3. The dpCtx's mapInputToDP is populated by SEL1 rather than 
 SEL2, which causes error in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-10999:
---
Attachment: (was: HIVE-10999.1-spark.patch)

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, 
 HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11008) webhcat GET /jobs retries on getting job details from history server is too agressive


[ 
https://issues.apache.org/jira/browse/HIVE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588584#comment-14588584
 ] 

Eugene Koifman commented on HIVE-11008:
---

whether you start with Server.showJobId() or Server.showJobList() you end up in 
StatusDelegator.run(), i.e. the calls to Hadoop daemons are exactly the same so 
this has to behave the same way...

 webhcat GET /jobs retries on getting job details from history server is too 
 agressive
 -

 Key: HIVE-11008
 URL: https://issues.apache.org/jira/browse/HIVE-11008
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 1.2.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-11008.1.patch


 Webhcat jobs api gets the list of jobs from RM and then gets details from 
 history server.
 RM has a policy of retaining fixed number of jobs to accommodate for the 
 memory it has, while HistoryServer retains jobs based on their age. As a 
 result, jobs that RM returns might not be present in HistoryServer and can 
 result in a failure. HistoryServer also ends up retrying on failures even if 
 they happen because the job actually does not exist. 
 The retries to get details from HistoryServer in such cases is too aggressive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-1643) support range scans and non-key columns in HBase filter pushdown

2015-06-16 Thread Ran Postar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588759#comment-14588759
 ] 

Ran Postar commented on HIVE-1643:
--

I understand that pushing down the expression to hbase is complicated 
especially when working with multiple tables. But is it possible to add a 
hinting mechanism?
We can added to each from table a hint for startRow and stopRow for the scan, 
and when HiveHBaseTableInputFormat scan the table it will added the startRow  
stopRow. And for the first Part we will leave the user the possibility to 
optimize specific table scan.

 support range scans and non-key columns in HBase filter pushdown
 

 Key: HIVE-1643
 URL: https://issues.apache.org/jira/browse/HIVE-1643
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.9.0
Reporter: John Sichi
Assignee: bharath v
  Labels: patch
 Attachments: HIVE-1643.patch, Hive-1643.2.patch, hbase_handler.patch


 HIVE-1226 added support for WHERE rowkey=3.  We would like to support WHERE 
 rowkey BETWEEN 10 and 20, as well as predicates on non-rowkeys (plus 
 conjunctions etc).  Non-rowkey conditions can't be used to filter out entire 
 ranges, but they can be used to push the per-row filter processing as far 
 down as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10984) After lock table shared explicit lock, lock database exclusive should fail.


 [ 
https://issues.apache.org/jira/browse/HIVE-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10984:

Description: 
The following statements will fail since tb1 and its database are locked in 
shared, and exclusive lock on database fails as expected.
{noformat}
use db1; 
lock table tbl1 shared; 
lock database db1 exclusive;
{noformat}

While the following similar statements will pass since the current database is 
different.
{noformat}
use default; 
lock table db1.tbl1 shared; 
lock database db1 exclusive;
{noformat}

Seems both case should fail.
Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
failure cases.


  was:
There is an issue in ZooKeeperHiveLockManager.java, in which when locking 
exclusively on a table, it doesn't lock the database object (which does if it's 
from the query).
The current implementation of ZooKeeperHiveLockManager will lock the the object 
and the parents, and won't check the children when it tries to acquire lock on 
certain object. Then it will cause the following scenario which should not be 
allowed but right now it goes through.

{noformat}
use default; 
lock table db1.tbl1 shared; 
lock database db1 exclusive;
{noformat}

Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
failure cases.



 After lock table shared explicit lock, lock database exclusive should 
 fail.
 ---

 Key: HIVE-10984
 URL: https://issues.apache.org/jira/browse/HIVE-10984
 Project: Hive
  Issue Type: Bug
  Components: Locking
Reporter: Aihua Xu
Assignee: Aihua Xu

 The following statements will fail since tb1 and its database are locked in 
 shared, and exclusive lock on database fails as expected.
 {noformat}
 use db1; 
 lock table tbl1 shared; 
 lock database db1 exclusive;
 {noformat}
 While the following similar statements will pass since the current database 
 is different.
 {noformat}
 use default; 
 lock table db1.tbl1 shared; 
 lock database db1 exclusive;
 {noformat}
 Seems both case should fail.
 Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
 failure cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10984) After lock table shared explicit lock, lock database exclusive should fail.


 [ 
https://issues.apache.org/jira/browse/HIVE-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10984:

Description: 
The following statements will fail since tb1 and its database are locked in 
shared, and exclusive lock on database fails as expected.
{noformat}
use db1; 
lock table tbl1 shared; 
lock database db1 exclusive;
{noformat}

While the following similar statements will pass just because the current 
database is different.
{noformat}
use default; 
lock table db1.tbl1 shared; 
lock database db1 exclusive;
{noformat}

Seems both case should fail.
Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
failure cases.


  was:
The following statements will fail since tb1 and its database are locked in 
shared, and exclusive lock on database fails as expected.
{noformat}
use db1; 
lock table tbl1 shared; 
lock database db1 exclusive;
{noformat}

While the following similar statements will pass since the current database is 
different.
{noformat}
use default; 
lock table db1.tbl1 shared; 
lock database db1 exclusive;
{noformat}

Seems both case should fail.
Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
failure cases.



 After lock table shared explicit lock, lock database exclusive should 
 fail.
 ---

 Key: HIVE-10984
 URL: https://issues.apache.org/jira/browse/HIVE-10984
 Project: Hive
  Issue Type: Bug
  Components: Locking
Reporter: Aihua Xu
Assignee: Aihua Xu

 The following statements will fail since tb1 and its database are locked in 
 shared, and exclusive lock on database fails as expected.
 {noformat}
 use db1; 
 lock table tbl1 shared; 
 lock database db1 exclusive;
 {noformat}
 While the following similar statements will pass just because the current 
 database is different.
 {noformat}
 use default; 
 lock table db1.tbl1 shared; 
 lock database db1 exclusive;
 {noformat}
 Seems both case should fail.
 Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
 failure cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10984) After lock table shared explicit lock, lock database exclusive should fail.


 [ 
https://issues.apache.org/jira/browse/HIVE-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10984:

Description: 
The following statements will fail since tb1 and its database are locked in 
shared, and exclusive lock on database fails as expected.
{noformat}
use db1; 
lock table db1.tbl1 shared; 
lock database db1 exclusive;
{noformat}

While the following similar statements will pass just because the current 
database is different.
{noformat}
use default; 
lock table db1.tbl1 shared; 
lock database db1 exclusive;
{noformat}

Seems both case should fail.
Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
failure cases.


  was:
The following statements will fail since tb1 and its database are locked in 
shared, and exclusive lock on database fails as expected.
{noformat}
use db1; 
lock table tbl1 shared; 
lock database db1 exclusive;
{noformat}

While the following similar statements will pass just because the current 
database is different.
{noformat}
use default; 
lock table db1.tbl1 shared; 
lock database db1 exclusive;
{noformat}

Seems both case should fail.
Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
failure cases.



 After lock table shared explicit lock, lock database exclusive should 
 fail.
 ---

 Key: HIVE-10984
 URL: https://issues.apache.org/jira/browse/HIVE-10984
 Project: Hive
  Issue Type: Bug
  Components: Locking
Reporter: Aihua Xu
Assignee: Aihua Xu

 The following statements will fail since tb1 and its database are locked in 
 shared, and exclusive lock on database fails as expected.
 {noformat}
 use db1; 
 lock table db1.tbl1 shared; 
 lock database db1 exclusive;
 {noformat}
 While the following similar statements will pass just because the current 
 database is different.
 {noformat}
 use default; 
 lock table db1.tbl1 shared; 
 lock database db1 exclusive;
 {noformat}
 Seems both case should fail.
 Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
 failure cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7193) Hive should support additional LDAP authentication parameters


 [ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-7193:
---
Description: 
Currently hive has only following authenticator parameters for LDAP 
authentication for hiveserver2:
{code:xml}
property 
  namehive.server2.authentication/name 
  valueLDAP/value 
/property 
property 
  namehive.server2.authentication.ldap.url/name 
  valueldap://our_ldap_address/value 
/property 
{code}
We need to include other LDAP properties as part of hive-LDAP authentication 
like below:
{noformat}
a group search base - dc=domain,dc=com 
a group search filter - member={0} 
a user search base - dc=domain,dc=com 
a user search filter - sAMAAccountName={0} 
a list of valid user groups - group1,group2,group3 
{noformat}

  was:
Currently hive has only following authenticator parameters for LDAP
 authentication for hiveserver2. 
property 
namehive.server2.authentication/name 
valueLDAP/value 
/property 
property 
namehive.server2.authentication.ldap.url/name 
valueldap://our_ldap_address/value 
/property 

We need to include other LDAP properties as part of hive-LDAP authentication 
like below
a group search base - dc=domain,dc=com 
a group search filter - member={0} 
a user search base - dc=domain,dc=com 
a user search filter - sAMAAccountName={0} 
a list of valid user groups - group1,group2,group3 




 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.5.patch, 
 HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11026) Make vector_outer_join* test more robust


 [ 
https://issues.apache.org/jira/browse/HIVE-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11026:

Attachment: HIVE-11026.patch

 Make vector_outer_join* test more robust
 

 Key: HIVE-11026
 URL: https://issues.apache.org/jira/browse/HIVE-11026
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11026.patch


 Different file sizes on different OSes result in different Data Size in 
 explain output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.


[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588792#comment-14588792
 ] 

Hive QA commented on HIVE-9736:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739895/HIVE-9736.8.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9008 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4275/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4275/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4275/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12739895 - PreCommit-HIVE-TRUNK-Build

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Affects Versions: 1.2.1
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
  Labels: TODOC1.2
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch, 
 HIVE-9736.8.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.


[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588814#comment-14588814
 ] 

Sushanth Sowmyan commented on HIVE-9736:


Looks like we have a regression : 
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
 is failing while it shouldn't. This happened in the 9th May run as well.

Error Message : expected:1 but was:0
Stacktrace:
{noformat}
java.lang.AssertionError: expected:1 but was:0
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.dropPartitionByOtherUser(TestStorageBasedMetastoreAuthorizationDrops.java:202)
at 
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition(TestStorageBasedMetastoreAuthorizationDrops.java:172)
{noformat}


[~mithun], if we can look at this and resolve this, we can get this into 1.2.1, 
but if not, then I'm afraid this will have to be deferred out of branch-1.2, 
and make it in 1.3/2.0 .

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Affects Versions: 1.2.1
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
  Labels: TODOC1.2
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch, 
 HIVE-9736.8.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7292) Hive on Spark

2015-06-16 Thread Raj Sharma (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588837#comment-14588837
]

Raj Sharma commented on HIVE-7292:
--

Can I run below command in Hive 1.1 or 1.2 to switch engine from MapReduce to
Spark?

hive set hive.execution.engine=spark;

Hive on Spark
-

Key: HIVE-7292
URL: https://issues.apache.org/jira/browse/HIVE-7292
Project: Hive
Issue Type: Improvement
Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
Attachments: Hive-on-Spark.pdf

Spark as an open-source data analytics cluster computing framework has gained
significant momentum recently. Many Hive users already have Spark installed
as their computing backbone. To take advantages of Hive, they still need to
have either MapReduce or Tez on their cluster. This initiative will provide
user a new alternative so that those user can consolidate their backend.
Secondly, providing such an alternative further increases Hive's adoption as
it exposes Spark users to a viable, feature-rich de facto standard SQL tools
on Hadoop.
Finally, allowing Hive to run on Spark also has performance benefits. Hive
queries, especially those involving multiple reducer stages, will run faster,
thus improving user experience as Tez does.
This is an umbrella JIRA which will cover many coming subtask. Design doc
will be attached here shortly, and will be on the wiki as well. Feedback from
the community is greatly appreciated!

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10479) CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD


 [ 
https://issues.apache.org/jira/browse/HIVE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10479:
---
Attachment: HIVE-10479.02.patch

address [~jcamachorodriguez]'s comments

 CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias 
 in columnInfo which triggers PPD
 

 Key: HIVE-10479
 URL: https://issues.apache.org/jira/browse/HIVE-10479
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-10479.01.patch, HIVE-10479.02.patch, 
 HIVE-10479.patch


 in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, 
 when aliases contains empty string  and key is an empty string  too, it 
 assumes that aliases contains key. This will trigger incorrect PPD. To 
 reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11027) Hive on tez: Bucket map joins fail when hashcode goes negative

2015-06-16 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-11027:
---
Affects Version/s: 0.13
   0.14.0

 Hive on tez: Bucket map joins fail when hashcode goes negative
 --

 Key: HIVE-11027
 URL: https://issues.apache.org/jira/browse/HIVE-11027
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0, 1.0.0, 0.13
Reporter: Vikram Dixit K
Assignee: Prasanth Jayachandran

 Seeing an issue when dynamic sort optimization is enabled while doing an 
 insert into bucketed table. We seem to be flipping the negative sign on the 
 hashcode instead of taking the complement of it for routing the data 
 correctly. This results in correctness issues in bucket map joins in hive on 
 tez when the hash code goes negative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.


 [ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9736:
---
Attachment: HIVE-9736.8.patch

One more attempt to get this patch in - updating the patch slightly so as to 
not remove the dependency on java.util.Set in HadoopShims.java (since there is 
another function that now depends on it)

Once the tests pass this time, I will get it in.

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Affects Versions: 1.2.1
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
  Labels: TODOC1.2
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch, 
 HIVE-9736.8.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-11026) Make vector_outer_join* test more robust


 [ 
https://issues.apache.org/jira/browse/HIVE-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reassigned HIVE-11026:
---

Assignee: Ashutosh Chauhan

 Make vector_outer_join* test more robust
 

 Key: HIVE-11026
 URL: https://issues.apache.org/jira/browse/HIVE-11026
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan

 Different file sizes on different OSes result in different Data Size in 
 explain output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10984) After lock table shared explicit lock, lock database exclusive should fail.


 [ 
https://issues.apache.org/jira/browse/HIVE-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10984:

Summary: After lock table shared explicit lock, lock database exclusive 
should fail.  (was: Lock table explicit lock command doesn't lock the 
database object.)

 After lock table shared explicit lock, lock database exclusive should 
 fail.
 ---

 Key: HIVE-10984
 URL: https://issues.apache.org/jira/browse/HIVE-10984
 Project: Hive
  Issue Type: Bug
  Components: Locking
Reporter: Aihua Xu
Assignee: Aihua Xu

 There is an issue in ZooKeeperHiveLockManager.java, in which when locking 
 exclusively on a table, it doesn't lock the database object (which does if 
 it's from the query).
 The current implementation of ZooKeeperHiveLockManager will lock the the 
 object and the parents, and won't check the children when it tries to acquire 
 lock on certain object. Then it will cause the following scenario which 
 should not be allowed but right now it goes through.
 {noformat}
 use default; 
 lock table db1.tbl1 shared; 
 lock database db1 exclusive;
 {noformat}
 Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
 failure cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11023) Disable directSQL if datanucleus.identifierFactory = datanucleus2


[ 
https://issues.apache.org/jira/browse/HIVE-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588641#comment-14588641
 ] 

Hive QA commented on HIVE-11023:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739893/HIVE-11023.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9008 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4274/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4274/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4274/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12739893 - PreCommit-HIVE-TRUNK-Build

 Disable directSQL if datanucleus.identifierFactory = datanucleus2
 -

 Key: HIVE-11023
 URL: https://issues.apache.org/jira/browse/HIVE-11023
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.3.0, 1.2.1, 2.0.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Critical
 Attachments: HIVE-11023.patch


 We hit an interesting bug in a case where datanucleus.identifierFactory = 
 datanucleus2 .
 The problem is that directSql handgenerates SQL strings assuming 
 datanucleus1 naming scheme. If a user has their metastore JDO managed by 
 datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate 
 are incorrect.
 One simple example of what this results in is the following: whenever DN 
 persists a field which is held as a ListT, it winds up storing each T as a 
 separate line in the appropriate mapping table, and has a column called 
 INTEGER_IDX, which holds the position in the list. Then, upon reading, it 
 automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which 
 results in the list retaining its order. In DN2 naming scheme, the column is 
 called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool 
 upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX 
 and IDX.
 Whenever they use JDO, such as with all writes, it will then use the IDX 
 field, and when they do any sort of optimized reads, such as through 
 directSQL, it will ORDER BY INTEGER_IDX.
 An immediate danger is seen when we consider that the schema of a table is 
 stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX 
 will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch 
 schema for the table can come up mixed up in the table's native hashing 
 order, rather than sorted by the index.
 This can then result in schema ordering being different from the actual 
 table. For eg:, if a user has a (a:int,b:string,c:string), a describe on this 
 may return (c:string, a:int, b: string), and thus, queries which are 
 inserting after selecting from another table can have ClassCastExceptions 
 when trying to insert data in the wong order - this is how we discovered this 
 bug. This problem, however, can be far worse, if there are no type problems - 
 it is possible, for eg., that if a,bc were all strings, that that insert 
 query would succeed but mix up the order, which then results in user table 
 data being mixed up. This has the potential to be very bad.
 We should write a tool to help convert metastores that use datanucleus2 to 
 datanucleus1(more difficult, needs more one-time testing) or change 
 directSql to support both(easier to code, but increases test-coverage matrix 
 significantly and we should really then be testing against both schemes). But 
 in the short term, we should disable directSql if we see that the 
 identifierfactory is datanucleus2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11025) In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly


 [ 
https://issues.apache.org/jira/browse/HIVE-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11025:

Attachment: HIVE-11025.patch

 In windowing spec, when the datatype is decimal, it's comparing the value 
 against NULL value incorrectly
 

 Key: HIVE-11025
 URL: https://issues.apache.org/jira/browse/HIVE-11025
 Project: Hive
  Issue Type: Sub-task
  Components: PTF-Windowing
Affects Versions: 2.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-11025.patch


 Given data and the following query,
 {noformat}
 deptno  empno  bonussalary
 307698 NULL2850.0 
 307900 NULL950.0 
 307844 0   1500.0 
 select avg(salary) over (partition by deptno order by bonus range 200 
 preceding) from emp2;
 {noformat}
 It produces incorrect result for the row in which bonus=0
 1900.0
 1900.0
 1766.7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11025) In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly


 [ 
https://issues.apache.org/jira/browse/HIVE-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11025:

Description: 
Given data and the following query,
{noformat}
deptno  empno  bonussalary
307698 NULL2850.0 
307900 NULL950.0 
307844 0   1500.0 

select avg(salary) over (partition by deptno order by bonus range 200 
preceding) from emp2;
{noformat}

It produces incorrect result for the row in which bonus=0
1900.0
1900.0
1766.7





  was:
Given data and the following query,
{noformat}
deptno  empno  bonussalary
307698 NULL2850.0 
307900 NULL950.0 
307844 0   1500.0 

select avg(salary) over (partition by deptno order by bonus range 200 
preceding) from emp2;
{noformat}

It produces incorrect result for the row in which bonus=0
1900.0
1900.0
1766.7






 In windowing spec, when the datatype is decimal, it's comparing the value 
 against NULL value incorrectly
 

 Key: HIVE-11025
 URL: https://issues.apache.org/jira/browse/HIVE-11025
 Project: Hive
  Issue Type: Sub-task
  Components: PTF-Windowing
Affects Versions: 2.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-11025.patch


 Given data and the following query,
 {noformat}
 deptno  empno  bonussalary
 307698 NULL2850.0 
 307900 NULL950.0 
 307844 0   1500.0 
 select avg(salary) over (partition by deptno order by bonus range 200 
 preceding) from emp2;
 {noformat}
 It produces incorrect result for the row in which bonus=0
 1900.0
 1900.0
 1766.7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588746#comment-14588746
 ] 

Hive QA commented on HIVE-10999:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739930/HIVE-10999.1-spark.patch

{color:red}ERROR:{color} -1 due to 604 failed/errored test(s), 7286 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_add_part_multiple
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18_multi_distinct

[jira] [Commented] (HIVE-11006) improve logging wrt ACID module


[ 
https://issues.apache.org/jira/browse/HIVE-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588711#comment-14588711
 ] 

Eugene Koifman commented on HIVE-11006:
---

actually it did pick it up but there was a glitch
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4272/console


 improve logging wrt ACID module
 ---

 Key: HIVE-11006
 URL: https://issues.apache.org/jira/browse/HIVE-11006
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11006.2.patch, HIVE-11006.patch


 especially around metastore DB operations (TxnHandler) which are retried or 
 fail for some reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7292) Hive on Spark

2015-06-16 Thread Raj Sharma (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588593#comment-14588593
]

Raj Sharma commented on HIVE-7292:
--

When will Spark be shipped with Hive as an option of Hive engine along with Tez
and MapReduce?

Hive on Spark
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7292) Hive on Spark

2015-06-16 Thread Chao Sun (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588596#comment-14588596
]

Chao Sun commented on HIVE-7292:

Hi Raj, as mentioned by Xuefu above, Hive on Spark is already available in Hive
1.1 and 1.2. Please check it out.

Hive on Spark
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11006) improve logging wrt ACID module


 [ 
https://issues.apache.org/jira/browse/HIVE-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11006:
--
Attachment: HIVE-11006.2.patch

attaching patch again - for some reason the build bot didn't pick it up

 improve logging wrt ACID module
 ---

 Key: HIVE-11006
 URL: https://issues.apache.org/jira/browse/HIVE-11006
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11006.2.patch, HIVE-11006.patch


 especially around metastore DB operations (TxnHandler) which are retried or 
 fail for some reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11018) Turn on cbo in more q files


 [ 
https://issues.apache.org/jira/browse/HIVE-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11018:

Attachment: HIVE-11018.2.patch

 Turn on cbo in more q files
 ---

 Key: HIVE-11018
 URL: https://issues.apache.org/jira/browse/HIVE-11018
 Project: Hive
  Issue Type: Task
  Components: Tests
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11018.1.patch, HIVE-11018.2.patch, HIVE-11018.patch


 There are few tests in which cbo was turned off for various reasons. Those 
 reasons don't exists anymore. For those tests, we should turn on cbo. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11007) CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL


[ 
https://issues.apache.org/jira/browse/HIVE-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588952#comment-14588952
 ] 

Hive QA commented on HIVE-11007:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739912/HIVE-11007.02.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8981 tests executed
*Failed tests:*
{noformat}
TestContribCliDriver - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4276/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4276/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4276/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12739912 - PreCommit-HIVE-TRUNK-Build

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's 
 mapInputToDP should depends on the last SEL
 -

 Key: HIVE-11007
 URL: https://issues.apache.org/jira/browse/HIVE-11007
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11007.01.patch, HIVE-11007.02.patch


 In dynamic partitioning case, for example, we are going to have 
 TS0-SEL1-SEL2-FS3. The dpCtx's mapInputToDP is populated by SEL1 rather than 
 SEL2, which causes error in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException

2015-06-16 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588988#comment-14588988
 ] 

Jason Dere commented on HIVE-11028:
---

It looks like this is caused because TezCompiler invokes ConstantPropagate and 
this is removing some columns, but without a corresponding call to ColumnPruner 
to remove outputColumnNames from the join operator.

Talking to [~jpullokkaran] and [~hagleitn], the use of ConstantPropagate in 
TezCompiler is to remove extra (and unnecessary) AND true predicates 
generated during dynamic partition pruning. One solution is to eliminate just 
those expressions (referred to in ConstantPropagate as short-cutting), as 
opposed to doing full constant folding. I'll try to add an option to 
ConstantPropagate where we can specify that we only want to perform expression 
short-cutting rather than full constant folding.

 Tez: table self join and join with another table fails with 
 IndexOutOfBoundsException
 -

 Key: HIVE-11028
 URL: https://issues.apache.org/jira/browse/HIVE-11028
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Jason Dere
Assignee: Jason Dere

 {noformat}
 create table tez_self_join1(id1 int, id2 string, id3 string);
 insert into table tez_self_join1 values(1, 'aa','bb'), (2, 'ab','ab'), 
 (3,'ba','ba');
 create table tez_self_join2(id1 int);
 insert into table tez_self_join2 values(1),(2),(3);
 explain
 select s.id2, s.id3
 from
 (
  select self1.id1, self1.id2, self1.id3
  from tez_self_join1 self1 join tez_self_join1 self2
  on self1.id2=self2.id3 ) s
 join tez_self_join2
 on s.id1=tez_self_join2.id1
 where s.id2='ab';
 {noformat}
 fails with error:
 {noformat}
 2015-06-16 15:41:55,759 ERROR [main]: ql.Driver 
 (SessionState.java:printError(979)) - FAILED: Execution Error, return code 2 
 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
 vertexName=Reducer 3, vertexId=vertex_1434494327112_0002_4_04, 
 diagnostics=[Task failed, taskId=task_1434494327112_0002_4_04_00, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 
 0, Size: 0
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:635)
 at java.util.ArrayList.get(ArrayList.java:411)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:109)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275)
 at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175)
 at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:313)
 at 
 org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator.initializeOp(AbstractMapJoinOperator.java:71)
 at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeOp(CommonMergeJoinOperator.java:99)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)

[jira] [Updated] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-16 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10996:
---
Affects Version/s: 2.0.0
   1.3.0

 Aggregation / Projection over Multi-Join Inner Query producing incorrect 
 results
 

 Key: HIVE-10996
 URL: https://issues.apache.org/jira/browse/HIVE-10996
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
Reporter: Gautam Kowshik
Assignee: Jesus Camacho Rodriguez
Priority: Critical
 Attachments: explain_q1.txt, explain_q2.txt


 We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
 a regression.
 The following query (Q1) produces no results:
 {code}
 select s
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 {code}
 While this one (Q2) does produce results :
 {code}
 select *
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 1 21  20  Bob 1234
 1 31  30  Bob 1234
 3 51  50  Jeff1234
 {code}
 The setup to test this is:
 {code}
 create table purchase_history (s string, product string, price double, 
 timestamp int);
 insert into purchase_history values ('1', 'Belt', 20.00, 21);
 insert into purchase_history values ('1', 'Socks', 3.50, 31);
 insert into purchase_history values ('3', 'Belt', 20.00, 51);
 insert into purchase_history values ('4', 'Shirt', 15.50, 59);
 create table cart_history (s string, cart_id int, timestamp int);
 insert into cart_history values ('1', 1, 10);
 insert into cart_history values ('1', 2, 20);
 insert into cart_history values ('1', 3, 30);
 insert into cart_history values ('1', 4, 40);
 insert into cart_history values ('3', 5, 50);
 insert into cart_history values ('4', 6, 60);
 create table events (s string, st2 string, n int, timestamp int);
 insert into events values ('1', 'Bob', 1234, 20);
 insert into events values ('1', 'Bob', 1234, 30);
 insert into events values ('1', 'Bob', 1234, 25);
 insert into events values ('2', 'Sam', 1234, 30);
 insert into events values ('3', 'Jeff', 1234, 50);
 insert into events values ('4', 'Ted', 1234, 60);
 {code}
 I realize select * and select s are not all that interesting in this context 
 but what lead us to this issue was select count(distinct s) was not returning 
 results. The above queries are the simplified queries that produce the issue. 
 I will note that if I convert the inner join to a table and select from that 
 the issue does not appear.
 Update: Found that turning off  hive.optimize.remove.identity.project fixes 
 this issue. This optimization was introduced in 
 https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-16 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10996:
---
Attachment: HIVE-10996.patch

Triggering a QA run.

 Aggregation / Projection over Multi-Join Inner Query producing incorrect 
 results
 

 Key: HIVE-10996
 URL: https://issues.apache.org/jira/browse/HIVE-10996
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
Reporter: Gautam Kowshik
Assignee: Jesus Camacho Rodriguez
Priority: Critical
 Attachments: HIVE-10996.patch, explain_q1.txt, explain_q2.txt


 We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
 a regression.
 The following query (Q1) produces no results:
 {code}
 select s
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 {code}
 While this one (Q2) does produce results :
 {code}
 select *
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 1 21  20  Bob 1234
 1 31  30  Bob 1234
 3 51  50  Jeff1234
 {code}
 The setup to test this is:
 {code}
 create table purchase_history (s string, product string, price double, 
 timestamp int);
 insert into purchase_history values ('1', 'Belt', 20.00, 21);
 insert into purchase_history values ('1', 'Socks', 3.50, 31);
 insert into purchase_history values ('3', 'Belt', 20.00, 51);
 insert into purchase_history values ('4', 'Shirt', 15.50, 59);
 create table cart_history (s string, cart_id int, timestamp int);
 insert into cart_history values ('1', 1, 10);
 insert into cart_history values ('1', 2, 20);
 insert into cart_history values ('1', 3, 30);
 insert into cart_history values ('1', 4, 40);
 insert into cart_history values ('3', 5, 50);
 insert into cart_history values ('4', 6, 60);
 create table events (s string, st2 string, n int, timestamp int);
 insert into events values ('1', 'Bob', 1234, 20);
 insert into events values ('1', 'Bob', 1234, 30);
 insert into events values ('1', 'Bob', 1234, 25);
 insert into events values ('2', 'Sam', 1234, 30);
 insert into events values ('3', 'Jeff', 1234, 50);
 insert into events values ('4', 'Ted', 1234, 60);
 {code}
 I realize select * and select s are not all that interesting in this context 
 but what lead us to this issue was select count(distinct s) was not returning 
 results. The above queries are the simplified queries that produce the issue. 
 I will note that if I convert the inner join to a table and select from that 
 the issue does not appear.
 Update: Found that turning off  hive.optimize.remove.identity.project fixes 
 this issue. This optimization was introduced in 
 https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7292) Hive on Spark

2015-06-16 Thread Rui Li (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589152#comment-14589152
]

Rui Li commented on HIVE-7292:
--

[~riomario] - Yes you can. You can follow this
[wiki|https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started]
to see what else you need to do to run Hive on Spark.

Hive on Spark
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-16 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589250#comment-14589250
 ] 

Xuefu Zhang commented on HIVE-10999:


Yeah. I built that jar from Spark branch-1.4, using make-distribution.sh and 
renamed it. Do you know how to make non-SNAPSHOT build from spark?

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, 
 HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-10707) CBO: debug logging OOMs


 [ 
https://issues.apache.org/jira/browse/HIVE-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-10707:
--

Assignee: Gopal V

 CBO: debug logging OOMs
 ---

 Key: HIVE-10707
 URL: https://issues.apache.org/jira/browse/HIVE-10707
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Gopal V
Assignee: Gopal V
Priority: Trivial

 {code}
 hive source xcross.sql;
 OK
 Time taken: 0.837 seconds
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:3332)
 at 
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
 at 
 java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
 at 
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
 at java.lang.StringBuilder.append(StringBuilder.java:136)
 at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:111)
 at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119)
 at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119)
 at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119)
 at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119)
 {code}
 The query contains 360 join clauses, wrapped in a UNION ALL.
 Looks like {{genOpTree}} does 
 {code}
   this.ctx.setCboInfo(Plan optimized by CBO.);
   this.ctx.setCboSucceeded(true);
   LOG.debug(newAST.dump());
   }
 {code}
 the debug logging OOMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11026) Make vector_outer_join* test more robust


[ 
https://issues.apache.org/jira/browse/HIVE-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589317#comment-14589317
 ] 

Ashutosh Chauhan commented on HIVE-11026:
-

Test failure is unrelated. [~prasanth_j] can you take a look?

 Make vector_outer_join* test more robust
 

 Key: HIVE-11026
 URL: https://issues.apache.org/jira/browse/HIVE-11026
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11026.patch


 Different file sizes on different OSes result in different Data Size in 
 explain output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-3958) support partial scan for analyze command - RCFile

2015-06-16 Thread Lefty Leverenz (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lefty Leverenz updated HIVE-3958:
-
Description:
analyze commands allows us to collect statistics on existing tables/partitions.
It works great but might be slow since it scans all files.

There are 2 ways to speed it up:
1. collect stats without file scan. It may not collect all stats but good and
fast enough for use case. HIVE-3917 addresses it
2. collect stats via partial file scan. It doesn't scan all content of files
but part of it to get file metadata. some examples are
https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and
HFile of Hbase (Edit: That link should be
https://cwiki.apache.org/confluence/display/Hive/RCFileCat.)

This jira is targeted to address the #2. More specifically RCFile format.

was:
analyze commands allows us to collect statistics on existing tables/partitions.
It works great but might be slow since it scans all files.

This jira is targeted to address the #2. More specifically RCFile format.

support partial scan for analyze command - RCFile
-

Key: HIVE-3958
URL: https://issues.apache.org/jira/browse/HIVE-3958
Project: Hive
Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Fix For: 0.11.0

Attachments: HIVE-3958.patch.1, HIVE-3958.patch.2, HIVE-3958.patch.3,
HIVE-3958.patch.4, HIVE-3958.patch.5, HIVE-3958.patch.6

analyze commands allows us to collect statistics on existing
tables/partitions. It works great but might be slow since it scans all files.
There are 2 ways to speed it up:
1. collect stats without file scan. It may not collect all stats but good and
fast enough for use case. HIVE-3917 addresses it
2. collect stats via partial file scan. It doesn't scan all content of files
but part of it to get file metadata. some examples are
https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 )
and HFile of Hbase (Edit: That link should be
https://cwiki.apache.org/confluence/display/Hive/RCFileCat.)
This jira is targeted to address the #2. More specifically RCFile format.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10707) CBO: debug logging OOMs


 [ 
https://issues.apache.org/jira/browse/HIVE-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10707:
---
Attachment: HIVE-10707.1.patch

 CBO: debug logging OOMs
 ---

 Key: HIVE-10707
 URL: https://issues.apache.org/jira/browse/HIVE-10707
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Gopal V
Assignee: Gopal V
Priority: Trivial
 Attachments: HIVE-10707.1.patch


 {code}
 hive source xcross.sql;
 OK
 Time taken: 0.837 seconds
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:3332)
 at 
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
 at 
 java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
 at 
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
 at java.lang.StringBuilder.append(StringBuilder.java:136)
 at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:111)
 at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119)
 at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119)
 at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119)
 at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119)
 {code}
 The query contains 360 join clauses, wrapped in a UNION ALL.
 Looks like {{genOpTree}} does 
 {code}
   this.ctx.setCboInfo(Plan optimized by CBO.);
   this.ctx.setCboSucceeded(true);
   LOG.debug(newAST.dump());
   }
 {code}
 the debug logging OOMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10707) CBO: debug logging OOMs


 [ 
https://issues.apache.org/jira/browse/HIVE-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10707:
---
Affects Version/s: 2.0.0

 CBO: debug logging OOMs
 ---

 Key: HIVE-10707
 URL: https://issues.apache.org/jira/browse/HIVE-10707
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Trivial
 Attachments: HIVE-10707.1.patch


 {code}
 hive source xcross.sql;
 OK
 Time taken: 0.837 seconds
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:3332)
 at 
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
 at 
 java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
 at 
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
 at java.lang.StringBuilder.append(StringBuilder.java:136)
 at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:111)
 at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119)
 at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119)
 at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119)
 at org.apache.hadoop.hive.ql.parse.ASTNode.dump(ASTNode.java:119)
 {code}
 The query contains 360 join clauses, wrapped in a UNION ALL.
 Looks like {{genOpTree}} does 
 {code}
   this.ctx.setCboInfo(Plan optimized by CBO.);
   this.ctx.setCboSucceeded(true);
   LOG.debug(newAST.dump());
   }
 {code}
 the debug logging OOMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-16 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589289#comment-14589289
 ] 

Rui Li commented on HIVE-10999:
---

I think you can use the [release 
tag|https://github.com/apache/spark/releases/tag/v1.4.0] to get a non-SNAPSHOT 
build. Also you should rename the dir to 
{{spark-1.4.0-bin-hadoop2-without-hive}} as I mentioned above.

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, 
 HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11007) CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL


[ 
https://issues.apache.org/jira/browse/HIVE-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589320#comment-14589320
 ] 

Pengcheng Xiong commented on HIVE-11007:


The test failures are unrelated. [~ashutoshc] or [~jpullokkaran], could you 
please take a look? Thanks.

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's 
 mapInputToDP should depends on the last SEL
 -

 Key: HIVE-11007
 URL: https://issues.apache.org/jira/browse/HIVE-11007
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11007.01.patch, HIVE-11007.02.patch


 In dynamic partitioning case, for example, we are going to have 
 TS0-SEL1-SEL2-FS3. The dpCtx's mapInputToDP is populated by SEL1 rather than 
 SEL2, which causes error in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11026) Make vector_outer_join* test more robust

2015-06-16 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589337#comment-14589337
 ] 

Prasanth Jayachandran commented on HIVE-11026:
--

+1. Where you able to verify this on other OSes?

 Make vector_outer_join* test more robust
 

 Key: HIVE-11026
 URL: https://issues.apache.org/jira/browse/HIVE-11026
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11026.patch


 Different file sizes on different OSes result in different Data Size in 
 explain output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10479) CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD


[ 
https://issues.apache.org/jira/browse/HIVE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589247#comment-14589247
 ] 

Hive QA commented on HIVE-10479:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739980/HIVE-10479.02.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9008 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4280/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4280/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4280/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12739980 - PreCommit-HIVE-TRUNK-Build

 CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias 
 in columnInfo which triggers PPD
 

 Key: HIVE-10479
 URL: https://issues.apache.org/jira/browse/HIVE-10479
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-10479.01.patch, HIVE-10479.02.patch, 
 HIVE-10479.patch


 in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, 
 when aliases contains empty string  and key is an empty string  too, it 
 assumes that aliases contains key. This will trigger incorrect PPD. To 
 reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10974) Use Configuration::getRaw() for the Base64 data


[ 
https://issues.apache.org/jira/browse/HIVE-10974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589259#comment-14589259
 ] 

Gopal V commented on HIVE-10974:


Commited to master, thanks [~sershe]

 Use Configuration::getRaw() for the Base64 data
 ---

 Key: HIVE-10974
 URL: https://issues.apache.org/jira/browse/HIVE-10974
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Gopal V
 Fix For: 2.0.0

 Attachments: HIVE-10974.1.patch


 Inspired by the Twitter HadoopSummit talk
 {code}
if (HiveConf.getBoolVar(conf, ConfVars.HIVE_RPC_QUERY_PLAN)) {
   LOG.debug(Loading plan from string: +path.toUri().getPath());
   String planString = conf.get(path.toUri().getPath());
 {code}
 Use getRaw() in other places where Base64 data is present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10974) Use Configuration::getRaw() for the Base64 data


 [ 
https://issues.apache.org/jira/browse/HIVE-10974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10974:
---
Affects Version/s: (was: 1.2.0)
   2.0.0

 Use Configuration::getRaw() for the Base64 data
 ---

 Key: HIVE-10974
 URL: https://issues.apache.org/jira/browse/HIVE-10974
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Gopal V
 Fix For: 2.0.0

 Attachments: HIVE-10974.1.patch


 Inspired by the Twitter HadoopSummit talk
 {code}
if (HiveConf.getBoolVar(conf, ConfVars.HIVE_RPC_QUERY_PLAN)) {
   LOG.debug(Loading plan from string: +path.toUri().getPath());
   String planString = conf.get(path.toUri().getPath());
 {code}
 Use getRaw() in other places where Base64 data is present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11026) Make vector_outer_join* test more robust


[ 
https://issues.apache.org/jira/browse/HIVE-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589312#comment-14589312
 ] 

Hive QA commented on HIVE-11026:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739981/HIVE-11026.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9008 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4281/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4281/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4281/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12739981 - PreCommit-HIVE-TRUNK-Build

 Make vector_outer_join* test more robust
 

 Key: HIVE-11026
 URL: https://issues.apache.org/jira/browse/HIVE-11026
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11026.patch


 Different file sizes on different OSes result in different Data Size in 
 explain output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10479) CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD


[ 
https://issues.apache.org/jira/browse/HIVE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589322#comment-14589322
 ] 

Pengcheng Xiong commented on HIVE-10479:


[~jcamachorodriguez], the test failures are unrelated and it failed on the 
previous runs too. Could you please take a look? Thanks.

 CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias 
 in columnInfo which triggers PPD
 

 Key: HIVE-10479
 URL: https://issues.apache.org/jira/browse/HIVE-10479
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-10479.01.patch, HIVE-10479.02.patch, 
 HIVE-10479.patch


 in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, 
 when aliases contains empty string  and key is an empty string  too, it 
 assumes that aliases contains key. This will trigger incorrect PPD. To 
 reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics

2015-06-16 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11031:
-
Attachment: HIVE-11031.patch

 ORC concatenation of old files can fail while merging column statistics
 ---

 Key: HIVE-11031
 URL: https://issues.apache.org/jira/browse/HIVE-11031
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11031.patch


 Column statistics in ORC are optional protobuf fields. Old ORC files might 
 not have statistics for newly added types like decimal, date, timestamp etc. 
 But column statistics merging assumes column statistics exists for these 
 types and invokes merge. For example, merging of TimestampColumnStatistics 
 directly casts the received ColumnStatistics object without doing instanceof 
 check. If the ORC file contains time stamp column statistics then this will 
 work else it will throw ClassCastException.
 Also, the file merge operator swallows the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-16 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-10999:
--

Assignee: Rui Li  (was: Xuefu Zhang)

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, 
 HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11006) improve logging wrt ACID module

2015-06-16 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589176#comment-14589176
 ] 

Hive QA commented on HIVE-11006:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739960/HIVE-11006.2.patch

{color:green}SUCCESS:{color} +1 9008 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4279/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4279/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4279/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12739960 - PreCommit-HIVE-TRUNK-Build

 improve logging wrt ACID module
 ---

 Key: HIVE-11006
 URL: https://issues.apache.org/jira/browse/HIVE-11006
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11006.2.patch, HIVE-11006.patch


 especially around metastore DB operations (TxnHandler) which are retried or 
 fail for some reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11018) Turn on cbo in more q files


[ 
https://issues.apache.org/jira/browse/HIVE-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589177#comment-14589177
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11018:
--

+1

 Turn on cbo in more q files
 ---

 Key: HIVE-11018
 URL: https://issues.apache.org/jira/browse/HIVE-11018
 Project: Hive
  Issue Type: Task
  Components: Tests
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11018.1.patch, HIVE-11018.2.patch, HIVE-11018.patch


 There are few tests in which cbo was turned off for various reasons. Those 
 reasons don't exists anymore. For those tests, we should turn on cbo. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11023) Disable directSQL if datanucleus.identifierFactory = datanucleus2

[
https://issues.apache.org/jira/browse/HIVE-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589182#comment-14589182
]

Sushanth Sowmyan commented on HIVE-11023:
-

[~sershe], could you please review? Thanks!

Disable directSQL if datanucleus.identifierFactory = datanucleus2
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-6384) Implement all Hive data types in Parquet

2015-06-16 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu resolved HIVE-6384.

   Resolution: Fixed
Fix Version/s: 1.2.0

Resolved since all sub tasks are resolved in 1.2.0.

 Implement all Hive data types in Parquet
 

 Key: HIVE-6384
 URL: https://issues.apache.org/jira/browse/HIVE-6384
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: Ferdinand Xu
  Labels: Parquet
 Fix For: 1.2.0


 Uber JIRA to track implementation of binary, timestamp, date, char, varchar 
 or decimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7261) Calculation works wrong when hive.groupby.skewindata is true and count(*) count(distinct) group by work simultaneously

2015-06-16 Thread Chengbing Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589193#comment-14589193
 ] 

Chengbing Liu commented on HIVE-7261:
-

HIVE-10971 solved the same problem, mark as a duplicate.

 Calculation works wrong when hive.groupby.skewindata  is true and count(*) 
 count(distinct) group by work simultaneously 
 

 Key: HIVE-7261
 URL: https://issues.apache.org/jira/browse/HIVE-7261
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
 Environment: hive0.12   hadoop1.0.4
Reporter: Chris Chen

 【Phenomenon】
 The query results are not the same as when hive.groupby.skewindata was setted 
 to true and false.
 【my question】
 I want to calculate the count(*) and count(distinct) simultaneously 
 ,otherwise it will cost 2 MR job to calculate. But when i set the 
 hive.groupby.skewindata to be true, the count(*) result shoud not be same as 
 the count(distinct) , but the real result is same, so it's wrong. And I find 
 the difference of its query plan which the Reduce Operator Tree-Group By 
 Operator-mode  is mergepartial when skew is set to false and 
 Reduce Operator Tree-Group By Operator-mode  is complete when skew is set 
 to true. So i'm confused the root cause of the error.
 【sql】
 select ds,appid,eventname,active,{color:red}count(distinct(guid)), count(*) 
 {color}from eventinfo_tmp where ds='20140612' and length(eventname)1000 and 
 eventname like '%alibaba%' group by ds,appid,eventname,active;
 【the others hive configaration exclude hive.groupby.skewindata】
 hive.exec.compress.output=true
 hive.exec.compress.intermediate=true
 io.seqfile.compression.type=BLOCK
 mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
 hive.map.aggr=true
 hive.stats.autogather=false
 hive.exec.scratchdir=/user/complat/tmp
 mapred.job.queue.name=complat
 hive.exec.mode.local.auto=false
 hive.exec.mode.local.auto.inputbytes.max=500
 hive.exec.mode.local.auto.tasks.max=10
 hive.exec.mode.local.auto.input.files.max=1000
 hive.exec.dynamic.partition=true
 hive.exec.dynamic.partition.mode=nonstrict
 hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
 mapred.max.split.size=1
 mapred.min.split.size.per.node=1
 mapred.min.split.size.per.rack=1
 【result】
 when hive.groupby.skewindata=true  the result is :
 20140612  8   alibaba 1   {color:red}87   147{color}
 when it=false the result is : 
 20140612  8   alibaba 1   {color:red}87   87{color}
 【query plan】
 ABSTRACT SYNTAX TREE:
   (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME eventinfo_tmp))) (TOK_INSERT 
 (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR 
 (TOK_TABLE_OR_COL ds)) (TOK_SELEXPR (TOK_TABLE_OR_COL appid)) (TOK_SELEXPR 
 (TOK_TABLE_OR_COL eventname)) (TOK_SELEXPR (TOK_TABLE_OR_COL active)) 
 (TOK_SELEXPR (TOK_FUNCTIONDI count (TOK_TABLE_OR_COL guid))) (TOK_SELEXPR 
 (TOK_FUNCTIONSTAR count))) (TOK_WHERE (and (and (= (TOK_TABLE_OR_COL ds) 
 '20140612') ( (TOK_FUNCTION length (TOK_TABLE_OR_COL eventname)) 1000)) 
 (like (TOK_TABLE_OR_COL eventname) '%tvvideo_setting%'))) (TOK_GROUPBY 
 (TOK_TABLE_OR_COL ds) (TOK_TABLE_OR_COL appid) (TOK_TABLE_OR_COL eventname) 
 (TOK_TABLE_OR_COL active
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 eventinfo_tmp 
   TableScan
 alias: eventinfo_tmp
 Filter Operator
   predicate:
   expr: ((length(eventname)  1000) and (eventname like 
 '%tvvideo_setting%'))
   type: boolean
   Select Operator
 expressions:
   expr: ds
   type: string
   expr: appid
   type: string
   expr: eventname
   type: string
   expr: active
   type: int
   expr: guid
   type: string
 outputColumnNames: ds, appid, eventname, active, guid
 Group By Operator
   aggregations:
 expr: count(DISTINCT guid)
 expr: count()
   bucketGroup: false
   keys:
 expr: ds
 type: string
 expr: appid
 type: string
 expr: eventname
 type: string
 expr: active
 type: int
 expr: guid

[jira] [Resolved] (HIVE-7261) Calculation works wrong when hive.groupby.skewindata is true and count(*) count(distinct) group by work simultaneously

2015-06-16 Thread Chengbing Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu resolved HIVE-7261.
-
Resolution: Duplicate

 Calculation works wrong when hive.groupby.skewindata  is true and count(*) 
 count(distinct) group by work simultaneously 
 

 Key: HIVE-7261
 URL: https://issues.apache.org/jira/browse/HIVE-7261
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
 Environment: hive0.12   hadoop1.0.4
Reporter: Chris Chen

 【Phenomenon】
 The query results are not the same as when hive.groupby.skewindata was setted 
 to true and false.
 【my question】
 I want to calculate the count(*) and count(distinct) simultaneously 
 ,otherwise it will cost 2 MR job to calculate. But when i set the 
 hive.groupby.skewindata to be true, the count(*) result shoud not be same as 
 the count(distinct) , but the real result is same, so it's wrong. And I find 
 the difference of its query plan which the Reduce Operator Tree-Group By 
 Operator-mode  is mergepartial when skew is set to false and 
 Reduce Operator Tree-Group By Operator-mode  is complete when skew is set 
 to true. So i'm confused the root cause of the error.
 【sql】
 select ds,appid,eventname,active,{color:red}count(distinct(guid)), count(*) 
 {color}from eventinfo_tmp where ds='20140612' and length(eventname)1000 and 
 eventname like '%alibaba%' group by ds,appid,eventname,active;
 【the others hive configaration exclude hive.groupby.skewindata】
 hive.exec.compress.output=true
 hive.exec.compress.intermediate=true
 io.seqfile.compression.type=BLOCK
 mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
 hive.map.aggr=true
 hive.stats.autogather=false
 hive.exec.scratchdir=/user/complat/tmp
 mapred.job.queue.name=complat
 hive.exec.mode.local.auto=false
 hive.exec.mode.local.auto.inputbytes.max=500
 hive.exec.mode.local.auto.tasks.max=10
 hive.exec.mode.local.auto.input.files.max=1000
 hive.exec.dynamic.partition=true
 hive.exec.dynamic.partition.mode=nonstrict
 hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
 mapred.max.split.size=1
 mapred.min.split.size.per.node=1
 mapred.min.split.size.per.rack=1
 【result】
 when hive.groupby.skewindata=true  the result is :
 20140612  8   alibaba 1   {color:red}87   147{color}
 when it=false the result is : 
 20140612  8   alibaba 1   {color:red}87   87{color}
 【query plan】
 ABSTRACT SYNTAX TREE:
   (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME eventinfo_tmp))) (TOK_INSERT 
 (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR 
 (TOK_TABLE_OR_COL ds)) (TOK_SELEXPR (TOK_TABLE_OR_COL appid)) (TOK_SELEXPR 
 (TOK_TABLE_OR_COL eventname)) (TOK_SELEXPR (TOK_TABLE_OR_COL active)) 
 (TOK_SELEXPR (TOK_FUNCTIONDI count (TOK_TABLE_OR_COL guid))) (TOK_SELEXPR 
 (TOK_FUNCTIONSTAR count))) (TOK_WHERE (and (and (= (TOK_TABLE_OR_COL ds) 
 '20140612') ( (TOK_FUNCTION length (TOK_TABLE_OR_COL eventname)) 1000)) 
 (like (TOK_TABLE_OR_COL eventname) '%tvvideo_setting%'))) (TOK_GROUPBY 
 (TOK_TABLE_OR_COL ds) (TOK_TABLE_OR_COL appid) (TOK_TABLE_OR_COL eventname) 
 (TOK_TABLE_OR_COL active
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 eventinfo_tmp 
   TableScan
 alias: eventinfo_tmp
 Filter Operator
   predicate:
   expr: ((length(eventname)  1000) and (eventname like 
 '%tvvideo_setting%'))
   type: boolean
   Select Operator
 expressions:
   expr: ds
   type: string
   expr: appid
   type: string
   expr: eventname
   type: string
   expr: active
   type: int
   expr: guid
   type: string
 outputColumnNames: ds, appid, eventname, active, guid
 Group By Operator
   aggregations:
 expr: count(DISTINCT guid)
 expr: count()
   bucketGroup: false
   keys:
 expr: ds
 type: string
 expr: appid
 type: string
 expr: eventname
 type: string
 expr: active
 type: int
 expr: guid
 type: string
   mode: hash

[jira] [Updated] (HIVE-11033) BloomFilter index is not honored by ORC reader

2015-06-16 Thread Allan Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yan updated HIVE-11033:
-
Description: 
There is a bug in the org.apache.hadoop.hive.ql.io.orc.ReaderImpl class which 
caused the bloom filter index saved in the ORC file not being used. The root 
cause is the bloomFilterIndices variable defined in the SargApplier class 
superseded the one defined in its parent class. Therefore, in the 
ReaderImpl.pickRowGroups()
{code}
  protected boolean[] pickRowGroups() throws IOException {
// if we don't have a sarg or indexes, we read everything
if (sargApp == null) {
  return null;
}
readRowIndex(currentStripe, included, sargApp.sargColumns);
return sargApp.pickRowGroups(stripes.get(currentStripe), indexes);
  }
{code}
The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp 
object. One solution is to simply pass it to the sargApp.pickRowGroups()
{noformat}
18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original
174d173
 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
178c177
   sarg, options.getColumnNames(), strideRate, types, included.length, 
bloomFilterIndices);
---
   sarg, options.getColumnNames(), strideRate, types, included.length);
204a204
 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
673c673
 ListOrcProto.Type types, int includedCount, 
OrcProto.BloomFilterIndex[] bloomFilterIndices) {
---
 ListOrcProto.Type types, int includedCount) {
677c677
   this.bloomFilterIndices = bloomFilterIndices;
---
   bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
{noformat}



  was:
There is a bug in the org.apache.hadoop.hive.ql.io.orc.ReaderImpl class which 
caused the bloom filter index saved in the ORC file not being used. The reason 
is because the bloomFilterIndices variable defined in the SargApplier class 
superseded from its parent class.

Here is one way to fix it
{noformat}
18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original
174d173
 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
178c177
   sarg, options.getColumnNames(), strideRate, types, included.length, 
bloomFilterIndices);
---
   sarg, options.getColumnNames(), strideRate, types, included.length);
204a204
 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
673c673
 ListOrcProto.Type types, int includedCount, 
OrcProto.BloomFilterIndex[] bloomFilterIndices) {
---
 ListOrcProto.Type types, int includedCount) {
677c677
   this.bloomFilterIndices = bloomFilterIndices;
---
   bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
{noformat}




 BloomFilter index is not honored by ORC reader
 --

 Key: HIVE-11033
 URL: https://issues.apache.org/jira/browse/HIVE-11033
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Allan Yan

 There is a bug in the org.apache.hadoop.hive.ql.io.orc.ReaderImpl class which 
 caused the bloom filter index saved in the ORC file not being used. The root 
 cause is the bloomFilterIndices variable defined in the SargApplier class 
 superseded the one defined in its parent class. Therefore, in the 
 ReaderImpl.pickRowGroups()
 {code}
   protected boolean[] pickRowGroups() throws IOException {
 // if we don't have a sarg or indexes, we read everything
 if (sargApp == null) {
   return null;
 }
 readRowIndex(currentStripe, included, sargApp.sargColumns);
 return sargApp.pickRowGroups(stripes.get(currentStripe), indexes);
   }
 {code}
 The bloomFilterIndices populated by readRowIndex() is not picked up by 
 sargApp object. One solution is to simply pass it to the 
 sargApp.pickRowGroups()
 {noformat}
 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
 src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original
 174d173
  bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
 178c177
sarg, options.getColumnNames(), strideRate, types, 
 included.length, bloomFilterIndices);
 ---
sarg, options.getColumnNames(), strideRate, types, 
  included.length);
 204a204
  bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
 673c673
  ListOrcProto.Type types, int includedCount, 
 OrcProto.BloomFilterIndex[] bloomFilterIndices) {
 ---
  ListOrcProto.Type types, int includedCount) {
 677c677
this.bloomFilterIndices = bloomFilterIndices;
 ---
bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
 {noformat}



--
This

[jira] [Updated] (HIVE-11033) BloomFilter index is not honored by ORC reader

2015-06-16 Thread Allan Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yan updated HIVE-11033:
-
Description: 
There is a bug in the org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl class 
which caused the bloom filter index saved in the ORC file not being used. The 
root cause is the bloomFilterIndices variable defined in the SargApplier class 
superseded the one defined in its parent class. Therefore, in the 
ReaderImpl.pickRowGroups()
{code}
  protected boolean[] pickRowGroups() throws IOException {
// if we don't have a sarg or indexes, we read everything
if (sargApp == null) {
  return null;
}
readRowIndex(currentStripe, included, sargApp.sargColumns);
return sargApp.pickRowGroups(stripes.get(currentStripe), indexes);
  }
{code}

The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp 
object. One solution is to make SargApplier.bloomFilterIndices a reference of 
the one defined in its parent class.
{noformat}
18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original
174d173
 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
178c177
   sarg, options.getColumnNames(), strideRate, types, included.length, 
bloomFilterIndices);
---
   sarg, options.getColumnNames(), strideRate, types, included.length);
204a204
 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
673c673
 ListOrcProto.Type types, int includedCount, 
OrcProto.BloomFilterIndex[] bloomFilterIndices) {
---
 ListOrcProto.Type types, int includedCount) {
677c677
   this.bloomFilterIndices = bloomFilterIndices;
---
   bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
{noformat}



  was:
There is a bug in the org.apache.hadoop.hive.ql.io.orc.ReaderImpl class which 
caused the bloom filter index saved in the ORC file not being used. The root 
cause is the bloomFilterIndices variable defined in the SargApplier class 
superseded the one defined in its parent class. Therefore, in the 
ReaderImpl.pickRowGroups()
{code}
  protected boolean[] pickRowGroups() throws IOException {
// if we don't have a sarg or indexes, we read everything
if (sargApp == null) {
  return null;
}
readRowIndex(currentStripe, included, sargApp.sargColumns);
return sargApp.pickRowGroups(stripes.get(currentStripe), indexes);
  }
{code}
The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp 
object. One solution is to simply pass it to the sargApp.pickRowGroups()
{noformat}
18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original
174d173
 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
178c177
   sarg, options.getColumnNames(), strideRate, types, included.length, 
bloomFilterIndices);
---
   sarg, options.getColumnNames(), strideRate, types, included.length);
204a204
 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
673c673
 ListOrcProto.Type types, int includedCount, 
OrcProto.BloomFilterIndex[] bloomFilterIndices) {
---
 ListOrcProto.Type types, int includedCount) {
677c677
   this.bloomFilterIndices = bloomFilterIndices;
---
   bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
{noformat}




 BloomFilter index is not honored by ORC reader
 --

 Key: HIVE-11033
 URL: https://issues.apache.org/jira/browse/HIVE-11033
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Allan Yan

 There is a bug in the org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl class 
 which caused the bloom filter index saved in the ORC file not being used. The 
 root cause is the bloomFilterIndices variable defined in the SargApplier 
 class superseded the one defined in its parent class. Therefore, in the 
 ReaderImpl.pickRowGroups()
 {code}
   protected boolean[] pickRowGroups() throws IOException {
 // if we don't have a sarg or indexes, we read everything
 if (sargApp == null) {
   return null;
 }
 readRowIndex(currentStripe, included, sargApp.sargColumns);
 return sargApp.pickRowGroups(stripes.get(currentStripe), indexes);
   }
 {code}
 The bloomFilterIndices populated by readRowIndex() is not picked up by 
 sargApp object. One solution is to make SargApplier.bloomFilterIndices a 
 reference of the one defined in its parent class.
 {noformat}
 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
 src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original
 174d173
  bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
 178c177

[jira] [Updated] (HIVE-11033) BloomFilter index is not honored by ORC reader

2015-06-16 Thread Allan Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yan updated HIVE-11033:
-
Description: 
There is a bug in the org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl class 
which caused the bloom filter index saved in the ORC file not being used. The 
root cause is the bloomFilterIndices variable defined in the SargApplier class 
superseded the one defined in its parent class. Therefore, in the 
ReaderImpl.pickRowGroups()
{code}
  protected boolean[] pickRowGroups() throws IOException {
// if we don't have a sarg or indexes, we read everything
if (sargApp == null) {
  return null;
}
readRowIndex(currentStripe, included, sargApp.sargColumns);
return sargApp.pickRowGroups(stripes.get(currentStripe), indexes);
  }
{code}

The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp 
object. One solution is to make SargApplier.bloomFilterIndices a reference to 
its parent counterpart.
{noformat}
18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original
174d173
 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
178c177
   sarg, options.getColumnNames(), strideRate, types, included.length, 
bloomFilterIndices);
---
   sarg, options.getColumnNames(), strideRate, types, included.length);
204a204
 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
673c673
 ListOrcProto.Type types, int includedCount, 
OrcProto.BloomFilterIndex[] bloomFilterIndices) {
---
 ListOrcProto.Type types, int includedCount) {
677c677
   this.bloomFilterIndices = bloomFilterIndices;
---
   bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
{noformat}



  was:
There is a bug in the org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl class 
which caused the bloom filter index saved in the ORC file not being used. The 
root cause is the bloomFilterIndices variable defined in the SargApplier class 
superseded the one defined in its parent class. Therefore, in the 
ReaderImpl.pickRowGroups()
{code}
  protected boolean[] pickRowGroups() throws IOException {
// if we don't have a sarg or indexes, we read everything
if (sargApp == null) {
  return null;
}
readRowIndex(currentStripe, included, sargApp.sargColumns);
return sargApp.pickRowGroups(stripes.get(currentStripe), indexes);
  }
{code}

The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp 
object. One solution is to make SargApplier.bloomFilterIndices a reference of 
the one defined in its parent class.
{noformat}
18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original
174d173
 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
178c177
   sarg, options.getColumnNames(), strideRate, types, included.length, 
bloomFilterIndices);
---
   sarg, options.getColumnNames(), strideRate, types, included.length);
204a204
 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
673c673
 ListOrcProto.Type types, int includedCount, 
OrcProto.BloomFilterIndex[] bloomFilterIndices) {
---
 ListOrcProto.Type types, int includedCount) {
677c677
   this.bloomFilterIndices = bloomFilterIndices;
---
   bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
{noformat}




 BloomFilter index is not honored by ORC reader
 --

 Key: HIVE-11033
 URL: https://issues.apache.org/jira/browse/HIVE-11033
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Allan Yan

 There is a bug in the org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl class 
 which caused the bloom filter index saved in the ORC file not being used. The 
 root cause is the bloomFilterIndices variable defined in the SargApplier 
 class superseded the one defined in its parent class. Therefore, in the 
 ReaderImpl.pickRowGroups()
 {code}
   protected boolean[] pickRowGroups() throws IOException {
 // if we don't have a sarg or indexes, we read everything
 if (sargApp == null) {
   return null;
 }
 readRowIndex(currentStripe, included, sargApp.sargColumns);
 return sargApp.pickRowGroups(stripes.get(currentStripe), indexes);
   }
 {code}
 The bloomFilterIndices populated by readRowIndex() is not picked up by 
 sargApp object. One solution is to make SargApplier.bloomFilterIndices a 
 reference to its parent counterpart.
 {noformat}
 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
 src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original
 174d173
  bloomFilterIndices = new

[jira] [Commented] (HIVE-11034) Multiple join table producing different results

2015-06-16 Thread Srini Pindi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589214#comment-14589214
 ] 

Srini Pindi commented on HIVE-11034:


Please see attachments for test data and query info.

 Multiple join table producing different results
 ---

 Key: HIVE-11034
 URL: https://issues.apache.org/jira/browse/HIVE-11034
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
 Environment: Linux 2.6.32-279.19.1.el6.x86_64
Reporter: Srini Pindi
Priority: Critical
 Attachments: hive_issue.zip, steps_to_reproduce_.docx


 Join between one main table with other tables with different join columns 
 returns wrong results in hive. Changing the order of the joins between main 
 table and other tables is producing different results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11034) Multiple join table producing different results

2015-06-16 Thread Srini Pindi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srini Pindi updated HIVE-11034:
---
Attachment: steps_to_reproduce_.docx
hive_issue.zip

 Multiple join table producing different results
 ---

 Key: HIVE-11034
 URL: https://issues.apache.org/jira/browse/HIVE-11034
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
 Environment: Linux 2.6.32-279.19.1.el6.x86_64
Reporter: Srini Pindi
Priority: Critical
 Attachments: hive_issue.zip, steps_to_reproduce_.docx


 Join between one main table with other tables with different join columns 
 returns wrong results in hive. Changing the order of the joins between main 
 table and other tables is producing different results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11034) Joining multiple tables producing different results with different order of join

2015-06-16 Thread Srini Pindi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srini Pindi updated HIVE-11034:
---
Summary: Joining multiple tables producing different results with different 
order of join  (was: Multiple join table producing different results)

 Joining multiple tables producing different results with different order of 
 join
 

 Key: HIVE-11034
 URL: https://issues.apache.org/jira/browse/HIVE-11034
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
 Environment: Linux 2.6.32-279.19.1.el6.x86_64
Reporter: Srini Pindi
Priority: Critical
 Attachments: hive_issue.zip, steps_to_reproduce_.docx


 Join between one main table with other tables with different join columns 
 returns wrong results in hive. Changing the order of the joins between main 
 table and other tables is producing different results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-16 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589223#comment-14589223
 ] 

Rui Li commented on HIVE-10999:
---

Hi [~xuefuz], the problem seems to be incorrect naming of the spark-bin tar we 
packed. We expect the decompressed dir to be 
{noformat}spark-${spark.version}-bin-hadoop2-without-hive{noformat}
Previously we got {{spark-1.3.1-bin-hadoop2-without-hive}} which was correct. 
But now we have {{spark-1.4.0-SNAPSHOT-bin-2.4.0}}. So during test we can't 
locate the spark-submit properly.

Would you mind take a look at how we packed the tar, especially why it's still 
a SNAPSHOT? Thanks.

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch, 
 HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-16 Thread Chaoyu Tang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589224#comment-14589224
]

Chaoyu Tang commented on HIVE-7193:
---

Thanks [~ngangam] for the patch. It looks good to me. Regarding to the concern
you had whether the AtnProvider should be changed to be implemented as a
singleton, I agree with you that you would not address it in this patch for
following reasons:
1. The existing code does not implement AtnProvider as a singleton. Making such
change might have some backward compatibility issue. For example, what if a
user has already implemented and used a CustomAuthenticationProvider which is
not for a singleton?
2. The patch only adds several additional read and processing of HiveConf
properties in LdapAuthenticationProviderImpl constructor. Compared to LDAP
authentication itself, its overhead should be trivial and it should not be a
performance bottleneck.
3. In case it turns out the performance is not desirable due to AtnProvider
instantiation, we might consider moving some static logic from constructor to a
static block to improve runtime performance. Or open a separate JIRA to
initiate the investigation to performance implementation (including singleton
etc). But this patch will mainly focuses on the LDAP enhancement.
4. As for your concern dont know what the user-coded
CustomAuthenticationProvider could do, even if you change the
AuthenticationProviderFactory and allow it to be implemented as a singleton,
but like you said, we still have no control how he implements the singleton.

In addition, the enhancement including its new configuration properties should
be properly documented.

Hive should support additional LDAP authentication parameters
-

Key: HIVE-7193
URL: https://issues.apache.org/jira/browse/HIVE-7193
Project: Hive
Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.5.patch,
HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx,
LDAPAuthentication_Design_Doc_V2.docx

Currently hive has only following authenticator parameters for LDAP
authentication for hiveserver2:
{code:xml}
property
namehive.server2.authentication/name
valueLDAP/value
/property
property
namehive.server2.authentication.ldap.url/name
valueldap://our_ldap_address/value
/property
{code}
We need to include other LDAP properties as part of hive-LDAP authentication
like below:
{noformat}
a group search base - dc=domain,dc=com
a group search filter - member={0}
a user search base - dc=domain,dc=com
a user search filter - sAMAAccountName={0}
a list of valid user groups - group1,group2,group3
{noformat}

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements


 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10841:
---
Fix Version/s: 2.0.0
   1.3.0

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic

[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements


 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10841:
---
Affects Version/s: 2.0.0

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements


[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589081#comment-14589081
 ] 

Alexander Pivovarov commented on HIVE-10841:


Currently the patch is committed to

https://github.com/apache/hive/commits/branch-1
https://github.com/apache/hive/commits/branch-1.0
https://github.com/apache/hive/commits/branch-1.2
https://github.com/apache/hive/commits/master

I updated Fix Version/s field accordingly

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi

[jira] [Commented] (HIVE-11018) Turn on cbo in more q files


[ 
https://issues.apache.org/jira/browse/HIVE-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589080#comment-14589080
 ] 

Hive QA commented on HIVE-11018:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739953/HIVE-11018.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9008 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4277/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4277/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4277/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12739953 - PreCommit-HIVE-TRUNK-Build

 Turn on cbo in more q files
 ---

 Key: HIVE-11018
 URL: https://issues.apache.org/jira/browse/HIVE-11018
 Project: Hive
  Issue Type: Task
  Components: Tests
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11018.1.patch, HIVE-11018.2.patch, HIVE-11018.patch


 There are few tests in which cbo was turned off for various reasons. Those 
 reasons don't exists anymore. For those tests, we should turn on cbo. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements


 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10841:
---
Affects Version/s: (was: 2.0.0)

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats:

[jira] [Updated] (HIVE-10994) Hive.moveFile should not fail on a no-op move


 [ 
https://issues.apache.org/jira/browse/HIVE-10994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-10994:

Fix Version/s: 2.0.0

 Hive.moveFile should not fail on a no-op move
 -

 Key: HIVE-10994
 URL: https://issues.apache.org/jira/browse/HIVE-10994
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 1.2.1, 2.0.0

 Attachments: HIVE-10994.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-16 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587589#comment-14587589
 ] 

Lefty Leverenz commented on HIVE-10841:
---

Does branch-1.0 mean version 1.0.1 or is it the same as branch-1 (version 
1.3.0)?

Today I've seen three commits to refs/heads/branch-1.0 but they don't show 
Fix Version 1.0.1 on the jira (HIVE-10273, HIVE-10685, and HIVE-10841).  Many 
other commits go to refs/heads/branch-1 so I'm confused.  Perhaps we need 
more details in the wiki.

* [Understanding Hive Branches | 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-UnderstandingHiveBranches]

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Fix For: 1.2.1

 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE

[jira] [Updated] (HIVE-11004) PermGen OOM error in Hiveserver2