[jira] [Commented] (HIVE-12080) Support auto type widening (int->bigint & float->double) for Parquet table

2015-11-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003682#comment-15003682
 ] 

Lefty Leverenz commented on HIVE-12080:
---

Does this need documentation in the wiki?

* [Parquet -- Versions and Limitations | 
https://cwiki.apache.org/confluence/display/Hive/Parquet#Parquet-VersionsandLimitations]
* [Hive Data Types | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types]

> Support auto type widening (int->bigint & float->double) for Parquet table
> --
>
> Key: HIVE-12080
> URL: https://issues.apache.org/jira/browse/HIVE-12080
> Project: Hive
>  Issue Type: New Feature
>  Components: File Formats
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Fix For: 2.0.0
>
> Attachments: HIVE-12080.1.patch, HIVE-12080.2.patch, 
> HIVE-12080.3.patch, HIVE-12080.6.patch, HIVE-12080.7.patch
>
>
> Currently Hive+Parquet doesn't support it. It should include at least basic 
> type promotions short->int->bigint,  float->double etc, that are already 
> supported for  other file formats.
> There were similar effort (Hive-6784) but was not committed. This JIRA is to 
> address the same in different way with little (no) performance impact.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12399) Native Vector MapJoin can encounter "Null key not expected in MapJoin" and "Unexpected NULL in map join small table" exceptions

2015-11-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003804#comment-15003804
 ] 

Hive QA commented on HIVE-12399:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772058/HIVE-12399.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9782 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6023/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6023/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6023/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772058 - PreCommit-HIVE-TRUNK-Build

> Native Vector MapJoin can encounter  "Null key not expected in MapJoin" and 
> "Unexpected NULL in map join small table" exceptions
> 
>
> Key: HIVE-12399
> URL: https://issues.apache.org/jira/browse/HIVE-12399
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12399.01.patch
>
>
> Instead of throw exception, just filter out NULLs in the Native Vector 
> MapJoin operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-1841) datanucleus.fixedDatastore should be true in hive-default.xml

2015-11-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003882#comment-15003882
 ] 

Hive QA commented on HIVE-1841:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772079/HIVE-1841.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 29 failed/errored test(s), 9114 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.initializationError
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestSparkCliDriver.initializationError
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.beeline.cli.TestHiveCli.testCmd
org.apache.hive.beeline.cli.TestHiveCli.testDatabaseOptions
org.apache.hive.beeline.cli.TestHiveCli.testErrOutput
org.apache.hive.beeline.cli.TestHiveCli.testHelp
org.apache.hive.beeline.cli.TestHiveCli.testInValidCmd
org.apache.hive.beeline.cli.TestHiveCli.testInvalidDatabaseOptions
org.apache.hive.beeline.cli.TestHiveCli.testInvalidOptions
org.apache.hive.beeline.cli.TestHiveCli.testInvalidOptions2
org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB
org.apache.hive.beeline.cli.TestHiveCli.testSetHeaderValue
org.apache.hive.beeline.cli.TestHiveCli.testSetPromptValue
org.apache.hive.beeline.cli.TestHiveCli.testSourceCmd
org.apache.hive.beeline.cli.TestHiveCli.testSourceCmd2
org.apache.hive.beeline.cli.TestHiveCli.testSourceCmd3
org.apache.hive.beeline.cli.TestHiveCli.testSqlFromCmd
org.apache.hive.beeline.cli.TestHiveCli.testSqlFromCmdWithDBName
org.apache.hive.beeline.cli.TestHiveCli.testUseCurrentDB1
org.apache.hive.beeline.cli.TestHiveCli.testUseCurrentDB2
org.apache.hive.beeline.cli.TestHiveCli.testUseCurrentDB3
org.apache.hive.beeline.cli.TestHiveCli.testUseInvalidDB
org.apache.hive.beeline.cli.TestHiveCli.testVariables
org.apache.hive.beeline.cli.TestHiveCli.testVariablesForSource
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6025/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6025/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6025/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 29 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772079 - PreCommit-HIVE-TRUNK-Build

>  datanucleus.fixedDatastore should be true in hive-default.xml
> --
>
> Key: HIVE-1841
> URL: https://issues.apache.org/jira/browse/HIVE-1841
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Metastore
>Affects Versions: 0.6.0
>Reporter: Edward Capriolo
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-1841.1.patch.txt, HIVE-1841.patch
>
>
> Two datanucleus variables:
> {noformat}
> 
>  datanucleus.autoCreateSchema
>  false
> 
> 
>  datanucleus.fixedDatastore
>  true
> 
> {noformat}
> are dangerous.  We do want the schema to auto-create itself, but we do not 
> want the schema to auto update itself. 
> Someone might accidentally point a trunk at the wrong meta-store and 
> unknowingly update. I believe we should set this to false and possibly trap 
> exceptions stemming from hive wanting to do any update. This way someone has 
> to actively acknowledge the update, by setting this to true and then starting 
> up hive, or leaving it false, removing schema modifies for the user that hive 
> usages, and doing all the time and doing the updates by hand. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11525) Bucket pruning

2015-11-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003686#comment-15003686
 ] 

Lefty Leverenz commented on HIVE-11525:
---

Doc note:  This adds *hive.tez.bucket.pruning* to HiveConf.java, so it needs to 
be documented in the Tez section of Configuration Properties in the wiki.

* [Configuration Properties -- Tez | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez]

> Bucket pruning
> --
>
> Key: HIVE-11525
> URL: https://issues.apache.org/jira/browse/HIVE-11525
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
>Reporter: Maciek Kocon
>Assignee: Gopal V
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11525.1.patch, HIVE-11525.2.patch, 
> HIVE-11525.3.patch, HIVE-11525.WIP.patch
>
>
> Logically and functionally bucketing and partitioning are quite similar - 
> both provide mechanism to segregate and separate the table's data based on 
> its content. Thanks to that significant further optimisations like 
> [partition] PRUNING or [bucket] MAP JOIN are possible.
> The difference seems to be imposed by design where the PARTITIONing is 
> open/explicit while BUCKETing is discrete/implicit.
> Partitioning seems to be very common if not a standard feature in all current 
> RDBMS while BUCKETING seems to be HIVE specific only.
> In a way BUCKETING could be also called by "hashing" or simply "IMPLICIT 
> PARTITIONING".
> Regardless of the fact that these two are recognised as two separate features 
> available in Hive there should be nothing to prevent leveraging same existing 
> query/join optimisations across the two.
> BUCKET pruning
> Enable partition PRUNING equivalent optimisation for queries on BUCKETED 
> tables
> Simplest example is for queries like:
> "SELECT … FROM x WHERE colA=123123"
> to read only the relevant bucket file rather than all file-buckets that 
> belong to a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12391) SkewJoinOptimizer might not kick in if columns are renamed after TableScanOperator

2015-11-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003725#comment-15003725
 ] 

Lefty Leverenz commented on HIVE-12391:
---

Thanks [~jcamachorodriguez].

> SkewJoinOptimizer might not kick in if columns are renamed after 
> TableScanOperator
> --
>
> Key: HIVE-12391
> URL: https://issues.apache.org/jira/browse/HIVE-12391
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12391.patch
>
>
> SkewJoinOptimizer will not kick in if the columns are just renamed after the 
> TS e.g. by the creation of a derived table.
> To reproduce, consider the following example:
> {code}
> set hive.optimize.skewjoin.compiletime = true;
> CREATE TABLE T1(key STRING, val STRING)
> SKEWED BY (key) ON ((2)) STORED AS TEXTFILE;
> CREATE TABLE T2(key STRING, val STRING)
> SKEWED BY (key) ON ((3)) STORED AS TEXTFILE;
> {code}
> For this query, SkewJoinOptimizer kicks in:
> {code}
> SELECT a.*, b.*
> FROM T1 a JOIN T2 b
> ON a.key = b.key
> {code}
> For this one, it does not:
> {code}
> SELECT a.*, b.*
> FROM 
>   (SELECT key as k, val as v FROM T1) a
>   JOIN
>   (SELECT key as k, val as v FROM T2) b
> ON a.k = b.k;
> {code}
> The reason is that SkewJoinOptimizer does not backtrack the origin of the 
> column. Instead it just uses its name to know if it is produced by a certain 
> TS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-11-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-11587:
--
Labels: TODOC1.3  (was: TODOC2.0)

> Fix memory estimates for mapjoin hashtable
> --
>
> Key: HIVE-11587
> URL: https://issues.apache.org/jira/browse/HIVE-11587
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
>  Labels: TODOC1.3
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, 
> HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, 
> HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch
>
>
> Due to the legacy in in-memory mapjoin and conservative planning, the memory 
> estimation code for mapjoin hashtable is currently not very good. It 
> allocates the probe erring on the side of more memory, not taking data into 
> account because unlike the probe, it's free to resize, so it's better for 
> perf to allocate big probe and hope for the best with regard to future data 
> size. It is not true for hybrid case.
> There's code to cap the initial allocation based on memory available 
> (memUsage argument), but due to some code rot, the memory estimates from 
> planning are not even passed to hashtable anymore (there used to be two 
> config settings, hashjoin size fraction by itself, or hashjoin size fraction 
> for group by case), so it never caps the memory anymore below 1 Gb. 
> Initial capacity is estimated from input key count, and in hybrid join cache 
> can exceed Java memory due to number of segments.
> There needs to be a review and fix of all this code.
> Suggested improvements:
> 1) Make sure "initialCapacity" argument from Hybrid case is correct given the 
> number of segments. See how it's calculated from keys for regular case; it 
> needs to be adjusted accordingly for hybrid case if not done already.
> 1.5) Note that, knowing the number of rows, the maximum capacity one will 
> ever need for probe size (in longs) is row count (assuming key per row, i.e. 
> maximum possible number of keys) divided by load factor, plus some very small 
> number to round up. That is for flat case. For hybrid case it may be more 
> complex due to skew, but that is still a good upper bound for the total probe 
> capacity of all segments.
> 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
> correctly based on estimates that take into account both probe and data size, 
> esp. in hybrid case.
> 3) Make sure that memory estimation for hybrid case also doesn't come up with 
> numbers that are too small, like 1-byte hashtable. I am not very familiar 
> with that code but it has happened in the past.
> Other issues we have seen:
> 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
> should not allocate large array in advance. Even if some estimate passes 
> 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
> 5) For hybrid, don't pre-allocate WBs - only allocate on write.
> 6) Change everywhere rounding up to power of two is used to rounding down, at 
> least for hybrid case (?)
> I wanted to put all of these items in single JIRA so we could keep track of 
> fixing all of them.
> I think there are JIRAs for some of these already, feel free to link them to 
> this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-11-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003698#comment-15003698
 ] 

Lefty Leverenz commented on HIVE-11587:
---

Changing the doc label from TODOC2.0 to TODOC1.3.

> Fix memory estimates for mapjoin hashtable
> --
>
> Key: HIVE-11587
> URL: https://issues.apache.org/jira/browse/HIVE-11587
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
>  Labels: TODOC2.0
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, 
> HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, 
> HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch
>
>
> Due to the legacy in in-memory mapjoin and conservative planning, the memory 
> estimation code for mapjoin hashtable is currently not very good. It 
> allocates the probe erring on the side of more memory, not taking data into 
> account because unlike the probe, it's free to resize, so it's better for 
> perf to allocate big probe and hope for the best with regard to future data 
> size. It is not true for hybrid case.
> There's code to cap the initial allocation based on memory available 
> (memUsage argument), but due to some code rot, the memory estimates from 
> planning are not even passed to hashtable anymore (there used to be two 
> config settings, hashjoin size fraction by itself, or hashjoin size fraction 
> for group by case), so it never caps the memory anymore below 1 Gb. 
> Initial capacity is estimated from input key count, and in hybrid join cache 
> can exceed Java memory due to number of segments.
> There needs to be a review and fix of all this code.
> Suggested improvements:
> 1) Make sure "initialCapacity" argument from Hybrid case is correct given the 
> number of segments. See how it's calculated from keys for regular case; it 
> needs to be adjusted accordingly for hybrid case if not done already.
> 1.5) Note that, knowing the number of rows, the maximum capacity one will 
> ever need for probe size (in longs) is row count (assuming key per row, i.e. 
> maximum possible number of keys) divided by load factor, plus some very small 
> number to round up. That is for flat case. For hybrid case it may be more 
> complex due to skew, but that is still a good upper bound for the total probe 
> capacity of all segments.
> 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
> correctly based on estimates that take into account both probe and data size, 
> esp. in hybrid case.
> 3) Make sure that memory estimation for hybrid case also doesn't come up with 
> numbers that are too small, like 1-byte hashtable. I am not very familiar 
> with that code but it has happened in the past.
> Other issues we have seen:
> 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
> should not allocate large array in advance. Even if some estimate passes 
> 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
> 5) For hybrid, don't pre-allocate WBs - only allocate on write.
> 6) Change everywhere rounding up to power of two is used to rounding down, at 
> least for hybrid case (?)
> I wanted to put all of these items in single JIRA so we could keep track of 
> fixing all of them.
> I think there are JIRAs for some of these already, feel free to link them to 
> this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12391) SkewJoinOptimizer might not kick in if columns are renamed after TableScanOperator

2015-11-13 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003722#comment-15003722
 ] 

Jesus Camacho Rodriguez commented on HIVE-12391:


[~leftylev], I just renamed it: I should not have used the abbreviation.

> SkewJoinOptimizer might not kick in if columns are renamed after 
> TableScanOperator
> --
>
> Key: HIVE-12391
> URL: https://issues.apache.org/jira/browse/HIVE-12391
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12391.patch
>
>
> SkewJoinOptimizer will not kick in if the columns are just renamed after the 
> TS e.g. by the creation of a derived table.
> To reproduce, consider the following example:
> {code}
> set hive.optimize.skewjoin.compiletime = true;
> CREATE TABLE T1(key STRING, val STRING)
> SKEWED BY (key) ON ((2)) STORED AS TEXTFILE;
> CREATE TABLE T2(key STRING, val STRING)
> SKEWED BY (key) ON ((3)) STORED AS TEXTFILE;
> {code}
> For this query, SkewJoinOptimizer kicks in:
> {code}
> SELECT a.*, b.*
> FROM T1 a JOIN T2 b
> ON a.key = b.key
> {code}
> For this one, it does not:
> {code}
> SELECT a.*, b.*
> FROM 
>   (SELECT key as k, val as v FROM T1) a
>   JOIN
>   (SELECT key as k, val as v FROM T2) b
> ON a.k = b.k;
> {code}
> The reason is that SkewJoinOptimizer does not backtrack the origin of the 
> column. Instead it just uses its name to know if it is produced by a certain 
> TS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12391) SkewJoinOptimizer might not kick in if columns are renamed after TableScanOperator

2015-11-13 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-12391:
---
Summary: SkewJoinOptimizer might not kick in if columns are renamed after 
TableScanOperator  (was: SkewJoinOptimizer might not kick in if columns are 
renamed after TS)

> SkewJoinOptimizer might not kick in if columns are renamed after 
> TableScanOperator
> --
>
> Key: HIVE-12391
> URL: https://issues.apache.org/jira/browse/HIVE-12391
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12391.patch
>
>
> SkewJoinOptimizer will not kick in if the columns are just renamed after the 
> TS e.g. by the creation of a derived table.
> To reproduce, consider the following example:
> {code}
> set hive.optimize.skewjoin.compiletime = true;
> CREATE TABLE T1(key STRING, val STRING)
> SKEWED BY (key) ON ((2)) STORED AS TEXTFILE;
> CREATE TABLE T2(key STRING, val STRING)
> SKEWED BY (key) ON ((3)) STORED AS TEXTFILE;
> {code}
> For this query, SkewJoinOptimizer kicks in:
> {code}
> SELECT a.*, b.*
> FROM T1 a JOIN T2 b
> ON a.key = b.key
> {code}
> For this one, it does not:
> {code}
> SELECT a.*, b.*
> FROM 
>   (SELECT key as k, val as v FROM T1) a
>   JOIN
>   (SELECT key as k, val as v FROM T2) b
> ON a.k = b.k;
> {code}
> The reason is that SkewJoinOptimizer does not backtrack the origin of the 
> column. Instead it just uses its name to know if it is produced by a certain 
> TS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11120) Generic interface for file format validation

2015-11-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003715#comment-15003715
 ] 

Hive QA commented on HIVE-11120:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772055/HIVE-11120.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9781 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6021/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6021/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6021/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772055 - PreCommit-HIVE-TRUNK-Build

> Generic interface for file format validation
> 
>
> Key: HIVE-11120
> URL: https://issues.apache.org/jira/browse/HIVE-11120
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11120.2.patch, HIVE-11120.3.patch, 
> HIVE-11120.4.patch, HIVE-11120.patch
>
>
> https://issues.apache.org/jira/browse/HIVE-8?focusedCommentId=14602302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14602302
> We need generic interfaces for verify if a specified file is of valid format 
> so that load data statement can make some sanity check before copying the 
> file to destination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12395) Turn off CBO for hive.support.special.characters.tablename tests until feature is complete

2015-11-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003719#comment-15003719
 ] 

Hive QA commented on HIVE-12395:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772077/HIVE-12395.02.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6022/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6022/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6022/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6022/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 55cb43d HIVE-11525: Tez Bucket pruning (Gopal V, reviewed by 
Sergey Shelukhin)
+ git clean -f -d
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at 55cb43d HIVE-11525: Tez Bucket pruning (Gopal V, reviewed by 
Sergey Shelukhin)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772077 - PreCommit-HIVE-TRUNK-Build

> Turn off CBO for hive.support.special.characters.tablename tests until 
> feature is complete
> --
>
> Key: HIVE-12395
> URL: https://issues.apache.org/jira/browse/HIVE-12395
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12395.01.patch, HIVE-12395.02.patch
>
>
> Due to recent stats issue found in HIVE-12381, we need to turn off CBO for 
> hive.support.special.characters.tablename tests until feature is complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11955) Add costing for join-groupby transpose rule

2015-11-13 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003769#comment-15003769
 ] 

Jesus Camacho Rodriguez commented on HIVE-11955:


+1, LGTM.

It makes sense to delegate the logic of choosing to the Planner, as other 
planner implementation (e.g. Volcano) might decide to continue firing up rules 
on both alternative plans.

> Add costing for join-groupby transpose rule
> ---
>
> Key: HIVE-11955
> URL: https://issues.apache.org/jira/browse/HIVE-11955
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-11955.patch
>
>
> Currently, its config driven. It needs to be cost driven.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12319) Remove HadoopShims::getHadoopConfNames()

2015-11-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004149#comment-15004149
 ] 

Ashutosh Chauhan commented on HIVE-12319:
-

I dont see much value of keeping them in HiveConf. These properties exist in 
Hadoop code. We should reference these properties directly from there, instead 
of redefining them in HiveConf.

> Remove HadoopShims::getHadoopConfNames()
> 
>
> Key: HIVE-12319
> URL: https://issues.apache.org/jira/browse/HIVE-12319
> Project: Hive
>  Issue Type: Improvement
>  Components: Shims
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Aleksei Statkevich
> Attachments: HIVE-12319.patch
>
>
> It was introduced in HIVE-6159 It has served its purpose now that we support 
> only Hadoop 2.x line



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12396) BucketingSortingReduceSinkOptimizer may still throw IOB exception for duplicate columns

2015-11-13 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004177#comment-15004177
 ] 

Jesus Camacho Rodriguez commented on HIVE-12396:


+1

> BucketingSortingReduceSinkOptimizer may still throw IOB exception for 
> duplicate columns
> ---
>
> Key: HIVE-12396
> URL: https://issues.apache.org/jira/browse/HIVE-12396
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12396.patch
>
>
> HIVE-12332 didn't fix the issue completely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8396) Hive CliDriver command splitting can be broken when comments are present

2015-11-13 Thread Elliot West (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-8396:
--
Summary: Hive CliDriver command splitting can be broken when comments are 
present  (was: HIVE CliDriver command splitting can be broken when comments are 
present)

> Hive CliDriver command splitting can be broken when comments are present
> 
>
> Key: HIVE-8396
> URL: https://issues.apache.org/jira/browse/HIVE-8396
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, Query Processor
>Affects Versions: 0.14.0
>Reporter: Sergey Shelukhin
>
> {noformat}
> -- SORT_QUERY_RESULTS
> set hive.cbo.enable=true;
> ... commands ...
> {noformat}
> causes
> {noformat}
> 2014-10-07 18:55:57,193 ERROR ql.Driver (SessionState.java:printError(825)) - 
> FAILED: ParseException line 2:4 missing KW_ROLE at 'hive' near 'hive'
> {noformat}
> If the comment is moved after the command it works.
> I noticed this earlier when I comment out parts of some random q file for 
> debugging purposes, and it starts failing. This is annoying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11110) Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules, improve Filter selectivity estimation

2015-11-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-0?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004100#comment-15004100
 ] 

Hive QA commented on HIVE-0:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772100/HIVE-0.23.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 9782 tests 
executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketpruning1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6027/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6027/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6027/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772100 - PreCommit-HIVE-TRUNK-Build

> Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules, 
> improve Filter selectivity estimation
> 
>
> Key: HIVE-0
> URL: https://issues.apache.org/jira/browse/HIVE-0
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-0-10.patch, HIVE-0-11.patch, 
> HIVE-0-12.patch, HIVE-0-branch-1.2.patch, HIVE-0.1.patch, 
> HIVE-0.13.patch, HIVE-0.14.patch, HIVE-0.15.patch, 
> HIVE-0.16.patch, HIVE-0.17.patch, HIVE-0.18.patch, 
> HIVE-0.19.patch, HIVE-0.2.patch, HIVE-0.20.patch, 
> HIVE-0.21.patch, HIVE-0.22.patch, HIVE-0.23.patch, 
> HIVE-0.4.patch, HIVE-0.5.patch, HIVE-0.6.patch, 
> HIVE-0.7.patch, HIVE-0.8.patch, HIVE-0.9.patch, 
> HIVE-0.91.patch, HIVE-0.92.patch, HIVE-0.patch
>
>
> Query
> {code}
> select  count(*)
>  from store_sales
>  ,store_returns
>  ,date_dim d1
>  ,date_dim d2
>  where d1.d_quarter_name = '2000Q1'
>and d1.d_date_sk = ss_sold_date_sk
>and ss_customer_sk = sr_customer_sk
>and ss_item_sk = sr_item_sk
>and ss_ticket_number = sr_ticket_number
>and sr_returned_date_sk = d2.d_date_sk
>and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3’);
> {code}
> The store_sales table is partitioned on ss_sold_date_sk, which is also used 
> in a join clause. The join clause should add a filter “filterExpr: 
> ss_sold_date_sk is not null”, which should get pushed the MetaStore when 
> fetching the stats. Currently this is not done in CBO planning, which results 
> in the stats from __HIVE_DEFAULT_PARTITION__ to be fetched and considered in 
> the optimization phase. In particular, this increases the NDV for the join 
> columns and may result in wrong planning.
> Including HiveJoinAddNotNullRule in the optimization phase solves this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12407) Check fetch property to determine if a SortLimit contains a limit operation

2015-11-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004163#comment-15004163
 ] 

Ashutosh Chauhan commented on HIVE-12407:
-

+1 LGTM

> Check fetch property to determine if a SortLimit contains a limit operation
> ---
>
> Key: HIVE-12407
> URL: https://issues.apache.org/jira/browse/HIVE-12407
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12407.patch
>
>
> Now that Calcite 1.5 went in, sometimes we end up with Sort and Limit 
> operations in the same operator. limitRelNode in HiveCalciteUtil should check 
> the fetch property of the SortLimit operator to determine if an operator is a 
> Limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12404) Orc ppd throws exception if types don't match

2015-11-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004262#comment-15004262
 ] 

Hive QA commented on HIVE-12404:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772123/HIVE-12404.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 9783 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hadoop.hive.ql.io.sarg.TestSearchArgumentImpl.testBadLiteral
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testConcurrentStatements
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6028/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6028/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6028/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772123 - PreCommit-HIVE-TRUNK-Build

> Orc ppd throws exception if types don't match
> -
>
> Key: HIVE-12404
> URL: https://issues.apache.org/jira/browse/HIVE-12404
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12404.patch
>
>
> When type of constant value and column doesn't match, currently Hive throws 
> up.
> {code}
> java.lang.IllegalArgumentException: Wrong value class java.lang.Integer for 
> BOOLEAN.LESS_THAN_EQUALS leaf
> at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl.(SearchArgumentImpl.java:63)
> at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$BuilderImpl.lessThanEquals(SearchArgumentImpl.java:304)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.createLeaf(ConvertAstToSearchArg.java:277)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.createLeaf(ConvertAstToSearchArg.java:326)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.parse(ConvertAstToSearchArg.java:386)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.addChildren(ConvertAstToSearchArg.java:331)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.parse(ConvertAstToSearchArg.java:370)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.addChildren(ConvertAstToSearchArg.java:331)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.parse(ConvertAstToSearchArg.java:366)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.(ConvertAstToSearchArg.java:68)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.create(ConvertAstToSearchArg.java:417)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.createFromConf(ConvertAstToSearchArg.java:436)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.(OrcInputFormat.java:484)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1121)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1207)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:369)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:481)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:160)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12404) Orc ppd throws exception if types don't match

2015-11-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004173#comment-15004173
 ] 

Ashutosh Chauhan commented on HIVE-12404:
-

Thats a valid argument. However, given that Hive has permitted these kind of 
operations historically, not supporting them for a particular file format will 
surprise user. Further, ORC already supports that as well, it throws exception 
only when a particular optimization is enabled. 
IMHO, in presence of such questionable semantics ORC should be defensive and 
should turn off ppd (which is what this patch does) and let upper layer do the 
evaluation of condition.

> Orc ppd throws exception if types don't match
> -
>
> Key: HIVE-12404
> URL: https://issues.apache.org/jira/browse/HIVE-12404
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12404.patch
>
>
> When type of constant value and column doesn't match, currently Hive throws 
> up.
> {code}
> java.lang.IllegalArgumentException: Wrong value class java.lang.Integer for 
> BOOLEAN.LESS_THAN_EQUALS leaf
> at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl.(SearchArgumentImpl.java:63)
> at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$BuilderImpl.lessThanEquals(SearchArgumentImpl.java:304)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.createLeaf(ConvertAstToSearchArg.java:277)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.createLeaf(ConvertAstToSearchArg.java:326)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.parse(ConvertAstToSearchArg.java:386)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.addChildren(ConvertAstToSearchArg.java:331)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.parse(ConvertAstToSearchArg.java:370)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.addChildren(ConvertAstToSearchArg.java:331)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.parse(ConvertAstToSearchArg.java:366)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.(ConvertAstToSearchArg.java:68)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.create(ConvertAstToSearchArg.java:417)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.createFromConf(ConvertAstToSearchArg.java:436)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.(OrcInputFormat.java:484)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1121)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1207)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:369)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:481)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:160)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12407) Check fetch property to determine if a SortLimit contains a limit operation

2015-11-13 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-12407:
---
Attachment: HIVE-12407.patch

> Check fetch property to determine if a SortLimit contains a limit operation
> ---
>
> Key: HIVE-12407
> URL: https://issues.apache.org/jira/browse/HIVE-12407
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12407.patch
>
>
> Now that Calcite 1.5 went in, sometimes we end up with Sort and Limit 
> operations in the same operator. limitRelNode in HiveCalciteUtil should check 
> the fetch property of the SortLimit operator to determine if an operator is a 
> Limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11488) Add sessionId and queryId info to HS2 log

2015-11-13 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11488:

Attachment: HIVE-11488.2.patch

> Add sessionId and queryId info to HS2 log
> -
>
> Key: HIVE-11488
> URL: https://issues.apache.org/jira/browse/HIVE-11488
> Project: Hive
>  Issue Type: New Feature
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11488.2.patch, HIVE-11488.patch
>
>
> Session is critical for a multi-user system like Hive. Currently Hive doesn't 
> log seessionId to the log file, which sometimes make debugging and analysis 
> difficult when multiple activities are going on at the same time and the log 
> from different sessions are mixed together.
> Currently, Hive already has the sessionId saved in SessionState and also 
> there is another sessionId in SessionHandle (Seems not used and I'm still 
> looking to understand it). Generally we should have one sessionId from the 
> beginning in the client side and server side. Seems we have some work on that 
> side first.
> The sessionId then can be added to log4j supported mapped diagnostic context 
> (MDC) and can be configured to output to log file through the log4j property. 
> MDC is per thread, so we need to add sessionId to the HS2 main thread and 
> then it will be inherited by the child threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11488) Add sessionId and queryId info to HS2 log

2015-11-13 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11488:

Attachment: (was: HIVE-11488.2.patch)

> Add sessionId and queryId info to HS2 log
> -
>
> Key: HIVE-11488
> URL: https://issues.apache.org/jira/browse/HIVE-11488
> Project: Hive
>  Issue Type: New Feature
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11488.2.patch, HIVE-11488.patch
>
>
> Session is critical for a multi-user system like Hive. Currently Hive doesn't 
> log seessionId to the log file, which sometimes make debugging and analysis 
> difficult when multiple activities are going on at the same time and the log 
> from different sessions are mixed together.
> Currently, Hive already has the sessionId saved in SessionState and also 
> there is another sessionId in SessionHandle (Seems not used and I'm still 
> looking to understand it). Generally we should have one sessionId from the 
> beginning in the client side and server side. Seems we have some work on that 
> side first.
> The sessionId then can be added to log4j supported mapped diagnostic context 
> (MDC) and can be configured to output to log file through the log4j property. 
> MDC is per thread, so we need to add sessionId to the HS2 main thread and 
> then it will be inherited by the child threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8396) HIVE CliDriver command splitting can be broken when comments are present

2015-11-13 Thread Elliot West (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-8396:
--
Affects Version/s: 0.14.0

> HIVE CliDriver command splitting can be broken when comments are present
> 
>
> Key: HIVE-8396
> URL: https://issues.apache.org/jira/browse/HIVE-8396
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Sergey Shelukhin
>
> {noformat}
> -- SORT_QUERY_RESULTS
> set hive.cbo.enable=true;
> ... commands ...
> {noformat}
> causes
> {noformat}
> 2014-10-07 18:55:57,193 ERROR ql.Driver (SessionState.java:printError(825)) - 
> FAILED: ParseException line 2:4 missing KW_ROLE at 'hive' near 'hive'
> {noformat}
> If the comment is moved after the command it works.
> I noticed this earlier when I comment out parts of some random q file for 
> debugging purposes, and it starts failing. This is annoying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12378) Exception on HBaseSerDe.serialize binary field

2015-11-13 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004314#comment-15004314
 ] 

Jimmy Xiang commented on HIVE-12378:


Binary field can't be null? What will happen with insert into table testhbaseb 
values(2, NULL)?

> Exception on HBaseSerDe.serialize binary field
> --
>
> Key: HIVE-12378
> URL: https://issues.apache.org/jira/browse/HIVE-12378
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler, Serializers/Deserializers
>Affects Versions: 1.0.0, 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12378.1.patch
>
>
> An issue was reproduced with the binary typed HBase columns in Hive:
> It works fine as below:
> CREATE TABLE test9 (key int, val string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = ":key,cf:val#b"
> );
> insert into test9 values(1,"hello");
> But when string type is changed to binary as:
> CREATE TABLE test2 (key int, val binary)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = ":key,cf:val#b"
> );
> insert into table test2 values(1, 'hello');
> The following exception is thrown:
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"tmp_values_col1":"1","tmp_values_col2":"hello"}
> ...
> Caused by: java.lang.RuntimeException: Hive internal error.
> at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:322)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:220)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serializeField(HBaseRowSerializer.java:194)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:118)
> at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:282)
> ... 16 more
> We should support hive binary type column for hbase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8396) HIVE CliDriver command splitting can be broken when comments are present

2015-11-13 Thread Elliot West (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-8396:
--
Component/s: Query Processor
 Parser

> HIVE CliDriver command splitting can be broken when comments are present
> 
>
> Key: HIVE-8396
> URL: https://issues.apache.org/jira/browse/HIVE-8396
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, Query Processor
>Affects Versions: 0.14.0
>Reporter: Sergey Shelukhin
>
> {noformat}
> -- SORT_QUERY_RESULTS
> set hive.cbo.enable=true;
> ... commands ...
> {noformat}
> causes
> {noformat}
> 2014-10-07 18:55:57,193 ERROR ql.Driver (SessionState.java:printError(825)) - 
> FAILED: ParseException line 2:4 missing KW_ROLE at 'hive' near 'hive'
> {noformat}
> If the comment is moved after the command it works.
> I noticed this earlier when I comment out parts of some random q file for 
> debugging purposes, and it starts failing. This is annoying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12402) Split hive.root.logger separately to make it compatible with log4j1.x

2015-11-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004135#comment-15004135
 ] 

Ashutosh Chauhan commented on HIVE-12402:
-

Instead of direct reference to log4j::Level its better to use slf4j api to 
configure logging for log4j. Thats the whole point of using facade like slf4j.

> Split hive.root.logger separately to make it compatible with log4j1.x
> -
>
> Key: HIVE-12402
> URL: https://issues.apache.org/jira/browse/HIVE-12402
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12402.patch
>
>
> With new Log4j2.x specifying logger name and log level together will not work.
> With old logger following will work
> --hiveconf hive.root.logger=DEBUG,console
> But with new logger we should specify logger and level separately
> --hiveconf hive.root.logger=console --hiveconf hive.log.level=DEBUG
> We can do this change internally for users still using the old configs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12378) Exception on HBaseSerDe.serialize binary field

2015-11-13 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004409#comment-15004409
 ] 

Yongzhi Chen commented on HIVE-12378:
-

Binary can not be null. This is consistent with other data types for hive hbase 
tables, for example if I tried to insert into test9 values (5, NULL); test9 
second column is string; or test1(second column is int)  I got similar 
exception:
{noformat}
URL:
  
http://ychencdh57-1.vpc.cloudera.com:8088/taskdetails.jsp?jobid=job_1447108763205_0022=task_1447108763205_0022_m_00
-
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {"tmp_values_col1":"5","tmp_values_col2":null}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {"tmp_values_col1":"5","tmp_values_col2":null}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more
Caused by: java.lang.IllegalArgumentException: No columns to insert
at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:1561)
at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.validatePut(BufferedMutatorImpl.java:147)
at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.doMutate(BufferedMutatorImpl.java:134)
at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:98)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1105)
at 
org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat$MyRecordWriter.write(HiveHBaseTableOutputFormat.java:146)
at 
org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat$MyRecordWriter.write(HiveHBaseTableOutputFormat.java:117)
at 
org.apache.hadoop.hive.ql.io.HivePassThroughRecordWriter.write(HivePassThroughRecordWriter.java:40)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:695)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
... 9 more

{noformat}

following code is added because  I want LazyBioBinary is consistent LazyBinary.
In the LazyBinary.init method, it calls super.init(bytes, start, length) which 
is the LazyObject.init and it is the same code as following:
{noformat}
if (bytes == null) {
  throw new RuntimeException("bytes cannot be null!");
}
this.isNull = false;
{noformat}






> Exception on HBaseSerDe.serialize binary field
> --
>
> Key: HIVE-12378
> URL: https://issues.apache.org/jira/browse/HIVE-12378
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler, Serializers/Deserializers
>Affects Versions: 1.0.0, 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12378.1.patch
>
>
> An issue was reproduced with the binary typed HBase columns in Hive:
> It works fine as below:
> CREATE TABLE test9 (key int, val string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = ":key,cf:val#b"
> );
> insert into test9 values(1,"hello");
> But when string type is changed to binary as:
> CREATE TABLE test2 (key int, val binary)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = ":key,cf:val#b"
> );
> insert into table test2 values(1, 'hello');
> The following exception is thrown:
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row 

[jira] [Commented] (HIVE-11488) Add sessionId and queryId info to HS2 log

2015-11-13 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004525#comment-15004525
 ] 

Szehon Ho commented on HIVE-11488:
--

OK that makes sense.. maybe put a comment that its for HiveCLI case?

> Add sessionId and queryId info to HS2 log
> -
>
> Key: HIVE-11488
> URL: https://issues.apache.org/jira/browse/HIVE-11488
> Project: Hive
>  Issue Type: New Feature
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11488.2.patch, HIVE-11488.patch
>
>
> Session is critical for a multi-user system like Hive. Currently Hive doesn't 
> log seessionId to the log file, which sometimes make debugging and analysis 
> difficult when multiple activities are going on at the same time and the log 
> from different sessions are mixed together.
> Currently, Hive already has the sessionId saved in SessionState and also 
> there is another sessionId in SessionHandle (Seems not used and I'm still 
> looking to understand it). Generally we should have one sessionId from the 
> beginning in the client side and server side. Seems we have some work on that 
> side first.
> The sessionId then can be added to log4j supported mapped diagnostic context 
> (MDC) and can be configured to output to log file through the log4j property. 
> MDC is per thread, so we need to add sessionId to the HS2 main thread and 
> then it will be inherited by the child threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12319) Remove HadoopShims::getHadoopConfNames()

2015-11-13 Thread Aleksei Statkevich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004361#comment-15004361
 ] 

Aleksei Statkevich commented on HIVE-12319:
---

Before the change these config names with default values were already present 
in HiveConf. Do you think they should be removed from HiveConf as well?

> Remove HadoopShims::getHadoopConfNames()
> 
>
> Key: HIVE-12319
> URL: https://issues.apache.org/jira/browse/HIVE-12319
> Project: Hive
>  Issue Type: Improvement
>  Components: Shims
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Aleksei Statkevich
> Attachments: HIVE-12319.patch
>
>
> It was introduced in HIVE-6159 It has served its purpose now that we support 
> only Hadoop 2.x line



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12402) Split hive.root.logger separately to make it compatible with log4j1.x

2015-11-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004471#comment-15004471
 ] 

Sergey Shelukhin commented on HIVE-12402:
-

It doesn't look like it will correctly deal with hive.root.logger=INFO which 
IIRC used to be legal. Otherwise looks good

> Split hive.root.logger separately to make it compatible with log4j1.x
> -
>
> Key: HIVE-12402
> URL: https://issues.apache.org/jira/browse/HIVE-12402
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12402.patch
>
>
> With new Log4j2.x specifying logger name and log level together will not work.
> With old logger following will work
> --hiveconf hive.root.logger=DEBUG,console
> But with new logger we should specify logger and level separately
> --hiveconf hive.root.logger=console --hiveconf hive.log.level=DEBUG
> We can do this change internally for users still using the old configs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11488) Add sessionId and queryId info to HS2 log

2015-11-13 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004528#comment-15004528
 ] 

Aihua Xu commented on HIVE-11488:
-

Yeah. Good idea. We can remove that when we really remove HiveCLI.

> Add sessionId and queryId info to HS2 log
> -
>
> Key: HIVE-11488
> URL: https://issues.apache.org/jira/browse/HIVE-11488
> Project: Hive
>  Issue Type: New Feature
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11488.2.patch, HIVE-11488.patch
>
>
> Session is critical for a multi-user system like Hive. Currently Hive doesn't 
> log seessionId to the log file, which sometimes make debugging and analysis 
> difficult when multiple activities are going on at the same time and the log 
> from different sessions are mixed together.
> Currently, Hive already has the sessionId saved in SessionState and also 
> there is another sessionId in SessionHandle (Seems not used and I'm still 
> looking to understand it). Generally we should have one sessionId from the 
> beginning in the client side and server side. Seems we have some work on that 
> side first.
> The sessionId then can be added to log4j supported mapped diagnostic context 
> (MDC) and can be configured to output to log file through the log4j property. 
> MDC is per thread, so we need to add sessionId to the HS2 main thread and 
> then it will be inherited by the child threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11488) Add sessionId and queryId info to HS2 log

2015-11-13 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004497#comment-15004497
 ] 

Aihua Xu commented on HIVE-11488:
-

1. It will log sessionId and queryId for all the operations since I added to 
Operation base class.
2. Yeah. I didn't know that property. That would be the right place to add. 
Will update that.
3. You are right. We don't need that any more. I will remove that.


> Add sessionId and queryId info to HS2 log
> -
>
> Key: HIVE-11488
> URL: https://issues.apache.org/jira/browse/HIVE-11488
> Project: Hive
>  Issue Type: New Feature
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11488.2.patch, HIVE-11488.patch
>
>
> Session is critical for a multi-user system like Hive. Currently Hive doesn't 
> log seessionId to the log file, which sometimes make debugging and analysis 
> difficult when multiple activities are going on at the same time and the log 
> from different sessions are mixed together.
> Currently, Hive already has the sessionId saved in SessionState and also 
> there is another sessionId in SessionHandle (Seems not used and I'm still 
> looking to understand it). Generally we should have one sessionId from the 
> beginning in the client side and server side. Seems we have some work on that 
> side first.
> The sessionId then can be added to log4j supported mapped diagnostic context 
> (MDC) and can be configured to output to log file through the log4j property. 
> MDC is per thread, so we need to add sessionId to the HS2 main thread and 
> then it will be inherited by the child threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11488) Add sessionId and queryId info to HS2 log

2015-11-13 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004524#comment-15004524
 ] 

Aihua Xu commented on HIVE-11488:
-

Actually during compile(), we may still need to keep generating queryId code. 
Since if you use Hive CLI, you don't go through Session, you won't create 
queryId there. Although Hive CLI is deprecated, better to keep it for now.

> Add sessionId and queryId info to HS2 log
> -
>
> Key: HIVE-11488
> URL: https://issues.apache.org/jira/browse/HIVE-11488
> Project: Hive
>  Issue Type: New Feature
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11488.2.patch, HIVE-11488.patch
>
>
> Session is critical for a multi-user system like Hive. Currently Hive doesn't 
> log seessionId to the log file, which sometimes make debugging and analysis 
> difficult when multiple activities are going on at the same time and the log 
> from different sessions are mixed together.
> Currently, Hive already has the sessionId saved in SessionState and also 
> there is another sessionId in SessionHandle (Seems not used and I'm still 
> looking to understand it). Generally we should have one sessionId from the 
> beginning in the client side and server side. Seems we have some work on that 
> side first.
> The sessionId then can be added to log4j supported mapped diagnostic context 
> (MDC) and can be configured to output to log file through the log4j property. 
> MDC is per thread, so we need to add sessionId to the HS2 main thread and 
> then it will be inherited by the child threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11488) Add sessionId and queryId info to HS2 log

2015-11-13 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004533#comment-15004533
 ] 

Szehon Ho commented on HIVE-11488:
--

OK thanks.

> Add sessionId and queryId info to HS2 log
> -
>
> Key: HIVE-11488
> URL: https://issues.apache.org/jira/browse/HIVE-11488
> Project: Hive
>  Issue Type: New Feature
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11488.2.patch, HIVE-11488.patch
>
>
> Session is critical for a multi-user system like Hive. Currently Hive doesn't 
> log seessionId to the log file, which sometimes make debugging and analysis 
> difficult when multiple activities are going on at the same time and the log 
> from different sessions are mixed together.
> Currently, Hive already has the sessionId saved in SessionState and also 
> there is another sessionId in SessionHandle (Seems not used and I'm still 
> looking to understand it). Generally we should have one sessionId from the 
> beginning in the client side and server side. Seems we have some work on that 
> side first.
> The sessionId then can be added to log4j supported mapped diagnostic context 
> (MDC) and can be configured to output to log file through the log4j property. 
> MDC is per thread, so we need to add sessionId to the HS2 main thread and 
> then it will be inherited by the child threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12319) Remove HadoopShims::getHadoopConfNames()

2015-11-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004609#comment-15004609
 ] 

Hive QA commented on HIVE-12319:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772126/HIVE-12319.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 9745 tests executed
*Failed tests:*
{noformat}
TestMiniLlapCliDriver - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_acid
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6029/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6029/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6029/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772126 - PreCommit-HIVE-TRUNK-Build

> Remove HadoopShims::getHadoopConfNames()
> 
>
> Key: HIVE-12319
> URL: https://issues.apache.org/jira/browse/HIVE-12319
> Project: Hive
>  Issue Type: Improvement
>  Components: Shims
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Aleksei Statkevich
> Attachments: HIVE-12319.patch
>
>
> It was introduced in HIVE-6159 It has served its purpose now that we support 
> only Hadoop 2.x line



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11488) Add sessionId and queryId info to HS2 log

2015-11-13 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004364#comment-15004364
 ] 

Szehon Ho commented on HIVE-11488:
--

Hi Aihua, this looks very useful for supportability.

Couple of comments/questions:  

1. Should we log sessionId for more operations than just SQLOperation?  Should 
we log it on base Operation?  I understand maybe only SQLOperation has queryId.
2. Instead of adding hive.query.id manually to the whitelist in the tests, can 
we just add it to the default?like HiveConf.sqlStdAuthSafeVarNames?
3. Is there any need now to even have queryId be generated in compile phase, if 
you are generating on beginning of the session?

Thanks

> Add sessionId and queryId info to HS2 log
> -
>
> Key: HIVE-11488
> URL: https://issues.apache.org/jira/browse/HIVE-11488
> Project: Hive
>  Issue Type: New Feature
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11488.2.patch, HIVE-11488.patch
>
>
> Session is critical for a multi-user system like Hive. Currently Hive doesn't 
> log seessionId to the log file, which sometimes make debugging and analysis 
> difficult when multiple activities are going on at the same time and the log 
> from different sessions are mixed together.
> Currently, Hive already has the sessionId saved in SessionState and also 
> there is another sessionId in SessionHandle (Seems not used and I'm still 
> looking to understand it). Generally we should have one sessionId from the 
> beginning in the client side and server side. Seems we have some work on that 
> side first.
> The sessionId then can be added to log4j supported mapped diagnostic context 
> (MDC) and can be configured to output to log file through the log4j property. 
> MDC is per thread, so we need to add sessionId to the HS2 main thread and 
> then it will be inherited by the child threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-1841) datanucleus.fixedDatastore should be true in hive-default.xml

2015-11-13 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004426#comment-15004426
 ] 

Sushanth Sowmyan commented on HIVE-1841:


+1.

While it is important that users be able to quickly try out hive, SchemaTool 
helps bridge that gap. I think it's a good time to disable autocreate schema 
and enable fixedDatastore.

>  datanucleus.fixedDatastore should be true in hive-default.xml
> --
>
> Key: HIVE-1841
> URL: https://issues.apache.org/jira/browse/HIVE-1841
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Metastore
>Affects Versions: 0.6.0
>Reporter: Edward Capriolo
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-1841.1.patch.txt, HIVE-1841.patch
>
>
> Two datanucleus variables:
> {noformat}
> 
>  datanucleus.autoCreateSchema
>  false
> 
> 
>  datanucleus.fixedDatastore
>  true
> 
> {noformat}
> are dangerous.  We do want the schema to auto-create itself, but we do not 
> want the schema to auto update itself. 
> Someone might accidentally point a trunk at the wrong meta-store and 
> unknowingly update. I believe we should set this to false and possibly trap 
> exceptions stemming from hive wanting to do any update. This way someone has 
> to actively acknowledge the update, by setting this to true and then starting 
> up hive, or leaving it false, removing schema modifies for the user that hive 
> usages, and doing all the time and doing the updates by hand. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12402) Split hive.root.logger separately to make it compatible with log4j1.x

2015-11-13 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004560#comment-15004560
 ] 

Prasanth Jayachandran commented on HIVE-12402:
--

I tried that first but unfortunately there is no Level in sl4j. It just 
provides api/spi but Level seems to impl specific.

> Split hive.root.logger separately to make it compatible with log4j1.x
> -
>
> Key: HIVE-12402
> URL: https://issues.apache.org/jira/browse/HIVE-12402
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12402.patch
>
>
> With new Log4j2.x specifying logger name and log level together will not work.
> With old logger following will work
> --hiveconf hive.root.logger=DEBUG,console
> But with new logger we should specify logger and level separately
> --hiveconf hive.root.logger=console --hiveconf hive.log.level=DEBUG
> We can do this change internally for users still using the old configs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11488) Add sessionId and queryId info to HS2 log

2015-11-13 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11488:

Attachment: HIVE-11488.3.patch

Attach the new patch addressing the comments.

> Add sessionId and queryId info to HS2 log
> -
>
> Key: HIVE-11488
> URL: https://issues.apache.org/jira/browse/HIVE-11488
> Project: Hive
>  Issue Type: New Feature
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11488.2.patch, HIVE-11488.3.patch, HIVE-11488.patch
>
>
> Session is critical for a multi-user system like Hive. Currently Hive doesn't 
> log seessionId to the log file, which sometimes make debugging and analysis 
> difficult when multiple activities are going on at the same time and the log 
> from different sessions are mixed together.
> Currently, Hive already has the sessionId saved in SessionState and also 
> there is another sessionId in SessionHandle (Seems not used and I'm still 
> looking to understand it). Generally we should have one sessionId from the 
> beginning in the client side and server side. Seems we have some work on that 
> side first.
> The sessionId then can be added to log4j supported mapped diagnostic context 
> (MDC) and can be configured to output to log file through the log4j property. 
> MDC is per thread, so we need to add sessionId to the HS2 main thread and 
> then it will be inherited by the child threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12319) Remove HadoopShims::getHadoopConfNames()

2015-11-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1500#comment-1500
 ] 

Ashutosh Chauhan commented on HIVE-12319:
-

yes.
e.g., we should directly reference FileInputFormat.INPUT_DIR in Hive wherever 
we are using those inside Hive. Having property with same name redefined in 
Hive is confusing.

> Remove HadoopShims::getHadoopConfNames()
> 
>
> Key: HIVE-12319
> URL: https://issues.apache.org/jira/browse/HIVE-12319
> Project: Hive
>  Issue Type: Improvement
>  Components: Shims
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Aleksei Statkevich
> Attachments: HIVE-12319.patch
>
>
> It was introduced in HIVE-6159 It has served its purpose now that we support 
> only Hadoop 2.x line



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11775) Implement limit push down through union all in CBO

2015-11-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11775:
---
Attachment: HIVE-11775.02.patch

> Implement limit push down through union all in CBO
> --
>
> Key: HIVE-11775
> URL: https://issues.apache.org/jira/browse/HIVE-11775
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11775.01.patch, HIVE-11775.02.patch
>
>
> Enlightened by HIVE-11684 (Kudos to [~jcamachorodriguez]), we can actually 
> push limit down through union all, which reduces the intermediate number of 
> rows in union branches. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11718) JDBC ResultSet.setFetchSize(0) returns no results

2015-11-13 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11718:

Fix Version/s: 1.3.0

> JDBC ResultSet.setFetchSize(0) returns no results
> -
>
> Key: HIVE-11718
> URL: https://issues.apache.org/jira/browse/HIVE-11718
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.1
>Reporter: Son Nguyen
>Assignee: Aleksei Statkevich
> Fix For: 1.3.0, 2.00
>
> Attachments: HIVE-11718-branch-1.patch, HIVE-11718.patch
>
>
> Hi,
> According to JDBC document, the driver setFetchSize(0) should ignore, but 
> Hive JDBC driver returns no result.
> Our product uses setFetchSize to fine tune performance, sometimes we would 
> like to leave setFetchSize(0) up to the driver to make best guess of the 
> fetch size.
> Thanks
> Son Nguyen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11718) JDBC ResultSet.setFetchSize(0) returns no results

2015-11-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004510#comment-15004510
 ] 

Sergey Shelukhin commented on HIVE-11718:
-

Committed to branch-1

> JDBC ResultSet.setFetchSize(0) returns no results
> -
>
> Key: HIVE-11718
> URL: https://issues.apache.org/jira/browse/HIVE-11718
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.1
>Reporter: Son Nguyen
>Assignee: Aleksei Statkevich
> Fix For: 1.3.0, 2.00
>
> Attachments: HIVE-11718-branch-1.patch, HIVE-11718.patch
>
>
> Hi,
> According to JDBC document, the driver setFetchSize(0) should ignore, but 
> Hive JDBC driver returns no result.
> Our product uses setFetchSize to fine tune performance, sometimes we would 
> like to leave setFetchSize(0) up to the driver to make best guess of the 
> fetch size.
> Thanks
> Son Nguyen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11488) Add sessionId and queryId info to HS2 log

2015-11-13 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004526#comment-15004526
 ] 

Aihua Xu commented on HIVE-11488:
-

queryId applies to all the operations actually. The explicit need of 
registration is because we are executing SQLOperation synchronizingly in 
another thread (from the thread pool). In that case, you need to 
register/unregister in that thread.

> Add sessionId and queryId info to HS2 log
> -
>
> Key: HIVE-11488
> URL: https://issues.apache.org/jira/browse/HIVE-11488
> Project: Hive
>  Issue Type: New Feature
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11488.2.patch, HIVE-11488.patch
>
>
> Session is critical for a multi-user system like Hive. Currently Hive doesn't 
> log seessionId to the log file, which sometimes make debugging and analysis 
> difficult when multiple activities are going on at the same time and the log 
> from different sessions are mixed together.
> Currently, Hive already has the sessionId saved in SessionState and also 
> there is another sessionId in SessionHandle (Seems not used and I'm still 
> looking to understand it). Generally we should have one sessionId from the 
> beginning in the client side and server side. Seems we have some work on that 
> side first.
> The sessionId then can be added to log4j supported mapped diagnostic context 
> (MDC) and can be configured to output to log file through the log4j property. 
> MDC is per thread, so we need to add sessionId to the HS2 main thread and 
> then it will be inherited by the child threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-1841) datanucleus.fixedDatastore should be true in hive-default.xml

2015-11-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-1841:
---
Attachment: HIVE-1841.2.patch

Fixed test failures.

>  datanucleus.fixedDatastore should be true in hive-default.xml
> --
>
> Key: HIVE-1841
> URL: https://issues.apache.org/jira/browse/HIVE-1841
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Metastore
>Affects Versions: 0.6.0
>Reporter: Edward Capriolo
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-1841.1.patch.txt, HIVE-1841.2.patch, HIVE-1841.patch
>
>
> Two datanucleus variables:
> {noformat}
> 
>  datanucleus.autoCreateSchema
>  false
> 
> 
>  datanucleus.fixedDatastore
>  true
> 
> {noformat}
> are dangerous.  We do want the schema to auto-create itself, but we do not 
> want the schema to auto update itself. 
> Someone might accidentally point a trunk at the wrong meta-store and 
> unknowingly update. I believe we should set this to false and possibly trap 
> exceptions stemming from hive wanting to do any update. This way someone has 
> to actively acknowledge the update, by setting this to true and then starting 
> up hive, or leaving it false, removing schema modifies for the user that hive 
> usages, and doing all the time and doing the updates by hand. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11684) Implement limit pushdown through outer join in CBO

2015-11-13 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11684:
---
Attachment: HIVE-11684.19.patch

HIVE-11684.19.patch addresses [~jpullokkaran]'s comments.

I also regenerated a few q files. Although the optimization is disabled by 
default, the changes in the q files are due to the additions to the metadata 
providers (HiveRelMdSelectivity and HiveRelMdRowCount) concerning left/right 
outer joins.

> Implement limit pushdown through outer join in CBO
> --
>
> Key: HIVE-11684
> URL: https://issues.apache.org/jira/browse/HIVE-11684
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11684.01.patch, HIVE-11684.02.patch, 
> HIVE-11684.03.patch, HIVE-11684.04.patch, HIVE-11684.05.patch, 
> HIVE-11684.07.patch, HIVE-11684.08.patch, HIVE-11684.09.patch, 
> HIVE-11684.10.patch, HIVE-11684.11.patch, HIVE-11684.12.patch, 
> HIVE-11684.12.patch, HIVE-11684.14.patch, HIVE-11684.15.patch, 
> HIVE-11684.16.patch, HIVE-11684.17.patch, HIVE-11684.18.patch, 
> HIVE-11684.19.patch, HIVE-11684.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12405) Comparison bug in HiveSplitGenerator.InputSplitComparator#compare()

2015-11-13 Thread Aleksei Statkevich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004365#comment-15004365
 ] 

Aleksei Statkevich commented on HIVE-12405:
---

I already checked. In branch-1 splits are not sorted, so there's no comparator 
and no such bug as a result.

> Comparison bug in HiveSplitGenerator.InputSplitComparator#compare()
> ---
>
> Key: HIVE-12405
> URL: https://issues.apache.org/jira/browse/HIVE-12405
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Aleksei Statkevich
>Assignee: Aleksei Statkevich
> Attachments: HIVE-12405.patch
>
>
> "compare()" method in HiveSplitGenerator.InputSplitComparator has the 
> following condition on line 281 which is always false and is most likely a 
> typo:
> {code}
> if (startPos1 > startPos1) {
> {code}
> As a result, in certain conditions splits might be sorted in incorrect order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12402) Split hive.root.logger separately to make it compatible with log4j1.x

2015-11-13 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004645#comment-15004645
 ] 

Prasanth Jayachandran commented on HIVE-12402:
--

I tried branch-1 with just hive.root.logger=DEBUG but it doesn't seem to be 
doing anything. No debug logs in console or hive.log location. It accepts it 
but doesn't log anywhere. IMO, we shouldn't encourage that usage. 

> Split hive.root.logger separately to make it compatible with log4j1.x
> -
>
> Key: HIVE-12402
> URL: https://issues.apache.org/jira/browse/HIVE-12402
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12402.patch
>
>
> With new Log4j2.x specifying logger name and log level together will not work.
> With old logger following will work
> --hiveconf hive.root.logger=DEBUG,console
> But with new logger we should specify logger and level separately
> --hiveconf hive.root.logger=console --hiveconf hive.log.level=DEBUG
> We can do this change internally for users still using the old configs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12175) Upgrade Kryo version to 3.0.x

2015-11-13 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-12175:
-
Attachment: HIVE-12175.3.patch

Rebased .3 after sl4j commit

> Upgrade Kryo version to 3.0.x
> -
>
> Key: HIVE-12175
> URL: https://issues.apache.org/jira/browse/HIVE-12175
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12175.1.patch, HIVE-12175.2.patch, 
> HIVE-12175.3.patch, HIVE-12175.3.patch
>
>
> Current version of kryo (2.22) has some issue (refer exception below and in 
> HIVE-12174) with serializing ArrayLists generated using Arrays.asList(). We 
> need to either replace all occurrences of  Arrays.asList() or change the 
> current StdInstantiatorStrategy. This issue is fixed in later versions and 
> kryo community recommends using DefaultInstantiatorStrategy with fallback to 
> StdInstantiatorStrategy. More discussion about this issue is here 
> https://github.com/EsotericSoftware/kryo/issues/216. Alternatively, custom 
> serilization/deserilization class can be provided for Arrays.asList.
> Also, kryo 3.0 introduced unsafe based serialization which claims to have 
> much better performance for certain types of serialization. 
> Exception:
> {code}
> Caused by: java.lang.NullPointerException
>   at java.util.Arrays$ArrayList.size(Arrays.java:2847)
>   at java.util.AbstractList.add(AbstractList.java:108)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   ... 57 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12411) Remove counter based stats collection mechanism

2015-11-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12411:
---
Attachment: HIVE-12411.01.patch

> Remove counter based stats collection mechanism
> ---
>
> Key: HIVE-12411
> URL: https://issues.apache.org/jira/browse/HIVE-12411
> Project: Hive
>  Issue Type: Task
>  Components: Statistics
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12411.01.patch
>
>
> Following HIVE-12005, HIVE-12164, we have removed jdbc and hbase stats 
> collection mechanism. Now we are targeting counter based stats collection 
> mechanism. The main advantages are as follows (1) counter based stats has 
> limitation on the length of the counter itself, if it is too long, MD5 will 
> be applied. (2) when there are a large number of partitions and columns, we 
> need to create a large number of counters in memory. This will put a heavy 
> load on the M/R AM or Tez AM etc. FS based stats will do a better job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11120) Generic interface for file format validation

2015-11-13 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004696#comment-15004696
 ] 

Prasanth Jayachandran commented on HIVE-11120:
--

The test failures are unrelated to this patch. It's happening for other patches 
as well.

> Generic interface for file format validation
> 
>
> Key: HIVE-11120
> URL: https://issues.apache.org/jira/browse/HIVE-11120
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11120-branch-1.patch, HIVE-11120.2.patch, 
> HIVE-11120.3.patch, HIVE-11120.4.patch, HIVE-11120.patch
>
>
> https://issues.apache.org/jira/browse/HIVE-8?focusedCommentId=14602302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14602302
> We need generic interfaces for verify if a specified file is of valid format 
> so that load data statement can make some sanity check before copying the 
> file to destination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11120) Generic interface for file format validation

2015-11-13 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004698#comment-15004698
 ] 

Prasanth Jayachandran commented on HIVE-11120:
--

Committed to branch-1 and master. Thanks [~xuefuz] for the review!

> Generic interface for file format validation
> 
>
> Key: HIVE-11120
> URL: https://issues.apache.org/jira/browse/HIVE-11120
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11120-branch-1.patch, HIVE-11120.2.patch, 
> HIVE-11120.3.patch, HIVE-11120.4.patch, HIVE-11120.patch
>
>
> https://issues.apache.org/jira/browse/HIVE-8?focusedCommentId=14602302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14602302
> We need generic interfaces for verify if a specified file is of valid format 
> so that load data statement can make some sanity check before copying the 
> file to destination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-12175) Upgrade Kryo version to 3.0.x

2015-11-13 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004656#comment-15004656
 ] 

Prasanth Jayachandran edited comment on HIVE-12175 at 11/13/15 8:23 PM:


Rebased .3 after slf4j commit


was (Author: prasanth_j):
Rebased .3 after sl4j commit

> Upgrade Kryo version to 3.0.x
> -
>
> Key: HIVE-12175
> URL: https://issues.apache.org/jira/browse/HIVE-12175
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12175.1.patch, HIVE-12175.2.patch, 
> HIVE-12175.3.patch, HIVE-12175.3.patch
>
>
> Current version of kryo (2.22) has some issue (refer exception below and in 
> HIVE-12174) with serializing ArrayLists generated using Arrays.asList(). We 
> need to either replace all occurrences of  Arrays.asList() or change the 
> current StdInstantiatorStrategy. This issue is fixed in later versions and 
> kryo community recommends using DefaultInstantiatorStrategy with fallback to 
> StdInstantiatorStrategy. More discussion about this issue is here 
> https://github.com/EsotericSoftware/kryo/issues/216. Alternatively, custom 
> serilization/deserilization class can be provided for Arrays.asList.
> Also, kryo 3.0 introduced unsafe based serialization which claims to have 
> much better performance for certain types of serialization. 
> Exception:
> {code}
> Caused by: java.lang.NullPointerException
>   at java.util.Arrays$ArrayList.size(Arrays.java:2847)
>   at java.util.AbstractList.add(AbstractList.java:108)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   ... 57 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11120) Generic interface for file format validation

2015-11-13 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11120:
-
Attachment: HIVE-11120-branch-1.patch

Attaching branch-1 patch

> Generic interface for file format validation
> 
>
> Key: HIVE-11120
> URL: https://issues.apache.org/jira/browse/HIVE-11120
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11120-branch-1.patch, HIVE-11120.2.patch, 
> HIVE-11120.3.patch, HIVE-11120.4.patch, HIVE-11120.patch
>
>
> https://issues.apache.org/jira/browse/HIVE-8?focusedCommentId=14602302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14602302
> We need generic interfaces for verify if a specified file is of valid format 
> so that load data statement can make some sanity check before copying the 
> file to destination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12405) Comparison bug in HiveSplitGenerator.InputSplitComparator#compare()

2015-11-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004746#comment-15004746
 ] 

Hive QA commented on HIVE-12405:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772134/HIVE-12405.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9781 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6030/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6030/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6030/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772134 - PreCommit-HIVE-TRUNK-Build

> Comparison bug in HiveSplitGenerator.InputSplitComparator#compare()
> ---
>
> Key: HIVE-12405
> URL: https://issues.apache.org/jira/browse/HIVE-12405
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Aleksei Statkevich
>Assignee: Aleksei Statkevich
> Attachments: HIVE-12405.patch
>
>
> "compare()" method in HiveSplitGenerator.InputSplitComparator has the 
> following condition on line 281 which is always false and is most likely a 
> typo:
> {code}
> if (startPos1 > startPos1) {
> {code}
> As a result, in certain conditions splits might be sorted in incorrect order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12378) Exception on HBaseSerDe.serialize binary field

2015-11-13 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004801#comment-15004801
 ] 

Jimmy Xiang commented on HIVE-12378:


Cool. Thanks for the explanation. +1

> Exception on HBaseSerDe.serialize binary field
> --
>
> Key: HIVE-12378
> URL: https://issues.apache.org/jira/browse/HIVE-12378
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler, Serializers/Deserializers
>Affects Versions: 1.0.0, 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12378.1.patch
>
>
> An issue was reproduced with the binary typed HBase columns in Hive:
> It works fine as below:
> CREATE TABLE test9 (key int, val string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = ":key,cf:val#b"
> );
> insert into test9 values(1,"hello");
> But when string type is changed to binary as:
> CREATE TABLE test2 (key int, val binary)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = ":key,cf:val#b"
> );
> insert into table test2 values(1, 'hello');
> The following exception is thrown:
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"tmp_values_col1":"1","tmp_values_col2":"hello"}
> ...
> Caused by: java.lang.RuntimeException: Hive internal error.
> at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:322)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:220)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serializeField(HBaseRowSerializer.java:194)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:118)
> at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:282)
> ... 16 more
> We should support hive binary type column for hbase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12402) Split hive.root.logger separately to make it compatible with log4j1.x

2015-11-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004810#comment-15004810
 ] 

Sergey Shelukhin commented on HIVE-12402:
-

+1

> Split hive.root.logger separately to make it compatible with log4j1.x
> -
>
> Key: HIVE-12402
> URL: https://issues.apache.org/jira/browse/HIVE-12402
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12402.patch
>
>
> With new Log4j2.x specifying logger name and log level together will not work.
> With old logger following will work
> --hiveconf hive.root.logger=DEBUG,console
> But with new logger we should specify logger and level separately
> --hiveconf hive.root.logger=console --hiveconf hive.log.level=DEBUG
> We can do this change internally for users still using the old configs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12341) LLAP: add security to daemon protocol endpoint (excluding shuffle)

2015-11-13 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12341:

Attachment: HIVE-12341.patch

With this patch, I can run a query not involving a shuffle successfully; I can 
also run a query with shuffle but it cannot write to a local FS, and after 
chmod 777 it cannot run because my cluster doesn't have native Hadoop set up, 
so I didn't bother with that much.
Sid told me that currently LLAP shuffle doesn't use tokens at all, so it does 
appear to get to the point where the request is being processed. I assume 
shuffle security would be added there and other shuffle issues resolved, in  
HIVE-12397.

[~gopalv] [~sseth] can you review?

> LLAP: add security to daemon protocol endpoint (excluding shuffle)
> --
>
> Key: HIVE-12341
> URL: https://issues.apache.org/jira/browse/HIVE-12341
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12341.WIP.nogen.patch, HIVE-12341.WIP.patch, 
> HIVE-12341.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-12341) LLAP: add security to daemon protocol endpoint (excluding shuffle)

2015-11-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004875#comment-15004875
 ] 

Sergey Shelukhin edited comment on HIVE-12341 at 11/13/15 11:12 PM:


With this patch, I can run a query not involving a shuffle successfully; I can 
also run a query with shuffle but it cannot write to a local FS, and after 
chmod 777 it cannot run because my cluster doesn't have native Hadoop set up, 
so I didn't bother with that much.
Sid told me that currently LLAP shuffle doesn't use tokens at all, so it does 
appear to get to the point where the request is being processed. I assume 
shuffle security would be added there and other shuffle issues resolved, in  
HIVE-12397.

[~gopalv] [~sseth] can you review? RB w/o the generated code at 
https://reviews.apache.org/r/40315/


was (Author: sershe):
With this patch, I can run a query not involving a shuffle successfully; I can 
also run a query with shuffle but it cannot write to a local FS, and after 
chmod 777 it cannot run because my cluster doesn't have native Hadoop set up, 
so I didn't bother with that much.
Sid told me that currently LLAP shuffle doesn't use tokens at all, so it does 
appear to get to the point where the request is being processed. I assume 
shuffle security would be added there and other shuffle issues resolved, in  
HIVE-12397.

[~gopalv] [~sseth] can you review?

> LLAP: add security to daemon protocol endpoint (excluding shuffle)
> --
>
> Key: HIVE-12341
> URL: https://issues.apache.org/jira/browse/HIVE-12341
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12341.WIP.nogen.patch, HIVE-12341.WIP.patch, 
> HIVE-12341.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11488) Add sessionId and queryId info to HS2 log

2015-11-13 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004620#comment-15004620
 ] 

Szehon Ho commented on HIVE-11488:
--

Looks ok to me, +1

> Add sessionId and queryId info to HS2 log
> -
>
> Key: HIVE-11488
> URL: https://issues.apache.org/jira/browse/HIVE-11488
> Project: Hive
>  Issue Type: New Feature
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11488.2.patch, HIVE-11488.3.patch, HIVE-11488.patch
>
>
> Session is critical for a multi-user system like Hive. Currently Hive doesn't 
> log seessionId to the log file, which sometimes make debugging and analysis 
> difficult when multiple activities are going on at the same time and the log 
> from different sessions are mixed together.
> Currently, Hive already has the sessionId saved in SessionState and also 
> there is another sessionId in SessionHandle (Seems not used and I'm still 
> looking to understand it). Generally we should have one sessionId from the 
> beginning in the client side and server side. Seems we have some work on that 
> side first.
> The sessionId then can be added to log4j supported mapped diagnostic context 
> (MDC) and can be configured to output to log file through the log4j property. 
> MDC is per thread, so we need to add sessionId to the HS2 main thread and 
> then it will be inherited by the child threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-11-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004738#comment-15004738
 ] 

Sergey Shelukhin commented on HIVE-11531:
-

It looks like many q.out files changed due to "Offset" being added. In any 
case, I think there's no need to output the offset 0, because that is the 
implied default for LIMIT. If you return null from the method, the explain 
setting will not be output.

The rest looks good to me, except for some spacing inconsistent with the 
surrounding file (e.g. ==2 w/o spaces when processing TOK_LIMIT, etc).

[~jcamachorodriguez] can you take a look at Calcite related changes, would this 
work with Calcite?



> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11358) LLAP: move LlapConfiguration into HiveConf

2015-11-13 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11358:

Assignee: Sergey Shelukhin
Target Version/s: 2.0.0

> LLAP: move LlapConfiguration into HiveConf
> --
>
> Key: HIVE-11358
> URL: https://issues.apache.org/jira/browse/HIVE-11358
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> Hive uses HiveConf for configuration. LlapConfiguration should be replaced 
> with parameters in HiveConf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11981) ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

2015-11-13 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11981:

Attachment: HIVE-11981.0991.patch

> ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)
> --
>
> Key: HIVE-11981
> URL: https://issues.apache.org/jira/browse/HIVE-11981
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11981.01.patch, HIVE-11981.02.patch, 
> HIVE-11981.03.patch, HIVE-11981.05.patch, HIVE-11981.06.patch, 
> HIVE-11981.07.patch, HIVE-11981.08.patch, HIVE-11981.09.patch, 
> HIVE-11981.091.patch, HIVE-11981.092.patch, HIVE-11981.093.patch, 
> HIVE-11981.094.patch, HIVE-11981.095.patch, HIVE-11981.096.patch, 
> HIVE-11981.097.patch, HIVE-11981.098.patch, HIVE-11981.099.patch, 
> HIVE-11981.0991.patch, ORC Schema Evolution Issues.docx
>
>
> High priority issues with schema evolution for the ORC file format.
> Schema evolution here is limited to adding new columns and a few cases of 
> column type-widening (e.g. int to bigint).
> Renaming columns, deleting column, moving columns and other schema evolution 
> were not pursued due to lack of importance and lack of time.  Also, it 
> appears a much more sophisticated metadata would be needed to support them.
> The biggest issues for users have been adding new columns for ACID table 
> (HIVE-11421 Support Schema evolution for ACID tables) and vectorization 
> (HIVE-10598 Vectorization borks when column is added to table).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11358) LLAP: move LlapConfiguration into HiveConf

2015-11-13 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11358:

Attachment: HIVE-11358.patch

Patch to move the parameters. [~sseth] can you review?

> LLAP: move LlapConfiguration into HiveConf
> --
>
> Key: HIVE-11358
> URL: https://issues.apache.org/jira/browse/HIVE-11358
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11358.patch
>
>
> Hive uses HiveConf for configuration. LlapConfiguration should be replaced 
> with parameters in HiveConf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12271) Add metrics around HS2 query execution and job submission for Hive

2015-11-13 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-12271:
-
Attachment: HIVE-12271.patch

Attaching first cut.

Rb: [https://reviews.apache.org/r/40318/|https://reviews.apache.org/r/40318/]

> Add metrics around HS2 query execution and job submission for Hive 
> ---
>
> Key: HIVE-12271
> URL: https://issues.apache.org/jira/browse/HIVE-12271
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>Assignee: Szehon Ho
> Attachments: HIVE-12271.patch
>
>
> We should add more metrics around query execution. Specifically:
> * Number of in-use worker threads
> * Number of in-use async threads
> * Number of queries waiting for compilation
> * Stats for query planning / compilation time
> * Stats for total job submission time
> * Others?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12407) Check fetch property to determine if a SortLimit contains a limit operation

2015-11-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005141#comment-15005141
 ] 

Hive QA commented on HIVE-12407:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772220/HIVE-12407.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9768 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.initializationError
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6032/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6032/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6032/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772220 - PreCommit-HIVE-TRUNK-Build

> Check fetch property to determine if a SortLimit contains a limit operation
> ---
>
> Key: HIVE-12407
> URL: https://issues.apache.org/jira/browse/HIVE-12407
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12407.patch
>
>
> Now that Calcite 1.5 went in, sometimes we end up with Sort and Limit 
> operations in the same operator. limitRelNode in HiveCalciteUtil should check 
> the fetch property of the SortLimit operator to determine if an operator is a 
> Limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12184) DESCRIBE of fully qualified table fails when db and table name match and non-default database is in use

2015-11-13 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-12184:
-
Attachment: HIVE-12184.6.patch

I am attaching a new patch. This fix re-arranges the grammer rule for describe 
table.
With the prior version of the patch, the syntax was 
[DB.]TABLE [COLUMN] [PARTITION_SPEC]

WIth the new patch
[DB.]TABLE [PARTITION_SPEC] [COLUMN]



> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use
> ---
>
> Key: HIVE-12184
> URL: https://issues.apache.org/jira/browse/HIVE-12184
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>Assignee: Naveen Gangam
> Attachments: HIVE-12184.2.patch, HIVE-12184.3.patch, 
> HIVE-12184.4.patch, HIVE-12184.5.patch, HIVE-12184.6.patch, HIVE-12184.patch
>
>
> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use.
> Repro:
> {code}
> : jdbc:hive2://localhost:1/default> create database foo;
> No rows affected (0.116 seconds)
> 0: jdbc:hive2://localhost:1/default> create table foo.foo(i int);
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> +---++--+--+
> | col_name  | data_type  | comment  |
> +---++--+--+
> | i | int|  |
> +---++--+--+
> 1 row selected (0.049 seconds)
> 0: jdbc:hive2://localhost:1/default> use foo;
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from 
> serde.Invalid Field foo (state=08S01,code=1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12402) Split hive.root.logger separately to make it compatible with log4j1.x

2015-11-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004975#comment-15004975
 ] 

Hive QA commented on HIVE-12402:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772135/HIVE-12402.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9767 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_partition_diff_num_cols.q-vectorization_10.q-orc_merge9.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6031/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6031/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6031/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772135 - PreCommit-HIVE-TRUNK-Build

> Split hive.root.logger separately to make it compatible with log4j1.x
> -
>
> Key: HIVE-12402
> URL: https://issues.apache.org/jira/browse/HIVE-12402
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12402.patch
>
>
> With new Log4j2.x specifying logger name and log level together will not work.
> With old logger following will work
> --hiveconf hive.root.logger=DEBUG,console
> But with new logger we should specify logger and level separately
> --hiveconf hive.root.logger=console --hiveconf hive.log.level=DEBUG
> We can do this change internally for users still using the old configs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12413) Default mode for hive.mapred.mode should be strict

2015-11-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12413:

Attachment: HIVE-12413.patch

> Default mode for hive.mapred.mode should be strict
> --
>
> Key: HIVE-12413
> URL: https://issues.apache.org/jira/browse/HIVE-12413
> Project: Hive
>  Issue Type: Task
>  Components: Configuration
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12413.patch
>
>
> Non-strict mode allows some questionable semantics and questionable 
> operations. Its better that user makes a conscious choice to enable such a 
> behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12319) Remove HadoopShims::getHadoopConfNames()

2015-11-13 Thread Aleksei Statkevich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005155#comment-15005155
 ] 

Aleksei Statkevich commented on HIVE-12319:
---

Makes sense. I'll make this change.

> Remove HadoopShims::getHadoopConfNames()
> 
>
> Key: HIVE-12319
> URL: https://issues.apache.org/jira/browse/HIVE-12319
> Project: Hive
>  Issue Type: Improvement
>  Components: Shims
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Aleksei Statkevich
> Attachments: HIVE-12319.patch
>
>
> It was introduced in HIVE-6159 It has served its purpose now that we support 
> only Hadoop 2.x line



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12404) Orc ppd throws exception if types don't match

2015-11-13 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005181#comment-15005181
 ] 

Gopal V commented on HIVE-12404:


The semantics needs a fix, this is handled for some of the numeric types in 
TypeCheckProcFactory.

{code}
// Try to infer the type of the constant only if there are two
// nodes, one of them is column and the other is numeric const
{code}

That's the case which needs to inject an inference for Boolean.

bq. such questionable semantics ORC should be defensive and should turn off ppd

The value class equality is complex when you do something like {{cbyte > 0}} or 
{{cbigint > 0}}.

This approach might be defensive, but to handle it optimally this patch needs 
to handle all the implicit inferences implemented for HIVE-10286.

> Orc ppd throws exception if types don't match
> -
>
> Key: HIVE-12404
> URL: https://issues.apache.org/jira/browse/HIVE-12404
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12404.patch
>
>
> When type of constant value and column doesn't match, currently Hive throws 
> up.
> {code}
> java.lang.IllegalArgumentException: Wrong value class java.lang.Integer for 
> BOOLEAN.LESS_THAN_EQUALS leaf
> at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl.(SearchArgumentImpl.java:63)
> at 
> org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$BuilderImpl.lessThanEquals(SearchArgumentImpl.java:304)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.createLeaf(ConvertAstToSearchArg.java:277)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.createLeaf(ConvertAstToSearchArg.java:326)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.parse(ConvertAstToSearchArg.java:386)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.addChildren(ConvertAstToSearchArg.java:331)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.parse(ConvertAstToSearchArg.java:370)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.addChildren(ConvertAstToSearchArg.java:331)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.parse(ConvertAstToSearchArg.java:366)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.(ConvertAstToSearchArg.java:68)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.create(ConvertAstToSearchArg.java:417)
> at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.createFromConf(ConvertAstToSearchArg.java:436)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.(OrcInputFormat.java:484)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1121)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1207)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:369)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:481)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:160)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12175) Upgrade Kryo version to 3.0.x

2015-11-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003809#comment-15003809
 ] 

Hive QA commented on HIVE-12175:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772075/HIVE-12175.3.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6024/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6024/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6024/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6024/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   34d4276..c259669  branch-1   -> origin/branch-1
   55cb43d..d2fd006  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 55cb43d HIVE-11525: Tez Bucket pruning (Gopal V, reviewed by 
Sergey Shelukhin)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
+ git reset --hard origin/master
HEAD is now at d2fd006 HIVE-12391: SkewJoinOptimizer might not kick in if 
columns are renamed after TableScanOperator (Jesus Camacho Rodriguez, reviewed 
by Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772075 - PreCommit-HIVE-TRUNK-Build

> Upgrade Kryo version to 3.0.x
> -
>
> Key: HIVE-12175
> URL: https://issues.apache.org/jira/browse/HIVE-12175
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12175.1.patch, HIVE-12175.2.patch, 
> HIVE-12175.3.patch
>
>
> Current version of kryo (2.22) has some issue (refer exception below and in 
> HIVE-12174) with serializing ArrayLists generated using Arrays.asList(). We 
> need to either replace all occurrences of  Arrays.asList() or change the 
> current StdInstantiatorStrategy. This issue is fixed in later versions and 
> kryo community recommends using DefaultInstantiatorStrategy with fallback to 
> StdInstantiatorStrategy. More discussion about this issue is here 
> https://github.com/EsotericSoftware/kryo/issues/216. Alternatively, custom 
> serilization/deserilization class can be provided for Arrays.asList.
> Also, kryo 3.0 introduced unsafe based serialization which claims to have 
> much better performance for certain types of serialization. 
> Exception:
> {code}
> Caused by: java.lang.NullPointerException
>   at 

[jira] [Commented] (HIVE-12399) Native Vector MapJoin can encounter "Null key not expected in MapJoin" and "Unexpected NULL in map join small table" exceptions

2015-11-13 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003816#comment-15003816
 ] 

Matt McCline commented on HIVE-12399:
-

Test failures look unrelated.

> Native Vector MapJoin can encounter  "Null key not expected in MapJoin" and 
> "Unexpected NULL in map join small table" exceptions
> 
>
> Key: HIVE-12399
> URL: https://issues.apache.org/jira/browse/HIVE-12399
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12399.01.patch
>
>
> Instead of throw exception, just filter out NULLs in the Native Vector 
> MapJoin operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12378) Exception on HBaseSerDe.serialize binary field

2015-11-13 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004013#comment-15004013
 ] 

Yongzhi Chen commented on HIVE-12378:
-

[~jxiang], [~csun], Could you review the code? Thanks.

> Exception on HBaseSerDe.serialize binary field
> --
>
> Key: HIVE-12378
> URL: https://issues.apache.org/jira/browse/HIVE-12378
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler, Serializers/Deserializers
>Affects Versions: 1.0.0, 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12378.1.patch
>
>
> An issue was reproduced with the binary typed HBase columns in Hive:
> It works fine as below:
> CREATE TABLE test9 (key int, val string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = ":key,cf:val#b"
> );
> insert into test9 values(1,"hello");
> But when string type is changed to binary as:
> CREATE TABLE test2 (key int, val binary)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = ":key,cf:val#b"
> );
> insert into table test2 values(1, 'hello');
> The following exception is thrown:
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"tmp_values_col1":"1","tmp_values_col2":"hello"}
> ...
> Caused by: java.lang.RuntimeException: Hive internal error.
> at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:322)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:220)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serializeField(HBaseRowSerializer.java:194)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:118)
> at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:282)
> ... 16 more
> We should support hive binary type column for hbase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12406) HIVE-9500 introduced incompatible change to LazySimpleSerDe public interface

2015-11-13 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu reassigned HIVE-12406:
---

Assignee: Aihua Xu

> HIVE-9500 introduced incompatible change to LazySimpleSerDe public interface
> 
>
> Key: HIVE-12406
> URL: https://issues.apache.org/jira/browse/HIVE-12406
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.2.0
>Reporter: Lenni Kuff
>Assignee: Aihua Xu
>Priority: Blocker
>
> In the process of fixing HIVE-9500, an incompatibility was introduced that 
> will break 3rd party code that relies on LazySimpleSerde. In HIVE-9500, the 
> nested class SerDeParamaters was removed and the method 
> LazySimpleSerDe.initSerdeParms was also removed. They were replaced by a 
> standalone class LazySerDeParameters.
> Since this has already been released, I don't think we should revert the 
> change since that would mean breaking compatibility again. Instead, the best 
> approach would be to support both interfaces, if possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12367) Lock/unlock database should add current database to inputs and outputs of authz hook

2015-11-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003994#comment-15003994
 ] 

Hive QA commented on HIVE-12367:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772091/HIVE-12367.002.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 9783 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dbtxnmgr_nodblock
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dbtxnmgr_nodbunlock
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_lockneg_query_tbl_in_locked_db
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_lockneg_try_db_lock_conflict
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_lockneg_try_drop_locked_db
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_lockneg_try_lock_db_in_use
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6026/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6026/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6026/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772091 - PreCommit-HIVE-TRUNK-Build

> Lock/unlock database should add current database to inputs and outputs of 
> authz hook
> 
>
> Key: HIVE-12367
> URL: https://issues.apache.org/jira/browse/HIVE-12367
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.1
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Attachments: HIVE-12367.001.patch, HIVE-12367.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)