[jira] [Commented] (HIVE-7105) Enable ReduceRecordProcessor to generate VectorizedRowBatches

2014-06-13 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030321#comment-14030321
 ] 

Remus Rusanu commented on HIVE-7105:


Can you share the rb link?

 Enable ReduceRecordProcessor to generate VectorizedRowBatches
 -

 Key: HIVE-7105
 URL: https://issues.apache.org/jira/browse/HIVE-7105
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Reporter: Rajesh Balamohan
Assignee: Gopal V
 Fix For: 0.14.0

 Attachments: HIVE-7105.1.patch, HIVE-7105.2.patch


 Currently, ReduceRecordProcessor sends one key,value pair at a time to its 
 operator pipeline.  It would be beneficial to send VectorizedRowBatch to 
 downstream operators. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7220) Empty dir in external table causes issue (root_dir_external_table.q failure)

2014-06-13 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030328#comment-14030328
 ] 

Szehon Ho commented on HIVE-7220:
-

OK, never mind about this patch.

 Empty dir in external table causes issue (root_dir_external_table.q failure)
 

 Key: HIVE-7220
 URL: https://issues.apache.org/jira/browse/HIVE-7220
 Project: Hive
  Issue Type: Bug
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-7220.patch


 While looking at root_dir_external_table.q failure, which is doing a query on 
 an external table located at root ('/'), I noticed that latest Hadoop2 
 CombineFileInputFormat returns split representing empty directories (like 
 '/Users'), which leads to failure in Hive's CombineFileRecordReader as it 
 tries to open the directory for processing.
 Tried with an external table in a normal HDFS directory, and it also returns 
 the same error.  Looks like a real bug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7005) MiniTez tests have non-deterministic explain plans

2014-06-13 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7005:
-

Attachment: HIVE-7005.1.patch

I believe the problem was that the filesinkoperators were kept in a hashset in 
tez. I've ran the tests a few times with the patch and didn't get any 
non-deterministic output.

 MiniTez tests have non-deterministic explain plans
 --

 Key: HIVE-7005
 URL: https://issues.apache.org/jira/browse/HIVE-7005
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Gunther Hagleitner
 Attachments: HIVE-7005.1.patch


 TestMiniTezCliDriver has a few test failures where there is a diff in the 
 explain plan generated. According to Vikram, the plan generated is correct, 
 but the plan can be generated in a couple of different ways and so sometimes 
 the plan will not diff against the expected output. We should probably come 
 up with a way to validate this explain plan in a reproducible way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7005) MiniTez tests have non-deterministic explain plans

2014-06-13 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030335#comment-14030335
 ] 

Gunther Hagleitner commented on HIVE-7005:
--

rb: https://reviews.apache.org/r/22547

 MiniTez tests have non-deterministic explain plans
 --

 Key: HIVE-7005
 URL: https://issues.apache.org/jira/browse/HIVE-7005
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Gunther Hagleitner
 Attachments: HIVE-7005.1.patch


 TestMiniTezCliDriver has a few test failures where there is a diff in the 
 explain plan generated. According to Vikram, the plan generated is correct, 
 but the plan can be generated in a couple of different ways and so sometimes 
 the plan will not diff against the expected output. We should probably come 
 up with a way to validate this explain plan in a reproducible way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7005) MiniTez tests have non-deterministic explain plans

2014-06-13 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7005:
-

Status: Patch Available  (was: Open)

 MiniTez tests have non-deterministic explain plans
 --

 Key: HIVE-7005
 URL: https://issues.apache.org/jira/browse/HIVE-7005
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Gunther Hagleitner
 Attachments: HIVE-7005.1.patch


 TestMiniTezCliDriver has a few test failures where there is a diff in the 
 explain plan generated. According to Vikram, the plan generated is correct, 
 but the plan can be generated in a couple of different ways and so sometimes 
 the plan will not diff against the expected output. We should probably come 
 up with a way to validate this explain plan in a reproducible way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-06-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030340#comment-14030340
 ] 

Lefty Leverenz commented on HIVE-7158:
--

Does the design doc need guidance about this (or is it time to add Tez 
documentation to the user docs)?

* [Hive on Tez | https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez]

At a minimum, Configuration Properties needs to document these parameters:

* new parameter:  hive.tez.auto.reducer.parallelism
* new parameter:  hive.tez.max.partition.factor
* new parameter:  hive.tez.min.partition.factor
* new default for [hive.exec.reducers.bytes.per.reducer | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.reducers.bytes.per.reducer]
 (with version information)
* new default for [hive.exec.reducers.max | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.reducers.max]
 (with version information)

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, 
 HIVE-7158.4.patch, HIVE-7158.5.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7158:
-

Labels: TODOC14  (was: )

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, 
 HIVE-7158.4.patch, HIVE-7158.5.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7220) Empty dir in external table causes issue (root_dir_external_table.q failure)

2014-06-13 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7220:


Status: Open  (was: Patch Available)

Cancelling for now, unless there's interest in having a workaround in Hive.  It 
will not be necessary to pursue if MAPREDUCE-5756 is fixed.

 Empty dir in external table causes issue (root_dir_external_table.q failure)
 

 Key: HIVE-7220
 URL: https://issues.apache.org/jira/browse/HIVE-7220
 Project: Hive
  Issue Type: Bug
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-7220.patch


 While looking at root_dir_external_table.q failure, which is doing a query on 
 an external table located at root ('/'), I noticed that latest Hadoop2 
 CombineFileInputFormat returns split representing empty directories (like 
 '/Users'), which leads to failure in Hive's CombineFileRecordReader as it 
 tries to open the directory for processing.
 Tried with an external table in a normal HDFS directory, and it also returns 
 the same error.  Looks like a real bug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.

2014-06-13 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030351#comment-14030351
 ] 

Ravi Prakash commented on HIVE-7100:


Purge is an acceptable option for us. Thanks

 Users of hive should be able to specify skipTrash when dropping tables.
 ---

 Key: HIVE-7100
 URL: https://issues.apache.org/jira/browse/HIVE-7100
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Ravi Prakash
Assignee: Jayesh
 Attachments: HIVE-7100.patch


 Users of our clusters are often running up against their quota limits because 
 of Hive tables. When they drop tables, they have to then manually delete the 
 files from HDFS using skipTrash. This is cumbersome and unnecessary. We 
 should enable users to skipTrash directly when dropping tables.
 We should also be able to provide this functionality without polluting SQL 
 syntax.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7209) allow metastore authorization api calls to be restricted to certain invokers

2014-06-13 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030365#comment-14030365
 ] 

Sushanth Sowmyan commented on HIVE-7209:


Looks good to me. +1.

 allow metastore authorization api calls to be restricted to certain invokers
 

 Key: HIVE-7209
 URL: https://issues.apache.org/jira/browse/HIVE-7209
 Project: Hive
  Issue Type: Bug
  Components: Authentication, Metastore
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7209.1.patch, HIVE-7209.2.patch, HIVE-7209.3.patch


 Any user who has direct access to metastore can make metastore api calls that 
 modify the authorization policy. 
 The users who can make direct metastore api calls in a secure cluster 
 configuration are usually the 'cluster insiders' such as Pig and MR users, 
 who are not (securely) covered by the metastore based authorization policy. 
 But it makes sense to disallow access from such users as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-13 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030369#comment-14030369
 ] 

Sushanth Sowmyan commented on HIVE-6584:


Teng, I'd be interested in how your patch winds up being. 

If you mean that at runtime, the HBaseStorageHandler decides to deputize a 
subclass of itself to do the work, then that might work. But if you mean that 
your approach would lead to the user having to create a separate table (kinda 
like a view) that associates with a snapshot, then speaking from the hive side, 
I think I would prefer having only one SH to deal with, and having it decide 
what to do with various set parameters as opposed to creating separate hive 
tables with a different SH in hive. That way, using the same hive table 
definition, a query could decide to use a snapshot or not.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Documentation Policy

2014-06-13 Thread Lefty Leverenz
One more question:  what should we do after the documentation is done for a
JIRA ticket?

(a) Just remove the TODOC## label.
(b) Replace TODOC## with docdone (no caps, no version number).
(c) Add a docdone label but keep TODOC##.
(d) Something else.


-- Lefty


On Thu, Jun 12, 2014 at 12:54 PM, Brock Noland br...@cloudera.com wrote:

 Thank you guys! This is great work.


 On Wed, Jun 11, 2014 at 6:20 PM, kulkarni.swar...@gmail.com 
 kulkarni.swar...@gmail.com wrote:

  Going through the issues, I think overall Lefty did an awesome job
 catching
  and documenting most of them in time. Following are some of the 0.13 and
  0.14 ones which I found which either do not have documentation or have
  outdated one and probably need one to be consumeable. Contributors, feel
  free to remove the label if you disagree.
 
  *TODOC13:*
 
 
 https://issues.apache.org/jira/browse/HIVE-6827?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC13%20AND%20status%20in%20(Resolved%2C%20Closed)
 
  *TODOC14:*
 
 
 https://issues.apache.org/jira/browse/HIVE-6999?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC14%20AND%20status%20in%20(Resolved%2C%20Closed)
 
  I'll continue digging through the queue going backwards to 0.12 and 0.11
  and see if I find similar stuff there as well.
 
 
 
  On Wed, Jun 11, 2014 at 10:36 AM, kulkarni.swar...@gmail.com 
  kulkarni.swar...@gmail.com wrote:
 
Feel free to label such jiras with this keyword and ask the
  contributors
   for more information if you need any.
  
   Cool. I'll start chugging through the queue today adding labels as apt.
  
  
   On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair the...@hortonworks.com
   wrote:
  
Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13?
   Sounds good to me.
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
   to
   which it is addressed and may contain information that is
 confidential,
   privileged and exempt from disclosure under applicable law. If the
  reader
   of this message is not the intended recipient, you are hereby notified
   that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
   immediately
   and delete it from your system. Thank You.
  
  
  
  
   --
   Swarnim
  
 
 
 
  --
  Swarnim
 



[jira] [Commented] (HIVE-7022) Replace BinaryWritable with BytesWritable in Parquet serde

2014-06-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030391#comment-14030391
 ] 

Lefty Leverenz commented on HIVE-7022:
--

No user doc needed, right?

 Replace BinaryWritable with BytesWritable in Parquet serde
 --

 Key: HIVE-7022
 URL: https://issues.apache.org/jira/browse/HIVE-7022
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.14.0

 Attachments: HIVE-7022.patch


 Currently ParquetHiveSerde uses BinaryWritable to enclose bytes read from 
 Parquet data. However, existing Hadoop class, BytesWritable, already does 
 that, and BinaryWritable offers no advantage. On the other hand, 
 BinaryWritable has a confusing getString() method, which, if misused, can 
 cause unexpected result. The proposal here is to replace it with Hadoop 
 BytesWritable.
 The issue was identified in HIVE-6367, serving as a follow-up JIRA. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition

2014-06-13 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7159:
-

Status: Patch Available  (was: Open)

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition

2014-06-13 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7159:
-

Attachment: HIVE-7159.3.patch

+golden file updates

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7155) WebHCat controller job exceeds container memory limit

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7155:
-

Labels: TODOC14  (was: )

 WebHCat controller job exceeds container memory limit
 -

 Key: HIVE-7155
 URL: https://issues.apache.org/jira/browse/HIVE-7155
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.0
Reporter: shanyu zhao
Assignee: shanyu zhao
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7155.1.patch, HIVE-7155.2.patch, HIVE-7155.patch


 Submit a Hive query on a large table via WebHCat results in failure because 
 the WebHCat controller job is killed by Yarn since it exceeds the memory 
 limit (set by mapreduce.map.memory.mb, defaults to 1GB):
 {code}
  INSERT OVERWRITE TABLE Temp_InjusticeEvents_2014_03_01_00_00 SELECT * from 
 Stage_InjusticeEvents where LogTimestamp  '2014-03-01 00:00:00' and 
 LogTimestamp = '2014-03-01 01:00:00';
 {code}
 We could increase mapreduce.map.memory.mb to solve this problem, but this way 
 we are changing this setting system wise.
 We need to provide a WebHCat configuration to overwrite 
 mapreduce.map.memory.mb when submitting the controller job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7155) WebHCat controller job exceeds container memory limit

2014-06-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030413#comment-14030413
 ] 

Lefty Leverenz commented on HIVE-7155:
--

Need to document *templeton.mapper.memory.mb* in the wiki with version 
information (0.14.0):

* [WebHCat Configuration:  Configuration Variables | 
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Configure#WebHCatConfigure-ConfigurationVariables]

 WebHCat controller job exceeds container memory limit
 -

 Key: HIVE-7155
 URL: https://issues.apache.org/jira/browse/HIVE-7155
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.0
Reporter: shanyu zhao
Assignee: shanyu zhao
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7155.1.patch, HIVE-7155.2.patch, HIVE-7155.patch


 Submit a Hive query on a large table via WebHCat results in failure because 
 the WebHCat controller job is killed by Yarn since it exceeds the memory 
 limit (set by mapreduce.map.memory.mb, defaults to 1GB):
 {code}
  INSERT OVERWRITE TABLE Temp_InjusticeEvents_2014_03_01_00_00 SELECT * from 
 Stage_InjusticeEvents where LogTimestamp  '2014-03-01 00:00:00' and 
 LogTimestamp = '2014-03-01 01:00:00';
 {code}
 We could increase mapreduce.map.memory.mb to solve this problem, but this way 
 we are changing this setting system wise.
 We need to provide a WebHCat configuration to overwrite 
 mapreduce.map.memory.mb when submitting the controller job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7224) Set incremental printing to true by default in Beeline

2014-06-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030414#comment-14030414
 ] 

Hive QA commented on HIVE-7224:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650083/HIVE-7224.1.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5535 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.beeline.TestBeeLineWithArgs.testNullEmpty
org.apache.hive.beeline.TestBeeLineWithArgs.testNullEmptyCmdArg
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/453/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/453/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-453/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650083

 Set incremental printing to true by default in Beeline
 --

 Key: HIVE-7224
 URL: https://issues.apache.org/jira/browse/HIVE-7224
 Project: Hive
  Issue Type: Bug
  Components: Clients, JDBC
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7224.1.patch


 See HIVE-7221.
 By default beeline tries to buffer the entire output relation before printing 
 it on stdout. This can cause OOM when the output relation is large. However, 
 beeline has the option of incremental prints. We should keep that as the 
 default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6473:
-

Labels: TODOC14  (was: )

 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
 HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, 
 HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7136:
-

Labels: TODOC14  (was: )

 Allow Hive to read hive scripts from any of the supported file systems in 
 hadoop eco-system
 ---

 Key: HIVE-7136
 URL: https://issues.apache.org/jira/browse/HIVE-7136
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.13.0
Reporter: Sumit Kumar
Assignee: Sumit Kumar
Priority: Minor
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7136.01.patch, HIVE-7136.patch


 Current hive cli assumes that the source file (hive script) is always on the 
 local file system. This patch implements support for reading source files 
 from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
 the default behavior intact to be reading from default filesystem (local) in 
 case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition

2014-06-13 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7159:
-

Attachment: HIVE-7159.4.patch

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, 
 HIVE-7159.4.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition

2014-06-13 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7159:
-

Status: Patch Available  (was: Open)

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, 
 HIVE-7159.4.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition

2014-06-13 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7159:
-

Status: Open  (was: Patch Available)

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, 
 HIVE-7159.4.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7143) Add Streaming support in Windowing mode for more UDAFs (min/max, lead/lag, fval/lval)

2014-06-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030424#comment-14030424
 ] 

Lefty Leverenz commented on HIVE-7143:
--

What user doc does this need?

* [Language Manual -- Windowing and Analytics Functions | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics]

 Add Streaming support in Windowing mode for more UDAFs (min/max, lead/lag, 
 fval/lval)
 -

 Key: HIVE-7143
 URL: https://issues.apache.org/jira/browse/HIVE-7143
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.14.0

 Attachments: HIVE-7143.1.patch, HIVE-7143.3.patch


 Provided implementations for Streaming for the above fns.
 Min/Max based on Alg by Daniel Lemire: 
 http://www.archipel.uqam.ca/309/1/webmaximinalgo.pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7168:
-

Labels: TODOC14  (was: )

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7119) Extended ACL's should be inherited if warehouse perm inheritance enabled

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7119:
-

Labels: TODOC14  (was: )

 Extended ACL's should be inherited if warehouse perm inheritance enabled
 

 Key: HIVE-7119
 URL: https://issues.apache.org/jira/browse/HIVE-7119
 Project: Hive
  Issue Type: Bug
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7119.2.patch, HIVE-7119.3.patch, HIVE-7119.4.patch, 
 HIVE-7119.patch


 HDFS recently came out with support for extended ACL's, ie permission for 
 specific group/user in addition to the general owner/group/other permission.
 Hive permission inheritance should also inherit those as well, if user has 
 set them at any point in the warehouse directory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6586) Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos)

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6586:
-

Labels: TODOC14  (was: )

 Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos)
 ---

 Key: HIVE-6586
 URL: https://issues.apache.org/jira/browse/HIVE-6586
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Lefty Leverenz
  Labels: TODOC14

 HIVE-6037 puts the definitions of configuration parameters into the 
 HiveConf.java file, but several recent jiras for release 0.13.0 introduce new 
 parameters that aren't in HiveConf.java yet and some parameter definitions 
 need to be altered for 0.13.0.  This jira will patch HiveConf.java after 
 HIVE-6037 gets committed.
 Also, four typos patched in HIVE-6582 need to be fixed in the new 
 HiveConf.java.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7050:
-

Labels: TODOC14  (was: )

 Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
 -

 Key: HIVE-7050
 URL: https://issues.apache.org/jira/browse/HIVE-7050
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7050.1.patch, HIVE-7050.2.patch, HIVE-7050.3.patch, 
 HIVE-7050.4.patch, HIVE-7050.5.patch, HIVE-7050.6.patch


 There is currently no way to display the column level stats from hive CLI. It 
 will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7062) Support Streaming mode in Windowing

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7062:
-

Labels: TODOC14  (was: )

 Support Streaming mode in Windowing
 ---

 Key: HIVE-7062
 URL: https://issues.apache.org/jira/browse/HIVE-7062
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7062.1.patch, HIVE-7062.4.patch, HIVE-7062.5.patch, 
 HIVE-7062.6.patch


 1. Have the Windowing Table Function support streaming mode.
 2. Have special handling for Ranking UDAFs.
 3. Have special handling for Sum/Avg for fixed size Wdws.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5072) [WebHCat]Enable directly invoke Sqoop job through Templeton

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-5072:
-

Labels: TODOC14  (was: )

 [WebHCat]Enable directly invoke Sqoop job through Templeton
 ---

 Key: HIVE-5072
 URL: https://issues.apache.org/jira/browse/HIVE-5072
 Project: Hive
  Issue Type: Improvement
  Components: WebHCat
Affects Versions: 0.12.0
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-5072.1.patch, HIVE-5072.2.patch, HIVE-5072.3.patch, 
 HIVE-5072.4.patch, HIVE-5072.5.patch, Templeton-Sqoop-Action.pdf


 Now it is hard to invoke a Sqoop job through templeton. The only way is to 
 use the classpath jar generated by a sqoop job and use the jar delegator in 
 Templeton. We should implement Sqoop Delegator to enable directly invoke 
 Sqoop job through Templeton.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6940) [WebHCat]Update documentation for Templeton-Sqoop action

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6940:
-

Labels: TODOC14  (was: )

 [WebHCat]Update documentation for Templeton-Sqoop action
 

 Key: HIVE-6940
 URL: https://issues.apache.org/jira/browse/HIVE-6940
 Project: Hive
  Issue Type: Bug
  Components: Documentation, WebHCat
Affects Versions: 0.14.0
Reporter: Shuaishuai Nie
  Labels: TODOC14

 WebHCat documentation need to be updated based on the new feature introduced 
 in HIVE-5072
 Here is some examples using the endpoint templeton/v1/sqoop
 example1: (passing Sqoop command directly)
 curl -s -d command=import --connect 
 jdbc:sqlserver://localhost:4033;databaseName=SqoopDB;user=hadoop;password=password
  --table mytable --target-dir user/hadoop/importtable -d 
 statusdir=sqoop.output 
 'http://localhost:50111/templeton/v1/sqoop?user.name=hadoop'
 example2: (passing source file which contains sqoop command)
 curl -s -d optionsfile=/sqoopcommand/command0.txt  -d 
 statusdir=sqoop.output 
 'http://localhost:50111/templeton/v1/sqoop?user.name=hadoop'
 example3: (using --options-file in the middle of sqoop command to enable 
 reuse part of Sqoop command like connection string)
 curl -s -d files=/sqoopcommand/command1.txt,/sqoopcommand/command2.txt -d 
 command=import --options-file command1.txt --options-file command2.txt -d 
 statusdir=sqoop.output 
 'http://localhost:50111/templeton/v1/sqoop?user.name=hadoop'
 Also, for user to pass their JDBC driver jar, they can use the -libjars 
 generic option in the Sqoop command. This is a functionality provided by 
 Sqoop.
 Set of parameters can be passed to the endpoint:
 command 
 (Sqoop command string to run)
 optionsfile
 (Options file which contain Sqoop command need to run, each section in the 
 Sqoop command separated by space should be a single line in the options file)
 files 
 (Comma seperated files to be copied to the map reduce cluster)
 statusdir 
 (A directory where WebHCat will write the status of the Sqoop job. If 
 provided, it is the caller’s responsibility to remove this directory when 
 done)
 callback 
 (Define a URL to be called upon job completion. You may embed a specific job 
 ID into the URL using $jobId. This tag will be replaced in the callback URL 
 with the job’s job ID. )
 enablelog
 (when set to true, WebHCat will upload job log to statusdir. Need to define 
 statusdir when enabled)
 All the above parameters are optional, but use have to provide either 
 command or optionsfile in the command.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5072) [WebHCat]Enable directly invoke Sqoop job through Templeton

2014-06-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030434#comment-14030434
 ] 

Lefty Leverenz commented on HIVE-5072:
--

Doc jira for this feature is HIVE-6940.

 [WebHCat]Enable directly invoke Sqoop job through Templeton
 ---

 Key: HIVE-5072
 URL: https://issues.apache.org/jira/browse/HIVE-5072
 Project: Hive
  Issue Type: Improvement
  Components: WebHCat
Affects Versions: 0.12.0
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-5072.1.patch, HIVE-5072.2.patch, HIVE-5072.3.patch, 
 HIVE-5072.4.patch, HIVE-5072.5.patch, Templeton-Sqoop-Action.pdf


 Now it is hard to invoke a Sqoop job through templeton. The only way is to 
 use the classpath jar generated by a sqoop job and use the jar delegator in 
 Templeton. We should implement Sqoop Delegator to enable directly invoke 
 Sqoop job through Templeton.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7227) Configuration parameters without descriptions

2014-06-13 Thread Lefty Leverenz (JIRA)
Lefty Leverenz created HIVE-7227:


 Summary: Configuration parameters without descriptions
 Key: HIVE-7227
 URL: https://issues.apache.org/jira/browse/HIVE-7227
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Reporter: Lefty Leverenz


More than 50 configuration parameters lack descriptions in 
hive-default.xml.template (or in HiveConf.java, after HIVE-6037 gets 
committed).  They are listed by release number in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7227) Configuration parameters without descriptions

2014-06-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030445#comment-14030445
 ] 

Lefty Leverenz commented on HIVE-7227:
--

Here's a list (possibly incomplete) of 51 Hive configuration parameters that 
don't have descriptions in hive-default.xml.template.  Parameters created after 
Hive 0.13.0 are not covered here.

_Release 1 or 2_ 
hive.exec.submitviachild
hive.metastore.metadb.dir
hive.jar.path
hive.aux.jars.path
hive.table.name
hive.partition.name
hive.alias

_Release 3_ 
hive.cli.errors.ignore

_Release 4_ 
hive.added.files.path
hive.added.jars.path

_Release 5_ 
hive.intermediate.compression.codec
hive.intermediate.compression.type
hive.added.archives.path

_Release 6_ 
hive.metastore.archive.intermediate.archived
hive.metastore.archive.intermediate.extracted
hive.mapred.partitioner
hive.exec.script.trust
hive.hadoop.supports.splittable.combineinputformat

_Release 7_ 
hive.lockmgr.zookeeper.default.partition.name
hive.metastore.fs.handler.class
hive.query.result.fileformat 
hive.hashtable.initialCapacity
hive.hashtable.loadfactor
hive.debug.localtask
hive.lock.manager
hive.outerjoin.supports.filters
hive.semantic.analyzer.hook

_Release 8_ 
hive.exec.job.debug.timeout
hive.exec.tasklog.debug.timeout
hive.merge.rcfile.block.level
hive.merge.input.format.block.level
hive.merge.current.job.has.dynamic.partitions
hive.stats.collect.rawdatasize

_Release 8.1_ 
hive.optimize.metadataonly

_Release 9_ 

_Release 10_ 

_Release 11_ 
hive.exec.rcfile.use.sync.cache
hive.stats.key.prefix  _(internal)_

_Release 12_ 
hive.scratch.dir.permission
datanucleus.fixedDatastore
datanucleus.rdbms.useLegacyNativeValueStrategy
hive.optimize.sampling.orderby  _(internal?)_
hive.optimize.sampling.orderby.number
hive.optimize.sampling.orderby.percent
hive.server2.authentication.ldap.Domain
hive.server2.session.hook
hive.typecheck.on.insert

_Release 13_ 
hive.metastore.expression.proxy
hive.txn.manager
hive.stageid.rearrange
hive.explain.dependency.append.tasktype
hive.compute.splits.in.am  _(comment in HiveConf.java can be used as 
description)_
hive.rpc.query.plan  _(comment in HiveConf.java can be used as description)_

 Configuration parameters without descriptions
 -

 Key: HIVE-7227
 URL: https://issues.apache.org/jira/browse/HIVE-7227
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Reporter: Lefty Leverenz

 More than 50 configuration parameters lack descriptions in 
 hive-default.xml.template (or in HiveConf.java, after HIVE-6037 gets 
 committed).  They are listed by release number in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: 49 config params without descriptions

2014-06-13 Thread Lefty Leverenz
This list of Hive configuration parameters without descriptions has been
transferred to HIVE-7227 https://issues.apache.org/jira/browse/HIVE-7227.

-- Lefty


On Tue, Apr 22, 2014 at 2:58 AM, Lefty Leverenz leftylever...@gmail.com
wrote:

 Found two more from HIVE-5522
 https://issues.apache.org/jira/browse/HIVE-5522 (also HIVE-6098
 https://issues.apache.org/jira/browse/HIVE-6098, Merge Tez branch into
 trunk) so the current total is 51 configs that don't have descriptions in
 0.13.0:

 *Release 13 *

 hive.compute.splits.in.am

 hive.rpc.query.plan


 But these both have comments in HiveConf.java that can be used as
 descriptions, although they aren't included in hive-default.xml.template.
  I missed them because I was working from the patch for HIVE-6037
 https://issues.apache.org/jira/browse/HIVE-6037 and Navis had used the
 HiveConf comments for descriptions.  (That means there could be more
 parameters missing from the 0.13.0 template file.)



 -- Lefty


 On Mon, Apr 14, 2014 at 1:53 AM, Lefty Leverenz leftylever...@gmail.com
 wrote:

 Here's a list of 49 configuration parameters in RC0 (and trunk) that
 don't have descriptions in hive-default.xml.template:


 *Release 1 or 2 *

 hive.exec.submitviachild

 hive.metastore.metadb.dir

 hive.jar.path

 hive.aux.jars.path

 hive.table.name

 hive.partition.name

 hive.alias


 *Release 3 *

 hive.cli.errors.ignore


 *Release 4 *

 hive.added.files.path

 hive.added.jars.path


 *Release 5 *

 hive.intermediate.compression.codec

 hive.intermediate.compression.type

 hive.added.archives.path


 *Release 6 *

 hive.metastore.archive.intermediate.archived

 hive.metastore.archive.intermediate.extracted

 hive.mapred.partitioner

 hive.exec.script.trust

 hive.hadoop.supports.splittable.combineinputformat


 *Release 7 *

 hive.lockmgr.zookeeper.default.partition.name

 hive.metastore.fs.handler.class

 hive.query.result.fileformat

 hive.hashtable.initialCapacity

 hive.hashtable.loadfactor

 hive.debug.localtask

 hive.lock.manager

 hive.outerjoin.supports.filters

 hive.semantic.analyzer.hook


 *Release 8 *

 hive.exec.job.debug.timeout

 hive.exec.tasklog.debug.timeout

 hive.merge.rcfile.block.level

 hive.merge.input.format.block.level

 hive.merge.current.job.has.dynamic.partitions

 hive.stats.collect.rawdatasize


 *Release 8.1 *

 hive.optimize.metadataonly


 *Release 9 *


 *Release 10 *


 *Release 11 *

 hive.exec.rcfile.use.sync.cache

 hive.stats.key.prefix--- *internal*


 *Release 12 *

 hive.scratch.dir.permission

 datanucleus.fixedDatastore

 datanucleus.rdbms.useLegacyNativeValueStrategy

 hive.optimize.sampling.orderby --- *internal?*

 hive.optimize.sampling.orderby.number

 hive.optimize.sampling.orderby.percent

 hive.server2.authentication.ldap.Domain

 hive.server2.session.hook

 hive.typecheck.on.insert


 *Release 13 *

 hive.metastore.expression.proxy

 hive.txn.manager

 hive.stageid.rearrange

 hive.explain.dependency.append.tasktype



 What's the best way to deal with these?

1. Ignore them (or identify those that can be ignored).
2. Add some descriptions in Hive 0.13.0 RC1.
3. Deal with them after HIVE-6037
https://issues.apache.org/jira/browse/HIVE-6037 gets committed.
   - Try to cover all of them by Hive 0.14.0:
- Put the list in a JIRA and create a common HiveConf.java patch,
  which can be appended until release 0.14.0 is ready.
  - Accumulate descriptions in JIRA comments, then create a patch
  from the comments.
   - Deal with them as soon as possible:
   - Put the list in an umbrella JIRA and use sub-task JIRAs to add
  descriptions individually or in small groups.
  4. Deal with them in the wiki, then patch HiveConf.java before
release 0.14.0.
5. [Your idea goes here.]


 -- Lefty





[jira] [Commented] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf

2014-06-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030448#comment-14030448
 ] 

Lefty Leverenz commented on HIVE-6037:
--

HIVE-7227 lists 51 parameters in releases up to 0.13 that don't have 
descriptions.

 Synchronize HiveConf with hive-default.xml.template and support show conf
 -

 Key: HIVE-6037
 URL: https://issues.apache.org/jira/browse/HIVE-6037
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Navis
Assignee: Navis
Priority: Minor
 Fix For: 0.14.0

 Attachments: CHIVE-6037.3.patch.txt, HIVE-6037-0.13.0, 
 HIVE-6037.1.patch.txt, HIVE-6037.10.patch.txt, HIVE-6037.11.patch.txt, 
 HIVE-6037.12.patch.txt, HIVE-6037.14.patch.txt, HIVE-6037.15.patch.txt, 
 HIVE-6037.16.patch.txt, HIVE-6037.17.patch, HIVE-6037.2.patch.txt, 
 HIVE-6037.4.patch.txt, HIVE-6037.5.patch.txt, HIVE-6037.6.patch.txt, 
 HIVE-6037.7.patch.txt, HIVE-6037.8.patch.txt, HIVE-6037.9.patch.txt, 
 HIVE-6037.patch


 see HIVE-5879



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6586) Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos)

2014-06-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030451#comment-14030451
 ] 

Lefty Leverenz commented on HIVE-6586:
--

See HIVE-7227 for a list of parameters that don't have descriptions yet.

 Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos)
 ---

 Key: HIVE-6586
 URL: https://issues.apache.org/jira/browse/HIVE-6586
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Lefty Leverenz
  Labels: TODOC14

 HIVE-6037 puts the definitions of configuration parameters into the 
 HiveConf.java file, but several recent jiras for release 0.13.0 introduce new 
 parameters that aren't in HiveConf.java yet and some parameter definitions 
 need to be altered for 0.13.0.  This jira will patch HiveConf.java after 
 HIVE-6037 gets committed.
 Also, four typos patched in HIVE-6582 need to be fixed in the new 
 HiveConf.java.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6430) MapJoin hash table has large memory overhead

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6430:
-

Labels: TODOC14  (was: )

 MapJoin hash table has large memory overhead
 

 Key: HIVE-6430
 URL: https://issues.apache.org/jira/browse/HIVE-6430
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, 
 HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, 
 HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, 
 HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, 
 HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, 
 HIVE-6430.14.patch, HIVE-6430.patch


 Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 
 for row) can take several hundred bytes, which is ridiculous. I am reducing 
 the size of MJKey and MJRowContainer in other jiras, but in general we don't 
 need to have java hash table there.  We can either use primitive-friendly 
 hashtable like the one from HPPC (Apache-licenced), or some variation, to map 
 primitive keys to single row storage structure without an object per row 
 (similar to vectorization).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead

2014-06-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030467#comment-14030467
 ] 

Lefty Leverenz commented on HIVE-6430:
--

The configuration parameters *hive.mapjoin.optimized.hashtable* and 
*hive.mapjoin.optimized.hashtable.wbsize* need to be documented in the wiki for 
release 0.14.0.

* [Hive Configuration Properties | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties]

 MapJoin hash table has large memory overhead
 

 Key: HIVE-6430
 URL: https://issues.apache.org/jira/browse/HIVE-6430
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, 
 HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, 
 HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, 
 HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, 
 HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, 
 HIVE-6430.14.patch, HIVE-6430.patch


 Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 
 for row) can take several hundred bytes, which is ridiculous. I am reducing 
 the size of MJKey and MJRowContainer in other jiras, but in general we don't 
 need to have java hash table there.  We can either use primitive-friendly 
 hashtable like the one from HPPC (Apache-licenced), or some variation, to map 
 primitive keys to single row storage structure without an object per row 
 (similar to vectorization).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6187) Add test to verify that DESCRIBE TABLE works with quoted table names

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6187:
-

Labels: TODOC14  (was: )

 Add test to verify that DESCRIBE TABLE works with quoted table names
 

 Key: HIVE-6187
 URL: https://issues.apache.org/jira/browse/HIVE-6187
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Andy Mok
Assignee: Carl Steinbach
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-6187.1.patch


 Backticks around tables named after special keywords, such as items, allow us 
 to create, drop, and alter the table. For example
 {code:sql}
 CREATE TABLE foo.`items` (bar INT);
 DROP TABLE foo.`items`;
 ALTER TABLE `items` RENAME TO `items_`;
 {code}
 However, we cannot call
 {code:sql}
 DESCRIBE foo.`items`;
 DESCRIBE `items`;
 {code}
 The DESCRIBE query does not permit backticks to surround table names. The 
 error returned is
 {code:sql}
 FAILED: SemanticException [Error 10001]: Table not found `items`
 {code} 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6187) Add test to verify that DESCRIBE TABLE works with quoted table names

2014-06-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030498#comment-14030498
 ] 

Lefty Leverenz commented on HIVE-6187:
--

This fix should be documented in the wiki for 0.14.0.

* [Language Manual -- DDL -- Describe | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Describe]

 Add test to verify that DESCRIBE TABLE works with quoted table names
 

 Key: HIVE-6187
 URL: https://issues.apache.org/jira/browse/HIVE-6187
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Andy Mok
Assignee: Carl Steinbach
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-6187.1.patch


 Backticks around tables named after special keywords, such as items, allow us 
 to create, drop, and alter the table. For example
 {code:sql}
 CREATE TABLE foo.`items` (bar INT);
 DROP TABLE foo.`items`;
 ALTER TABLE `items` RENAME TO `items_`;
 {code}
 However, we cannot call
 {code:sql}
 DESCRIBE foo.`items`;
 DESCRIBE `items`;
 {code}
 The DESCRIBE query does not permit backticks to surround table names. The 
 error returned is
 {code:sql}
 FAILED: SemanticException [Error 10001]: Table not found `items`
 {code} 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6411) Support more generic way of using composite key for HBaseHandler

2014-06-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030506#comment-14030506
 ] 

Lefty Leverenz commented on HIVE-6411:
--

The release note says this should be documented at the Hive-HBase Integration 
page, which is in the Design Docs:

* [Design Docs -- Completed:  Hive HBase Integration | 
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration]

 Support more generic way of using composite key for HBaseHandler
 

 Key: HIVE-6411
 URL: https://issues.apache.org/jira/browse/HIVE-6411
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-6411.1.patch.txt, HIVE-6411.10.patch.txt, 
 HIVE-6411.11.patch.txt, HIVE-6411.2.patch.txt, HIVE-6411.3.patch.txt, 
 HIVE-6411.4.patch.txt, HIVE-6411.5.patch.txt, HIVE-6411.6.patch.txt, 
 HIVE-6411.7.patch.txt, HIVE-6411.8.patch.txt, HIVE-6411.9.patch.txt


 HIVE-2599 introduced using custom object for the row key. But it forces key 
 objects to extend HBaseCompositeKey, which is again extension of LazyStruct. 
 If user provides proper Object and OI, we can replace internal key and keyOI 
 with those. 
 Initial implementation is based on factory interface.
 {code}
 public interface HBaseKeyFactory {
   void init(SerDeParameters parameters, Properties properties) throws 
 SerDeException;
   ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException;
   LazyObjectBase createObject(ObjectInspector inspector) throws 
 SerDeException;
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6411) Support more generic way of using composite key for HBaseHandler

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6411:
-

Labels: TODOC14  (was: )

 Support more generic way of using composite key for HBaseHandler
 

 Key: HIVE-6411
 URL: https://issues.apache.org/jira/browse/HIVE-6411
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-6411.1.patch.txt, HIVE-6411.10.patch.txt, 
 HIVE-6411.11.patch.txt, HIVE-6411.2.patch.txt, HIVE-6411.3.patch.txt, 
 HIVE-6411.4.patch.txt, HIVE-6411.5.patch.txt, HIVE-6411.6.patch.txt, 
 HIVE-6411.7.patch.txt, HIVE-6411.8.patch.txt, HIVE-6411.9.patch.txt


 HIVE-2599 introduced using custom object for the row key. But it forces key 
 objects to extend HBaseCompositeKey, which is again extension of LazyStruct. 
 If user provides proper Object and OI, we can replace internal key and keyOI 
 with those. 
 Initial implementation is based on factory interface.
 {code}
 public interface HBaseKeyFactory {
   void init(SerDeParameters parameters, Properties properties) throws 
 SerDeException;
   ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException;
   LazyObjectBase createObject(ObjectInspector inspector) throws 
 SerDeException;
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6500) Stats collection via filesystem

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6500:
-

Labels: TODOC14  (was: )

 Stats collection via filesystem
 ---

 Key: HIVE-6500
 URL: https://issues.apache.org/jira/browse/HIVE-6500
 Project: Hive
  Issue Type: New Feature
  Components: Statistics
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
  Labels: TODOC14
 Fix For: 0.13.0

 Attachments: HIVE-6500.2.patch, HIVE-6500.3.patch, HIVE-6500.patch


 Recently, support for stats gathering via counter was [added | 
 https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has 
 following issues:
 * [Length of counter group name is limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340]
 * [Length of counter name is limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337]
 * [Number of distinct counter groups are limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343]
 * [Number of distinct counters are limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334]
 Although, these limits are configurable, but setting them to higher value 
 implies increased memory load on AM and job history server.
 Now, whether these limits makes sense or not is [debatable | 
 https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that 
 Hive doesn't make use of counters features of framework so that it we can 
 evolve this feature without relying on support from framework. Filesystem 
 based counter collection is a step in that direction.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7099) Add Decimal datatype support for Windowing

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7099:
-

Labels: TODOC14  (was: )

 Add Decimal datatype support for Windowing
 --

 Key: HIVE-7099
 URL: https://issues.apache.org/jira/browse/HIVE-7099
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7099.1.patch, HIVE-7099.2.patch


 Decimal datatype is not handled by Windowing



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7061) sql std auth - insert queries without overwrite should not require delete privileges

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7061:
-

Labels: TODOC14  (was: )

 sql std auth - insert queries without overwrite should not require delete 
 privileges
 

 Key: HIVE-7061
 URL: https://issues.apache.org/jira/browse/HIVE-7061
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Affects Versions: 0.13.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7061.1.patch, HIVE-7061.2.patch, HIVE-7061.3.patch


 Insert queries can do the equivalent of delete and insert of all rows of a 
 table or partition, if the overwrite keyword is used. As a result DELETE 
 privilege is applicable to such queries.
 However, SQL Standard auth requires DELETE privilege even for queries that 
 don't have the overwrite keyword.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6367) Implement Decimal in ParquetSerde

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6367:
-

Labels: Parquet TODOC14  (was: Parquet)

 Implement Decimal in ParquetSerde
 -

 Key: HIVE-6367
 URL: https://issues.apache.org/jira/browse/HIVE-6367
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Brock Noland
Assignee: Xuefu Zhang
  Labels: Parquet, TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-6367.patch, dec.parq


 Some code in the Parquet Serde deals with decimal and other does not. For 
 example in ETypeConverter we convert Decimal to double (which is invalid) 
 whereas in DataWritableWriter and other locations we throw an exception if 
 decimal is used.
 This JIRA is to implement decimal support.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5961) Add explain authorize for checking privileges

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-5961:
-

Labels: TODOC14  (was: )

 Add explain authorize for checking privileges
 -

 Key: HIVE-5961
 URL: https://issues.apache.org/jira/browse/HIVE-5961
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Reporter: Navis
Assignee: Navis
Priority: Trivial
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-5961.1.patch.txt, HIVE-5961.2.patch.txt, 
 HIVE-5961.3.patch.txt, HIVE-5961.4.patch.txt, HIVE-5961.5.patch.txt, 
 HIVE-5961.6.patch.txt


 For easy checking of need privileges for a query, 
 {noformat}
 explain authorize select * from src join srcpart
 INPUTS: 
   default@srcpart
   default@srcpart@ds=2008-04-08/hr=11
   default@srcpart@ds=2008-04-08/hr=12
   default@srcpart@ds=2008-04-09/hr=11
   default@srcpart@ds=2008-04-09/hr=12
   default@src
 OUTPUTS: 
   
 file:/home/navis/apache/oss-hive/itests/qtest/target/tmp/localscratchdir/hive_2013-12-04_21-57-53_748_5323811717799107868-1/-mr-1
 CURRENT_USER: 
   hive_test_user
 OPERATION: 
   QUERY
 AUTHORIZATION_FAILURES: 
   No privilege 'Select' found for inputs { database:default, table:srcpart, 
 columnName:key}
   No privilege 'Select' found for inputs { database:default, table:src, 
 columnName:key}
   No privilege 'Select' found for inputs { database:default, table:src, 
 columnName:key}
 {noformat}
 Hopefully good for debugging of authorization, which is in progress on 
 HIVE-5837.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6928) Beeline should not chop off describe extended results by default

2014-06-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030560#comment-14030560
 ] 

Hive QA commented on HIVE-6928:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650101/HIVE-6928.2.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5610 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter_partitioned
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/454/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/454/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-454/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650101

 Beeline should not chop off describe extended results by default
 --

 Key: HIVE-6928
 URL: https://issues.apache.org/jira/browse/HIVE-6928
 Project: Hive
  Issue Type: Bug
  Components: CLI
Reporter: Szehon Ho
Assignee: Chinna Rao Lalam
 Attachments: HIVE-6928.1.patch, HIVE-6928.2.patch, HIVE-6928.patch


 By default, beeline truncates long results based on the console width like:
 {code}
 +-+--+
 |  col_name   |   
|
 +-+--+
 | pat_id  | string
|
 | score   | float 
|
 | acutes  | float 
|
 | |   
|
 | Detailed Table Information  | Table(tableName:refills, dbName:default, 
 owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto |
 +-+--+
 5 rows selected (0.4 seconds)
 {code}
 This can be changed by !outputformat, but the default should behave better to 
 give a better experience to the first-time beeline user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7065:
-

Labels: TODOC14  (was: )

 Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
 -

 Key: HIVE-7065
 URL: https://issues.apache.org/jira/browse/HIVE-7065
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch


 WebHCat config has templeton.hive.properties to specify Hive config 
 properties that need to be passed to Hive client on node executing a job 
 submitted through WebHCat (hive query, for example).
 this should include hive.execution.engine



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6122) Implement show grant on resource

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6122:
-

Labels: TODOC13  (was: )

 Implement show grant on resource
 --

 Key: HIVE-6122
 URL: https://issues.apache.org/jira/browse/HIVE-6122
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: TODOC13
 Fix For: 0.13.0

 Attachments: HIVE-6122.1.patch.txt, HIVE-6122.2.patch.txt, 
 HIVE-6122.3.patch.txt, HIVE-6122.4.patch, HIVE-6122.4.patch, 
 HIVE-6122.5.patch, HIVE-6122.6.patch


 Currently, hive shows privileges owned by a principal. Reverse API is also 
 needed, which shows all principals for a resource. 
 {noformat}
 show grant user hive_test_user on database default;
 show grant user hive_test_user on table dummy;
 show grant user hive_test_user on all;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set

2014-06-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030624#comment-14030624
 ] 

Xuefu Zhang commented on HIVE-7200:
---

The result looks good. Could you update RB with your latest patch?

 Beeline output displays column heading even if --showHeader=false is set
 

 Key: HIVE-7200
 URL: https://issues.apache.org/jira/browse/HIVE-7200
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7200.1.patch, HIVE-7200.2.patch


 A few minor/cosmetic issues with the beeline CLI.
 1) Tool prints the column headers despite setting the --showHeader to false. 
 This property only seems to affect the subsequent header information that 
 gets printed based on the value of property headerInterval (default value 
 is 100).
 2) When showHeader is true  headerInterval  0, the header after the 
 first interval gets printed after headerInterval - 1 rows. The code seems 
 to count the initial header as a row, if you will.
 3) The table footer(the line that closes the table) does not get printed if 
 the showHeader is false. I think the table should get closed irrespective 
 of whether it prints the header or not.
 {code}
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 +--+
 6 rows selected (3.998 seconds)
 0: jdbc:hive2://localhost:1 !set headerInterval 2
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 +--+
 | val  |
 +--+
 | f|
 | T|
 +--+
 | val  |
 +--+
 | F|
 | 0|
 +--+
 | val  |
 +--+
 | 1|
 +--+
 6 rows selected (0.691 seconds)
 0: jdbc:hive2://localhost:1 !set showHeader false
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 6 rows selected (1.728 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6928) Beeline should not chop off describe extended results by default

2014-06-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030630#comment-14030630
 ] 

Xuefu Zhang commented on HIVE-6928:
---

[~chinnalalam] could you please update RB with your latest patch? Thanks.

 Beeline should not chop off describe extended results by default
 --

 Key: HIVE-6928
 URL: https://issues.apache.org/jira/browse/HIVE-6928
 Project: Hive
  Issue Type: Bug
  Components: CLI
Reporter: Szehon Ho
Assignee: Chinna Rao Lalam
 Attachments: HIVE-6928.1.patch, HIVE-6928.2.patch, HIVE-6928.patch


 By default, beeline truncates long results based on the console width like:
 {code}
 +-+--+
 |  col_name   |   
|
 +-+--+
 | pat_id  | string
|
 | score   | float 
|
 | acutes  | float 
|
 | |   
|
 | Detailed Table Information  | Table(tableName:refills, dbName:default, 
 owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto |
 +-+--+
 5 rows selected (0.4 seconds)
 {code}
 This can be changed by !outputformat, but the default should behave better to 
 give a better experience to the first-time beeline user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6394) Implement Timestmap in ParquetSerde

2014-06-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030675#comment-14030675
 ] 

Hive QA commented on HIVE-6394:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650102/HIVE-6394.7.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5613 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/455/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/455/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-455/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650102

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, 
 HIVE-6394.5.patch, HIVE-6394.6.patch, HIVE-6394.6.patch, HIVE-6394.7.patch, 
 HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Questions about Hive authorization under HDFS permission

2014-06-13 Thread Apple Wang
Hi, all

I have enabled hive authorization in my testing cluster. I use the user hive to 
create database hivedb and grant create privilege on hivedb to user root.

But I come across the following problem that root can not create table in 
hivedb even it has the create privilege.

FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
org.apache.hadoop.security.AccessControlException Permission denied: user=root, 
access=WRITE, inode=/tmp/user/hive/warehouse/hivedb.db:hive:hadoop:drwxr-xr-x
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:214)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:158)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5499)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5481)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5455)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3455)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3425)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3397)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:724)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:502)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48089)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)


It is obviously that the hivedb.db directory in HDFS are not allowed to be 
written by other user. So how does hive authorization work under the HDFS 
permissions?

PS. if I create a table by user hive and grant update privilege to user root. 
The same ERROR will come across if I load data into the table by root.

Look forward to your reply!

Thanks
Alex



[jira] [Created] (HIVE-7228) StreamPrinter should be joined to calling thread

2014-06-13 Thread Pankit Thapar (JIRA)
Pankit Thapar created HIVE-7228:
---

 Summary: StreamPrinter should be joined to calling thread 
 Key: HIVE-7228
 URL: https://issues.apache.org/jira/browse/HIVE-7228
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Pankit Thapar
Priority: Minor


ISSUE:
StreamPrinter class is used for connecting an input stream (connected to 
output) of a process with the output stream of a Session 
(CliSessionState/SessionState class)
It acts as a pipe between the two and transfers data from input stream to the 
output stream. THE TRANSFER OPERATION RUNS IN A SEPARATE THREAD. 

From some of the current usages of this class, I noticed that the calling 
threads do not wait for the transfer operation to be completed. That is, the 
calling thread does not join the SteamPrinter threads.
The calling thread would move forward thinking that the respective output 
stream already has the data needed. But, it is not always the right assumption 
since, it might happen that
the StreamPrinter thread did not finish execution by the time it was expected 
by the calling thread.

FIX:
To ensure that calling thread waits for the StreamPrinter threads to complete, 
StreamPrinter threads are joined to calling thread.

Please note , without the fix, TestCliDriverMethods#testRun failed sometimes 
(like 1 in 30 times). This test would not fail with this fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7201) Fix TestHiveConf#testConfProperties test case

2014-06-13 Thread Pankit Thapar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankit Thapar updated HIVE-7201:


Status: Patch Available  (was: Open)

 Fix TestHiveConf#testConfProperties test case
 -

 Key: HIVE-7201
 URL: https://issues.apache.org/jira/browse/HIVE-7201
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.0
Reporter: Pankit Thapar
Priority: Minor
 Attachments: HIVE-7201-1.patch, HIVE-7201-2.patch, 
 HIVE-7201.03.patch, HIVE-7201.patch


 CHANGE 1: 
 TEST CASE :
 The intention of TestHiveConf#testConfProperties() is to test the HiveConf 
 properties being set in the priority as expected.
 Each HiveConf object is initialized as follows:
 1) Hadoop configuration properties are applied.
 2) ConfVar properties with non-null values are overlayed.
 3) hive-site.xml properties are overlayed.
 ISSUE :
 The mapreduce related configurations are loaded by JobConf and not 
 Configuration.
 The current test tries to get the configuration properties  like : 
 HADOOPNUMREDUCERS (mapred.job.reduces)
 from Configuration class. But these mapreduce related properties are loaded 
 by JobConf class from mapred-default.xml.
 DETAILS :
 LINE  63 : checkHadoopConf(ConfVars.HADOOPNUMREDUCERS.varname, 1); --fails
 Because, 
 private void  checkHadoopConf(String name, String expectedHadoopVal) {
  Assert.assertEquals(expectedHadoopVal, new Configuration().get(name)); 
  Second parameter is null, since its the JobConf class and not the 
 Configuration class that initializes mapred-default values. 
 }
 Code that loads mapreduce resources is in ConfigUtil and JobConf makes a call 
 like this (in static block):
 public class JobConf extends Configuration {
   
   private static final Log LOG = LogFactory.getLog(JobConf.class);
   static{
 ConfigUtil.loadResources(); -- loads mapreduce related resources 
 (mapreduce-default.xml)
   }
 .
 }
 Please note, the test case assertion works fine if HiveConf() constructor is 
 called before this assertion since, HiveConf() triggers JobConf()
 which basically sets the default values of the properties pertaining to 
 mapreduce.
 This is why, there won't be any failures if testHiveSitePath() was run before 
 testConfProperties() as that would load mapreduce
 properties into config properties.
 FIX:
 Instead of using a Configuration object, we can use the JobConf object to get 
 the default values used by hadoop/mapreduce.
 CHANGE 2:
 In TestHiveConf#testHiveSitePath(), a call to static method 
 getHiveSiteLocation() should be called statically instead of using an object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7228) StreamPrinter should be joined to calling thread

2014-06-13 Thread Pankit Thapar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankit Thapar updated HIVE-7228:


Attachment: HIVE-7228.patch

Added join() to usages of StreamPrinter

 StreamPrinter should be joined to calling thread 
 -

 Key: HIVE-7228
 URL: https://issues.apache.org/jira/browse/HIVE-7228
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Pankit Thapar
Priority: Minor
 Attachments: HIVE-7228.patch


 ISSUE:
 StreamPrinter class is used for connecting an input stream (connected to 
 output) of a process with the output stream of a Session 
 (CliSessionState/SessionState class)
 It acts as a pipe between the two and transfers data from input stream to the 
 output stream. THE TRANSFER OPERATION RUNS IN A SEPARATE THREAD. 
 From some of the current usages of this class, I noticed that the calling 
 threads do not wait for the transfer operation to be completed. That is, the 
 calling thread does not join the SteamPrinter threads.
 The calling thread would move forward thinking that the respective output 
 stream already has the data needed. But, it is not always the right 
 assumption since, it might happen that
 the StreamPrinter thread did not finish execution by the time it was expected 
 by the calling thread.
 FIX:
 To ensure that calling thread waits for the StreamPrinter threads to 
 complete, StreamPrinter threads are joined to calling thread.
 Please note , without the fix, TestCliDriverMethods#testRun failed sometimes 
 (like 1 in 30 times). This test would not fail with this fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7228) StreamPrinter should be joined to calling thread

2014-06-13 Thread Pankit Thapar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankit Thapar updated HIVE-7228:


Status: Patch Available  (was: Open)

 StreamPrinter should be joined to calling thread 
 -

 Key: HIVE-7228
 URL: https://issues.apache.org/jira/browse/HIVE-7228
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Pankit Thapar
Priority: Minor
 Attachments: HIVE-7228.patch


 ISSUE:
 StreamPrinter class is used for connecting an input stream (connected to 
 output) of a process with the output stream of a Session 
 (CliSessionState/SessionState class)
 It acts as a pipe between the two and transfers data from input stream to the 
 output stream. THE TRANSFER OPERATION RUNS IN A SEPARATE THREAD. 
 From some of the current usages of this class, I noticed that the calling 
 threads do not wait for the transfer operation to be completed. That is, the 
 calling thread does not join the SteamPrinter threads.
 The calling thread would move forward thinking that the respective output 
 stream already has the data needed. But, it is not always the right 
 assumption since, it might happen that
 the StreamPrinter thread did not finish execution by the time it was expected 
 by the calling thread.
 FIX:
 To ensure that calling thread waits for the StreamPrinter threads to 
 complete, StreamPrinter threads are joined to calling thread.
 Please note , without the fix, TestCliDriverMethods#testRun failed sometimes 
 (like 1 in 30 times). This test would not fail with this fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7201) Fix TestHiveConf#testConfProperties test case

2014-06-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7201:
---

Assignee: Pankit Thapar

 Fix TestHiveConf#testConfProperties test case
 -

 Key: HIVE-7201
 URL: https://issues.apache.org/jira/browse/HIVE-7201
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.0
Reporter: Pankit Thapar
Assignee: Pankit Thapar
Priority: Minor
 Attachments: HIVE-7201-1.patch, HIVE-7201-2.patch, 
 HIVE-7201.03.patch, HIVE-7201.patch


 CHANGE 1: 
 TEST CASE :
 The intention of TestHiveConf#testConfProperties() is to test the HiveConf 
 properties being set in the priority as expected.
 Each HiveConf object is initialized as follows:
 1) Hadoop configuration properties are applied.
 2) ConfVar properties with non-null values are overlayed.
 3) hive-site.xml properties are overlayed.
 ISSUE :
 The mapreduce related configurations are loaded by JobConf and not 
 Configuration.
 The current test tries to get the configuration properties  like : 
 HADOOPNUMREDUCERS (mapred.job.reduces)
 from Configuration class. But these mapreduce related properties are loaded 
 by JobConf class from mapred-default.xml.
 DETAILS :
 LINE  63 : checkHadoopConf(ConfVars.HADOOPNUMREDUCERS.varname, 1); --fails
 Because, 
 private void  checkHadoopConf(String name, String expectedHadoopVal) {
  Assert.assertEquals(expectedHadoopVal, new Configuration().get(name)); 
  Second parameter is null, since its the JobConf class and not the 
 Configuration class that initializes mapred-default values. 
 }
 Code that loads mapreduce resources is in ConfigUtil and JobConf makes a call 
 like this (in static block):
 public class JobConf extends Configuration {
   
   private static final Log LOG = LogFactory.getLog(JobConf.class);
   static{
 ConfigUtil.loadResources(); -- loads mapreduce related resources 
 (mapreduce-default.xml)
   }
 .
 }
 Please note, the test case assertion works fine if HiveConf() constructor is 
 called before this assertion since, HiveConf() triggers JobConf()
 which basically sets the default values of the properties pertaining to 
 mapreduce.
 This is why, there won't be any failures if testHiveSitePath() was run before 
 testConfProperties() as that would load mapreduce
 properties into config properties.
 FIX:
 Instead of using a Configuration object, we can use the JobConf object to get 
 the default values used by hadoop/mapreduce.
 CHANGE 2:
 In TestHiveConf#testHiveSitePath(), a call to static method 
 getHiveSiteLocation() should be called statically instead of using an object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7201) Fix TestHiveConf#testConfProperties test case

2014-06-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030766#comment-14030766
 ] 

Ashutosh Chauhan commented on HIVE-7201:


+1

 Fix TestHiveConf#testConfProperties test case
 -

 Key: HIVE-7201
 URL: https://issues.apache.org/jira/browse/HIVE-7201
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.0
Reporter: Pankit Thapar
Assignee: Pankit Thapar
Priority: Minor
 Attachments: HIVE-7201-1.patch, HIVE-7201-2.patch, 
 HIVE-7201.03.patch, HIVE-7201.patch


 CHANGE 1: 
 TEST CASE :
 The intention of TestHiveConf#testConfProperties() is to test the HiveConf 
 properties being set in the priority as expected.
 Each HiveConf object is initialized as follows:
 1) Hadoop configuration properties are applied.
 2) ConfVar properties with non-null values are overlayed.
 3) hive-site.xml properties are overlayed.
 ISSUE :
 The mapreduce related configurations are loaded by JobConf and not 
 Configuration.
 The current test tries to get the configuration properties  like : 
 HADOOPNUMREDUCERS (mapred.job.reduces)
 from Configuration class. But these mapreduce related properties are loaded 
 by JobConf class from mapred-default.xml.
 DETAILS :
 LINE  63 : checkHadoopConf(ConfVars.HADOOPNUMREDUCERS.varname, 1); --fails
 Because, 
 private void  checkHadoopConf(String name, String expectedHadoopVal) {
  Assert.assertEquals(expectedHadoopVal, new Configuration().get(name)); 
  Second parameter is null, since its the JobConf class and not the 
 Configuration class that initializes mapred-default values. 
 }
 Code that loads mapreduce resources is in ConfigUtil and JobConf makes a call 
 like this (in static block):
 public class JobConf extends Configuration {
   
   private static final Log LOG = LogFactory.getLog(JobConf.class);
   static{
 ConfigUtil.loadResources(); -- loads mapreduce related resources 
 (mapreduce-default.xml)
   }
 .
 }
 Please note, the test case assertion works fine if HiveConf() constructor is 
 called before this assertion since, HiveConf() triggers JobConf()
 which basically sets the default values of the properties pertaining to 
 mapreduce.
 This is why, there won't be any failures if testHiveSitePath() was run before 
 testConfProperties() as that would load mapreduce
 properties into config properties.
 FIX:
 Instead of using a Configuration object, we can use the JobConf object to get 
 the default values used by hadoop/mapreduce.
 CHANGE 2:
 In TestHiveConf#testHiveSitePath(), a call to static method 
 getHiveSiteLocation() should be called statically instead of using an object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Disc out of space error

2014-06-13 Thread Chen, Yanlin
Hi,

One of my job keeps facing FSError: java.io.IOException: No space left on 
device with some tasks fail with 
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid 
local directory for output/file.out at  on Host 
node72-142.prod-aws.eadpdata.ea.com
OR org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any 
valid local directory for 
attempt_201405211957_566618_m_01_0/intermediate.34 at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
 at ...

The nodes failed the tasks don't look that full and the stats for this job is 
attached below.
The job is doing a self inner join in the subquery then do some aggregation.

Does anybody possibly know what's the reason the job fails on space issue while 
we still have some space?
And is there any way to optimize the query itself besides the space cleanup?

Thanks a lot!


SET mapred.max.split.size=134217728;
SET mapred.min.split.size.per.node=1;
SET mapred.min.split.size.per.rack=1;

CREATE EXTERNAL TABLE IF NOT EXISTS mpst.score_per_min_v2
(
game_name STRING,
hosted_platform STRING,
s_kit STRING,
vehicle STRING,
score_amt FLOAT,
min_spent FLOAT,
score_per_min FLOAT
)
PARTITIONED BY (load_datetime STRING)
STORED AS RCFILE
LOCATION '/hive/warehouse/mpst/score_per_min_v2';

INSERT OVERWRITE TABLE score_per_min_v2 PARTITION(load_datetime='2014-06-09 
23-58-00')
SELECT game_name, hosted_platform,
CASE WHEN s_kit IS NOT NULL THEN s_kit ELSE NA END AS s_kit,
vehicle,
SUM(score_amt),
SUM(time_duration/60) AS min_spent,
CASE WHEN SUM(time_duration/60)=0 THEN 0.0 ELSE 
round(SUM(score_amt)/SUM(time_duration/60),2) END AS score_per_min
FROM
(
SELECT
c.round_guid AS round_guid,
c.persona_id AS persona_id,
c.player_id AS player_id,
c.round_start_datetime AS round_start_datetime,
c.s_kit AS s_kit,
c.vehicle AS vehicle,
a.round_time AS start_time,
c.round_time AS end_time,
(c.round_time - a.round_time) AS time_duration,
c.score_amt,
c.hosted_platform,
c.game_name
FROM
mpst.spm_stg_v2 c
INNER JOIN
mpst.spm_stg_v2 a
ON
a.dt= '2014-06-10' AND c.dt = '2014-06-10' AND a.dt = c.dt AND a.service = 
c.service AND a.hour = c.hour
AND a.round_guid = c.round_guid AND a.player_id = c.player_id AND 
a.hosted_platform = c.hosted_platform AND a.persona_id = c.persona_id AND 
a.player_id = c.player_id AND a.round_start_datetime = c.round_start_datetime 
AND a.rank = (c.rank - 1)
) x
GROUP BY game_name, hosted_platform, s_kit, vehicle;


Map-Reduce Framework

Map output materialized bytes

173,033,990,918

0

173,033,990,918

Map input records

555,343,308

0

555,343,308

Reduce shuffle bytes

0

173,033,990,918

173,033,990,918

Spilled Records

4,188,988,304

1,350,009,594

5,538,997,898

Map output bytes

169,705,718,344

0

169,705,718,344

Total committed heap usage (bytes)

3,002,007,552

553,385,984

3,555,393,536

CPU time spent (ms)

26,347,260

10,932,050

37,279,310

Map input bytes

1,275,536,063

0

1,275,536,063

SPLIT_RAW_BYTES

13,493

0

13,493

Combine input records

0

0

0

Reduce input records

0

1,110,686,616

1,110,686,616

Reduce input groups

0

1,110,686,616

1,110,686,616

Combine output records

0

0

0

Physical memory (bytes) snapshot

3,628,310,528

493,240,320

4,121,550,848

Reduce output records

0

0

0

Virtual memory (bytes) snapshot

21,354,807,296

4,420,263,936

25,775,071,232

Map output records

1,110,686,616

0

1,110,686,616



Regards,

Y. Chen
--- Perspiration never betray you ---




[jira] [Updated] (HIVE-7228) StreamPrinter should be joined to calling thread

2014-06-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7228:
---

Assignee: Pankit Thapar

 StreamPrinter should be joined to calling thread 
 -

 Key: HIVE-7228
 URL: https://issues.apache.org/jira/browse/HIVE-7228
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Pankit Thapar
Assignee: Pankit Thapar
Priority: Minor
 Attachments: HIVE-7228.patch


 ISSUE:
 StreamPrinter class is used for connecting an input stream (connected to 
 output) of a process with the output stream of a Session 
 (CliSessionState/SessionState class)
 It acts as a pipe between the two and transfers data from input stream to the 
 output stream. THE TRANSFER OPERATION RUNS IN A SEPARATE THREAD. 
 From some of the current usages of this class, I noticed that the calling 
 threads do not wait for the transfer operation to be completed. That is, the 
 calling thread does not join the SteamPrinter threads.
 The calling thread would move forward thinking that the respective output 
 stream already has the data needed. But, it is not always the right 
 assumption since, it might happen that
 the StreamPrinter thread did not finish execution by the time it was expected 
 by the calling thread.
 FIX:
 To ensure that calling thread waits for the StreamPrinter threads to 
 complete, StreamPrinter threads are joined to calling thread.
 Please note , without the fix, TestCliDriverMethods#testRun failed sometimes 
 (like 1 in 30 times). This test would not fail with this fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7228) StreamPrinter should be joined to calling thread

2014-06-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030771#comment-14030771
 ] 

Ashutosh Chauhan commented on HIVE-7228:


+1

 StreamPrinter should be joined to calling thread 
 -

 Key: HIVE-7228
 URL: https://issues.apache.org/jira/browse/HIVE-7228
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Pankit Thapar
Assignee: Pankit Thapar
Priority: Minor
 Attachments: HIVE-7228.patch


 ISSUE:
 StreamPrinter class is used for connecting an input stream (connected to 
 output) of a process with the output stream of a Session 
 (CliSessionState/SessionState class)
 It acts as a pipe between the two and transfers data from input stream to the 
 output stream. THE TRANSFER OPERATION RUNS IN A SEPARATE THREAD. 
 From some of the current usages of this class, I noticed that the calling 
 threads do not wait for the transfer operation to be completed. That is, the 
 calling thread does not join the SteamPrinter threads.
 The calling thread would move forward thinking that the respective output 
 stream already has the data needed. But, it is not always the right 
 assumption since, it might happen that
 the StreamPrinter thread did not finish execution by the time it was expected 
 by the calling thread.
 FIX:
 To ensure that calling thread waits for the StreamPrinter threads to 
 complete, StreamPrinter threads are joined to calling thread.
 Please note , without the fix, TestCliDriverMethods#testRun failed sometimes 
 (like 1 in 30 times). This test would not fail with this fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set

2014-06-13 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030784#comment-14030784
 ] 

Naveen Gangam commented on HIVE-7200:
-

Done. The review has been updated with the latest diff.

 Beeline output displays column heading even if --showHeader=false is set
 

 Key: HIVE-7200
 URL: https://issues.apache.org/jira/browse/HIVE-7200
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7200.1.patch, HIVE-7200.2.patch


 A few minor/cosmetic issues with the beeline CLI.
 1) Tool prints the column headers despite setting the --showHeader to false. 
 This property only seems to affect the subsequent header information that 
 gets printed based on the value of property headerInterval (default value 
 is 100).
 2) When showHeader is true  headerInterval  0, the header after the 
 first interval gets printed after headerInterval - 1 rows. The code seems 
 to count the initial header as a row, if you will.
 3) The table footer(the line that closes the table) does not get printed if 
 the showHeader is false. I think the table should get closed irrespective 
 of whether it prints the header or not.
 {code}
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 +--+
 6 rows selected (3.998 seconds)
 0: jdbc:hive2://localhost:1 !set headerInterval 2
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 +--+
 | val  |
 +--+
 | f|
 | T|
 +--+
 | val  |
 +--+
 | F|
 | 0|
 +--+
 | val  |
 +--+
 | 1|
 +--+
 6 rows selected (0.691 seconds)
 0: jdbc:hive2://localhost:1 !set showHeader false
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 6 rows selected (1.728 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7182) ResultSet is not closed in JDBCStatsPublisher#init()

2014-06-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7182:
---

Assignee: steve, Oh
  Status: Open  (was: Patch Available)

Patch fails to compile.

 ResultSet is not closed in JDBCStatsPublisher#init()
 

 Key: HIVE-7182
 URL: https://issues.apache.org/jira/browse/HIVE-7182
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: steve, Oh
Priority: Minor
 Attachments: HIVE-7182.1.patch, HIVE-7182.patch


 {code}
 ResultSet rs = dbm.getTables(null, null, 
 JDBCStatsUtils.getStatTableName(), null);
 boolean tblExists = rs.next();
 {code}
 rs is not closed upon return from init()
 If stmt.executeUpdate() throws exception, stmt.close() would be skipped - the 
 close() call should be placed in finally block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7183) Size of partColumnGrants should be checked in ObjectStore#removeRole()

2014-06-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7183:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
 Assignee: SUYEON LEE
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Suyeon!

 Size of partColumnGrants should be checked in ObjectStore#removeRole()
 --

 Key: HIVE-7183
 URL: https://issues.apache.org/jira/browse/HIVE-7183
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: SUYEON LEE
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7183.patch


 Here is related code:
 {code}
 ListMPartitionColumnPrivilege partColumnGrants = 
 listPrincipalAllPartitionColumnGrants(
 mRol.getRoleName(), PrincipalType.ROLE);
 if (tblColumnGrants.size()  0) {
   pm.deletePersistentAll(partColumnGrants);
 {code}
 Size of tblColumnGrants is currently checked.
 Size of partColumnGrants should be checked instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7216) Hive Query Failure on Hive 0.10.0

2014-06-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030793#comment-14030793
 ] 

Ashutosh Chauhan commented on HIVE-7216:


{{org.apache.hive.hcatalog.data.JsonSerDe}} is a json serde shipped with Hive 
and is supported by project. Please switch using to that.

 Hive Query Failure on Hive 0.10.0
 -

 Key: HIVE-7216
 URL: https://issues.apache.org/jira/browse/HIVE-7216
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
 Environment: hadoop 0.20.0, hive 0.10.0, Ubuntu 13.04 LTS
Reporter: Suddhasatwa Bhaumik
 Attachments: HadoopTaskDetails.html


 Hello,
 I have created a table and a view in hive as below:
 ADD JAR json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar;
 CREATE EXTERNAL TABLE IF NOT EXISTS ulf_raw (
transactionid STRING,
externaltraceid STRING,
externalreferenceid STRING,
usecaseid STRING,
timestampin STRING,
timestampout STRING,
component STRING,
destination STRING,
callerid STRING,
service STRING,
logpoint STRING,
requestin STRING,
status STRING,
errorcode STRING,
error STRING,
servername STRING,
inboundrequestip STRING,
inboundrequestport STRING,
outboundurl STRING,
messagesize STRING,
jmsdestination STRING,
msisdn STRING,
countrycode STRING,
acr STRING,
imei STRING,
imsi STRING,
iccid STRING,
email STRING,
payload STRING
 )
 ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
 WITH SERDEPROPERTIES ( mapping.transactionid = 
 transaction-id,mapping.timestampin = timestamp-in )
 LOCATION '/home/bhaumik/input';
 ADD JAR json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar;
 create view IF NOT EXISTS parse_soap_payload
 as
 select
 transactionid,
 component,
 logpoint,
 g.service as service,
 case g.service
 when 'createHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()')
 when 'retrieveHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()')
 when 'updateHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()')
 end as opcoNodeId
 ,
 case g.service
 when 'createHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'opcoId\']/text()')
 when 'retrieveHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'opcoId\']/text()')
 when 'updateHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'opcoId\']/text()')
 end as opcoId
 ,
 case g.service
 when 'createHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()')
 when 'retrieveHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()')
 when 'updateHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()')
 end as partnerParentNodeId
 ,
 case g.service
 when 'createHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'partnerId\']/text()')
 when 'retrieveHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'partnerId\']/text()')
 when 'updateHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'partnerId\']/text()')
 end as partnerId
 from ulf_raw g;
 When I am running hive query: select * from parse_soap_payload;
 it is failing with attached error. 
 I only have json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar file in 
 Hadoop LIB and HIVE LIB folder. Please advise if there are other JAR files 
 required to be added here. If yes, please advise from where I can download 
 them?
 Thanks,
 Suddhasatwa



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead

2014-06-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030807#comment-14030807
 ] 

Sergey Shelukhin commented on HIVE-6430:


They are already documented in config template as far as I recall. Should we 
have that copied to wiki automatically somehow?

 MapJoin hash table has large memory overhead
 

 Key: HIVE-6430
 URL: https://issues.apache.org/jira/browse/HIVE-6430
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, 
 HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, 
 HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, 
 HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, 
 HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, 
 HIVE-6430.14.patch, HIVE-6430.patch


 Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 
 for row) can take several hundred bytes, which is ridiculous. I am reducing 
 the size of MJKey and MJRowContainer in other jiras, but in general we don't 
 need to have java hash table there.  We can either use primitive-friendly 
 hashtable like the one from HPPC (Apache-licenced), or some variation, to map 
 primitive keys to single row storage structure without an object per row 
 (similar to vectorization).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive

2014-06-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5771:
---

Attachment: HIVE-5771.12.patch

updated .q.out files.

 Constant propagation optimizer for Hive
 ---

 Key: HIVE-5771
 URL: https://issues.apache.org/jira/browse/HIVE-5771
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ted Xu
Assignee: Ted Xu
 Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, 
 HIVE-5771.11.patch, HIVE-5771.12.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, 
 HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, 
 HIVE-5771.8.patch, HIVE-5771.9.patch, HIVE-5771.patch, 
 HIVE-5771.patch.javaonly


 Currently there is no constant folding/propagation optimizer, all expressions 
 are evaluated at runtime. 
 HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
 however, it is still a runtime evaluation and it doesn't propagate constants 
 from a subquery to outside.
 It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive

2014-06-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5771:
---

Status: Patch Available  (was: Open)

 Constant propagation optimizer for Hive
 ---

 Key: HIVE-5771
 URL: https://issues.apache.org/jira/browse/HIVE-5771
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ted Xu
Assignee: Ted Xu
 Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, 
 HIVE-5771.11.patch, HIVE-5771.12.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, 
 HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, 
 HIVE-5771.8.patch, HIVE-5771.9.patch, HIVE-5771.patch, 
 HIVE-5771.patch.javaonly


 Currently there is no constant folding/propagation optimizer, all expressions 
 are evaluated at runtime. 
 HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
 however, it is still a runtime evaluation and it doesn't propagate constants 
 from a subquery to outside.
 It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive

2014-06-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5771:
---

Status: Open  (was: Patch Available)

 Constant propagation optimizer for Hive
 ---

 Key: HIVE-5771
 URL: https://issues.apache.org/jira/browse/HIVE-5771
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ted Xu
Assignee: Ted Xu
 Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, 
 HIVE-5771.11.patch, HIVE-5771.12.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, 
 HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, 
 HIVE-5771.8.patch, HIVE-5771.9.patch, HIVE-5771.patch, 
 HIVE-5771.patch.javaonly


 Currently there is no constant folding/propagation optimizer, all expressions 
 are evaluated at runtime. 
 HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
 however, it is still a runtime evaluation and it doesn't propagate constants 
 from a subquery to outside.
 It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5771) Constant propagation optimizer for Hive

2014-06-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030841#comment-14030841
 ] 

Ashutosh Chauhan commented on HIVE-5771:


Test subquery_in.q failed with exception:
{code}
java.lang.Exception: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to 
deserialize reduce input key from x1x128x0x0x1 with properties 
{columns=reducesinkkey0,reducesinkkey1, 
serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
 serialization.sort.order=++, columns.types=int,int}
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to 
deserialize reduce input key from x1x128x0x0x1 with properties 
{columns=reducesinkkey0,reducesinkkey1, 
serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
 serialization.sort.order=++, columns.types=int,int}
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:695)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
Error: Unable to deserialize reduce input key from x1x128x0x0x1 with properties 
{columns=reducesinkkey0,reducesinkkey1, 
serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
 serialization.sort.order=++, columns.types=int,int}
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:222)
... 9 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException
at 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:191)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:220)
... 9 more
Caused by: java.io.EOFException
at 
org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54)
at 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:201)
at 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:187)
... 10 more

{code}
subquery_views.q is failing with following exception
{code}
java.lang.Exception: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to 
deserialize reduce input key from  with properties {columns=reducesinkkey0, 
serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
 serialization.sort.order=+, columns.types=string}
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to 
deserialize reduce input key from  with properties {columns=reducesinkkey0, 
serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
 serialization.sort.order=+, columns.types=string}
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:695)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
Error: Unable to 

[jira] [Commented] (HIVE-7005) MiniTez tests have non-deterministic explain plans

2014-06-13 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030863#comment-14030863
 ] 

Jason Dere commented on HIVE-7005:
--

+1 if tests pass

 MiniTez tests have non-deterministic explain plans
 --

 Key: HIVE-7005
 URL: https://issues.apache.org/jira/browse/HIVE-7005
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Gunther Hagleitner
 Attachments: HIVE-7005.1.patch


 TestMiniTezCliDriver has a few test failures where there is a diff in the 
 explain plan generated. According to Vikram, the plan generated is correct, 
 but the plan can be generated in a couple of different ways and so sometimes 
 the plan will not diff against the expected output. We should probably come 
 up with a way to validate this explain plan in a reproducible way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-13 Thread Ivan Mitic


 On June 12, 2014, 9:03 p.m., Eugene Koifman wrote:
  hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/TempletonUtils.java,
   line 352
  https://reviews.apache.org/r/22329/diff/2/?file=607831#file607831line352
 
  Is there a reason 
  org.apache.hadoop.util.ClassUtil.findContainingJar(Class? clazz) won't 
  work?

ClassUtil is declared as a private interface so I don't think we should take a 
dependency on it. Besides this, there was another problem where I wanted to 
match the file name to hive-shims. This is to avoid accidentally picking up 
hive-exec.jar which also contains shim classes and is a 15MB jar (not sure if 
hive-exec including shims is intentional or by accident though). 


- Ivan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22329/#review45536
---


On June 12, 2014, 12:04 a.m., Ivan Mitic wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22329/
 ---
 
 (Updated June 12, 2014, 12:04 a.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Approach in the patch is similar to what Oozie does to handle this situation. 
 Specifically, all child map jobs get tagged with the launcher MR job id. On 
 launcher task restart, launcher queries RM for the list of jobs that have the 
 tag and kills them. After that it moves on to start the same child job again. 
 Again, similarly to what Oozie does, a new templeton.job.launch.time property 
 is introduced that captures the launcher job submit timestamp and later used 
 to reduce the search window when RM is queried. 
 
 To validate the patch, you will need to add webhcat shim jars to 
 templeton.libjars as now webhcat launcher also has a dependency on hadoop 
 shims. 
 
 I have noticed that in case of the SqoopDelegator webhcat currently does not 
 set the MR delegation token when optionsFile flag is used. This also creates 
 the problem in this scenario. This looks like something that should be 
 handled via a separate Jira.
 
 
 Diffs
 -
 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java
  23b1c4f 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java
  41b1dc5 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java
  04a5c6f 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java
  04e061d 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java
  adcd917 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java
  a6355a6 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java
  556ee62 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/TempletonUtils.java
  fff4b68 
   
 hcatalog/webhcat/svr/src/test/java/org/apache/hive/hcatalog/templeton/tool/TestTempletonUtils.java
  8b46d38 
   shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java 
 d3552c1 
   shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java 
 5a728b2 
   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
 299e918 
 
 Diff: https://reviews.apache.org/r/22329/diff/
 
 
 Testing
 ---
 
 I have validated that MR, Pig and Hive jobs do get tagged appropriately. I 
 have also validated that previous child jobs do get killed on RM 
 failover/task failure.
 
 
 Thanks,
 
 Ivan Mitic
 




Re: Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-13 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22329/#review45631
---

Ship it!


Ship It!

- Eugene Koifman


On June 12, 2014, 12:04 a.m., Ivan Mitic wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22329/
 ---
 
 (Updated June 12, 2014, 12:04 a.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Approach in the patch is similar to what Oozie does to handle this situation. 
 Specifically, all child map jobs get tagged with the launcher MR job id. On 
 launcher task restart, launcher queries RM for the list of jobs that have the 
 tag and kills them. After that it moves on to start the same child job again. 
 Again, similarly to what Oozie does, a new templeton.job.launch.time property 
 is introduced that captures the launcher job submit timestamp and later used 
 to reduce the search window when RM is queried. 
 
 To validate the patch, you will need to add webhcat shim jars to 
 templeton.libjars as now webhcat launcher also has a dependency on hadoop 
 shims. 
 
 I have noticed that in case of the SqoopDelegator webhcat currently does not 
 set the MR delegation token when optionsFile flag is used. This also creates 
 the problem in this scenario. This looks like something that should be 
 handled via a separate Jira.
 
 
 Diffs
 -
 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java
  23b1c4f 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java
  41b1dc5 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java
  04a5c6f 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java
  04e061d 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java
  adcd917 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java
  a6355a6 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java
  556ee62 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/TempletonUtils.java
  fff4b68 
   
 hcatalog/webhcat/svr/src/test/java/org/apache/hive/hcatalog/templeton/tool/TestTempletonUtils.java
  8b46d38 
   shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java 
 d3552c1 
   shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java 
 5a728b2 
   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
 299e918 
 
 Diff: https://reviews.apache.org/r/22329/diff/
 
 
 Testing
 ---
 
 I have validated that MR, Pig and Hive jobs do get tagged appropriately. I 
 have also validated that previous child jobs do get killed on RM 
 failover/task failure.
 
 
 Thanks,
 
 Ivan Mitic
 




Re: Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-13 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22329/#review45630
---

Ship it!


Ship It!

- Eugene Koifman


On June 12, 2014, 12:04 a.m., Ivan Mitic wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22329/
 ---
 
 (Updated June 12, 2014, 12:04 a.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Approach in the patch is similar to what Oozie does to handle this situation. 
 Specifically, all child map jobs get tagged with the launcher MR job id. On 
 launcher task restart, launcher queries RM for the list of jobs that have the 
 tag and kills them. After that it moves on to start the same child job again. 
 Again, similarly to what Oozie does, a new templeton.job.launch.time property 
 is introduced that captures the launcher job submit timestamp and later used 
 to reduce the search window when RM is queried. 
 
 To validate the patch, you will need to add webhcat shim jars to 
 templeton.libjars as now webhcat launcher also has a dependency on hadoop 
 shims. 
 
 I have noticed that in case of the SqoopDelegator webhcat currently does not 
 set the MR delegation token when optionsFile flag is used. This also creates 
 the problem in this scenario. This looks like something that should be 
 handled via a separate Jira.
 
 
 Diffs
 -
 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java
  23b1c4f 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java
  41b1dc5 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java
  04a5c6f 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java
  04e061d 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java
  adcd917 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java
  a6355a6 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java
  556ee62 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/TempletonUtils.java
  fff4b68 
   
 hcatalog/webhcat/svr/src/test/java/org/apache/hive/hcatalog/templeton/tool/TestTempletonUtils.java
  8b46d38 
   shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java 
 d3552c1 
   shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java 
 5a728b2 
   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
 299e918 
 
 Diff: https://reviews.apache.org/r/22329/diff/
 
 
 Testing
 ---
 
 I have validated that MR, Pig and Hive jobs do get tagged appropriately. I 
 have also validated that previous child jobs do get killed on RM 
 failover/task failure.
 
 
 Thanks,
 
 Ivan Mitic
 




[jira] [Commented] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-13 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030933#comment-14030933
 ] 

Eugene Koifman commented on HIVE-7190:
--

+1

 WebHCat launcher task failure can cause two concurent user jobs to run
 --

 Key: HIVE-7190
 URL: https://issues.apache.org/jira/browse/HIVE-7190
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.0
Reporter: Ivan Mitic
 Attachments: HIVE-7190.2.patch, HIVE-7190.3.patch, HIVE-7190.patch


 Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs 
 are 1-map jobs (a single task jobs) which kick off the actual user job and 
 monitor it until it finishes. Given that the launcher is a task, like any 
 other MR task, it has a retry policy in case it fails (due to a task crash, 
 tasktracker/nodemanager crash, machine level outage, etc.). Further, when 
 launcher task is retried, it will again launch the same user job, *however* 
 the previous attempt user job is already running. What this means is that we 
 can have two identical user jobs running in parallel. 
 In case of MRv2, there will be an MRAppMaster and the launcher task, which 
 are subject to failure. In case any of the two fails, another instance of a 
 user job will be launched again in parallel. 
 Above situation is already a bug.
 Now going further to RM HA, what RM does on failover/restart is that it kills 
 all containers, and it restarts all applications. This means that if our 
 customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user 
 jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will 
 queue user jobs again. There are two issues with this design:
 1. There are *possible* chances for corruption of job outputs (it would be 
 useful to analyze this scenario more and confirm this statement).
 2. Cluster resources are spent on jobs redundantly
 To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should 
 do the same thing Oozie does in this scenario, and that is to tag all its 
 child jobs with an id, and kill those jobs on task restart before they are 
 kicked off again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set

2014-06-13 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-7200:


Attachment: HIVE-7200.3.patch

Fixed a code style issue in this revision of the patch. 

 Beeline output displays column heading even if --showHeader=false is set
 

 Key: HIVE-7200
 URL: https://issues.apache.org/jira/browse/HIVE-7200
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7200.1.patch, HIVE-7200.2.patch, HIVE-7200.3.patch


 A few minor/cosmetic issues with the beeline CLI.
 1) Tool prints the column headers despite setting the --showHeader to false. 
 This property only seems to affect the subsequent header information that 
 gets printed based on the value of property headerInterval (default value 
 is 100).
 2) When showHeader is true  headerInterval  0, the header after the 
 first interval gets printed after headerInterval - 1 rows. The code seems 
 to count the initial header as a row, if you will.
 3) The table footer(the line that closes the table) does not get printed if 
 the showHeader is false. I think the table should get closed irrespective 
 of whether it prints the header or not.
 {code}
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 +--+
 6 rows selected (3.998 seconds)
 0: jdbc:hive2://localhost:1 !set headerInterval 2
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 +--+
 | val  |
 +--+
 | f|
 | T|
 +--+
 | val  |
 +--+
 | F|
 | 0|
 +--+
 | val  |
 +--+
 | 1|
 +--+
 6 rows selected (0.691 seconds)
 0: jdbc:hive2://localhost:1 !set showHeader false
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 6 rows selected (1.728 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set

2014-06-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030957#comment-14030957
 ] 

Xuefu Zhang commented on HIVE-7200:
---

+1

 Beeline output displays column heading even if --showHeader=false is set
 

 Key: HIVE-7200
 URL: https://issues.apache.org/jira/browse/HIVE-7200
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7200.1.patch, HIVE-7200.2.patch, HIVE-7200.3.patch


 A few minor/cosmetic issues with the beeline CLI.
 1) Tool prints the column headers despite setting the --showHeader to false. 
 This property only seems to affect the subsequent header information that 
 gets printed based on the value of property headerInterval (default value 
 is 100).
 2) When showHeader is true  headerInterval  0, the header after the 
 first interval gets printed after headerInterval - 1 rows. The code seems 
 to count the initial header as a row, if you will.
 3) The table footer(the line that closes the table) does not get printed if 
 the showHeader is false. I think the table should get closed irrespective 
 of whether it prints the header or not.
 {code}
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 +--+
 6 rows selected (3.998 seconds)
 0: jdbc:hive2://localhost:1 !set headerInterval 2
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 +--+
 | val  |
 +--+
 | f|
 | T|
 +--+
 | val  |
 +--+
 | F|
 | 0|
 +--+
 | val  |
 +--+
 | 1|
 +--+
 6 rows selected (0.691 seconds)
 0: jdbc:hive2://localhost:1 !set showHeader false
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 6 rows selected (1.728 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7226) Windowing Streaming mode causes NPE for empty partitions

2014-06-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030964#comment-14030964
 ] 

Hive QA commented on HIVE-7226:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650117/HIVE-7226.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5535 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/456/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/456/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-456/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650117

 Windowing Streaming mode causes NPE for empty partitions
 

 Key: HIVE-7226
 URL: https://issues.apache.org/jira/browse/HIVE-7226
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7226.1.patch


 Change in HIVE-7062 doesn't handle empty partitions properly. StreamingState 
 is not correctly initialized for empty partition



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Documentation Policy

2014-06-13 Thread Szehon Ho
Yea, I'd imagine the TODOC tag pollutes the query of TODOC's and confuses
the state of a JIRA, so its probably best to remove it.

The idea of docdone is to query what docs got produced and needs review?
 It might be nice to have a tag for that, to easily signal to contributor
or interested parties to take a look.

On a side note, I already find very helpful your JIRA comments with links
to doc-wikis, both to inform the contributor and just as reference for
others.  Thanks again for the great work.


On Fri, Jun 13, 2014 at 1:33 AM, Lefty Leverenz leftylever...@gmail.com
wrote:

 One more question:  what should we do after the documentation is done for a
 JIRA ticket?

 (a) Just remove the TODOC## label.
 (b) Replace TODOC## with docdone (no caps, no version number).
 (c) Add a docdone label but keep TODOC##.
 (d) Something else.


 -- Lefty


 On Thu, Jun 12, 2014 at 12:54 PM, Brock Noland br...@cloudera.com wrote:

  Thank you guys! This is great work.
 
 
  On Wed, Jun 11, 2014 at 6:20 PM, kulkarni.swar...@gmail.com 
  kulkarni.swar...@gmail.com wrote:
 
   Going through the issues, I think overall Lefty did an awesome job
  catching
   and documenting most of them in time. Following are some of the 0.13
 and
   0.14 ones which I found which either do not have documentation or have
   outdated one and probably need one to be consumeable. Contributors,
 feel
   free to remove the label if you disagree.
  
   *TODOC13:*
  
  
 
 https://issues.apache.org/jira/browse/HIVE-6827?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC13%20AND%20status%20in%20(Resolved%2C%20Closed)
  
   *TODOC14:*
  
  
 
 https://issues.apache.org/jira/browse/HIVE-6999?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC14%20AND%20status%20in%20(Resolved%2C%20Closed)
  
   I'll continue digging through the queue going backwards to 0.12 and
 0.11
   and see if I find similar stuff there as well.
  
  
  
   On Wed, Jun 11, 2014 at 10:36 AM, kulkarni.swar...@gmail.com 
   kulkarni.swar...@gmail.com wrote:
  
 Feel free to label such jiras with this keyword and ask the
   contributors
for more information if you need any.
   
Cool. I'll start chugging through the queue today adding labels as
 apt.
   
   
On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair the...@hortonworks.com
 
wrote:
   
 Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13?
Sounds good to me.
   
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
  entity
to
which it is addressed and may contain information that is
  confidential,
privileged and exempt from disclosure under applicable law. If the
   reader
of this message is not the intended recipient, you are hereby
 notified
that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
immediately
and delete it from your system. Thank You.
   
   
   
   
--
Swarnim
   
  
  
  
   --
   Swarnim
  
 



[jira] [Updated] (HIVE-7210) NPE with No plan file found when running Driver instances on multiple threads

2014-06-13 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7210:
-

Attachment: HIVE-7210.1.patch

Patch to prevent getSplits() from removing cached plans from other queries. 
Talked to Gunther and he said he can eliminate the call to clear the cached 
plan from getSplits() altogether, so this may not be the final fix.

 NPE with No plan file found when running Driver instances on multiple 
 threads
 ---

 Key: HIVE-7210
 URL: https://issues.apache.org/jira/browse/HIVE-7210
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Gunther Hagleitner
 Attachments: HIVE-7210.1.patch


 Informatica has a multithreaded application running multiple instances of 
 CLIDriver.  When running concurrent queries they sometimes hit the following 
 error:
 {noformat}
 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO 
 org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: 
 hdfs://ICRHHW21NODE1:8020/tmp/hive-qamercury/hive_2014-05-30_10-24-57_346_890014621821056491-2/-mr-10002/6169987c-3263-4737-b5cb-38daab882afb/map.xml
 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO 
 org.apache.hadoop.mapreduce.JobSubmitter: Cleaning up the staging area 
 /tmp/hadoop-yarn/staging/qamercury/.staging/job_1401360353644_0078
 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :ERROR 
 org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 
 'java.lang.NullPointerException(null)'
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:271)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
 at 
 org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
 at 
 org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
 at 
 org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
 at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
 at 
 org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
 at 
 org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
 at 
 org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1089)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
 at 
 com.informatica.platform.dtm.executor.hive.impl.AbstractHiveDriverBaseImpl.run(AbstractHiveDriverBaseImpl.java:86)
 at 
 com.informatica.platform.dtm.executor.hive.MHiveDriver.executeQuery(MHiveDriver.java:126)
 at 
 com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeQuery(HiveTaskHandlerImpl.java:358)
 at 
 com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeScript(HiveTaskHandlerImpl.java:247)
 at 
 com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeMainScript(HiveTaskHandlerImpl.java:194)
 at 
 com.informatica.platform.ldtm.executor.common.workflow.taskhandler.impl.BaseTaskHandlerImpl.run(BaseTaskHandlerImpl.java:126)
 at 
 

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-13 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.4.patch

Rebased onto trunk and fixed two broken hbase tests.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7226) Windowing Streaming mode causes NPE for empty partitions

2014-06-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7226:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Harish!

 Windowing Streaming mode causes NPE for empty partitions
 

 Key: HIVE-7226
 URL: https://issues.apache.org/jira/browse/HIVE-7226
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.14.0

 Attachments: HIVE-7226.1.patch


 Change in HIVE-7062 doesn't handle empty partitions properly. StreamingState 
 is not correctly initialized for empty partition



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7210) NPE with No plan file found when running Driver instances on multiple threads

2014-06-13 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031059#comment-14031059
 ] 

Gunther Hagleitner commented on HIVE-7210:
--

Thanks [~jdere]. My plan was to do this purely in HiveSplitGen for Tez. But I 
think Vikram re-introduced a path that doesn't go through HiveSplitGen (rather 
- I broke something, he fixed it by adding that path back in). [~vikram.dixit] 
- can you confirm that?

If that's the case the patch you uploaded is probably the best fix.

 NPE with No plan file found when running Driver instances on multiple 
 threads
 ---

 Key: HIVE-7210
 URL: https://issues.apache.org/jira/browse/HIVE-7210
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Gunther Hagleitner
 Attachments: HIVE-7210.1.patch


 Informatica has a multithreaded application running multiple instances of 
 CLIDriver.  When running concurrent queries they sometimes hit the following 
 error:
 {noformat}
 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO 
 org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: 
 hdfs://ICRHHW21NODE1:8020/tmp/hive-qamercury/hive_2014-05-30_10-24-57_346_890014621821056491-2/-mr-10002/6169987c-3263-4737-b5cb-38daab882afb/map.xml
 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO 
 org.apache.hadoop.mapreduce.JobSubmitter: Cleaning up the staging area 
 /tmp/hadoop-yarn/staging/qamercury/.staging/job_1401360353644_0078
 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :ERROR 
 org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 
 'java.lang.NullPointerException(null)'
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:271)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
 at 
 org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
 at 
 org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
 at 
 org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
 at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
 at 
 org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
 at 
 org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
 at 
 org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1089)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
 at 
 com.informatica.platform.dtm.executor.hive.impl.AbstractHiveDriverBaseImpl.run(AbstractHiveDriverBaseImpl.java:86)
 at 
 com.informatica.platform.dtm.executor.hive.MHiveDriver.executeQuery(MHiveDriver.java:126)
 at 
 com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeQuery(HiveTaskHandlerImpl.java:358)
 at 
 com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeScript(HiveTaskHandlerImpl.java:247)
 at 
 com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeMainScript(HiveTaskHandlerImpl.java:194)
 at 
 

[jira] [Updated] (HIVE-7209) allow metastore authorization api calls to be restricted to certain invokers

2014-06-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7209:


Labels: TODOC14  (was: )

 allow metastore authorization api calls to be restricted to certain invokers
 

 Key: HIVE-7209
 URL: https://issues.apache.org/jira/browse/HIVE-7209
 Project: Hive
  Issue Type: Bug
  Components: Authentication, Metastore
Reporter: Thejas M Nair
Assignee: Thejas M Nair
  Labels: TODOC14
 Attachments: HIVE-7209.1.patch, HIVE-7209.2.patch, HIVE-7209.3.patch


 Any user who has direct access to metastore can make metastore api calls that 
 modify the authorization policy. 
 The users who can make direct metastore api calls in a secure cluster 
 configuration are usually the 'cluster insiders' such as Pig and MR users, 
 who are not (securely) covered by the metastore based authorization policy. 
 But it makes sense to disallow access from such users as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7209) allow metastore authorization api calls to be restricted to certain invokers

2014-06-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7209:


Release Note: 
With this change hive.security.metastore.authorization.manager configuration 
parameter allows you to specify more than one authorization manager class 
(comma separated).

This patch introduces a new authorization manager for use under this 
configuration - 
org.apache.hadoop.hive.ql.security.authorization.MetaStoreAuthzAPIAuthorizerEmbedOnly.
 It will disallow any of the authorization api calls to be invoked in a remote 
metastore.
HiveServer2 can be configured to use embedded metastore, and that will allow it 
to invoke metastore authorization api. Hive cli and any other remote metastore 
users would be denied authorization when they try to make authorization api 
calls. This allows restricting the authorization api use to privileged 
HiveServer2 process.



 allow metastore authorization api calls to be restricted to certain invokers
 

 Key: HIVE-7209
 URL: https://issues.apache.org/jira/browse/HIVE-7209
 Project: Hive
  Issue Type: Bug
  Components: Authentication, Metastore
Reporter: Thejas M Nair
Assignee: Thejas M Nair
  Labels: TODOC14
 Attachments: HIVE-7209.1.patch, HIVE-7209.2.patch, HIVE-7209.3.patch


 Any user who has direct access to metastore can make metastore api calls that 
 modify the authorization policy. 
 The users who can make direct metastore api calls in a secure cluster 
 configuration are usually the 'cluster insiders' such as Pig and MR users, 
 who are not (securely) covered by the metastore based authorization policy. 
 But it makes sense to disallow access from such users as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7209) allow metastore authorization api calls to be restricted to certain invokers

2014-06-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7209:


Attachment: HIVE-7209.4.patch

HIVE-7209.4.patch - also updating hive-default.xml.template to mention that 
more than one metastore authorization manager classes can be specified under  
hive.security.metastore.authorization.manager .


 allow metastore authorization api calls to be restricted to certain invokers
 

 Key: HIVE-7209
 URL: https://issues.apache.org/jira/browse/HIVE-7209
 Project: Hive
  Issue Type: Bug
  Components: Authentication, Metastore
Reporter: Thejas M Nair
Assignee: Thejas M Nair
  Labels: TODOC14
 Attachments: HIVE-7209.1.patch, HIVE-7209.2.patch, HIVE-7209.3.patch, 
 HIVE-7209.4.patch


 Any user who has direct access to metastore can make metastore api calls that 
 modify the authorization policy. 
 The users who can make direct metastore api calls in a secure cluster 
 configuration are usually the 'cluster insiders' such as Pig and MR users, 
 who are not (securely) covered by the metastore based authorization policy. 
 But it makes sense to disallow access from such users as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Documentation Policy

2014-06-13 Thread kulkarni.swar...@gmail.com
+1 on deleting the TODOC tag as I think it's assumed by default that once
an enhancement is done, it will be doc'ed. We may consider adding an
additional docdone tag but I think we can instead just wait for a +1 from
the contributor that the documentation is satisfactory (and assume a
implicit +1 for no reply) before deleting the TODOC tag.


On Fri, Jun 13, 2014 at 1:32 PM, Szehon Ho sze...@cloudera.com wrote:

 Yea, I'd imagine the TODOC tag pollutes the query of TODOC's and confuses
 the state of a JIRA, so its probably best to remove it.

 The idea of docdone is to query what docs got produced and needs review?
  It might be nice to have a tag for that, to easily signal to contributor
 or interested parties to take a look.

 On a side note, I already find very helpful your JIRA comments with links
 to doc-wikis, both to inform the contributor and just as reference for
 others.  Thanks again for the great work.


 On Fri, Jun 13, 2014 at 1:33 AM, Lefty Leverenz leftylever...@gmail.com
 wrote:

  One more question:  what should we do after the documentation is done
 for a
  JIRA ticket?
 
  (a) Just remove the TODOC## label.
  (b) Replace TODOC## with docdone (no caps, no version number).
  (c) Add a docdone label but keep TODOC##.
  (d) Something else.
 
 
  -- Lefty
 
 
  On Thu, Jun 12, 2014 at 12:54 PM, Brock Noland br...@cloudera.com
 wrote:
 
   Thank you guys! This is great work.
  
  
   On Wed, Jun 11, 2014 at 6:20 PM, kulkarni.swar...@gmail.com 
   kulkarni.swar...@gmail.com wrote:
  
Going through the issues, I think overall Lefty did an awesome job
   catching
and documenting most of them in time. Following are some of the 0.13
  and
0.14 ones which I found which either do not have documentation or
 have
outdated one and probably need one to be consumeable. Contributors,
  feel
free to remove the label if you disagree.
   
*TODOC13:*
   
   
  
 
 https://issues.apache.org/jira/browse/HIVE-6827?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC13%20AND%20status%20in%20(Resolved%2C%20Closed)
   
*TODOC14:*
   
   
  
 
 https://issues.apache.org/jira/browse/HIVE-6999?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC14%20AND%20status%20in%20(Resolved%2C%20Closed)
   
I'll continue digging through the queue going backwards to 0.12 and
  0.11
and see if I find similar stuff there as well.
   
   
   
On Wed, Jun 11, 2014 at 10:36 AM, kulkarni.swar...@gmail.com 
kulkarni.swar...@gmail.com wrote:
   
  Feel free to label such jiras with this keyword and ask the
contributors
 for more information if you need any.

 Cool. I'll start chugging through the queue today adding labels as
  apt.


 On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair 
 the...@hortonworks.com
  
 wrote:

  Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13?
 Sounds good to me.

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
   entity
 to
 which it is addressed and may contain information that is
   confidential,
 privileged and exempt from disclosure under applicable law. If the
reader
 of this message is not the intended recipient, you are hereby
  notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you
 have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.




 --
 Swarnim

   
   
   
--
Swarnim
   
  
 




-- 
Swarnim


[jira] [Commented] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer

2014-06-13 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031188#comment-14031188
 ] 

Carl Steinbach commented on HIVE-7094:
--

[~davidzchen]: +1. Can you please attach a new version of the patch to trigger 
testing? If everything passes I will commit.

 Separate out static/dynamic partitioning code in FileRecordWriterContainer
 --

 Key: HIVE-7094
 URL: https://issues.apache.org/jira/browse/HIVE-7094
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7094.1.patch


 There are two major places in FileRecordWriterContainer that have the {{if 
 (dynamicPartitioning)}} condition: the constructor and write().
 This is the approach that I am taking:
 # Move the DP and SP code into two subclasses: 
 DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer.
 # Make FileRecordWriterContainer an abstract class that contains the common 
 code for both implementations. For write(), FileRecordWriterContainer will 
 call an abstract method that will provide the local RecordWriter, 
 ObjectInspector, SerDe, and OutputJobInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer

2014-06-13 Thread David Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated HIVE-7094:
-

Attachment: HIVE-7094.3.patch

 Separate out static/dynamic partitioning code in FileRecordWriterContainer
 --

 Key: HIVE-7094
 URL: https://issues.apache.org/jira/browse/HIVE-7094
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7094.1.patch, HIVE-7094.3.patch


 There are two major places in FileRecordWriterContainer that have the {{if 
 (dynamicPartitioning)}} condition: the constructor and write().
 This is the approach that I am taking:
 # Move the DP and SP code into two subclasses: 
 DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer.
 # Make FileRecordWriterContainer an abstract class that contains the common 
 code for both implementations. For write(), FileRecordWriterContainer will 
 call an abstract method that will provide the local RecordWriter, 
 ObjectInspector, SerDe, and OutputJobInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions

2014-06-13 Thread David Chen (JIRA)
David Chen created HIVE-7230:


 Summary: Add Eclipse formatter file for Hive coding conventions
 Key: HIVE-7230
 URL: https://issues.apache.org/jira/browse/HIVE-7230
 Project: Hive
  Issue Type: Improvement
Reporter: David Chen
Assignee: David Chen


Eclipse's formatter is a convenient way to clean up formatting for Java code. 
Currently, there is no Eclipse formatter file checked into Hive's codebase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer

2014-06-13 Thread David Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031197#comment-14031197
 ] 

David Chen commented on HIVE-7094:
--

Thanks, [~cwsteinbach]! I have addressed the remaining formatting issues using 
the Eclipse formatter and uploaded a new patch.

 Separate out static/dynamic partitioning code in FileRecordWriterContainer
 --

 Key: HIVE-7094
 URL: https://issues.apache.org/jira/browse/HIVE-7094
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7094.1.patch, HIVE-7094.3.patch


 There are two major places in FileRecordWriterContainer that have the {{if 
 (dynamicPartitioning)}} condition: the constructor and write().
 This is the approach that I am taking:
 # Move the DP and SP code into two subclasses: 
 DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer.
 # Make FileRecordWriterContainer an abstract class that contains the common 
 code for both implementations. For write(), FileRecordWriterContainer will 
 call an abstract method that will provide the local RecordWriter, 
 ObjectInspector, SerDe, and OutputJobInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 22590: HIVE-7230: Add Eclipse formatter file.

2014-06-13 Thread David Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22590/
---

Review request for hive.


Bugs: HIVE-7230
https://issues.apache.org/jira/browse/HIVE-7230


Repository: hive-git


Description
---

HIVE-7230: Add Eclipse formatter file.


Diffs
-

  eclipse-styles.xml PRE-CREATION 

Diff: https://reviews.apache.org/r/22590/diff/


Testing
---

Manual


Thanks,

David Chen



[jira] [Updated] (HIVE-7210) NPE with No plan file found when running Driver instances on multiple threads

2014-06-13 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7210:
-

Status: Patch Available  (was: Open)

 NPE with No plan file found when running Driver instances on multiple 
 threads
 ---

 Key: HIVE-7210
 URL: https://issues.apache.org/jira/browse/HIVE-7210
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Gunther Hagleitner
 Attachments: HIVE-7210.1.patch


 Informatica has a multithreaded application running multiple instances of 
 CLIDriver.  When running concurrent queries they sometimes hit the following 
 error:
 {noformat}
 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO 
 org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: 
 hdfs://ICRHHW21NODE1:8020/tmp/hive-qamercury/hive_2014-05-30_10-24-57_346_890014621821056491-2/-mr-10002/6169987c-3263-4737-b5cb-38daab882afb/map.xml
 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO 
 org.apache.hadoop.mapreduce.JobSubmitter: Cleaning up the staging area 
 /tmp/hadoop-yarn/staging/qamercury/.staging/job_1401360353644_0078
 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :ERROR 
 org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 
 'java.lang.NullPointerException(null)'
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:271)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
 at 
 org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
 at 
 org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
 at 
 org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
 at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
 at 
 org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
 at 
 org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
 at 
 org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1089)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
 at 
 com.informatica.platform.dtm.executor.hive.impl.AbstractHiveDriverBaseImpl.run(AbstractHiveDriverBaseImpl.java:86)
 at 
 com.informatica.platform.dtm.executor.hive.MHiveDriver.executeQuery(MHiveDriver.java:126)
 at 
 com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeQuery(HiveTaskHandlerImpl.java:358)
 at 
 com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeScript(HiveTaskHandlerImpl.java:247)
 at 
 com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeMainScript(HiveTaskHandlerImpl.java:194)
 at 
 com.informatica.platform.ldtm.executor.common.workflow.taskhandler.impl.BaseTaskHandlerImpl.run(BaseTaskHandlerImpl.java:126)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 

[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions

2014-06-13 Thread David Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031205#comment-14031205
 ] 

David Chen commented on HIVE-7230:
--

I took the Hadoop Eclipse formatter file 
(https://github.com/cloudera/blog-eclipse) and adapted it for Hive's coding 
style, namely changing the line lengths from 80 to 100 characters.

RB: https://reviews.apache.org/r/22590/

 Add Eclipse formatter file for Hive coding conventions
 --

 Key: HIVE-7230
 URL: https://issues.apache.org/jira/browse/HIVE-7230
 Project: Hive
  Issue Type: Improvement
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7230.1.patch


 Eclipse's formatter is a convenient way to clean up formatting for Java code. 
 Currently, there is no Eclipse formatter file checked into Hive's codebase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions

2014-06-13 Thread David Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated HIVE-7230:
-

Attachment: HIVE-7230.1.patch

 Add Eclipse formatter file for Hive coding conventions
 --

 Key: HIVE-7230
 URL: https://issues.apache.org/jira/browse/HIVE-7230
 Project: Hive
  Issue Type: Improvement
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7230.1.patch


 Eclipse's formatter is a convenient way to clean up formatting for Java code. 
 Currently, there is no Eclipse formatter file checked into Hive's codebase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions

2014-06-13 Thread David Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated HIVE-7230:
-

Status: Patch Available  (was: Open)

 Add Eclipse formatter file for Hive coding conventions
 --

 Key: HIVE-7230
 URL: https://issues.apache.org/jira/browse/HIVE-7230
 Project: Hive
  Issue Type: Improvement
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7230.1.patch


 Eclipse's formatter is a convenient way to clean up formatting for Java code. 
 Currently, there is no Eclipse formatter file checked into Hive's codebase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Jenkins permissions, and auto-trigger help

2014-06-13 Thread Brock Noland
+ dev

Good call, yep that will need to be configured.

Brock

On Fri, Jun 13, 2014 at 10:29 AM, Szehon Ho sze...@cloudera.com wrote:

 I was studying this a bit more, I believe the MiniTezCliDriver tests are
 hitting timeout after 2 hours as error code is 124.  The framework is
 running all of them in one call, I'll try to chunk the tests into batches
 like the other q-tests.

 I'll try to take a look next week at this.

 Thanks
 Szehon


 On Mon, Jun 9, 2014 at 1:13 PM, Szehon Ho sze...@cloudera.com wrote:

 It looks like JVM OOM crash during MiniTezCliDriver tests, or its
 otherwise crashing.  The 407 log has failures, but the 408 log is cut off.


 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-407/failed/TestMiniTezCliDriver/maven-test.txt

 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/maven-test.txt

 The MAVEN_OPTS is already set to -XmX2g -XX:MaxPermSize=256M.  Do you
 guys know of any such issues?

 Thanks,
 Szehon



 On Sun, Jun 8, 2014 at 12:05 PM, Brock Noland br...@cloudera.com wrote:

 Looks like it's failing to generate a to generate a test output:


 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/


 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/TestMiniTezCliDriver.txt

 exiting with 124 here:

 + wait 21961
 + timeout 2h mvn -B -o test 
 -Dmaven.repo.local=/home/hiveptest//ip-10-31-188-232-hiveptest-2/maven 
 -Phadoop-2 -Phadoop-2 -Dtest=TestMiniTezCliDriver
 + ret=124





 On Sun, Jun 8, 2014 at 11:25 AM, Ashutosh Chauhan hashut...@apache.org
 wrote:

 Build #407 ran MiniTezCliDriver
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/407/testReport/org.apache.hadoop.hive.cli/

 but Build #408 didn't
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/408/testReport/org.apache.hadoop.hive.cli/


 On Sat, Jun 7, 2014 at 12:25 PM, Szehon Ho sze...@cloudera.com wrote:

 Sounds like there's randomness, either in PTest test-parser or in the
 maven test itself.  In the history now, its running between 5633-5707,
 which is similar to your range.


 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/394/testReport/history/

 I didnt see any in history without MiniTezCLIDriver, can you point me
 to a build no. if you see one?  If nobody else knows immediately, I can 
 dig
 deeper at it next week to try to find out.


 On Sat, Jun 7, 2014 at 9:00 AM, Ashutosh Chauhan hashut...@apache.org
  wrote:

 I noticed that PTest2 framework runs different number of tests on
 various runs. e.g., on yesterday's runs I saw it ran 5585  5510 tests on
 subsequent runs. In particular, it seems its running MiniTezCliDriver 
 tests
 in only half the runs. Anyone observed this?




  1   2   >