[jira] [Updated] (HIVE-6960) Set Hive pom to use Hadoop-2.4

2014-04-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6960:
---

Status: Open  (was: Patch Available)

TestJdbcwithKDC looks genuine failure. [~jdere] can you take a look?

 Set Hive pom to use Hadoop-2.4
 --

 Key: HIVE-6960
 URL: https://issues.apache.org/jira/browse/HIVE-6960
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-6960.1.patch


 A number of the hadoop-2 unit test failures are due to HADOOP-10425, fixed in 
 Hadoop 2.4.  Perhaps we should move onto that version.
 - org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullgroup3
 - org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join4
 - org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_dummy_source
 - 
 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_symlink_text_input_format
 - org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_current_database
 - 
 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead

2014-04-27 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982207#comment-13982207
 ] 

Lefty Leverenz commented on HIVE-6430:
--

This adds *hive.mapjoin.optimized.hashtable* and 
*hive.mapjoin.optimized.hashtable.wbsize* to HiveConf.java.  They both need 
descriptions -- I assume wb means write buffer.

The descriptions can go in HiveConf comments or a release note for now, or you 
can patch hive-default.xml.template and I'll add a comment on HIVE-6586 (for 
HIVE-6037, Synchronize HiveConf with hive-default.xml.template and support show 
conf).

 MapJoin hash table has large memory overhead
 

 Key: HIVE-6430
 URL: https://issues.apache.org/jira/browse/HIVE-6430
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, 
 HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, 
 HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, 
 HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.patch


 Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 
 for row) can take several hundred bytes, which is ridiculous. I am reducing 
 the size of MJKey and MJRowContainer in other jiras, but in general we don't 
 need to have java hash table there.  We can either use primitive-friendly 
 hashtable like the one from HPPC (Apache-licenced), or some variation, to map 
 primitive keys to single row storage structure without an object per row 
 (similar to vectorization).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: What is the minimal required version of Hadoop for Hive 0.13.0?

2014-04-27 Thread Lefty Leverenz
This needs to be documented somewhere.

-- Lefty


On Wed, Apr 23, 2014 at 11:45 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 LOL. I thought I was the last 0.20.2 hold out.


 On Wed, Apr 23, 2014 at 4:01 PM, Thejas Nair the...@hortonworks.com
 wrote:

  There is a jira for the hadoop 1.0.x compatibility issue.
  https://issues.apache.org/jira/browse/HIVE-6962
  I have suggested a possible workaround there. There is no patch for it
 yet.
 
  I am planning to propose a 0.13.1 release, primarily for the issues
  around use of Oracle as metastore database, and one in SQL standard
  authorization (HIVE-6945, HIVE-6919).
  We can also include patch to get hive working with older versions of
  hadoop, specially 1.x versions.
 
  Hive build and tests are currently run against hadoop 1.2.1 and 2.3.0
  versions (as you can see in pom.xml). But I don't believe there was a
  conscious decision to have 1.2.1 as the *minimum* required version.
 
 
  On Wed, Apr 23, 2014 at 7:20 AM, David Gayou david.ga...@kxen.com
 wrote:
   I actually have pretty the same issue with Hadoop 1.1.2
  
   There is a jira issue opened here :
   https://issues.apache.org/jira/browse/HIVE-6962
   with a link to the issue that created our problem.
  
   A quick search in release notes seem's to indicate that the unset
 method
   appeared in the Haddop 1.2.1
  
   Is it now the minimal required version ?
   If not, will there be a Hive 0.13.1 for older hadoop?
  
   Regards,
  
   David
  
  
   On Wed, Apr 23, 2014 at 4:00 PM, Dmitry Vasilenko dvasi...@gmail.com
   wrote:
  
  
   Hive 0.12.0 (and previous versions) worked with Hadoop 0.20.x,
 0.23.x.y,
   1.x.y, 2.x.y.
  
   Hive 0.13.0 did not work with Hadoop 0.20.x out of the box and to make
  it
   work I had to patch Hadoop installation and add Path(URL) constructor
  and
   Configuration.unset() method.
  
   After that the basic functionality seems to be working.
  
   Both issues originate from the
 org.apache.hadoop.hive.ql.exec.Utilities
  
   I know that Hadoop 0.20.x is old but some of us still have to work
 with
   that version. So does anyone know what is the minimal required version
  of
   Hadoop for Hive 0.13.0?
  
   Thanks
   Dmitry Vasilenko
  
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 



[jira] [Updated] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly

2014-04-27 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6979:
-

Attachment: HIVE-6979.2.patch

Addressed [~ashutoshc]'s review comments. [~ashutoshc] I fixed the recent test 
failures. Can you please take a look at the changes in RB?

 Hadoop-2 test failures related to quick stats not being populated correctly
 ---

 Key: HIVE-6979
 URL: https://issues.apache.org/jira/browse/HIVE-6979
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch


 The test failures that are currently reported by Hive QA running on hadoop-2 
 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570)
  are related to difference in the way hadoop FileSystem.globStatus() api 
 behaves. For a directory structure like below
 {code}
 dir1/file1
 dir1/file2
 {code}
 Two level of path pattern like dir1/*/* will return both files in hadoop 1.x 
 but will return empty result in hadoop 2.x (in fact it will say no such file 
 or directory and return empty file status array). Hadoop 2.x seems to be 
 compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not.
 As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are 
 populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly

2014-04-27 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6979:
-

Status: Patch Available  (was: Open)

 Hadoop-2 test failures related to quick stats not being populated correctly
 ---

 Key: HIVE-6979
 URL: https://issues.apache.org/jira/browse/HIVE-6979
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch


 The test failures that are currently reported by Hive QA running on hadoop-2 
 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570)
  are related to difference in the way hadoop FileSystem.globStatus() api 
 behaves. For a directory structure like below
 {code}
 dir1/file1
 dir1/file2
 {code}
 Two level of path pattern like dir1/*/* will return both files in hadoop 1.x 
 but will return empty result in hadoop 2.x (in fact it will say no such file 
 or directory and return empty file status array). Hadoop 2.x seems to be 
 compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not.
 As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are 
 populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly

2014-04-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982305#comment-13982305
 ] 

Hive QA commented on HIVE-6979:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12642117/HIVE-6979.2.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5419 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_25
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/59/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/59/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12642117

 Hadoop-2 test failures related to quick stats not being populated correctly
 ---

 Key: HIVE-6979
 URL: https://issues.apache.org/jira/browse/HIVE-6979
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch


 The test failures that are currently reported by Hive QA running on hadoop-2 
 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570)
  are related to difference in the way hadoop FileSystem.globStatus() api 
 behaves. For a directory structure like below
 {code}
 dir1/file1
 dir1/file2
 {code}
 Two level of path pattern like dir1/*/* will return both files in hadoop 1.x 
 but will return empty result in hadoop 2.x (in fact it will say no such file 
 or directory and return empty file status array). Hadoop 2.x seems to be 
 compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not.
 As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are 
 populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6901) Explain plan doesn't show operator tree for the fetch operator

2014-04-27 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6901:
--

Attachment: HIVE-6901.5.patch

 Explain plan doesn't show operator tree for the fetch operator
 --

 Key: HIVE-6901
 URL: https://issues.apache.org/jira/browse/HIVE-6901
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
Priority: Minor
 Attachments: HIVE-6901.1.patch, HIVE-6901.2.patch, HIVE-6901.3.patch, 
 HIVE-6901.4.patch, HIVE-6901.5.patch, HIVE-6901.patch


 Explaining a simple select query that involves a MR phase doesn't show 
 processor tree for the fetch operator.
 {code}
 hive explain select d from test;
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
 ...
   Stage: Stage-0
 Fetch Operator
   limit: -1
 {code}
 It would be nice if the operator tree is shown even if there is only one node.
 Please note that in local execution, the operator tree is complete:
 {code}
 hive explain select * from test;
 OK
 STAGE DEPENDENCIES:
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-0
 Fetch Operator
   limit: -1
   Processor Tree:
 TableScan
   alias: test
   Statistics: Num rows: 8 Data size: 34 Basic stats: COMPLETE Column 
 stats: NONE
   Select Operator
 expressions: d (type: int)
 outputColumnNames: _col0
 Statistics: Num rows: 8 Data size: 34 Basic stats: COMPLETE 
 Column stats: NONE
 ListSink
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6901) Explain plan doesn't show operator tree for the fetch operator

2014-04-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982400#comment-13982400
 ] 

Hive QA commented on HIVE-6901:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12642131/HIVE-6901.5.patch

{color:red}ERROR:{color} -1 due to 55 failed/errored test(s), 5484 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_numeric
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullformatCTAS
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullgroup3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_dummy_source
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_alter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_tblproperties
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_symlink_text_input_format
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_column_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_current_database
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_21
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unset_table_view_property
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_count
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union7
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_unset_table_property
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/60/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/60/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing 

[jira] [Updated] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-27 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-6785:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk! Thank you for the contribution!

Thank you Szehon for the review!

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
Assignee: Tongjie Chen
 Fix For: 0.14.0

 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, 
 HIVE-6785.3.patch


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-27 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-6785:
---

Assignee: Tongjie Chen

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
Assignee: Tongjie Chen
 Fix For: 0.14.0

 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, 
 HIVE-6785.3.patch


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly

2014-04-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982429#comment-13982429
 ] 

Ashutosh Chauhan commented on HIVE-6979:


+1

 Hadoop-2 test failures related to quick stats not being populated correctly
 ---

 Key: HIVE-6979
 URL: https://issues.apache.org/jira/browse/HIVE-6979
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch


 The test failures that are currently reported by Hive QA running on hadoop-2 
 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570)
  are related to difference in the way hadoop FileSystem.globStatus() api 
 behaves. For a directory structure like below
 {code}
 dir1/file1
 dir1/file2
 {code}
 Two level of path pattern like dir1/*/* will return both files in hadoop 1.x 
 but will return empty result in hadoop 2.x (in fact it will say no such file 
 or directory and return empty file status array). Hadoop 2.x seems to be 
 compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not.
 As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are 
 populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly

2014-04-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982429#comment-13982429
 ] 

Ashutosh Chauhan edited comment on HIVE-6979 at 4/27/14 7:10 PM:
-

union_remove_25 wasn't expected to fail. Prashant, can you take a look?


was (Author: ashutoshc):
+1

 Hadoop-2 test failures related to quick stats not being populated correctly
 ---

 Key: HIVE-6979
 URL: https://issues.apache.org/jira/browse/HIVE-6979
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch


 The test failures that are currently reported by Hive QA running on hadoop-2 
 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570)
  are related to difference in the way hadoop FileSystem.globStatus() api 
 behaves. For a directory structure like below
 {code}
 dir1/file1
 dir1/file2
 {code}
 Two level of path pattern like dir1/*/* will return both files in hadoop 1.x 
 but will return empty result in hadoop 2.x (in fact it will say no such file 
 or directory and return empty file status array). Hadoop 2.x seems to be 
 compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not.
 As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are 
 populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Delete old website

2014-04-27 Thread Brock Noland
Hi,

The old website content is located here:
https://svn.apache.org/repos/asf/hive/site/ while the new CMS based website
is here https://svn.apache.org/repos/asf/hive/cms/. Instructions on how to
edit the website are here:
https://cwiki.apache.org/confluence/display/Hive/How+to+edit+the+website

I think we should delete the old website content since it has confused
folks.

Brock


Re: Delete old website

2014-04-27 Thread Ashutosh Chauhan
+1


On Sun, Apr 27, 2014 at 12:33 PM, Brock Noland br...@cloudera.com wrote:

 Hi,

 The old website content is located here:
 https://svn.apache.org/repos/asf/hive/site/ while the new CMS based
 website
 is here https://svn.apache.org/repos/asf/hive/cms/. Instructions on how to
 edit the website are here:
 https://cwiki.apache.org/confluence/display/Hive/How+to+edit+the+website

 I think we should delete the old website content since it has confused
 folks.

 Brock



Re: Delete old website

2014-04-27 Thread Lefty Leverenz
+1

I was about to ask where to find the new menu, then found it under
templates here:
https://svn.apache.org/repos/asf/hive/cms/trunk/templates/sidenav.mdtext.


-- Lefty


On Sun, Apr 27, 2014 at 3:41 PM, Ashutosh Chauhan hashut...@apache.orgwrote:

 +1


 On Sun, Apr 27, 2014 at 12:33 PM, Brock Noland br...@cloudera.com wrote:

  Hi,
 
  The old website content is located here:
  https://svn.apache.org/repos/asf/hive/site/ while the new CMS based
  website
  is here https://svn.apache.org/repos/asf/hive/cms/. Instructions on how
 to
  edit the website are here:
  https://cwiki.apache.org/confluence/display/Hive/How+to+edit+the+website
 
  I think we should delete the old website content since it has confused
  folks.
 
  Brock
 



[jira] [Commented] (HIVE-2853) Add pre event listeners to metastore

2014-04-27 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982480#comment-13982480
 ] 

Lefty Leverenz commented on HIVE-2853:
--

For the record:  This added *hive.metastore.pre.event.listeners* to 
HiveConf.java.


 Add pre event listeners to metastore
 

 Key: HIVE-2853
 URL: https://issues.apache.org/jira/browse/HIVE-2853
 Project: Hive
  Issue Type: Improvement
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.9.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2853.D2175.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2853.D2175.2.patch


 Currently there are event listeners in the metastore which run after the 
 completion of a method.  It would be useful to have similar hooks which run 
 before the metastore method is executed.  These can be used to make 
 validating names, locations, etc. customizable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6901) Explain plan doesn't show operator tree for the fetch operator

2014-04-27 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6901:
--

Attachment: HIVE-6901.6.patch

 Explain plan doesn't show operator tree for the fetch operator
 --

 Key: HIVE-6901
 URL: https://issues.apache.org/jira/browse/HIVE-6901
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
Priority: Minor
 Attachments: HIVE-6901.1.patch, HIVE-6901.2.patch, HIVE-6901.3.patch, 
 HIVE-6901.4.patch, HIVE-6901.5.patch, HIVE-6901.6.patch, HIVE-6901.patch


 Explaining a simple select query that involves a MR phase doesn't show 
 processor tree for the fetch operator.
 {code}
 hive explain select d from test;
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
 ...
   Stage: Stage-0
 Fetch Operator
   limit: -1
 {code}
 It would be nice if the operator tree is shown even if there is only one node.
 Please note that in local execution, the operator tree is complete:
 {code}
 hive explain select * from test;
 OK
 STAGE DEPENDENCIES:
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-0
 Fetch Operator
   limit: -1
   Processor Tree:
 TableScan
   alias: test
   Statistics: Num rows: 8 Data size: 34 Basic stats: COMPLETE Column 
 stats: NONE
   Select Operator
 expressions: d (type: int)
 outputColumnNames: _col0
 Statistics: Num rows: 8 Data size: 34 Basic stats: COMPLETE 
 Column stats: NONE
 ListSink
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6980) Drop table by using direct sql

2014-04-27 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-6980:
--

 Summary: Drop table by using direct sql
 Key: HIVE-6980
 URL: https://issues.apache.org/jira/browse/HIVE-6980
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Selina Zhang


Dropping table which has lots of partitions is slow. Even after applying the 
patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 

The fixes come with two parts:
1. use directSQL to query the partitions protect mode;
the current implementation needs to transfer the Partition object to client and 
check the protect mode for each partition. I'd like to move this part of logic 
to metastore. The check will be done by direct sql (if direct sql is disabled, 
execute the same logic in the ObjectStore);

2. use directSQL to drop partitions for table;
there maybe two solutions here:
1. add DELETE CASCADE in the schema. In this way we only need to delete 
entries from partitions table use direct sql. May need to change 
datanucleus.deletionPolicy = DataNucleus. 
2. clean up the dependent tables by issue DELETE statement. This also needs to 
turn on datanucleus.query.sql.allowAll

Both of above solutions should be able to fix the problem. The DELETE CASCADE 
has to change schemas and prepare upgrade scripts. The second solutions added 
maintenance cost if new tables added in the future releases.

Please advice. 





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6901) Explain plan doesn't show operator tree for the fetch operator

2014-04-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982734#comment-13982734
 ] 

Hive QA commented on HIVE-6901:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12642168/HIVE-6901.6.patch

{color:red}ERROR:{color} -1 due to 46 failed/errored test(s), 5419 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_numeric
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullformatCTAS
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullgroup3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_dummy_source
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_alter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_tblproperties
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_symlink_text_input_format
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_column_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_current_database
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_21
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unset_table_view_property
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_unset_table_property
org.apache.hadoop.hive.metastore.TestMetaStoreAuthorization.testMetaStoreAuthorization
org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler
org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/61/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/61/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 46 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12642168

 Explain plan doesn't show operator tree for the fetch operator
 --

 Key: HIVE-6901
 URL: https://issues.apache.org/jira/browse/HIVE-6901
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Xuefu Zhang

[jira] [Updated] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly

2014-04-27 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6979:
-

Attachment: HIVE-6979.3.patch

union_remove_25 was failing as it wasn't able to find partition. So increased 
the limit count so that the specific partition value is always present. Updated 
stats_partscan_1_23 test as well. Other tests seems to pass in my local setup 
(Mac OS X).

 Hadoop-2 test failures related to quick stats not being populated correctly
 ---

 Key: HIVE-6979
 URL: https://issues.apache.org/jira/browse/HIVE-6979
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch, HIVE-6979.3.patch


 The test failures that are currently reported by Hive QA running on hadoop-2 
 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570)
  are related to difference in the way hadoop FileSystem.globStatus() api 
 behaves. For a directory structure like below
 {code}
 dir1/file1
 dir1/file2
 {code}
 Two level of path pattern like dir1/*/* will return both files in hadoop 1.x 
 but will return empty result in hadoop 2.x (in fact it will say no such file 
 or directory and return empty file status array). Hadoop 2.x seems to be 
 compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not.
 As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are 
 populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6469) skipTrash option in hive command line

2014-04-27 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6469:
--

Fix Version/s: (was: 0.12.1)

 skipTrash option in hive command line
 -

 Key: HIVE-6469
 URL: https://issues.apache.org/jira/browse/HIVE-6469
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Affects Versions: 0.12.0
Reporter: Jayesh
Assignee: Jayesh
 Fix For: 0.14.0

 Attachments: HIVE-6469.1.patch, HIVE-6469.2.patch, HIVE-6469.3.patch, 
 HIVE-6469.patch


 hive drop table command deletes the data from HDFS warehouse and puts it into 
 Trash.
 Currently there is no way to provide flag to tell warehouse to skip trash 
 while deleting table data.
 This ticket is to add skipTrash feature in hive command-line, that looks as 
 following. 
 hive -e drop table skipTrash testTable
 This would be good feature to add, so that user can specify when not to put 
 data into trash directory and thus not to fill hdfs space instead of relying 
 on trash interval and policy configuration to take care of disk filling issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6469) skipTrash option in hive command line

2014-04-27 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6469:
--

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks Jayesh for the contribution.

 skipTrash option in hive command line
 -

 Key: HIVE-6469
 URL: https://issues.apache.org/jira/browse/HIVE-6469
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Affects Versions: 0.12.0
Reporter: Jayesh
Assignee: Jayesh
 Fix For: 0.12.1, 0.14.0

 Attachments: HIVE-6469.1.patch, HIVE-6469.2.patch, HIVE-6469.3.patch, 
 HIVE-6469.patch


 hive drop table command deletes the data from HDFS warehouse and puts it into 
 Trash.
 Currently there is no way to provide flag to tell warehouse to skip trash 
 while deleting table data.
 This ticket is to add skipTrash feature in hive command-line, that looks as 
 following. 
 hive -e drop table skipTrash testTable
 This would be good feature to add, so that user can specify when not to put 
 data into trash directory and thus not to fill hdfs space instead of relying 
 on trash interval and policy configuration to take care of disk filling issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Apache Hive 0.13.1

2014-04-27 Thread amareshwarisr .
I would like to have https://issues.apache.org/jira/browse/HIVE-6953 also
as part 0.13.1. I still could not figure out the reason for test failures,
when all tests are running, the CompactorTests fail.


Thanks
Amareshwari


On Sun, Apr 27, 2014 at 2:49 AM, Sushanth Sowmyan khorg...@gmail.comwrote:

 Added.

 If others have difficulty editing( I can't figure out how to change editing
 privileges, but it seems to indicate that others can edit) the page, I'll
 accept replies to this thread as well and can add it in.
  On Apr 25, 2014 6:25 PM, Sergey Shelukhin ser...@hortonworks.com
 wrote:

  I don't have access to edit this page (or cannot figure out the UI).
  Username sershe.
  Can you add
  HIVE-6961 : Drop partitions treats partition columns as strings (area -
  metastore)
 
 
  On Fri, Apr 25, 2014 at 4:20 PM, Sushanth Sowmyan khorg...@gmail.com
  wrote:
 
   I've created the following wiki link :
  
  
 
 https://cwiki.apache.org/confluence/display/Hive/Hive+0.13.1+Release+tracking
  
   People should be able to request additional jiras by adding it to the
   list. I think it might make sense to halt addition of requests to the
   list 3 days before the RC is cut, so as to prevent an endless-tail
   scenario, unless the bug in question is a breaking severe issue,
   where, yes, after discussion, we can vote to add it to the list. That
   also gives us time to run a full suite of tests on a stable build
   before we cut the RC.
  
   I propose that the first RC (RC0) be built on Monday May 5th at 6pm
   PDT, and the jira list on the wiki be closed to open/easy additions at
   6pm PDT on Friday May 2nd.
  
  
   On Fri, Apr 25, 2014 at 2:40 PM, Gunther Hagleitner
   ghagleit...@hortonworks.com wrote:
Sorry - HIVE-6824 isn't needed. Just the other 3. My bad.
   
Thanks,
Gunther.
   
   
On Fri, Apr 25, 2014 at 2:10 PM, Gunther Hagleitner 
ghagleit...@hortonworks.com wrote:
   
I'd like to request to include these Tez fixes:
   
HIVE-6824, HIVE-6826, HIVE-6828, HIVE-6898
   
Thanks,
Gunther.
   
   
On Fri, Apr 25, 2014 at 11:59 AM, Sushanth Sowmyan 
  khorg...@gmail.com
   wrote:
   
True, I was counting two weeks from today, but 0.13 has already
 been
out for a week. I'm amenable to having an RC1 out on May 5th. If
 any
further issues appear that block, then we can deal with them in an
RC2/etc modification to that.
   
On Fri, Apr 25, 2014 at 11:45 AM, Thejas Nair 
  the...@hortonworks.com
wrote:
 On Fri, Apr 25, 2014 at 11:33 AM, Sushanth Sowmyan 
   khorg...@gmail.com
wrote:


 I think it's important to get a bugfix/stabilization release
 reasonably quickly, but it's also important to give people a
  little
 time to try out 0.13, discover/report bugs and fix them. So I
  think
 about two weeks is a good point? And instead of releasing an RC
  on a
 friday, I'm thinking of pushing it out to Monday - does 12th May
   sound
 good to everyone?


 I think we can aim for an earlier date. Most of these issues seem
  to
   be
 already committed to trunk or have patches available. So the
   remaining
ones
 also might get committed to trunk by early next week. How about
   shooting
 for May 5th (Monday) ?
 By then 0.13 would also have been out for 2 weeks. If we have any
  new
 critical bug reported that needs a fix, we can hold off on the RC
  for
few
 days.
 What do you think ?

 Thanks,
 Thejas

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
entity to
 which it is addressed and may contain information that is
   confidential,
 privileged and exempt from disclosure under applicable law. If
 the
reader
 of this message is not the intended recipient, you are hereby
   notified
that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you
  have
 received this communication in error, please contact the sender
immediately
 and delete it from your system. Thank You.
   
   
   
   
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
  entity
   to
which it is addressed and may contain information that is
 confidential,
privileged and exempt from disclosure under applicable law. If the
  reader
of this message is not the intended recipient, you are hereby
 notified
   that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
   immediately
and delete it from your system. Thank You.
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is 

[jira] [Updated] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly

2014-04-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6979:
---

Status: Patch Available  (was: Open)

 Hadoop-2 test failures related to quick stats not being populated correctly
 ---

 Key: HIVE-6979
 URL: https://issues.apache.org/jira/browse/HIVE-6979
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch, HIVE-6979.3.patch


 The test failures that are currently reported by Hive QA running on hadoop-2 
 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570)
  are related to difference in the way hadoop FileSystem.globStatus() api 
 behaves. For a directory structure like below
 {code}
 dir1/file1
 dir1/file2
 {code}
 Two level of path pattern like dir1/*/* will return both files in hadoop 1.x 
 but will return empty result in hadoop 2.x (in fact it will say no such file 
 or directory and return empty file status array). Hadoop 2.x seems to be 
 compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not.
 As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are 
 populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly

2014-04-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6979:
---

Status: Open  (was: Patch Available)

 Hadoop-2 test failures related to quick stats not being populated correctly
 ---

 Key: HIVE-6979
 URL: https://issues.apache.org/jira/browse/HIVE-6979
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch, HIVE-6979.3.patch


 The test failures that are currently reported by Hive QA running on hadoop-2 
 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570)
  are related to difference in the way hadoop FileSystem.globStatus() api 
 behaves. For a directory structure like below
 {code}
 dir1/file1
 dir1/file2
 {code}
 Two level of path pattern like dir1/*/* will return both files in hadoop 1.x 
 but will return empty result in hadoop 2.x (in fact it will say no such file 
 or directory and return empty file status array). Hadoop 2.x seems to be 
 compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not.
 As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are 
 populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly

2014-04-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982761#comment-13982761
 ] 

Ashutosh Chauhan commented on HIVE-6979:


+1

 Hadoop-2 test failures related to quick stats not being populated correctly
 ---

 Key: HIVE-6979
 URL: https://issues.apache.org/jira/browse/HIVE-6979
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch, HIVE-6979.3.patch


 The test failures that are currently reported by Hive QA running on hadoop-2 
 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570)
  are related to difference in the way hadoop FileSystem.globStatus() api 
 behaves. For a directory structure like below
 {code}
 dir1/file1
 dir1/file2
 {code}
 Two level of path pattern like dir1/*/* will return both files in hadoop 1.x 
 but will return empty result in hadoop 2.x (in fact it will say no such file 
 or directory and return empty file status array). Hadoop 2.x seems to be 
 compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not.
 As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are 
 populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6980) Drop table by using direct sql

2014-04-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982763#comment-13982763
 ] 

Ashutosh Chauhan commented on HIVE-6980:


Can you try with patch of HIVE-6809 ? That should help.

 Drop table by using direct sql
 --

 Key: HIVE-6980
 URL: https://issues.apache.org/jira/browse/HIVE-6980
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Selina Zhang

 Dropping table which has lots of partitions is slow. Even after applying the 
 patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
 The fixes come with two parts:
 1. use directSQL to query the partitions protect mode;
 the current implementation needs to transfer the Partition object to client 
 and check the protect mode for each partition. I'd like to move this part of 
 logic to metastore. The check will be done by direct sql (if direct sql is 
 disabled, execute the same logic in the ObjectStore);
 2. use directSQL to drop partitions for table;
 there maybe two solutions here:
 1. add DELETE CASCADE in the schema. In this way we only need to delete 
 entries from partitions table use direct sql. May need to change 
 datanucleus.deletionPolicy = DataNucleus. 
 2. clean up the dependent tables by issue DELETE statement. This also needs 
 to turn on datanucleus.query.sql.allowAll
 Both of above solutions should be able to fix the problem. The DELETE CASCADE 
 has to change schemas and prepare upgrade scripts. The second solutions added 
 maintenance cost if new tables added in the future releases.
 Please advice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6469) skipTrash option in hive command line

2014-04-27 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982769#comment-13982769
 ] 

Lefty Leverenz commented on HIVE-6469:
--

What user doc does this need, besides adding *hive.warehouse.data.skipTrash* to 
the Configuration Properties wikidoc?  If it's just an administrative config, 
it could be mentioned in the Admin Manual's section on Configuration Variables 
(link below).  But if it's also for users, it should be mentioned in the DDL 
doc's Drop Table section, which includes this sentence and further discussion:  
The data is actually moved to the .Trash/Current directory if Trash is 
configured.

* [Admin Manual -- Configuration Variables 
|https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfigurationVariables]
* [Language Manual -- DDL -- Drop Table 
|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DropTable]

 skipTrash option in hive command line
 -

 Key: HIVE-6469
 URL: https://issues.apache.org/jira/browse/HIVE-6469
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Affects Versions: 0.12.0
Reporter: Jayesh
Assignee: Jayesh
 Fix For: 0.14.0

 Attachments: HIVE-6469.1.patch, HIVE-6469.2.patch, HIVE-6469.3.patch, 
 HIVE-6469.patch


 hive drop table command deletes the data from HDFS warehouse and puts it into 
 Trash.
 Currently there is no way to provide flag to tell warehouse to skip trash 
 while deleting table data.
 This ticket is to add skipTrash feature in hive command-line, that looks as 
 following. 
 hive -e drop table skipTrash testTable
 This would be good feature to add, so that user can specify when not to put 
 data into trash directory and thus not to fill hdfs space instead of relying 
 on trash interval and policy configuration to take care of disk filling issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)