[jira] [Updated] (HIVE-7923) populate stats for test tables

2014-09-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7923:
---
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Pengcheng!

> populate stats for test tables
> --
>
> Key: HIVE-7923
> URL: https://issues.apache.org/jira/browse/HIVE-7923
> Project: Hive
>  Issue Type: Improvement
>Reporter: pengcheng xiong
>Assignee: pengcheng xiong
>Priority: Minor
> Fix For: 0.14.0
>
> Attachments: HIVE-7923.1.patch, HIVE-7923.2.patch, HIVE-7923.3.patch, 
> HIVE-7923.4.patch, HIVE-7923.5.patch, HIVE-7923.6.patch
>
>
> Current q_test only generates tables, e.g., src only but does not create 
> status. All the test cases will fail in CBO because CBO depends on the 
> status. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-07 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125252#comment-14125252
 ] 

Rui Li commented on HIVE-8017:
--

This patch change the RDD key type to HiveKey after map/reduce functions have 
been applied. Original input RDD key type remains BytesWritable.

> Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
> Branch]
> ---
>
> Key: HIVE-8017
> URL: https://issues.apache.org/jira/browse/HIVE-8017
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8017-spark.patch
>
>
> HiveKey should be used as the key type because it holds the hash code for 
> partitioning. While BytesWritable serves partitioning well for simple cases, 
> we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
> bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-07 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8017:
-
Status: Patch Available  (was: Open)

> Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
> Branch]
> ---
>
> Key: HIVE-8017
> URL: https://issues.apache.org/jira/browse/HIVE-8017
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8017-spark.patch
>
>
> HiveKey should be used as the key type because it holds the hash code for 
> partitioning. While BytesWritable serves partitioning well for simple cases, 
> we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
> bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-07 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8017:
-
Attachment: HIVE-8017-spark.patch

> Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
> Branch]
> ---
>
> Key: HIVE-8017
> URL: https://issues.apache.org/jira/browse/HIVE-8017
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-8017-spark.patch
>
>
> HiveKey should be used as the key type because it holds the hash code for 
> partitioning. While BytesWritable serves partitioning well for simple cases, 
> we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
> bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-2390) Expand support for union types

2014-09-07 Thread Suma Shivaprasad (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125239#comment-14125239
 ] 

Suma Shivaprasad commented on HIVE-2390:


[~amareshwari] Yes Test Case failure is unrelated to the patch

> Expand support for union types
> --
>
> Key: HIVE-2390
> URL: https://issues.apache.org/jira/browse/HIVE-2390
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Jakob Homan
>Assignee: Suma Shivaprasad
>  Labels: uniontype
> Fix For: 0.14.0
>
> Attachments: HIVE-2390.1.patch, HIVE-2390.patch
>
>
> When the union type was introduced, full support for it wasn't provided.  For 
> instance, when working with a union that gets passed to LazyBinarySerde: 
> {noformat}Caused by: java.lang.RuntimeException: Unrecognized type: UNION
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:468)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:230)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:184)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: hive unit test report question

2014-09-07 Thread Nick Dimiduk
IMHO, would be better to wire up the integration suite via failsafe plugin
(surefire for IT) and link the modules correctly. This is on (admittedly,
near the bottom of) my todo list. See also HBase poms for an example.

-n

On Saturday, September 6, 2014, wzc  wrote:

> hi all:
>  I would like to create a jenkins job to run both hive ut and integration
> test. Right now it seems that I have to execute mulitple maven goals in
> different poms:
>
> mvn clean install  surefire-report:report -Daggregate=true   -Phadoop-2
> > cd itests
> > mvn clean install  surefire-report:report -Daggregate=true   -Phadoop-2
>
>
> I would like to use one maven jenkins job and right now I cant figure out
> how to configure job propery to execute  maven goals  in different poms
> (maybe I can add post-build step to execute another shell?).  Each hive
> ptest2 job can run all tests and I would like to know the configure it use.
>
> Any help is appreciated.
>
> Thanks.
>
>
>
>
>
>
>
> 2014-01-14 14:05 GMT+08:00 Shanyu Zhao  >:
>
> > Thanks guys for your help!
> >
> > I found Eugene's comments are particularly helpful. With
> > "-Daggregate=true" I now can see an aggregated unit test results.
> >
> > Btw, I didn't mean to run itests, I just want to run all "unit tests". I
> > think in the FAQ they made it clear that itests are disconnected from the
> > top level pom.xml.
> >
> > Shanyu
> >
> > -Original Message-
> > From: Eugene Koifman [mailto:ekoif...@hortonworks.com ]
> > Sent: Monday, January 13, 2014 4:06 PM
> > To: dev@hive.apache.org 
> > Subject: Re: hive unit test report question
> >
> > I think you want to add
> > -Daggregate=true
> > you should then have target/site/surefire-report.html in the module where
> > you ran the command
> >
> >
> >
> > On Mon, Jan 13, 2014 at 2:54 PM, Szehon Ho  > wrote:
> >
> > > Hi Shanyu,
> > >
> > > Are you running in /itests?  The unit tests are in there, and are not
> > > run if you are running from the root.
> > >
> > > Thanks
> > > Szehon
> > >
> > >
> > > On Mon, Jan 13, 2014 at 1:59 PM, Shanyu Zhao  >
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I was trying to build hive trunk, run all unit tests and generate
> > > reports,
> > > > but I'm not sure what's the correct command line. I was using:
> > > > mvn clean install -Phadoop-2 -DskipTests mvn test
> > > > surefire-report:report -Phadoop-2 But the reports in the root folder
> > > > and several other projects (such as
> > > > metastore) are empty with no test results. And I couldn't find a
> > > > summary page for all unit tests.
> > > >
> > > > I was trying to avoid "mvn site" because it seems to take forever to
> > > > finish. Am I using the correct commands? How can I get a report like
> > > > the one in the precommit report:
> > > >
> > > http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/827/testRep
> > > ort/
> > > > ?
> > > >
> > > > I really appreciate your help!
> > > >
> > > > Shanyu
> > > >
> > >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> > to which it is addressed and may contain information that is
> confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>


[jira] [Commented] (HIVE-1363) 'SHOW TABLE EXTENDED LIKE' command does not strip single/double quotes

2014-09-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125229#comment-14125229
 ] 

Lefty Leverenz commented on HIVE-1363:
--

_If_ this needs to be documented in the wiki, here's where it goes (but maybe 
it's just a bug fix that doesn't need user documentation):

* [Language Manual -- DDL -- Show Table/Partition Extended | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowTable/PartitionExtended]

> 'SHOW TABLE EXTENDED LIKE' command does not strip single/double quotes
> --
>
> Key: HIVE-1363
> URL: https://issues.apache.org/jira/browse/HIVE-1363
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Chaoyu Tang
> Fix For: 0.14.0
>
> Attachments: HIVE-1363.1.patch, HIVE-1363.2.patch, HIVE-1363.patch
>
>
> {code}
> hive> SHOW TABLE EXTENDED LIKE pokes;
> OK
> tableName:pokes
> owner:carl
> location:hdfs://localhost/user/hive/warehouse/pokes
> inputformat:org.apache.hadoop.mapred.TextInputFormat
> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> columns:struct columns { i32 num}
> partitioned:false
> partitionColumns:
> totalNumberFiles:0
> totalFileSize:0
> maxFileSize:0
> minFileSize:0
> lastAccessTime:0
> lastUpdateTime:1274517075221
> hive> SHOW TABLE EXTENDED LIKE "p*";
> FAILED: Error in metadata: MetaException(message:Got exception: 
> javax.jdo.JDOUserException ')' expected at character 54 in "database.name == 
> dbName && ( tableName.matches("(?i)"p.*""))")
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> hive> SHOW TABLE EXTENDED LIKE 'p*';
> OK
> hive> SHOW TABLE EXTENDED LIKE `p*`;
> OK
> tableName:pokes
> owner:carl
> location:hdfs://localhost/user/hive/warehouse/pokes
> inputformat:org.apache.hadoop.mapred.TextInputFormat
> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> columns:struct columns { i32 num}
> partitioned:false
> partitionColumns:
> totalNumberFiles:0
> totalFileSize:0
> maxFileSize:0
> minFileSize:0
> lastAccessTime:0
> lastUpdateTime:1274517075221
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125218#comment-14125218
 ] 

Hive QA commented on HIVE-7946:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667115/HIVE-7946.4.patch

{color:red}ERROR:{color} -1 due to 325 failed/errored test(s),  tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver_accumulo_predicate_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_table_null_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_view_sqlstd
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binarysortable_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_cast
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_udf1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_column_access_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_count
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_func1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_genericudaf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_union_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_precision
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_distinct_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_evolved_parts
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_dependency
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explode_null
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fetch_aggregation
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_cube1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_distinct_samekey
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_id1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_id2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_position
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppd
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_resolution
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_rollup1
org.apache.hadoop.hive.c

[jira] [Updated] (HIVE-8015) Merge from trunk (3) [Spark Branch]

2014-09-07 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8015:
---
Fix Version/s: (was: 0.14.0)
   spark-branch

> Merge from trunk (3) [Spark Branch]
> ---
>
> Key: HIVE-8015
> URL: https://issues.apache.org/jira/browse/HIVE-8015
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: spark-branch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8015) Merge from trunk (3) [Spark Branch]

2014-09-07 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8015:
---
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

> Merge from trunk (3) [Spark Branch]
> ---
>
> Key: HIVE-8015
> URL: https://issues.apache.org/jira/browse/HIVE-8015
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8015) Merge from trunk (3) [Spark Branch]

2014-09-07 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125211#comment-14125211
 ] 

Brock Noland commented on HIVE-8015:


I found the issue with TestMiniTezCliDriver.

> Merge from trunk (3) [Spark Branch]
> ---
>
> Key: HIVE-8015
> URL: https://issues.apache.org/jira/browse/HIVE-8015
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Brock Noland
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-2390) Expand support for union types

2014-09-07 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125210#comment-14125210
 ] 

Amareshwari Sriramadasu commented on HIVE-2390:
---

+1 Changes look fine to me.

[~suma.shivaprasad], Test failure seems unrelated to me. Can you look into and 
confirm?

> Expand support for union types
> --
>
> Key: HIVE-2390
> URL: https://issues.apache.org/jira/browse/HIVE-2390
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Jakob Homan
>Assignee: Suma Shivaprasad
>  Labels: uniontype
> Fix For: 0.14.0
>
> Attachments: HIVE-2390.1.patch, HIVE-2390.patch
>
>
> When the union type was introduced, full support for it wasn't provided.  For 
> instance, when working with a union that gets passed to LazyBinarySerde: 
> {noformat}Caused by: java.lang.RuntimeException: Unrecognized type: UNION
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:468)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:230)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:184)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-1363) 'SHOW TABLE EXTENDED LIKE' command does not strip single/double quotes

2014-09-07 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-1363:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks to Chaoyu for the contribution.

> 'SHOW TABLE EXTENDED LIKE' command does not strip single/double quotes
> --
>
> Key: HIVE-1363
> URL: https://issues.apache.org/jira/browse/HIVE-1363
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Chaoyu Tang
> Fix For: 0.14.0
>
> Attachments: HIVE-1363.1.patch, HIVE-1363.2.patch, HIVE-1363.patch
>
>
> {code}
> hive> SHOW TABLE EXTENDED LIKE pokes;
> OK
> tableName:pokes
> owner:carl
> location:hdfs://localhost/user/hive/warehouse/pokes
> inputformat:org.apache.hadoop.mapred.TextInputFormat
> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> columns:struct columns { i32 num}
> partitioned:false
> partitionColumns:
> totalNumberFiles:0
> totalFileSize:0
> maxFileSize:0
> minFileSize:0
> lastAccessTime:0
> lastUpdateTime:1274517075221
> hive> SHOW TABLE EXTENDED LIKE "p*";
> FAILED: Error in metadata: MetaException(message:Got exception: 
> javax.jdo.JDOUserException ')' expected at character 54 in "database.name == 
> dbName && ( tableName.matches("(?i)"p.*""))")
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> hive> SHOW TABLE EXTENDED LIKE 'p*';
> OK
> hive> SHOW TABLE EXTENDED LIKE `p*`;
> OK
> tableName:pokes
> owner:carl
> location:hdfs://localhost/user/hive/warehouse/pokes
> inputformat:org.apache.hadoop.mapred.TextInputFormat
> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> columns:struct columns { i32 num}
> partitioned:false
> partitionColumns:
> totalNumberFiles:0
> totalFileSize:0
> maxFileSize:0
> minFileSize:0
> lastAccessTime:0
> lastUpdateTime:1274517075221
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-1363) 'SHOW TABLE EXTENDED LIKE' command does not strip single/double quotes

2014-09-07 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-1363:
--
Component/s: Query Processor

> 'SHOW TABLE EXTENDED LIKE' command does not strip single/double quotes
> --
>
> Key: HIVE-1363
> URL: https://issues.apache.org/jira/browse/HIVE-1363
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Chaoyu Tang
> Fix For: 0.14.0
>
> Attachments: HIVE-1363.1.patch, HIVE-1363.2.patch, HIVE-1363.patch
>
>
> {code}
> hive> SHOW TABLE EXTENDED LIKE pokes;
> OK
> tableName:pokes
> owner:carl
> location:hdfs://localhost/user/hive/warehouse/pokes
> inputformat:org.apache.hadoop.mapred.TextInputFormat
> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> columns:struct columns { i32 num}
> partitioned:false
> partitionColumns:
> totalNumberFiles:0
> totalFileSize:0
> maxFileSize:0
> minFileSize:0
> lastAccessTime:0
> lastUpdateTime:1274517075221
> hive> SHOW TABLE EXTENDED LIKE "p*";
> FAILED: Error in metadata: MetaException(message:Got exception: 
> javax.jdo.JDOUserException ')' expected at character 54 in "database.name == 
> dbName && ( tableName.matches("(?i)"p.*""))")
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> hive> SHOW TABLE EXTENDED LIKE 'p*';
> OK
> hive> SHOW TABLE EXTENDED LIKE `p*`;
> OK
> tableName:pokes
> owner:carl
> location:hdfs://localhost/user/hive/warehouse/pokes
> inputformat:org.apache.hadoop.mapred.TextInputFormat
> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> columns:struct columns { i32 num}
> partitioned:false
> partitionColumns:
> totalNumberFiles:0
> totalFileSize:0
> maxFileSize:0
> minFileSize:0
> lastAccessTime:0
> lastUpdateTime:1274517075221
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-1363) 'SHOW TABLE EXTENDED LIKE' command does not strip single/double quotes

2014-09-07 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-1363:
--
Affects Version/s: (was: 0.14.0)

> 'SHOW TABLE EXTENDED LIKE' command does not strip single/double quotes
> --
>
> Key: HIVE-1363
> URL: https://issues.apache.org/jira/browse/HIVE-1363
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Chaoyu Tang
> Fix For: 0.14.0
>
> Attachments: HIVE-1363.1.patch, HIVE-1363.2.patch, HIVE-1363.patch
>
>
> {code}
> hive> SHOW TABLE EXTENDED LIKE pokes;
> OK
> tableName:pokes
> owner:carl
> location:hdfs://localhost/user/hive/warehouse/pokes
> inputformat:org.apache.hadoop.mapred.TextInputFormat
> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> columns:struct columns { i32 num}
> partitioned:false
> partitionColumns:
> totalNumberFiles:0
> totalFileSize:0
> maxFileSize:0
> minFileSize:0
> lastAccessTime:0
> lastUpdateTime:1274517075221
> hive> SHOW TABLE EXTENDED LIKE "p*";
> FAILED: Error in metadata: MetaException(message:Got exception: 
> javax.jdo.JDOUserException ')' expected at character 54 in "database.name == 
> dbName && ( tableName.matches("(?i)"p.*""))")
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> hive> SHOW TABLE EXTENDED LIKE 'p*';
> OK
> hive> SHOW TABLE EXTENDED LIKE `p*`;
> OK
> tableName:pokes
> owner:carl
> location:hdfs://localhost/user/hive/warehouse/pokes
> inputformat:org.apache.hadoop.mapred.TextInputFormat
> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> columns:struct columns { i32 num}
> partitioned:false
> partitionColumns:
> totalNumberFiles:0
> totalFileSize:0
> maxFileSize:0
> minFileSize:0
> lastAccessTime:0
> lastUpdateTime:1274517075221
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8008) NPE while reading null decimal value

2014-09-07 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8008:
--
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks to Chao for the contribution.

> NPE while reading null decimal value
> 
>
> Key: HIVE-8008
> URL: https://issues.apache.org/jira/browse/HIVE-8008
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Affects Versions: 0.13.0, 0.13.1
>Reporter: Chao
>Assignee: Chao
> Fix For: 0.14.0
>
> Attachments: HIVE-8008.2.patch, HIVE-8008.3.patch, HIVE-8008.4.patch, 
> HIVE-8008.patch
>
>
> Say you have this table {{dec_test}}:
> {code}
> dec   decimal(10,0)   
> {code}
> If the table has a row that is 99.5, and if we do
> {code}
> select * from dec_test;
> {code}
> it will crash with NPE:
> {code}
> 2014-09-05 14:08:56,023 ERROR [main]: CliDriver 
> (SessionState.java:printError(545)) - Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:151)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1531)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:544)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:536)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
>   ... 12 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:265)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439)
>   at 
> org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:71)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423)
>   at 
> org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:39)
>   at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87)
>   ... 19 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8008) NPE while reading null decimal value

2014-09-07 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8008:
--
  Component/s: Types
Affects Version/s: 0.13.0
   0.13.1

> NPE while reading null decimal value
> 
>
> Key: HIVE-8008
> URL: https://issues.apache.org/jira/browse/HIVE-8008
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Affects Versions: 0.13.0, 0.13.1
>Reporter: Chao
>Assignee: Chao
> Fix For: 0.14.0
>
> Attachments: HIVE-8008.2.patch, HIVE-8008.3.patch, HIVE-8008.4.patch, 
> HIVE-8008.patch
>
>
> Say you have this table {{dec_test}}:
> {code}
> dec   decimal(10,0)   
> {code}
> If the table has a row that is 99.5, and if we do
> {code}
> select * from dec_test;
> {code}
> it will crash with NPE:
> {code}
> 2014-09-05 14:08:56,023 ERROR [main]: CliDriver 
> (SessionState.java:printError(545)) - Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:151)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1531)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:544)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:536)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
>   ... 12 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:265)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439)
>   at 
> org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:71)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423)
>   at 
> org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:39)
>   at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87)
>   ... 19 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-09-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125153#comment-14125153
 ] 

Lefty Leverenz commented on HIVE-7223:
--

The Admin manual has Thrift Hive Server, and the user docs have a pathetic 
little section in Hive Client.

* [Administrator Docs -- Thrift Hive Server | 
https://cwiki.apache.org/confluence/display/Hive/HiveServer]
* [User Docs -- Hive Client -- Thrift | 
https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Thrift]

> Support generic PartitionSpecs in Metastore partition-functions
> ---
>
> Key: HIVE-7223
> URL: https://issues.apache.org/jira/browse/HIVE-7223
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 0.14.0
>
> Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch, HIVE-7223.3.patch, 
> HIVE-7223.4.patch, HIVE-7223.5.patch
>
>
> Currently, the functions in the HiveMetaStore API that handle multiple 
> partitions do so using List. E.g. 
> {code}
> public List listPartitions(String db_name, String tbl_name, short 
> max_parts);
> public List listPartitionsByFilter(String db_name, String 
> tbl_name, String filter, short max_parts);
> public int add_partitions(List new_parts);
> {code}
> Partition objects are fairly heavyweight, since each Partition carries its 
> own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
> thousands of partitions take so long to have their partitions listed that the 
> client times out with default hive.metastore.client.socket.timeout. There is 
> the additional expense of serializing and deserializing metadata for large 
> sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
> should help in this regard.
> In a date-partitioned table, all sub-partitions for a particular date are 
> *likely* (but not expected) to have:
> # The same base directory (e.g. {{/feeds/search/20140601/}})
> # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
> # The same SerDe/StorageHandler/IOFormat classes
> # Sorting/Bucketing/SkewInfo settings
> In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
> represent the partition-list (for a date) in a more condensed form: a list of 
> LighterPartition instances, all sharing a common StorageDescriptor whose 
> location points to the root directory. 
> We can go one better for the {{add_partitions()}} case: When adding all 
> partitions for a given date, the “normal” case affords us the ability to 
> specify the top-level date-directory, where sub-partitions can be inferred 
> from the HDFS directory-path.
> These extensions are hard to introduce at the metastore-level, since 
> partition-functions explicitly specify {{List}} arguments. I 
> wonder if a {{PartitionSpec}} interface might help:
> {code}
> public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
> ; 
> public int add_partitions( PartitionSpec new_parts ) throws … ;
> {code}
> where the PartitionSpec looks like:
> {code}
> public interface PartitionSpec {
> public List getPartitions();
> public List getPartNames();
> public Iterator getPartitionIter();
> public Iterator getPartNameIter();
> }
> {code}
> For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
> {{PartitionSpec}}, store a top-level directory, and return Partition 
> instances from sub-directory names, while storing a single StorageDescriptor 
> for all of them.
> Similarly, list_partitions() could return a List, where each 
> PartitionSpec corresponds to a set or partitions that can share a 
> StorageDescriptor.
> By exposing iterator semantics, neither the client nor the metastore need 
> instantiate all partitions at once. That should help with memory requirements.
> In case no smart grouping is possible, we could just fall back on a 
> {{DefaultPartitionSpec}} which composes {{List}}, and is no worse 
> than status quo.
> PartitionSpec abstracts away how a set of partitions may be represented. A 
> tighter representation allows us to communicate metadata for a larger number 
> of Partitions, with less Thrift traffic.
> Given that Thrift doesn’t support polymorphism, we’d have to implement the 
> PartitionSpec as a Thrift Union of supported implementations. (We could 
> convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
> sub-class.)
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-7818) Support boolean PPD for ORC

2014-09-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned HIVE-7818:


Assignee: Daniel Dai

> Support boolean PPD for ORC
> ---
>
> Key: HIVE-7818
> URL: https://issues.apache.org/jira/browse/HIVE-7818
> Project: Hive
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-7818.1.patch
>
>
> Currently ORC does collect stats for boolean field. However, the boolean 
> stats is not range based, instead, it collects counts of true records. 
> RecordReaderImpl.evaluatePredicate currently only deals with range based 
> stats, we need to improve it to deal with the boolean stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7818) Support boolean PPD for ORC

2014-09-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-7818:
-
Attachment: HIVE-7818.1.patch

> Support boolean PPD for ORC
> ---
>
> Key: HIVE-7818
> URL: https://issues.apache.org/jira/browse/HIVE-7818
> Project: Hive
>  Issue Type: Improvement
>Reporter: Daniel Dai
> Attachments: HIVE-7818.1.patch
>
>
> Currently ORC does collect stats for boolean field. However, the boolean 
> stats is not range based, instead, it collects counts of true records. 
> RecordReaderImpl.evaluatePredicate currently only deals with range based 
> stats, we need to improve it to deal with the boolean stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-09-07 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125148#comment-14125148
 ] 

Alan Gates commented on HIVE-7223:
--

Is there a section of the wiki just on the thrift interface?  In my opinion we 
should be encouraging users to come through CLI, HS2, or WebHCat, not directly 
through thrift since that by passes a lot of security.

> Support generic PartitionSpecs in Metastore partition-functions
> ---
>
> Key: HIVE-7223
> URL: https://issues.apache.org/jira/browse/HIVE-7223
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 0.14.0
>
> Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch, HIVE-7223.3.patch, 
> HIVE-7223.4.patch, HIVE-7223.5.patch
>
>
> Currently, the functions in the HiveMetaStore API that handle multiple 
> partitions do so using List. E.g. 
> {code}
> public List listPartitions(String db_name, String tbl_name, short 
> max_parts);
> public List listPartitionsByFilter(String db_name, String 
> tbl_name, String filter, short max_parts);
> public int add_partitions(List new_parts);
> {code}
> Partition objects are fairly heavyweight, since each Partition carries its 
> own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
> thousands of partitions take so long to have their partitions listed that the 
> client times out with default hive.metastore.client.socket.timeout. There is 
> the additional expense of serializing and deserializing metadata for large 
> sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
> should help in this regard.
> In a date-partitioned table, all sub-partitions for a particular date are 
> *likely* (but not expected) to have:
> # The same base directory (e.g. {{/feeds/search/20140601/}})
> # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
> # The same SerDe/StorageHandler/IOFormat classes
> # Sorting/Bucketing/SkewInfo settings
> In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
> represent the partition-list (for a date) in a more condensed form: a list of 
> LighterPartition instances, all sharing a common StorageDescriptor whose 
> location points to the root directory. 
> We can go one better for the {{add_partitions()}} case: When adding all 
> partitions for a given date, the “normal” case affords us the ability to 
> specify the top-level date-directory, where sub-partitions can be inferred 
> from the HDFS directory-path.
> These extensions are hard to introduce at the metastore-level, since 
> partition-functions explicitly specify {{List}} arguments. I 
> wonder if a {{PartitionSpec}} interface might help:
> {code}
> public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
> ; 
> public int add_partitions( PartitionSpec new_parts ) throws … ;
> {code}
> where the PartitionSpec looks like:
> {code}
> public interface PartitionSpec {
> public List getPartitions();
> public List getPartNames();
> public Iterator getPartitionIter();
> public Iterator getPartNameIter();
> }
> {code}
> For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
> {{PartitionSpec}}, store a top-level directory, and return Partition 
> instances from sub-directory names, while storing a single StorageDescriptor 
> for all of them.
> Similarly, list_partitions() could return a List, where each 
> PartitionSpec corresponds to a set or partitions that can share a 
> StorageDescriptor.
> By exposing iterator semantics, neither the client nor the metastore need 
> instantiate all partitions at once. That should help with memory requirements.
> In case no smart grouping is possible, we could just fall back on a 
> {{DefaultPartitionSpec}} which composes {{List}}, and is no worse 
> than status quo.
> PartitionSpec abstracts away how a set of partitions may be represented. A 
> tighter representation allows us to communicate metadata for a larger number 
> of Partitions, with less Thrift traffic.
> Given that Thrift doesn’t support polymorphism, we’d have to implement the 
> PartitionSpec as a Thrift Union of supported implementations. (We could 
> convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
> sub-class.)
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup

2014-09-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125147#comment-14125147
 ] 

Lefty Leverenz commented on HIVE-6847:
--

Doc note:  This changes the default value and description for 
*hive.exec.scratchdir*, which is already documented in the wiki (in Configuring 
Hive as well as Configuration Properties).  It also gives a description for 
*hive.scratch.dir.permission*, which is not documented in the wiki yet.  No 
change is made to *hive.downloaded.resources.dir*, it's just removed in one 
place and added elsewhere.

Here are all the places in the wiki that *hive.exec.scratchdir* is documented:

* [AdminManual Configuration -- Configuring Hive | 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive]
* [AdminManual Configuration -- Temporary Folders | 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-TemporaryFolders]
* [AdminManual Configuration -- Configuration Variables | 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfigurationVariables]
* [Configuration Properties -- hive.exec.scratchdir | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.scratchdir]
* example of using hive.exec.scratchdir (2nd bullet):  [Hive CLI -- Examples | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-Examples]

> Improve / fix bugs in Hive scratch dir setup
> 
>
> Key: HIVE-6847
> URL: https://issues.apache.org/jira/browse/HIVE-6847
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 0.14.0
>Reporter: Vikram Dixit K
>Assignee: Vaibhav Gumashta
>  Labels: TODOC14
> Fix For: 0.14.0
>
> Attachments: HIVE-6847.1.patch, HIVE-6847.10.patch, 
> HIVE-6847.11.patch, HIVE-6847.2.patch, HIVE-6847.3.patch, HIVE-6847.4.patch, 
> HIVE-6847.5.patch, HIVE-6847.6.patch, HIVE-6847.7.patch, HIVE-6847.8.patch, 
> HIVE-6847.9.patch
>
>
> Currently, the hive server creates scratch directory and changes permission 
> to 777 however, this is not great with respect to security. We need to create 
> user specific scratch directories instead. Also refer to HIVE-6782 1st 
> iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-09-07 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7223:
-
Fix Version/s: 0.14.0

> Support generic PartitionSpecs in Metastore partition-functions
> ---
>
> Key: HIVE-7223
> URL: https://issues.apache.org/jira/browse/HIVE-7223
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 0.14.0
>
> Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch, HIVE-7223.3.patch, 
> HIVE-7223.4.patch, HIVE-7223.5.patch
>
>
> Currently, the functions in the HiveMetaStore API that handle multiple 
> partitions do so using List. E.g. 
> {code}
> public List listPartitions(String db_name, String tbl_name, short 
> max_parts);
> public List listPartitionsByFilter(String db_name, String 
> tbl_name, String filter, short max_parts);
> public int add_partitions(List new_parts);
> {code}
> Partition objects are fairly heavyweight, since each Partition carries its 
> own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
> thousands of partitions take so long to have their partitions listed that the 
> client times out with default hive.metastore.client.socket.timeout. There is 
> the additional expense of serializing and deserializing metadata for large 
> sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
> should help in this regard.
> In a date-partitioned table, all sub-partitions for a particular date are 
> *likely* (but not expected) to have:
> # The same base directory (e.g. {{/feeds/search/20140601/}})
> # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
> # The same SerDe/StorageHandler/IOFormat classes
> # Sorting/Bucketing/SkewInfo settings
> In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
> represent the partition-list (for a date) in a more condensed form: a list of 
> LighterPartition instances, all sharing a common StorageDescriptor whose 
> location points to the root directory. 
> We can go one better for the {{add_partitions()}} case: When adding all 
> partitions for a given date, the “normal” case affords us the ability to 
> specify the top-level date-directory, where sub-partitions can be inferred 
> from the HDFS directory-path.
> These extensions are hard to introduce at the metastore-level, since 
> partition-functions explicitly specify {{List}} arguments. I 
> wonder if a {{PartitionSpec}} interface might help:
> {code}
> public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
> ; 
> public int add_partitions( PartitionSpec new_parts ) throws … ;
> {code}
> where the PartitionSpec looks like:
> {code}
> public interface PartitionSpec {
> public List getPartitions();
> public List getPartNames();
> public Iterator getPartitionIter();
> public Iterator getPartNameIter();
> }
> {code}
> For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
> {{PartitionSpec}}, store a top-level directory, and return Partition 
> instances from sub-directory names, while storing a single StorageDescriptor 
> for all of them.
> Similarly, list_partitions() could return a List, where each 
> PartitionSpec corresponds to a set or partitions that can share a 
> StorageDescriptor.
> By exposing iterator semantics, neither the client nor the metastore need 
> instantiate all partitions at once. That should help with memory requirements.
> In case no smart grouping is possible, we could just fall back on a 
> {{DefaultPartitionSpec}} which composes {{List}}, and is no worse 
> than status quo.
> PartitionSpec abstracts away how a set of partitions may be represented. A 
> tighter representation allows us to communicate metadata for a larger number 
> of Partitions, with less Thrift traffic.
> Given that Thrift doesn’t support polymorphism, we’d have to implement the 
> PartitionSpec as a Thrift Union of supported implementations. (We could 
> convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
> sub-class.)
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8016) CBO: PPD to honor hive Join Cond, Casting fixes, Add annotations for IF, Code cleanup

2014-09-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125141#comment-14125141
 ] 

Hive QA commented on HIVE-8016:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667114/HIVE-8016.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/686/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/686/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-686/

Messages:
{noformat}
 This message was trimmed, see log for full details 
Reverted 'ql/src/test/results/clientpositive/groupby5_map.q.out'
Reverted 'ql/src/test/results/clientpositive/order.q.out'
Reverted 'ql/src/test/results/clientpositive/num_op_type_conv.q.out'
Reverted 'ql/src/test/results/clientpositive/ppd_join3.q.out'
Reverted 'ql/src/test/results/clientpositive/skewjoin.q.out'
Reverted 'ql/src/test/results/clientpositive/mapjoin_distinct.q.out'
Reverted 'ql/src/test/results/clientpositive/vector_coalesce.q.out'
Reverted 'ql/src/test/results/clientpositive/vectorization_15.q.out'
Reverted 'ql/src/test/results/clientpositive/input17.q.out'
Reverted 'ql/src/test/results/clientpositive/groupby1.q.out'
Reverted 'ql/src/test/results/clientpositive/udf_case.q.out'
Reverted 'ql/src/test/results/clientpositive/udf_inline.q.out'
Reverted 'ql/src/test/results/clientpositive/alias_casted_column.q.out'
Reverted 'ql/src/test/results/clientpositive/mapreduce7.q.out'
Reverted 'ql/src/test/results/clientpositive/subquery_views.q.out'
Reverted 'ql/src/test/results/clientpositive/udf_explode.q.out'
Reverted 'ql/src/test/results/clientpositive/udf_abs.q.out'
Reverted 'ql/src/test/results/clientpositive/ppd_udtf.q.out'
Reverted 'ql/src/test/results/clientpositive/groupby2_map.q.out'
Reverted 'ql/src/test/results/clientpositive/input26.q.out'
Reverted 'ql/src/test/results/clientpositive/rand_partitionpruner2.q.out'
Reverted 'ql/src/test/results/clientpositive/optimize_nullscan.q.out'
Reverted 'ql/src/test/results/clientpositive/constprog_dp.q.out'
Reverted 'ql/src/test/results/clientpositive/input12.q.out'
Reverted 'ql/src/test/results/clientpositive/input35.q.out'
Reverted 'ql/src/test/results/clientpositive/auto_join_without_localtask.q.out'
Reverted 'ql/src/test/results/clientpositive/subquery_multiinsert.q.out'
Reverted 'ql/src/test/results/clientpositive/orc_merge1.q.out'
Reverted 'ql/src/test/results/clientpositive/mapreduce2.q.out'
Reverted 'ql/src/test/results/clientpositive/index_auto_self_join.q.out'
Reverted 'ql/src/test/results/clientpositive/auto_join16.q.out'
Reverted 'ql/src/test/results/clientpositive/bucket3.q.out'
Reverted 'ql/src/test/results/clientpositive/merge2.q.out'
Reverted 'ql/src/test/results/clientpositive/input_lazyserde.q.out'
Reverted 'ql/src/test/results/clientpositive/groupby_ppr_multi_distinct.q.out'
Reverted 'ql/src/test/results/clientpositive/udf_count.q.out'
Reverted 'ql/src/test/results/clientpositive/input9.q.out'
Reverted 'ql/src/test/results/clientpositive/groupby_duplicate_key.q.out'
Reverted 'ql/src/test/results/clientpositive/union17.q.out'
Reverted 'ql/src/test/results/clientpositive/subq_where_serialization.q.out'
Reverted 'ql/src/test/results/clientpositive/input30.q.out'
Reverted 'ql/src/test/results/clientpositive/udf_locate.q.out'
Reverted 'ql/src/test/results/clientpositive/join18.q.out'
Reverted 'ql/src/test/results/clientpositive/nullgroup4.q.out'
Reverted 'ql/src/test/results/clientpositive/auto_join11.q.out'
Reverted 'ql/src/test/results/clientpositive/subq2.q.out'
Reverted 'ql/src/test/results/clientpositive/parallel.q.out'
Reverted 'ql/src/test/results/clientpositive/udf_instr.q.out'
Reverted 'ql/src/test/results/clientpositive/union26.q.out'
Reverted 'ql/src/test/results/clientpositive/fetch_aggregation.q.out'
Reverted 'ql/src/test/results/clientpositive/ppd1.q.out'
Reverted 'ql/src/test/results/clientpositive/groupby8_map_skew.q.out'
Reverted 'ql/src/test/results/clientpositive/join27.q.out'
Reverted 'ql/src/test/results/clientpositive/cross_join.q.out'
Reverted 'ql/src/test/results/clientpositive/ppd_random.q.out'
Reverted 'ql/src/test/results/clientpositive/auto_join20.q.out'
Reverted 'ql/src/test/results/clientpositive/union12.q.out'
Reverted 'ql/src/test/results/clientpositive/join13.q.out'
Reverted 'ql/src/test/results/clientpositive/input_part5.q.out'
Reverted 'ql/src/test/results/clientpositive/insert_into2.q.out'
Reverted 'ql/src/test/results/clientpositive/groupby12.q.out'
Reverted 'ql/src/test/results/clientpositive/union21.q.out'
Reverted 'ql/src/test/results/clientpositive/join22.q.out'
Reverted 'ql/src/test/results/clientpositive/nonblock_op_deduplicate.q.out'
Reverted 'ql/src/test/results/client

[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-07 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7946:
-
Status: Patch Available  (was: Open)

> CBO: Merge CBO changes to Trunk
> ---
>
> Key: HIVE-7946
> URL: https://issues.apache.org/jira/browse/HIVE-7946
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
> HIVE-7946.4.patch, HIVE-7946.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-07 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7946:
-
Attachment: HIVE-7946.4.patch

> CBO: Merge CBO changes to Trunk
> ---
>
> Key: HIVE-7946
> URL: https://issues.apache.org/jira/browse/HIVE-7946
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
> HIVE-7946.4.patch, HIVE-7946.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup

2014-09-07 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6847:
-
Labels: TODOC14  (was: )

> Improve / fix bugs in Hive scratch dir setup
> 
>
> Key: HIVE-6847
> URL: https://issues.apache.org/jira/browse/HIVE-6847
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 0.14.0
>Reporter: Vikram Dixit K
>Assignee: Vaibhav Gumashta
>  Labels: TODOC14
> Fix For: 0.14.0
>
> Attachments: HIVE-6847.1.patch, HIVE-6847.10.patch, 
> HIVE-6847.11.patch, HIVE-6847.2.patch, HIVE-6847.3.patch, HIVE-6847.4.patch, 
> HIVE-6847.5.patch, HIVE-6847.6.patch, HIVE-6847.7.patch, HIVE-6847.8.patch, 
> HIVE-6847.9.patch
>
>
> Currently, the hive server creates scratch directory and changes permission 
> to 777 however, this is not great with respect to security. We need to create 
> user specific scratch directories instead. Also refer to HIVE-6782 1st 
> iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8015) Merge from trunk (3) [Spark Branch]

2014-09-07 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8015:
---
Attachment: (was: HIVE-8015.1-spark.patch)

> Merge from trunk (3) [Spark Branch]
> ---
>
> Key: HIVE-8015
> URL: https://issues.apache.org/jira/browse/HIVE-8015
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Brock Noland
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8016) CBO: PPD to honor hive Join Cond, Casting fixes, Add annotations for IF, Code cleanup

2014-09-07 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-8016:
-
Status: Patch Available  (was: Open)

> CBO: PPD to honor hive Join Cond, Casting fixes, Add annotations for IF, Code 
> cleanup
> -
>
> Key: HIVE-8016
> URL: https://issues.apache.org/jira/browse/HIVE-8016
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-8016.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8016) CBO: PPD to honor hive Join Cond, Casting fixes, Add annotations for IF, Code cleanup

2014-09-07 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-8016:
-
Attachment: HIVE-8016.1.patch

> CBO: PPD to honor hive Join Cond, Casting fixes, Add annotations for IF, Code 
> cleanup
> -
>
> Key: HIVE-8016
> URL: https://issues.apache.org/jira/browse/HIVE-8016
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-8016.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-09-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125116#comment-14125116
 ] 

Lefty Leverenz commented on HIVE-7223:
--

Does this need to be documented in the wiki?

(Also, Fix Version/s should be 0.14.0.)

> Support generic PartitionSpecs in Metastore partition-functions
> ---
>
> Key: HIVE-7223
> URL: https://issues.apache.org/jira/browse/HIVE-7223
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch, HIVE-7223.3.patch, 
> HIVE-7223.4.patch, HIVE-7223.5.patch
>
>
> Currently, the functions in the HiveMetaStore API that handle multiple 
> partitions do so using List. E.g. 
> {code}
> public List listPartitions(String db_name, String tbl_name, short 
> max_parts);
> public List listPartitionsByFilter(String db_name, String 
> tbl_name, String filter, short max_parts);
> public int add_partitions(List new_parts);
> {code}
> Partition objects are fairly heavyweight, since each Partition carries its 
> own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
> thousands of partitions take so long to have their partitions listed that the 
> client times out with default hive.metastore.client.socket.timeout. There is 
> the additional expense of serializing and deserializing metadata for large 
> sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
> should help in this regard.
> In a date-partitioned table, all sub-partitions for a particular date are 
> *likely* (but not expected) to have:
> # The same base directory (e.g. {{/feeds/search/20140601/}})
> # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
> # The same SerDe/StorageHandler/IOFormat classes
> # Sorting/Bucketing/SkewInfo settings
> In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
> represent the partition-list (for a date) in a more condensed form: a list of 
> LighterPartition instances, all sharing a common StorageDescriptor whose 
> location points to the root directory. 
> We can go one better for the {{add_partitions()}} case: When adding all 
> partitions for a given date, the “normal” case affords us the ability to 
> specify the top-level date-directory, where sub-partitions can be inferred 
> from the HDFS directory-path.
> These extensions are hard to introduce at the metastore-level, since 
> partition-functions explicitly specify {{List}} arguments. I 
> wonder if a {{PartitionSpec}} interface might help:
> {code}
> public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
> ; 
> public int add_partitions( PartitionSpec new_parts ) throws … ;
> {code}
> where the PartitionSpec looks like:
> {code}
> public interface PartitionSpec {
> public List getPartitions();
> public List getPartNames();
> public Iterator getPartitionIter();
> public Iterator getPartNameIter();
> }
> {code}
> For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
> {{PartitionSpec}}, store a top-level directory, and return Partition 
> instances from sub-directory names, while storing a single StorageDescriptor 
> for all of them.
> Similarly, list_partitions() could return a List, where each 
> PartitionSpec corresponds to a set or partitions that can share a 
> StorageDescriptor.
> By exposing iterator semantics, neither the client nor the metastore need 
> instantiate all partitions at once. That should help with memory requirements.
> In case no smart grouping is possible, we could just fall back on a 
> {{DefaultPartitionSpec}} which composes {{List}}, and is no worse 
> than status quo.
> PartitionSpec abstracts away how a set of partitions may be represented. A 
> tighter representation allows us to communicate metadata for a larger number 
> of Partitions, with less Thrift traffic.
> Given that Thrift doesn’t support polymorphism, we’d have to implement the 
> PartitionSpec as a Thrift Union of supported implementations. (We could 
> convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
> sub-class.)
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7976) Merge tez branch into trunk (tez 0.5.0)

2014-09-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125112#comment-14125112
 ] 

Lefty Leverenz commented on HIVE-7976:
--

Does this need any user or administrator documentation, other than the three 
parameters for dynamic partition pruning?  Should the design doc be updated?

* [design doc:  Hive on Tez | 
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez]

> Merge tez branch into trunk (tez 0.5.0)
> ---
>
> Key: HIVE-7976
> URL: https://issues.apache.org/jira/browse/HIVE-7976
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gopal V
> Fix For: 0.14.0
>
> Attachments: HIVE-7976.1.patch, HIVE-7976.2.patch, HIVE-7976.3.patch
>
>
> Tez 0.5.0 release is available now. 
> (https://repository.apache.org/content/repositories/releases/org/apache/tez/tez-api/0.5.0/)
> In Tez 0.5.0 a lot of APIs have changed, and we were doing dev against these 
> APIs in the tez branch, until they've become stable and available.
> [~gopalv] has been driving a lot of the API changes necessary, but [~sseth], 
> [~rajesh.balamohan], [~vikram.dixit] and myself have chimed in as well.
> One new feature (dynamic partition pruning, HIVE-7826) has also been parked 
> on this branch because of dependencies to APIs first released in 0.5.0.
> This ticket is to merge the tez branch back to trunk. I'll post patches for 
> review and for the unit tests to run, but once the required +1s are there the 
> goal is to merge to keep the history.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7976) Merge tez branch into trunk (tez 0.5.0)

2014-09-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125110#comment-14125110
 ] 

Lefty Leverenz commented on HIVE-7976:
--

See HIVE-7826 comment about parameter typo:  
*hive.tez.dynamic.parition.pruning.max.data.size* (parition).

* [HIVE-7826 comment | 
https://issues.apache.org/jira/browse/HIVE-7826?focusedCommentId=14125109&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14125109]

> Merge tez branch into trunk (tez 0.5.0)
> ---
>
> Key: HIVE-7976
> URL: https://issues.apache.org/jira/browse/HIVE-7976
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gopal V
> Fix For: 0.14.0
>
> Attachments: HIVE-7976.1.patch, HIVE-7976.2.patch, HIVE-7976.3.patch
>
>
> Tez 0.5.0 release is available now. 
> (https://repository.apache.org/content/repositories/releases/org/apache/tez/tez-api/0.5.0/)
> In Tez 0.5.0 a lot of APIs have changed, and we were doing dev against these 
> APIs in the tez branch, until they've become stable and available.
> [~gopalv] has been driving a lot of the API changes necessary, but [~sseth], 
> [~rajesh.balamohan], [~vikram.dixit] and myself have chimed in as well.
> One new feature (dynamic partition pruning, HIVE-7826) has also been parked 
> on this branch because of dependencies to APIs first released in 0.5.0.
> This ticket is to merge the tez branch back to trunk. I'll post patches for 
> review and for the unit tests to run, but once the required +1s are there the 
> goal is to merge to keep the history.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7826) Dynamic partition pruning on Tez

2014-09-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125109#comment-14125109
 ] 

Lefty Leverenz commented on HIVE-7826:
--

Typo alert:  *hive.tez.dynamic.parition.pruning.max.data.size* is misspelled 
(parition) here and in HIVE-7976 (merge Tez branch).  It's even misspelled in 
the description and the doc comment above.  So much for eagle eyes, sigh.

Does this need a new JIRA ticket or can it be fixed in HIVE-6586 (various 
HiveConf.java fixes)?  The string 
"hive.tez.dynamic.parition.pruning.max.data.size" only occurs once in each 
patch -- this one and the Tez merge.

> Dynamic partition pruning on Tez
> 
>
> Key: HIVE-7826
> URL: https://issues.apache.org/jira/browse/HIVE-7826
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>  Labels: TODOC14, tez
> Fix For: 0.14.0
>
> Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, 
> HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch
>
>
> It's natural in a star schema to map one or more dimensions to partition 
> columns. Time or location are likely candidates. 
> It can also useful to be to compute the partitions one would like to scan via 
> a subquery (where p in select ... from ...).
> The resulting joins in hive require a full table scan of the large table 
> though, because partition pruning takes place before the corresponding values 
> are known.
> On Tez it's relatively straight forward to send the values needed to prune to 
> the application master - where splits are generated and tasks are submitted. 
> Using these values we can strip out any unneeded partitions dynamically, 
> while the query is running.
> The approach is straight forward:
> - Insert synthetic conditions for each join representing "x in (keys of other 
> side in join)"
> - This conditions will be pushed as far down as possible
> - If the condition hits a table scan and the column involved is a partition 
> column:
>- Setup Operator to send key events to AM
> - else:
>- Remove synthetic predicate
> Add  these properties :
> ||Property||Default Value||
> |{{hive.tez.dynamic.partition.pruning}}|true|
> |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L|
> |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7923) populate stats for test tables

2014-09-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125103#comment-14125103
 ] 

Hive QA commented on HIVE-7923:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667108/HIVE-7923.6.patch

{color:red}ERROR:{color} -1 due to 32 failed/errored test(s), 6185 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.ql.parse.TestParse.testParse_case_sensitivity
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input3
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input7
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input9
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testsequencefile
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join3
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample3
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample7
org.apache.hadoop.hive.ql.parse.TestParse.testParse_subq
org.apache.hadoop.hive.ql.parse.TestParse.testParse_union
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/685/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/685/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-685/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 32 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12667108

> populate stats for test tables
> --
>
> Key: HIVE-7923
> URL: https://issues.apache.org/jira/browse/HIVE-7923
> Project: Hive
>  Issue Type: Improvement
>Reporter: pengcheng xiong
>Assignee: pengcheng xiong
>Priority: Minor
> Attachments: HIVE-7923.1.patch, HIVE-7923.2.patch, HIVE-7923.3.patch, 
> HIVE-7923.4.patch, HIVE-7923.5.patch, HIVE-7923.6.patch
>
>
> Current q_test only generates tables, e.g., src only but does not create 
> status. All the test cases will fail in CBO because CBO depends on the 
> status. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4725) templeton.hadoop.queue.name property should be documented

2014-09-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125096#comment-14125096
 ] 

Lefty Leverenz commented on HIVE-4725:
--

Whew, this has been waiting a long long time.  I'll get to it right after 
HIVE-6586 (HiveConf.java fixes), which is needed for the 0.14.0 release.

Thanks for reminding me, [~eugene.koifman].

> templeton.hadoop.queue.name property should be documented
> -
>
> Key: HIVE-4725
> URL: https://issues.apache.org/jira/browse/HIVE-4725
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation, WebHCat
>Affects Versions: 0.12.0
>Reporter: Eugene Koifman
>Assignee: Lefty Leverenz
>  Labels: TODOC12
>
> This is to track that changes in HIVE-4679 get documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-07 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125088#comment-14125088
 ] 

Laljo John Pullokkaran commented on HIVE-7946:
--

Lars,
 For next few days we are working to get all the unit tests pass.
After which more code cleanup/re org would follow.

> CBO: Merge CBO changes to Trunk
> ---
>
> Key: HIVE-7946
> URL: https://issues.apache.org/jira/browse/HIVE-7946
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
> HIVE-7946.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7923) populate stats for test tables

2014-09-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7923:
---
Attachment: HIVE-7923.6.patch

> populate stats for test tables
> --
>
> Key: HIVE-7923
> URL: https://issues.apache.org/jira/browse/HIVE-7923
> Project: Hive
>  Issue Type: Improvement
>Reporter: pengcheng xiong
>Assignee: pengcheng xiong
>Priority: Minor
> Attachments: HIVE-7923.1.patch, HIVE-7923.2.patch, HIVE-7923.3.patch, 
> HIVE-7923.4.patch, HIVE-7923.5.patch, HIVE-7923.6.patch
>
>
> Current q_test only generates tables, e.g., src only but does not create 
> status. All the test cases will fail in CBO because CBO depends on the 
> status. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7923) populate stats for test tables

2014-09-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7923:
---
Status: Patch Available  (was: Open)

> populate stats for test tables
> --
>
> Key: HIVE-7923
> URL: https://issues.apache.org/jira/browse/HIVE-7923
> Project: Hive
>  Issue Type: Improvement
>Reporter: pengcheng xiong
>Assignee: pengcheng xiong
>Priority: Minor
> Attachments: HIVE-7923.1.patch, HIVE-7923.2.patch, HIVE-7923.3.patch, 
> HIVE-7923.4.patch, HIVE-7923.5.patch, HIVE-7923.6.patch
>
>
> Current q_test only generates tables, e.g., src only but does not create 
> status. All the test cases will fail in CBO because CBO depends on the 
> status. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6147) Support avro data stored in HBase columns

2014-09-07 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-6147:
---
Attachment: HIVE-6147.6.patch.txt

Attaching refactored patch for testing.

> Support avro data stored in HBase columns
> -
>
> Key: HIVE-6147
> URL: https://issues.apache.org/jira/browse/HIVE-6147
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, 
> HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, 
> HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt
>
>
> Presently, the HBase Hive integration supports querying only primitive data 
> types in columns. It would be nice to be able to store and query Avro objects 
> in HBase columns by making them visible as structs to Hive. This will allow 
> Hive to perform ad hoc analysis of HBase data which can be deeply structured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7991) Incorrect calculation of number of rows in JoinStatsRule.process results in overflow

2014-09-07 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7991:
-
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Gunther and Mostafa for the review.

> Incorrect calculation of number of rows in JoinStatsRule.process results in 
> overflow
> 
>
> Key: HIVE-7991
> URL: https://issues.apache.org/jira/browse/HIVE-7991
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth J
>Priority: Minor
> Fix For: 0.14.0
>
> Attachments: HIVE-7991.1.patch, HIVE-7991.2.patch
>
>
> This loop results in adding the parent twice incase of a 3 way join of 
> store_sales  x date_dim x store
> {code}
>  for (int pos = 0; pos < parents.size(); pos++) {
> ReduceSinkOperator parent = (ReduceSinkOperator) 
> jop.getParentOperators().get(pos);
> Statistics parentStats = parent.getStatistics();
> List keyExprs = parent.getConf().getKeyCols();
> // Parent RS may have column statistics from multiple parents.
> // Populate table alias to row count map, this will be used later 
> to
> // scale down/up column statistics based on new row count
> // NOTE: JOIN with UNION as parent of RS will not have table alias
> // propagated properly. UNION operator does not propagate the 
> table
> // alias of subqueries properly to expression nodes. Hence 
> union20.q
> // will have wrong number of rows.
> Set tableAliases = 
> StatsUtils.getAllTableAlias(parent.getColumnExprMap());
> for (String tabAlias : tableAliases) {
>   rowCountParents.put(tabAlias, parentStats.getNumRows());
> }
> {code}
> In the first join we have rowCountParents with {store_sales=120464862, 
> date_dim=36524} which is correct.
> For the second join result rowCountParents ends up with {store=212, 
> store_sales=120464862, date_dim=120464862} where it should be {store=212, 
> store_sales=120464862, date_dim=36524}.
> The result of this is that computeNewRowCount ends up multiplying row count 
> of store_sales x store_sales which makes the number of rows really high and 
> eventually over flow.
> Plan snippet : 
> {code}
>Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (((ss_sold_date_sk is not null and ss_store_sk 
> is not null) and ss_item_sk is not null) and ss_sold_date BETWEEN 
> '1999-06-01' AND '2000-05-31') (type: boolean)
>   Statistics: Num rows: 110339135 Data size: 4817453454 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((ss_sold_date_sk is not null and ss_store_sk 
> is not null) and ss_item_sk is not null) (type: boolean)
> Statistics: Num rows: 107740258 Data size: 2124353556 
> Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   condition expressions:
> 0 {ss_sold_date_sk} {ss_item_sk} {ss_store_sk} 
> {ss_quantity} {ss_sales_price} {ss_sold_date}
> 1 {d_date_sk} {d_month_seq} {d_year} {d_moy} {d_qoy}
>   keys:
> 0 ss_sold_date_sk (type: int)
> 1 d_date_sk (type: int)
>   outputColumnNames: _col0, _col2, _col7, _col10, _col13, 
> _col23, _col27, _col30, _col33, _col35, _col37
>   input vertices:
> 1 Map 6
>   Statistics: Num rows: 120464862 Data size: 26984129088 
> Basic stats: COMPLETE Column stats: COMPLETE
>   Map Join Operator
> condition map:
>  Inner Join 0 to 1
> condition expressions:
>   0 {_col0} {_col2} {_col7} {_col10} {_col13} 
> {_col23} {_col27} {_col30} {_col33} {_col35} {_col37}
>   1 {s_store_sk} {s_store_id}
> keys:
>   0 _col7 (type: int)
>   1 s_store_sk (type: int)
> outputColumnNames: _col0, _col2, _col7, _col10, 
> _col13, _col23, _col27, _col30, _col33, _col35, _col37, _col58, _col59
> input vertices:
>   1 Map 5
> Statistics: Num rows: 17886

[jira] [Updated] (HIVE-7990) With fetch column stats disabled number of elements in grouping set is not taken into account

2014-09-07 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7990:
-
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Gunther for review.

> With fetch column stats disabled number of elements in grouping set is not 
> taken into account
> -
>
> Key: HIVE-7990
> URL: https://issues.apache.org/jira/browse/HIVE-7990
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth J
> Fix For: 0.14.0
>
> Attachments: HIVE-7990.1.patch, HIVE-7990.2.patch, HIVE-7990.3.patch
>
>
> For queries with rollup and cube the number of rows calculation in 
> GroupByStatsRule should be multiplied by number of elements in grouping set.
> A side effect of this defect is that reducers will under estimate data size 
> and end up with small number of tasks which negatively affects query runtime. 
>  
> {code}
> // since we do not know if hash-aggregation will be enabled or 
> disabled
> // at runtime we will assume that map-side group by does not do 
> any
> // reduction.hence no group by rule will be applied
> // map-side grouping set present. if grouping set is present then
> // multiply the number of rows by number of elements in grouping 
> set
> if (gop.getConf().isGroupingSetsPresent()) {
>   int multiplier = gop.getConf().getListGroupingSets().size();
>   // take into account the map-side parallelism as well, default 
> is 1
>   multiplier *= mapSideParallelism;
>   newNumRows = multiplier * stats.getNumRows();
>   long dataSize = multiplier * stats.getDataSize();
>   stats.setNumRows(newNumRows);
>   stats.setDataSize(dataSize);
>   for (ColStatistics cs : colStats) {
> if (cs != null) {
>   long oldNumNulls = cs.getNumNulls();
>   long newNumNulls = multiplier * oldNumNulls;
>   cs.setNumNulls(newNumNulls);
> }
>   }
> } else {
>   // map side no grouping set
>   newNumRows = stats.getNumRows() * mapSideParallelism;
>   updateStats(stats, newNumRows, true);
> }
>   
> {code}
> Query 
> {code}
> select  *
> from (select i_category
> ,i_class
> ,i_brand
> ,i_product_name
> ,d_year
> ,d_qoy
> ,d_moy
> ,s_store_id
> ,sumsales
> ,rank() over (partition by i_category order by sumsales desc) rk
>   from (select i_category
>   ,i_class
>   ,i_brand
>   ,i_product_name
>   ,d_year
>   ,d_qoy
>   ,d_moy
>   ,s_store_id
>   ,sum(coalesce(ss_sales_price*ss_quantity,0)) sumsales
> from store_sales
> ,date_dim
> ,store
> ,item
>where  store_sales.ss_sold_date_sk=date_dim.d_date_sk
>   and store_sales.ss_item_sk=item.i_item_sk
>   and store_sales.ss_store_sk = store.s_store_sk
>   and d_month_seq between 1193 and 1193+11
>   and ss_sold_date between '1999-06-01' and '2000-05-31'
>group by i_category, i_class, i_brand, i_product_name, d_year, d_qoy, 
> d_moy,s_store_id with rollup)dw1) dw2
> where rk <= 100
> order by i_category
> ,i_class
> ,i_brand
> ,i_product_name
> ,d_year
> ,d_qoy
> ,d_moy
> ,s_store_id
> ,sumsales
> ,rk
> limit 100
> {code}
> Plan generated , note the data size for Map 1
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 1 <- Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 
> (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
>   DagName: mmokhtar_20140903154848_7cf1519f-e95c-47ab-9f10-6d2130cd5734:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (((ss_sold_date_sk is not null and ss_store_sk 
> is not null) and ss_item_sk is not null) and ss_sold_date BETWEEN 
> '1999-06-01' AND '2000-05-31') (type: boolean)
>   Statistics: Num rows: 110339135 Data 

[jira] [Commented] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-07 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125000#comment-14125000
 ] 

Lars Francke commented on HIVE-7946:


Thanks for addressing all those issues. It's a bit hard to tell whether you're 
looking for proper reviews or still working through "internal" issues. Could 
you maybe give me/us a quick update on your plans/expectations around this 
issue?

> CBO: Merge CBO changes to Trunk
> ---
>
> Key: HIVE-7946
> URL: https://issues.apache.org/jira/browse/HIVE-7946
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
> HIVE-7946.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8016) CBO: PPD to honor hive Join Cond, Casting fixes, Add annotations for IF, Code cleanup

2014-09-07 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-8016:
-
Attachment: (was: HIVE-8016.patch)

> CBO: PPD to honor hive Join Cond, Casting fixes, Add annotations for IF, Code 
> cleanup
> -
>
> Key: HIVE-8016
> URL: https://issues.apache.org/jira/browse/HIVE-8016
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8016) CBO: PPD to honor hive Join Cond, Casting fixes, Add annotations for IF, Code cleanup

2014-09-07 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-8016:
-
Attachment: (was: HIVE-8016.1.patch)

> CBO: PPD to honor hive Join Cond, Casting fixes, Add annotations for IF, Code 
> cleanup
> -
>
> Key: HIVE-8016
> URL: https://issues.apache.org/jira/browse/HIVE-8016
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-8016.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8016) CBO: PPD to honor hive Join Cond, Casting fixes, Add annotations for IF, Code cleanup

2014-09-07 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-8016:
-
Status: Open  (was: Patch Available)

> CBO: PPD to honor hive Join Cond, Casting fixes, Add annotations for IF, Code 
> cleanup
> -
>
> Key: HIVE-8016
> URL: https://issues.apache.org/jira/browse/HIVE-8016
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-8016.1.patch, HIVE-8016.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-07 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7946:
-
Attachment: (was: HIVE-7946.4.patch)

> CBO: Merge CBO changes to Trunk
> ---
>
> Key: HIVE-7946
> URL: https://issues.apache.org/jira/browse/HIVE-7946
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
> HIVE-7946.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-07 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7946:
-
Status: Open  (was: Patch Available)

> CBO: Merge CBO changes to Trunk
> ---
>
> Key: HIVE-7946
> URL: https://issues.apache.org/jira/browse/HIVE-7946
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
> HIVE-7946.4.patch, HIVE-7946.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124955#comment-14124955
 ] 

Hive QA commented on HIVE-7946:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667088/HIVE-7946.4.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/684/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/684/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-684/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-exec ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-exec ---
[INFO] Compiling 1882 source files to 
/data/hive-ptest/working/apache-svn-trunk-source/ql/target/classes
[INFO] -
[WARNING] COMPILATION WARNING : 
[INFO] -
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:
 Some input files use or override a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:
 Recompile with -Xlint:deprecation for details.
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/Query.java:
 Some input files use unchecked or unsafe operations.
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/Query.java:
 Recompile with -Xlint:unchecked for details.
[INFO] 4 warnings 
[INFO] -
[INFO] -
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:[117,49]
 cannot find symbol
  symbol:   class HiveTraitsUtil
  location: package org.apache.hadoop.hive.ql.optimizer.optiq
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveTableScanRel.java:[23,49]
 cannot find symbol
  symbol:   class HiveTraitsUtil
  location: package org.apache.hadoop.hive.ql.optimizer.optiq
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveFilterRel.java:[20,49]
 cannot find symbol
  symbol:   class HiveTraitsUtil
  location: package org.apache.hadoop.hive.ql.optimizer.optiq
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveAggregateRel.java:[23,49]
 cannot find symbol
  symbol:   class HiveTraitsUtil
  location: package org.apache.hadoop.hive.ql.optimizer.optiq
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveProjectRel.java:[27,49]
 cannot find symbol
  symbol:   class HiveTraitsUtil
  location: package org.apache.hadoop.hive.ql.optimizer.optiq
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveSortRel.java:[20,49]
 cannot find symbol
  symbol:   class HiveTraitsUtil
  location: package org.apache.hadoop.hive.ql.optimizer.optiq
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveLimitRel.java:[22,49]
 cannot find symbol
  symbol:   class HiveTraitsUtil
  location: package org.apache.hadoop.hive.ql.optimizer.optiq
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveJoinRel.java:[23,49]
 cannot find symbol
  symbol:   class HiveTraitsUtil
  location: package org.apache.hadoop.hive.ql.optimizer.optiq
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:[12139,46]
 cannot find symbol
  symbol:   variable HiveTraitsUtil
  location: class 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.OptiqBasedPlanner
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/

[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-07 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7946:
-
Status: Patch Available  (was: Open)

> CBO: Merge CBO changes to Trunk
> ---
>
> Key: HIVE-7946
> URL: https://issues.apache.org/jira/browse/HIVE-7946
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
> HIVE-7946.4.patch, HIVE-7946.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-07 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7946:
-
Attachment: (was: HIVE-7946.4.patch)

> CBO: Merge CBO changes to Trunk
> ---
>
> Key: HIVE-7946
> URL: https://issues.apache.org/jira/browse/HIVE-7946
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
> HIVE-7946.4.patch, HIVE-7946.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-07 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7946:
-
Attachment: HIVE-7946.4.patch

> CBO: Merge CBO changes to Trunk
> ---
>
> Key: HIVE-7946
> URL: https://issues.apache.org/jira/browse/HIVE-7946
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
> HIVE-7946.4.patch, HIVE-7946.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-07 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7946:
-
Status: Open  (was: Patch Available)

> CBO: Merge CBO changes to Trunk
> ---
>
> Key: HIVE-7946
> URL: https://issues.apache.org/jira/browse/HIVE-7946
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
> HIVE-7946.4.patch, HIVE-7946.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7935) Support dynamic service discovery for HiveServer2

2014-09-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124910#comment-14124910
 ] 

Hive QA commented on HIVE-7935:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667077/HIVE-7935.2.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/683/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/683/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-683/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/itests/hcatalog-unit/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ 
hive-hcatalog-it-unit ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ 
hive-hcatalog-it-unit ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
hive-hcatalog-it-unit ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/itests/hcatalog-unit/src/test/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ 
hive-hcatalog-it-unit ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/hcatalog-unit/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/hcatalog-unit/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/hcatalog-unit/target/tmp/conf
 [copy] Copying 7 files to 
/data/hive-ptest/working/apache-svn-trunk-source/itests/hcatalog-unit/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-hcatalog-it-unit ---
[INFO] Compiling 6 source files to 
/data/hive-ptest/working/apache-svn-trunk-source/itests/hcatalog-unit/target/test-classes
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/hbase/ManyMiniCluster.java:
 Some input files use or override a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-svn-trunk-source/itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/hbase/ManyMiniCluster.java:
 Recompile with -Xlint:deprecation for details.
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ 
hive-hcatalog-it-unit ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-hcatalog-it-unit ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/hcatalog-unit/target/hive-hcatalog-it-unit-0.14.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-hcatalog-it-unit ---
[INFO] 
[INFO] --- maven-jar-plugin:2.2:test-jar (default) @ hive-hcatalog-it-unit ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/hcatalog-unit/target/hive-hcatalog-it-unit-0.14.0-SNAPSHOT-tests.jar
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ 
hive-hcatalog-it-unit ---
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/itests/hcatalog-unit/target/hive-hcatalog-it-unit-0.14.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-hcatalog-it-unit/0.14.0-SNAPSHOT/hive-hcatalog-it-unit-0.14.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/itests/hcatalog-unit/pom.xml 
to 
/data/hive-ptest/working/maven/org/apache/hive/hive-hcatalog-it-unit/0.14.0-SNAPSHOT/hive-hcatalog-it-unit-0.14.0-SNAPSHOT.pom
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/itests/hcatalog-unit/target/hive-hcatalog-it-unit-0.14.0-SNAPSHOT-tests.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-hcatalog-it-unit/0.14.0-SNAPSHOT/hive-hcatalog-it-unit-0.14.0-SNAPSHOT-tests.jar
[INFO] 
[INFO] 
[INFO] Building Hive Integration - Testing Utilities 0.14.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-it-util ---
[INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/itests/util 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[I

Re: Review Request 25245: Support dynamic service discovery for HiveServer2

2014-09-07 Thread Vaibhav Gumashta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25245/
---

(Updated Sept. 7, 2014, 1:35 p.m.)


Review request for hive, Alan Gates, Navis Ryu, Szehon Ho, and Thejas Nair.


Bugs: HIVE-7935
https://issues.apache.org/jira/browse/HIVE-7935


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-7935


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a0a5f54 
  jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 
  jdbc/src/java/org/apache/hive/jdbc/HiveDriver.java 6e248d6 
  jdbc/src/java/org/apache/hive/jdbc/JdbcUriParseException.java PRE-CREATION 
  jdbc/src/java/org/apache/hive/jdbc/Utils.java 58339bf 
  jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientException.java 
PRE-CREATION 
  jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientHelper.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
 0919d2f 
  ql/src/java/org/apache/hadoop/hive/ql/util/ZooKeeperHiveHelper.java 
PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/lockmgr/zookeeper/TestZookeeperLockManager.java
 59294b1 
  service/src/java/org/apache/hive/service/cli/CLIService.java a0bc905 
  service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
f5a8f27 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
0b5ef12 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
11d25cc 
  
service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java 
2b80adc 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
443c371 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 
4067106 
  service/src/java/org/apache/hive/service/server/HiveServer2.java 124996c 
  
service/src/test/org/apache/hive/service/cli/session/TestSessionGlobalInitFile.java
 66fc1fc 

Diff: https://reviews.apache.org/r/25245/diff/


Testing (updated)
---

Manual testing.


Thanks,

Vaibhav Gumashta



[jira] [Commented] (HIVE-7935) Support dynamic service discovery for HiveServer2

2014-09-07 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124893#comment-14124893
 ] 

Vaibhav Gumashta commented on HIVE-7935:


[~thejas] [~alangates] Latest patch addresses review comments. Thanks!

> Support dynamic service discovery for HiveServer2
> -
>
> Key: HIVE-7935
> URL: https://issues.apache.org/jira/browse/HIVE-7935
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2, JDBC
>Affects Versions: 0.14.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.14.0
>
> Attachments: HIVE-7935.1.patch, HIVE-7935.2.patch
>
>
> To support Rolling Upgrade / HA, we need a mechanism by which a JDBC client 
> can dynamically resolve an HiveServer2 to connect to.
> *High Level Design:* 
> Whether, dynamic service discovery is supported or not, can be configured by 
> setting HIVE_SERVER2_SUPPORT_DYNAMIC_SERVICE_DISCOVERY. ZooKeeper is used to 
> support this.
> * When an instance of HiveServer2 comes up, it adds itself as a znode to 
> ZooKeeper under a configurable namespace (HIVE_SERVER2_ZOOKEEPER_NAMESPACE).
> * A JDBC/ODBC client now specifies the ZooKeeper ensemble in its connection 
> string, instead of pointing to a specific HiveServer2 instance. The JDBC 
> driver, uses the ZooKeeper ensemble to pick an instance of HiveServer2 to 
> connect for the entire session.
> * When an instance is removed from ZooKeeper, the existing client sessions 
> continue till completion. When the last client session completes, the 
> instance shuts down.
> * All new client connection pick one of the available HiveServer2 uris from 
> ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7935) Support dynamic service discovery for HiveServer2

2014-09-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7935:
---
Attachment: HIVE-7935.2.patch

> Support dynamic service discovery for HiveServer2
> -
>
> Key: HIVE-7935
> URL: https://issues.apache.org/jira/browse/HIVE-7935
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2, JDBC
>Affects Versions: 0.14.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.14.0
>
> Attachments: HIVE-7935.1.patch, HIVE-7935.2.patch
>
>
> To support Rolling Upgrade / HA, we need a mechanism by which a JDBC client 
> can dynamically resolve an HiveServer2 to connect to.
> *High Level Design:* 
> Whether, dynamic service discovery is supported or not, can be configured by 
> setting HIVE_SERVER2_SUPPORT_DYNAMIC_SERVICE_DISCOVERY. ZooKeeper is used to 
> support this.
> * When an instance of HiveServer2 comes up, it adds itself as a znode to 
> ZooKeeper under a configurable namespace (HIVE_SERVER2_ZOOKEEPER_NAMESPACE).
> * A JDBC/ODBC client now specifies the ZooKeeper ensemble in its connection 
> string, instead of pointing to a specific HiveServer2 instance. The JDBC 
> driver, uses the ZooKeeper ensemble to pick an instance of HiveServer2 to 
> connect for the entire session.
> * When an instance is removed from ZooKeeper, the existing client sessions 
> continue till completion. When the last client session completes, the 
> instance shuts down.
> * All new client connection pick one of the available HiveServer2 uris from 
> ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7935) Support dynamic service discovery for HiveServer2

2014-09-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7935:
---
Status: Patch Available  (was: Open)

> Support dynamic service discovery for HiveServer2
> -
>
> Key: HIVE-7935
> URL: https://issues.apache.org/jira/browse/HIVE-7935
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2, JDBC
>Affects Versions: 0.14.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.14.0
>
> Attachments: HIVE-7935.1.patch, HIVE-7935.2.patch
>
>
> To support Rolling Upgrade / HA, we need a mechanism by which a JDBC client 
> can dynamically resolve an HiveServer2 to connect to.
> *High Level Design:* 
> Whether, dynamic service discovery is supported or not, can be configured by 
> setting HIVE_SERVER2_SUPPORT_DYNAMIC_SERVICE_DISCOVERY. ZooKeeper is used to 
> support this.
> * When an instance of HiveServer2 comes up, it adds itself as a znode to 
> ZooKeeper under a configurable namespace (HIVE_SERVER2_ZOOKEEPER_NAMESPACE).
> * A JDBC/ODBC client now specifies the ZooKeeper ensemble in its connection 
> string, instead of pointing to a specific HiveServer2 instance. The JDBC 
> driver, uses the ZooKeeper ensemble to pick an instance of HiveServer2 to 
> connect for the entire session.
> * When an instance is removed from ZooKeeper, the existing client sessions 
> continue till completion. When the last client session completes, the 
> instance shuts down.
> * All new client connection pick one of the available HiveServer2 uris from 
> ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 25245: Support dynamic service discovery for HiveServer2

2014-09-07 Thread Vaibhav Gumashta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25245/
---

(Updated Sept. 7, 2014, 1:29 p.m.)


Review request for hive, Alan Gates, Navis Ryu, Szehon Ho, and Thejas Nair.


Bugs: HIVE-7935
https://issues.apache.org/jira/browse/HIVE-7935


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-7935


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a0a5f54 
  jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 
  jdbc/src/java/org/apache/hive/jdbc/HiveDriver.java 6e248d6 
  jdbc/src/java/org/apache/hive/jdbc/JdbcUriParseException.java PRE-CREATION 
  jdbc/src/java/org/apache/hive/jdbc/Utils.java 58339bf 
  jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientException.java 
PRE-CREATION 
  jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientHelper.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
 0919d2f 
  ql/src/java/org/apache/hadoop/hive/ql/util/ZooKeeperHiveHelper.java 
PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/lockmgr/zookeeper/TestZookeeperLockManager.java
 59294b1 
  service/src/java/org/apache/hive/service/cli/CLIService.java a0bc905 
  service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
f5a8f27 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
0b5ef12 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
11d25cc 
  
service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java 
2b80adc 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
443c371 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 
4067106 
  service/src/java/org/apache/hive/service/server/HiveServer2.java 124996c 
  
service/src/test/org/apache/hive/service/cli/session/TestSessionGlobalInitFile.java
 66fc1fc 

Diff: https://reviews.apache.org/r/25245/diff/


Testing
---

Manual testing + test cases.


Thanks,

Vaibhav Gumashta



Re: Review Request 25245: Support dynamic service discovery for HiveServer2

2014-09-07 Thread Vaibhav Gumashta


> On Sept. 3, 2014, 7:15 a.m., Thejas Nair wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java, line 84
> > 
> >
> > How big are the zookeeper jars ? If we use zookeeper in this class, I 
> > believe zookeeper jars will always be needed for jdbc driver.
> > It would be better to have the zookeeper service discovery code in a 
> > separate util class. That way we will need zookeeper jars only if this mode 
> > is used.

ZooKeeper jars are about 780KB. I was wondering what is a better option:
1. If dynamic service discovery using ZK is supported, add the following jar(s) 
to your classpath. Or
2. The JDBC jar you get is indifferent to this config - always works.

Personally, I am bending towards a simpler client deployment, otherwise 1.) 
adds one more knob that needs to be supported/kept updated via documentation. 
Thoughts?


- Vaibhav


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25245/#review52139
---


On Sept. 2, 2014, 10:05 a.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25245/
> ---
> 
> (Updated Sept. 2, 2014, 10:05 a.m.)
> 
> 
> Review request for hive, Alan Gates, Navis Ryu, Szehon Ho, and Thejas Nair.
> 
> 
> Bugs: HIVE-7935
> https://issues.apache.org/jira/browse/HIVE-7935
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-7935
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9 
>   jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 
>   jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientHelper.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
>  46044d0 
>   ql/src/java/org/apache/hadoop/hive/ql/util/ZooKeeperHiveHelper.java 
> PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/lockmgr/zookeeper/TestZookeeperLockManager.java
>  59294b1 
>   service/src/java/org/apache/hive/service/cli/CLIService.java 08ed2e7 
>   
> service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
> 21c33bc 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> bc0a02c 
>   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
> d573592 
>   
> service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java
>  37b05fc 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 027931e 
>   
> service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 
> c380b69 
>   service/src/java/org/apache/hive/service/server/HiveServer2.java 0864dfb 
>   
> service/src/test/org/apache/hive/service/cli/session/TestSessionGlobalInitFile.java
>  66fc1fc 
> 
> Diff: https://reviews.apache.org/r/25245/diff/
> 
> 
> Testing
> ---
> 
> Manual testing + test cases.
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>



Re: Review Request 25245: Support dynamic service discovery for HiveServer2

2014-09-07 Thread Vaibhav Gumashta


> On Sept. 3, 2014, 6:56 a.m., Thejas Nair wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java, line 142
> > 
> >
> > I think we can still make use of the java URI class for parameter 
> > parsing by just parsing the hostname portion first. Custom parsing of 
> > params in this mode can introduce bugs or inconsistencies.
> > 
> > The JdbcConnectionParams can be expanded to give a list of hosts.
> > The Utils.parseURL can first extract and substitute the multiple 
> > hostnames (if any), and then use the regular java URI parsing.
> > We can have the to validate if the current discovery mode supports 
> > multiple hosts, after parsing.

Can't convert to URI directly, unless we resolve the authority part in the URI 
string to a valid value (from host1:port1,host2:port2,host3:port3 to 
host:port). For the resolution to happen, we need to determine from the URI 
string whether SERVICE_DISCOVERY_MODE = zooKeeper or not and then get a host 
from ZooKeeper. But I'll move the logic to Utils#parseURL. Let me know if you 
think otherwise.


> On Sept. 3, 2014, 6:56 a.m., Thejas Nair wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java, line 110
> > 
> >
> > I think we are likely to have people wanting to implement other modes 
> > of dynamically picking the HS2 host. For example, you could simply have 
> > multiple HS2 hostnames in a URL (instead of zookeeper hosts). Or people 
> > might decide to store the hostnames in another place instead of zookeeper.
> > 
> > So I think instead of making this param a boolean, it is better to have 
> > the value as "none" (default) or "zookeeper".
> > 
> > Maybe change the param name also to "service.discovery.mode" ?

Actually, I see one more problem here. The jdbc url: 
jdbc:hive2://:/dbName;sess_var_list?hive_conf_list#hive_var_list 
contains hive_conf_list & hive_var_list which are used to set the corresponding 
values on the server side (for this connection) while opening a client session. 
As you have pointed out earlier, we haven't done a great job of keeping 
"session only" configs in the sess_var_list (e.g by specifying 
hive.server2.thrift.http.path, the client actually is trying to point to the 
corresponding path for the server, and not trying to set its value). I'll 
create a follow up jira to do that cleanup. And I agree, we can keep variable 
names short in sess_var_list and probably use a camelCase convention to keep 
the intent clear.


- Vaibhav


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25245/#review52116
---


On Sept. 2, 2014, 10:05 a.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25245/
> ---
> 
> (Updated Sept. 2, 2014, 10:05 a.m.)
> 
> 
> Review request for hive, Alan Gates, Navis Ryu, Szehon Ho, and Thejas Nair.
> 
> 
> Bugs: HIVE-7935
> https://issues.apache.org/jira/browse/HIVE-7935
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-7935
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9 
>   jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 
>   jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientHelper.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
>  46044d0 
>   ql/src/java/org/apache/hadoop/hive/ql/util/ZooKeeperHiveHelper.java 
> PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/lockmgr/zookeeper/TestZookeeperLockManager.java
>  59294b1 
>   service/src/java/org/apache/hive/service/cli/CLIService.java 08ed2e7 
>   
> service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
> 21c33bc 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> bc0a02c 
>   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
> d573592 
>   
> service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java
>  37b05fc 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 027931e 
>   
> service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 
> c380b69 
>   service/src/java/org/apache/hive/service/server/HiveServer2.java 0864dfb 
>   
> service/src/test/org/apache/hive/service/cli/session/TestSessionGlobalInitFile.java
>  66fc1fc 
> 
> Diff: https://reviews.apache.org/r/25245/diff/
> 
> 
> Testing
> ---
> 
> Manual testing + test cases.
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>

[jira] [Resolved] (HIVE-265) Implement COPY TO syntax

2014-09-07 Thread Lars Francke (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke resolved HIVE-265.
---
Resolution: Won't Fix

> Implement COPY TO syntax
> 
>
> Key: HIVE-265
> URL: https://issues.apache.org/jira/browse/HIVE-265
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Jeff Hammerbacher
>
> Postgres has a nice syntax for moving data from a file into a table, and vice 
> versa. It's similar to Hive's "INSERT INTO ... SELECT ..." syntax, but may be 
> easier to use for Postgres users. In particular, I like the ability to COPY 
> TO with delimiters specified. This would allow for easy CSV export.
> Docs: http://www.postgresql.org/docs/current/interactive/sql-copy.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-265) Implement COPY TO syntax

2014-09-07 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124876#comment-14124876
 ] 

Lars Francke commented on HIVE-265:
---

I think this is supported using the extended INSERT syntax as of Hive 0.11 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries

I'll close this for now but feel free to reopen if you think that functionality 
doesn't cover your request.

> Implement COPY TO syntax
> 
>
> Key: HIVE-265
> URL: https://issues.apache.org/jira/browse/HIVE-265
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Jeff Hammerbacher
>
> Postgres has a nice syntax for moving data from a file into a table, and vice 
> versa. It's similar to Hive's "INSERT INTO ... SELECT ..." syntax, but may be 
> easier to use for Postgres users. In particular, I like the ability to COPY 
> TO with delimiters specified. This would allow for easy CSV export.
> Docs: http://www.postgresql.org/docs/current/interactive/sql-copy.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-275) split log output from "hive -e ''" to something other than stdout

2014-09-07 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124874#comment-14124874
 ] 

Lars Francke commented on HIVE-275:
---

As the old hive CLI has been deprecated I think we can close this. Feel free to 
reopen against Beeline if necessary.

> split log output from "hive -e ''" to something other than stdout
> 
>
> Key: HIVE-275
> URL: https://issues.apache.org/jira/browse/HIVE-275
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients
>Reporter: S. Alex Smith
>Priority: Minor
>
> A command like: hive -e 'select * from my_table' produces output like:
> Hive history file=/tmp/asmith/hive_job_log_asmith_200902051423_1455103524.txt
> OK
> 1   [4, 4]
> 1   [0, 1]
> 0   [0, 0]
> 0   [1, 0]
> Time taken: 2.413 seconds
> all of which goes to stdout.  The non-data messages can be removed using 
> '-s', but it would be nice to have a way to instead redirect them to (for 
> example) stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-275) split log output from "hive -e ''" to something other than stdout

2014-09-07 Thread Lars Francke (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke resolved HIVE-275.
---
Resolution: Won't Fix

> split log output from "hive -e ''" to something other than stdout
> 
>
> Key: HIVE-275
> URL: https://issues.apache.org/jira/browse/HIVE-275
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients
>Reporter: S. Alex Smith
>Priority: Minor
>
> A command like: hive -e 'select * from my_table' produces output like:
> Hive history file=/tmp/asmith/hive_job_log_asmith_200902051423_1455103524.txt
> OK
> 1   [4, 4]
> 1   [0, 1]
> 0   [0, 0]
> 0   [1, 0]
> Time taken: 2.413 seconds
> all of which goes to stdout.  The non-data messages can be removed using 
> '-s', but it would be nice to have a way to instead redirect them to (for 
> example) stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7990) With fetch column stats disabled number of elements in grouping set is not taken into account

2014-09-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124866#comment-14124866
 ] 

Hive QA commented on HIVE-7990:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667073/HIVE-7990.3.patch

{color:green}SUCCESS:{color} +1 6184 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/682/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/682/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-682/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12667073

> With fetch column stats disabled number of elements in grouping set is not 
> taken into account
> -
>
> Key: HIVE-7990
> URL: https://issues.apache.org/jira/browse/HIVE-7990
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth J
> Attachments: HIVE-7990.1.patch, HIVE-7990.2.patch, HIVE-7990.3.patch
>
>
> For queries with rollup and cube the number of rows calculation in 
> GroupByStatsRule should be multiplied by number of elements in grouping set.
> A side effect of this defect is that reducers will under estimate data size 
> and end up with small number of tasks which negatively affects query runtime. 
>  
> {code}
> // since we do not know if hash-aggregation will be enabled or 
> disabled
> // at runtime we will assume that map-side group by does not do 
> any
> // reduction.hence no group by rule will be applied
> // map-side grouping set present. if grouping set is present then
> // multiply the number of rows by number of elements in grouping 
> set
> if (gop.getConf().isGroupingSetsPresent()) {
>   int multiplier = gop.getConf().getListGroupingSets().size();
>   // take into account the map-side parallelism as well, default 
> is 1
>   multiplier *= mapSideParallelism;
>   newNumRows = multiplier * stats.getNumRows();
>   long dataSize = multiplier * stats.getDataSize();
>   stats.setNumRows(newNumRows);
>   stats.setDataSize(dataSize);
>   for (ColStatistics cs : colStats) {
> if (cs != null) {
>   long oldNumNulls = cs.getNumNulls();
>   long newNumNulls = multiplier * oldNumNulls;
>   cs.setNumNulls(newNumNulls);
> }
>   }
> } else {
>   // map side no grouping set
>   newNumRows = stats.getNumRows() * mapSideParallelism;
>   updateStats(stats, newNumRows, true);
> }
>   
> {code}
> Query 
> {code}
> select  *
> from (select i_category
> ,i_class
> ,i_brand
> ,i_product_name
> ,d_year
> ,d_qoy
> ,d_moy
> ,s_store_id
> ,sumsales
> ,rank() over (partition by i_category order by sumsales desc) rk
>   from (select i_category
>   ,i_class
>   ,i_brand
>   ,i_product_name
>   ,d_year
>   ,d_qoy
>   ,d_moy
>   ,s_store_id
>   ,sum(coalesce(ss_sales_price*ss_quantity,0)) sumsales
> from store_sales
> ,date_dim
> ,store
> ,item
>where  store_sales.ss_sold_date_sk=date_dim.d_date_sk
>   and store_sales.ss_item_sk=item.i_item_sk
>   and store_sales.ss_store_sk = store.s_store_sk
>   and d_month_seq between 1193 and 1193+11
>   and ss_sold_date between '1999-06-01' and '2000-05-31'
>group by i_category, i_class, i_brand, i_product_name, d_year, d_qoy, 
> d_moy,s_store_id with rollup)dw1) dw2
> where rk <= 100
> order by i_category
> ,i_class
> ,i_brand
> ,i_product_name
> ,d_year
> ,d_qoy
> ,d_moy
> ,s_store_id
> ,sumsales
> ,rk
> limit 100
> {code}
> Plan generated , note the data size for Map 1
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Ed

[jira] [Commented] (HIVE-6425) Unable to create external table with 3000+ columns

2014-09-07 Thread Michalis Kongtongk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124864#comment-14124864
 ] 

Michalis Kongtongk commented on HIVE-6425:
--

Hive metastore "SERDE_PARAMS.PARAM_VALUE" column limits to [varchar 
4000|https://github.com/apache/hive/blob/trunk/metastore/scripts/upgrade/mysql/hive-schema-0.14.0.mysql.sql#L455]

> Unable to create external table with 3000+ columns
> --
>
> Key: HIVE-6425
> URL: https://issues.apache.org/jira/browse/HIVE-6425
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.10.0
> Environment: Linux, CDH 4.2.0
>Reporter: Anurag
>  Labels: patch
> Attachments: Hive_Script.txt
>
>
> While creating an external table in Hive to a table in HBase with 3000+ 
> columns, Hive shows up an error:
> FAILED: Error in metadata: 
> MetaException(message:javax.jdo.JDODataStoreException: Put request failed : 
> INSERT INTO "SERDE_PARAMS" ("PARAM_VALUE","SERDE_ID","PARAM_KEY") VALUES 
> (?,?,?)
> NestedThrowables:
> org.datanucleus.store.rdbms.exceptions.MappedDatastoreException: INSERT INTO 
> "SERDE_PARAMS" ("PARAM_VALUE","SERDE_ID","PARAM_KEY") VALUES (?,?,?) )
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7991) Incorrect calculation of number of rows in JoinStatsRule.process results in overflow

2014-09-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124852#comment-14124852
 ] 

Hive QA commented on HIVE-7991:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667072/HIVE-7991.2.patch

{color:green}SUCCESS:{color} +1 6184 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/681/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/681/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-681/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12667072

> Incorrect calculation of number of rows in JoinStatsRule.process results in 
> overflow
> 
>
> Key: HIVE-7991
> URL: https://issues.apache.org/jira/browse/HIVE-7991
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth J
>Priority: Minor
> Attachments: HIVE-7991.1.patch, HIVE-7991.2.patch
>
>
> This loop results in adding the parent twice incase of a 3 way join of 
> store_sales  x date_dim x store
> {code}
>  for (int pos = 0; pos < parents.size(); pos++) {
> ReduceSinkOperator parent = (ReduceSinkOperator) 
> jop.getParentOperators().get(pos);
> Statistics parentStats = parent.getStatistics();
> List keyExprs = parent.getConf().getKeyCols();
> // Parent RS may have column statistics from multiple parents.
> // Populate table alias to row count map, this will be used later 
> to
> // scale down/up column statistics based on new row count
> // NOTE: JOIN with UNION as parent of RS will not have table alias
> // propagated properly. UNION operator does not propagate the 
> table
> // alias of subqueries properly to expression nodes. Hence 
> union20.q
> // will have wrong number of rows.
> Set tableAliases = 
> StatsUtils.getAllTableAlias(parent.getColumnExprMap());
> for (String tabAlias : tableAliases) {
>   rowCountParents.put(tabAlias, parentStats.getNumRows());
> }
> {code}
> In the first join we have rowCountParents with {store_sales=120464862, 
> date_dim=36524} which is correct.
> For the second join result rowCountParents ends up with {store=212, 
> store_sales=120464862, date_dim=120464862} where it should be {store=212, 
> store_sales=120464862, date_dim=36524}.
> The result of this is that computeNewRowCount ends up multiplying row count 
> of store_sales x store_sales which makes the number of rows really high and 
> eventually over flow.
> Plan snippet : 
> {code}
>Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (((ss_sold_date_sk is not null and ss_store_sk 
> is not null) and ss_item_sk is not null) and ss_sold_date BETWEEN 
> '1999-06-01' AND '2000-05-31') (type: boolean)
>   Statistics: Num rows: 110339135 Data size: 4817453454 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((ss_sold_date_sk is not null and ss_store_sk 
> is not null) and ss_item_sk is not null) (type: boolean)
> Statistics: Num rows: 107740258 Data size: 2124353556 
> Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   condition expressions:
> 0 {ss_sold_date_sk} {ss_item_sk} {ss_store_sk} 
> {ss_quantity} {ss_sales_price} {ss_sold_date}
> 1 {d_date_sk} {d_month_seq} {d_year} {d_moy} {d_qoy}
>   keys:
> 0 ss_sold_date_sk (type: int)
> 1 d_date_sk (type: int)
>   outputColumnNames: _col0, _col2, _col7, _col10, _col13, 
> _col23, _col27, _col30, _col33, _col35, _col37
>   input vertices:
> 1 Map 6
>   Statistics: Num rows: 120464862 Data size: 26984129088 
> Basic stats: COMPLETE Column stats: COMPLETE
>   Map Join Operator
> condition map:
> 

[jira] [Updated] (HIVE-7990) With fetch column stats disabled number of elements in grouping set is not taken into account

2014-09-07 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7990:
-
Attachment: HIVE-7990.3.patch

Fixed failing test.

> With fetch column stats disabled number of elements in grouping set is not 
> taken into account
> -
>
> Key: HIVE-7990
> URL: https://issues.apache.org/jira/browse/HIVE-7990
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth J
> Attachments: HIVE-7990.1.patch, HIVE-7990.2.patch, HIVE-7990.3.patch
>
>
> For queries with rollup and cube the number of rows calculation in 
> GroupByStatsRule should be multiplied by number of elements in grouping set.
> A side effect of this defect is that reducers will under estimate data size 
> and end up with small number of tasks which negatively affects query runtime. 
>  
> {code}
> // since we do not know if hash-aggregation will be enabled or 
> disabled
> // at runtime we will assume that map-side group by does not do 
> any
> // reduction.hence no group by rule will be applied
> // map-side grouping set present. if grouping set is present then
> // multiply the number of rows by number of elements in grouping 
> set
> if (gop.getConf().isGroupingSetsPresent()) {
>   int multiplier = gop.getConf().getListGroupingSets().size();
>   // take into account the map-side parallelism as well, default 
> is 1
>   multiplier *= mapSideParallelism;
>   newNumRows = multiplier * stats.getNumRows();
>   long dataSize = multiplier * stats.getDataSize();
>   stats.setNumRows(newNumRows);
>   stats.setDataSize(dataSize);
>   for (ColStatistics cs : colStats) {
> if (cs != null) {
>   long oldNumNulls = cs.getNumNulls();
>   long newNumNulls = multiplier * oldNumNulls;
>   cs.setNumNulls(newNumNulls);
> }
>   }
> } else {
>   // map side no grouping set
>   newNumRows = stats.getNumRows() * mapSideParallelism;
>   updateStats(stats, newNumRows, true);
> }
>   
> {code}
> Query 
> {code}
> select  *
> from (select i_category
> ,i_class
> ,i_brand
> ,i_product_name
> ,d_year
> ,d_qoy
> ,d_moy
> ,s_store_id
> ,sumsales
> ,rank() over (partition by i_category order by sumsales desc) rk
>   from (select i_category
>   ,i_class
>   ,i_brand
>   ,i_product_name
>   ,d_year
>   ,d_qoy
>   ,d_moy
>   ,s_store_id
>   ,sum(coalesce(ss_sales_price*ss_quantity,0)) sumsales
> from store_sales
> ,date_dim
> ,store
> ,item
>where  store_sales.ss_sold_date_sk=date_dim.d_date_sk
>   and store_sales.ss_item_sk=item.i_item_sk
>   and store_sales.ss_store_sk = store.s_store_sk
>   and d_month_seq between 1193 and 1193+11
>   and ss_sold_date between '1999-06-01' and '2000-05-31'
>group by i_category, i_class, i_brand, i_product_name, d_year, d_qoy, 
> d_moy,s_store_id with rollup)dw1) dw2
> where rk <= 100
> order by i_category
> ,i_class
> ,i_brand
> ,i_product_name
> ,d_year
> ,d_qoy
> ,d_moy
> ,s_store_id
> ,sumsales
> ,rk
> limit 100
> {code}
> Plan generated , note the data size for Map 1
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Map 1 <- Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 
> (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
>   DagName: mmokhtar_20140903154848_7cf1519f-e95c-47ab-9f10-6d2130cd5734:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (((ss_sold_date_sk is not null and ss_store_sk 
> is not null) and ss_item_sk is not null) and ss_sold_date BETWEEN 
> '1999-06-01' AND '2000-05-31') (type: boolean)
>   Statistics: Num rows: 110339135 Data size: 4817453454 Basic 
> stats: COMPLETE Column stats: NONE
>   Filter Operator
> predicate

[jira] [Updated] (HIVE-7991) Incorrect calculation of number of rows in JoinStatsRule.process results in overflow

2014-09-07 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7991:
-
Attachment: HIVE-7991.2.patch

Fixed failing test.

> Incorrect calculation of number of rows in JoinStatsRule.process results in 
> overflow
> 
>
> Key: HIVE-7991
> URL: https://issues.apache.org/jira/browse/HIVE-7991
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth J
>Priority: Minor
> Attachments: HIVE-7991.1.patch, HIVE-7991.2.patch
>
>
> This loop results in adding the parent twice incase of a 3 way join of 
> store_sales  x date_dim x store
> {code}
>  for (int pos = 0; pos < parents.size(); pos++) {
> ReduceSinkOperator parent = (ReduceSinkOperator) 
> jop.getParentOperators().get(pos);
> Statistics parentStats = parent.getStatistics();
> List keyExprs = parent.getConf().getKeyCols();
> // Parent RS may have column statistics from multiple parents.
> // Populate table alias to row count map, this will be used later 
> to
> // scale down/up column statistics based on new row count
> // NOTE: JOIN with UNION as parent of RS will not have table alias
> // propagated properly. UNION operator does not propagate the 
> table
> // alias of subqueries properly to expression nodes. Hence 
> union20.q
> // will have wrong number of rows.
> Set tableAliases = 
> StatsUtils.getAllTableAlias(parent.getColumnExprMap());
> for (String tabAlias : tableAliases) {
>   rowCountParents.put(tabAlias, parentStats.getNumRows());
> }
> {code}
> In the first join we have rowCountParents with {store_sales=120464862, 
> date_dim=36524} which is correct.
> For the second join result rowCountParents ends up with {store=212, 
> store_sales=120464862, date_dim=120464862} where it should be {store=212, 
> store_sales=120464862, date_dim=36524}.
> The result of this is that computeNewRowCount ends up multiplying row count 
> of store_sales x store_sales which makes the number of rows really high and 
> eventually over flow.
> Plan snippet : 
> {code}
>Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (((ss_sold_date_sk is not null and ss_store_sk 
> is not null) and ss_item_sk is not null) and ss_sold_date BETWEEN 
> '1999-06-01' AND '2000-05-31') (type: boolean)
>   Statistics: Num rows: 110339135 Data size: 4817453454 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((ss_sold_date_sk is not null and ss_store_sk 
> is not null) and ss_item_sk is not null) (type: boolean)
> Statistics: Num rows: 107740258 Data size: 2124353556 
> Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   condition expressions:
> 0 {ss_sold_date_sk} {ss_item_sk} {ss_store_sk} 
> {ss_quantity} {ss_sales_price} {ss_sold_date}
> 1 {d_date_sk} {d_month_seq} {d_year} {d_moy} {d_qoy}
>   keys:
> 0 ss_sold_date_sk (type: int)
> 1 d_date_sk (type: int)
>   outputColumnNames: _col0, _col2, _col7, _col10, _col13, 
> _col23, _col27, _col30, _col33, _col35, _col37
>   input vertices:
> 1 Map 6
>   Statistics: Num rows: 120464862 Data size: 26984129088 
> Basic stats: COMPLETE Column stats: COMPLETE
>   Map Join Operator
> condition map:
>  Inner Join 0 to 1
> condition expressions:
>   0 {_col0} {_col2} {_col7} {_col10} {_col13} 
> {_col23} {_col27} {_col30} {_col33} {_col35} {_col37}
>   1 {s_store_sk} {s_store_id}
> keys:
>   0 _col7 (type: int)
>   1 s_store_sk (type: int)
> outputColumnNames: _col0, _col2, _col7, _col10, 
> _col13, _col23, _col27, _col30, _col33, _col35, _col37, _col58, _col59
> input vertices:
>   1 Map 5
> Statistics: Num rows: 17886616227069518 Data size: 
> 5866810122478801920 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
>   

[jira] [Updated] (HIVE-7992) StatsRulesProcFactory should gracefully handle overflows

2014-09-07 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7992:
-
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Harish for the review.

> StatsRulesProcFactory should gracefully handle overflows
> 
>
> Key: HIVE-7992
> URL: https://issues.apache.org/jira/browse/HIVE-7992
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Prasanth J
> Fix For: 0.14.0
>
> Attachments: HIVE-7992.1.patch
>
>
> When StatsRulesProcFactory overflows it sets data size to 0 and as a result 
> the Vertex will ask for a single task, this results in a fairly slow running 
> query, most likely the overflow is a result of higher than usual number of 
> rows.
> The class should detect an overflow and set a flag when an overflow occurs, 
> if an overflow occurs StatsRulesProcFactory should request the maximum number 
> of tasks for the vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8016) CBO: PPD to honor hive Join Cond, Casting fixes, Add annotations for IF, Code cleanup

2014-09-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124821#comment-14124821
 ] 

Hive QA commented on HIVE-8016:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667070/HIVE-8016.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/680/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/680/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-680/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-680/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1622982.

At revision 1622982.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12667070

> CBO: PPD to honor hive Join Cond, Casting fixes, Add annotations for IF, Code 
> cleanup
> -
>
> Key: HIVE-8016
> URL: https://issues.apache.org/jira/browse/HIVE-8016
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-8016.1.patch, HIVE-8016.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124820#comment-14124820
 ] 

Hive QA commented on HIVE-7946:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667068/HIVE-7946.4.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/679/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/679/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-679/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-679/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'ql/src/test/results/clientpositive/show_tables.q.out'
Reverted 'ql/src/test/results/clientpositive/describe_table_json.q.out'
Reverted 'ql/src/test/queries/clientpositive/show_tables.q'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target 
accumulo-handler/target hwi/target common/target common/src/gen service/target 
contrib/target serde/target beeline/target odbc/target cli/target 
ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1622982.

At revision 1622982.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
patch:  malformed patch at line 14: diff --git a/data/conf/hive-site.xml 
b/data/conf/hive-site.xml

patch:  malformed patch at line 14: diff --git a/data/conf/hive-site.xml 
b/data/conf/hive-site.xml

patch:  malformed patch at line 14: diff --git a/data/conf/hive-site.xml 
b/data/conf/hive-site.xml

The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12667068

> CBO: Merge CBO changes to Trunk
> ---
>
> Key: HIVE-7946
> URL: https://issues.apache.org/jira/browse/HIVE-7946
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
> HIVE-7946.4.patch, HIVE-7946.patch
>
>




--
This message was sent by Atlass

[jira] [Commented] (HIVE-1363) 'SHOW TABLE EXTENDED LIKE' command does not strip single/double quotes

2014-09-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124819#comment-14124819
 ] 

Hive QA commented on HIVE-1363:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667057/HIVE-1363.2.patch

{color:green}SUCCESS:{color} +1 6184 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/678/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/678/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-678/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12667057

> 'SHOW TABLE EXTENDED LIKE' command does not strip single/double quotes
> --
>
> Key: HIVE-1363
> URL: https://issues.apache.org/jira/browse/HIVE-1363
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.5.0, 0.14.0
>Reporter: Carl Steinbach
>Assignee: Chaoyu Tang
> Fix For: 0.14.0
>
> Attachments: HIVE-1363.1.patch, HIVE-1363.2.patch, HIVE-1363.patch
>
>
> {code}
> hive> SHOW TABLE EXTENDED LIKE pokes;
> OK
> tableName:pokes
> owner:carl
> location:hdfs://localhost/user/hive/warehouse/pokes
> inputformat:org.apache.hadoop.mapred.TextInputFormat
> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> columns:struct columns { i32 num}
> partitioned:false
> partitionColumns:
> totalNumberFiles:0
> totalFileSize:0
> maxFileSize:0
> minFileSize:0
> lastAccessTime:0
> lastUpdateTime:1274517075221
> hive> SHOW TABLE EXTENDED LIKE "p*";
> FAILED: Error in metadata: MetaException(message:Got exception: 
> javax.jdo.JDOUserException ')' expected at character 54 in "database.name == 
> dbName && ( tableName.matches("(?i)"p.*""))")
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> hive> SHOW TABLE EXTENDED LIKE 'p*';
> OK
> hive> SHOW TABLE EXTENDED LIKE `p*`;
> OK
> tableName:pokes
> owner:carl
> location:hdfs://localhost/user/hive/warehouse/pokes
> inputformat:org.apache.hadoop.mapred.TextInputFormat
> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> columns:struct columns { i32 num}
> partitioned:false
> partitionColumns:
> totalNumberFiles:0
> totalFileSize:0
> maxFileSize:0
> minFileSize:0
> lastAccessTime:0
> lastUpdateTime:1274517075221
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)