Review Request 25571: Stats collection for columns fails on a partitioned table with null values in partitioning column
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25571/ --- Review request for hive. Bugs: HIVE-8062 https://issues.apache.org/jira/browse/HIVE-8062 Repository: hive-git Description --- Stats collection for columns fails on a partitioned table with null values in partitioning column Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 176a593 ql/src/test/queries/clientpositive/stats_only_null.q b47bc48 ql/src/test/results/clientpositive/stats_only_null.q.out 063da37 Diff: https://reviews.apache.org/r/25571/diff/ Testing --- Added new test. Thanks, Ashutosh Chauhan
[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-8017: - Attachment: HIVE-8017.5-spark.patch Update the golden file for union_remove_25 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Query regarding Hive configuration on windows OS and make connection through asp.net
Hi , I want to configure Hive-0.13.1 on my windows7 machine ,But some error appear during configuration .I already install hadoop-2.5.0 and cygwin64 terminal on my machine. Both are working fine. For hive configuration on windows there is no specific blog or post available on internet. So I need your help or some steps how we can configuring Hive-0.13.1 on windows machine and how will we make ODBC connection between Hive to asp.net application for both query and Data Cubs . Please suggest me the steps. Thanks, Kapil Khare Team Lead Helm360 Phone: +91-120-499 3300 Mobile: +91- 9718012939 A-16, Sector 16 | Noida, UP, India 201 301 kkh...@helm360.commailto:kkh...@helm360.com | www.helm360.comhttps://mail.ccsglobaltech.com/exchweb/bin/redir.asp?URL=http://www.helm360.com/ [Description: Description: Description: logox80]
Query regarding Hive configuration on windows OS and make connection through asp.net
Hi , I want to configure Hive-0.13.1 on my windows7 machine ,But some error appear during configuration .I already install hadoop-2.5.0 and cygwin64 terminal on my machine. Both are working fine. For hive configuration on windows there is no specific blog or post available on internet. So I need your help or some steps how we can configuring Hive-0.13.1 on windows machine and how will we make ODBC connection between Hive to asp.net application for both query and Data Cubs . Please suggest me the steps. Thanks, Kapil Khare Team Lead Helm360 Phone: +91-120-499 3300 Mobile: +91- 9718012939 A-16, Sector 16 | Noida, UP, India 201 301 kkh...@helm360.commailto:kkh...@helm360.com | www.helm360.comhttps://mail.ccsglobaltech.com/exchweb/bin/redir.asp?URL=http://www.helm360.com/ [Description: Description: Description: logox80]
[jira] [Created] (HIVE-8069) CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR
Harish Butani created HIVE-8069: --- Summary: CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR Key: HIVE-8069 URL: https://issues.apache.org/jira/browse/HIVE-8069 Project: Hive Issue Type: Sub-task Reporter: Harish Butani Assignee: Harish Butani -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8069) CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR
[ https://issues.apache.org/jira/browse/HIVE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-8069: Attachment: HIVE-8069.1.patch CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR -- Key: HIVE-8069 URL: https://issues.apache.org/jira/browse/HIVE-8069 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-8069.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8067) set default table permissions for table owner to have all privileges
[ https://issues.apache.org/jira/browse/HIVE-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131152#comment-14131152 ] Lefty Leverenz commented on HIVE-8067: -- Agreed, and the description should list all the possible values. Patch 1 adds defaults INSERT, SELECT, UPDATE, and DELETE but the old description says 'An example like select,drop will grant select and drop privilege to the owner of the table' and the wiki lists DROP instead of DELETE. Is the wiki list complete and accurate? * ALL, ALTER, UPDATE, CREATE (irrelevant here), DROP, INDEX (not implemented), LOCK, SELECT, SHOW_DATABASE (irrelevant here) * [Hive Default Authorization (Legacy Mode) -- Privileges | https://cwiki.apache.org/confluence/display/Hive/Hive+Default+Authorization+-+Legacy+Mode#HiveDefaultAuthorization-LegacyMode-Privileges] set default table permissions for table owner to have all privileges Key: HIVE-8067 URL: https://issues.apache.org/jira/browse/HIVE-8067 Project: Hive Issue Type: Bug Components: Authorization, SQLStandardAuthorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-8067.1.patch When tables are created using without SQLStandards based authorization being enabled, the table owner does not have any privileges on the table. It makes sense to set the default privileges to be compatible with sql standard mode's expected default privileges for the owner of the table, instead of setting no privileges at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8069) CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR
[ https://issues.apache.org/jira/browse/HIVE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131153#comment-14131153 ] Laljo John Pullokkaran commented on HIVE-8069: -- +1 CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR -- Key: HIVE-8069 URL: https://issues.apache.org/jira/browse/HIVE-8069 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-8069.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8070) TestHWIServer failed due to wrong references to war and properties file
Bing Li created HIVE-8070: - Summary: TestHWIServer failed due to wrong references to war and properties file Key: HIVE-8070 URL: https://issues.apache.org/jira/browse/HIVE-8070 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.13.1 Reporter: Bing Li Assignee: Bing Li Fix For: 0.14.0 In testServerInit() method of that test class, it's still using build.properties to retrieve the version # for the war file name -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8052) Vectorization: min() on TimeStamp datatype fails with error Vector aggregate not implemented: min for type: TIMESTAMP
[ https://issues.apache.org/jira/browse/HIVE-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131157#comment-14131157 ] Hive QA commented on HIVE-8052: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668198/HIVE-8052.02.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6197 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/753/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/753/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-753/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668198 Vectorization: min() on TimeStamp datatype fails with error Vector aggregate not implemented: min for type: TIMESTAMP --- Key: HIVE-8052 URL: https://issues.apache.org/jira/browse/HIVE-8052 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-8052.01.patch, HIVE-8052.02.patch Changes in HIVE-5760 to make explicit when timestamp and date can be vectorized as Long were accidentally to strict for min, max, count, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8070) TestHWIServer failed due to wrong references to war and properties file
[ https://issues.apache.org/jira/browse/HIVE-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131159#comment-14131159 ] Bing Li commented on HIVE-8070: --- This JIRA is blocked by HIVE-7233 TestHWIServer failed due to wrong references to war and properties file --- Key: HIVE-8070 URL: https://issues.apache.org/jira/browse/HIVE-8070 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.13.1 Reporter: Bing Li Assignee: Bing Li Fix For: 0.14.0 In testServerInit() method of that test class, it's still using build.properties to retrieve the version # for the war file name -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7868) AvroSerDe error handling could be improved
[ https://issues.apache.org/jira/browse/HIVE-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-7868: --- Attachment: (was: HIVE-7868.patch) AvroSerDe error handling could be improved -- Key: HIVE-7868 URL: https://issues.apache.org/jira/browse/HIVE-7868 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Ferdinand Xu When an Avro schema is invalid, AvroSerDe returns an error message instead of throwing an exception. This is described in {{AvroSerdeUtils.determineSchemaOrReturnErrorSchema}}: {noformat} /** * Attempt to determine the schema via the usual means, but do not throw * an exception if we fail. Instead, signal failure via a special * schema. This is used because Hive calls init on the serde during * any call, including calls to update the serde properties, meaning * if the serde is in a bad state, there is no way to update that state. */ {noformat} I believe we should find a way to provide a better experience to our users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7868) AvroSerDe error handling could be improved
[ https://issues.apache.org/jira/browse/HIVE-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-7868: --- Attachment: HIVE-7868.1.patch AvroSerDe error handling could be improved -- Key: HIVE-7868 URL: https://issues.apache.org/jira/browse/HIVE-7868 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Ferdinand Xu Attachments: HIVE-7868.1.patch When an Avro schema is invalid, AvroSerDe returns an error message instead of throwing an exception. This is described in {{AvroSerdeUtils.determineSchemaOrReturnErrorSchema}}: {noformat} /** * Attempt to determine the schema via the usual means, but do not throw * an exception if we fail. Instead, signal failure via a special * schema. This is used because Hive calls init on the serde during * any call, including calls to update the serde properties, meaning * if the serde is in a bad state, there is no way to update that state. */ {noformat} I believe we should find a way to provide a better experience to our users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8062) Stats collection for columns fails on a partitioned table with null values in partitioning column
[ https://issues.apache.org/jira/browse/HIVE-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131169#comment-14131169 ] Gunther Hagleitner commented on HIVE-8062: -- LGTM +1 Stats collection for columns fails on a partitioned table with null values in partitioning column - Key: HIVE-8062 URL: https://issues.apache.org/jira/browse/HIVE-8062 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.14.0 Reporter: Deepesh Khandelwal Assignee: Ashutosh Chauhan Attachments: HIVE-8062.patch Steps to reproduce: 1. Create a data file abc.txt with the following contents: {noformat} a,1 b, {noformat} 2. Use the Hive CLI to create and load the partitioned table: {noformat} hive create table abc(a string, b int); OK Time taken: 0.272 seconds hive load data local inpath 'abc.txt' into table abc; Loading data to table default.abc Table default.abc stats: [numFiles=1, numRows=0, totalSize=7, rawDataSize=0] OK Time taken: 0.463 seconds hive create table abc1(a string) partitioned by (b int); OK Time taken: 0.098 seconds hive set hive.exec.dynamic.partition.mode=nonstrict; hive insert overwrite table abc1 partition (b) select a, b from abc; Query ID = hrt_qa_20140911210909_1200fae7-1e18-4e0d-b74f-040453c27cff Total jobs = 1 Launching Job 1 out of 1 Status: Running (application id: Executing on YARN cluster with App id application_1410457588978_0063) Map 1: -/-Reducer 2: 0/1 Map 1: 0/1Reducer 2: 0/1 Map 1: 0(+1)/1Reducer 2: 0/1 Map 1: 1/1Reducer 2: 0(+1)/1 Map 1: 1/1Reducer 2: 0/1 Map 1: 1/1Reducer 2: 1/1 Status: Finished successfully Loading data to table default.abc1 partition (b=null) Loading partition {b=__HIVE_DEFAULT_PARTITION__} Partition default.abc1{b=__HIVE_DEFAULT_PARTITION__} stats: [numFiles=1, numRows=2, totalSize=7, rawDataSize=5] OK Time taken: 7.49 seconds {noformat} 3. Now run the analyze statistics command for columns: {noformat} hive analyze table abc1 partition (b) compute statistics for columns; Query ID = hrt_qa_20140911211010_440bdb4a-6a0d-496b-9d2e-5fc84db3d0ee Total jobs = 1 Launching Job 1 out of 1 Status: Running (application id: Executing on YARN cluster with App id application_1410457588978_0063) Map 1: 0(+1)/1Reducer 2: 0/1 Map 1: 1/1Reducer 2: 0(+1)/1 Map 1: 1/1Reducer 2: 1/1 Status: Finished successfully FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask {noformat} The analyze statistics for columns fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-8070) TestHWIServer failed due to wrong references to war and properties file
[ https://issues.apache.org/jira/browse/HIVE-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-8070 started by Bing Li. - TestHWIServer failed due to wrong references to war and properties file --- Key: HIVE-8070 URL: https://issues.apache.org/jira/browse/HIVE-8070 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.13.1 Reporter: Bing Li Assignee: Bing Li Fix For: 0.14.0 Attachments: HIVE-8070.1.patch In testServerInit() method of that test class, it's still using build.properties to retrieve the version # for the war file name -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8070) TestHWIServer failed due to wrong references to war and properties file
[ https://issues.apache.org/jira/browse/HIVE-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-8070: -- Status: Patch Available (was: In Progress) The patch is generated for trunk TestHWIServer failed due to wrong references to war and properties file --- Key: HIVE-8070 URL: https://issues.apache.org/jira/browse/HIVE-8070 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.13.1 Reporter: Bing Li Assignee: Bing Li Fix For: 0.14.0 Attachments: HIVE-8070.1.patch In testServerInit() method of that test class, it's still using build.properties to retrieve the version # for the war file name -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8070) TestHWIServer failed due to wrong references to war and properties file
[ https://issues.apache.org/jira/browse/HIVE-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-8070: -- Attachment: HIVE-8070.1.patch TestHWIServer failed due to wrong references to war and properties file --- Key: HIVE-8070 URL: https://issues.apache.org/jira/browse/HIVE-8070 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.13.1 Reporter: Bing Li Assignee: Bing Li Fix For: 0.14.0 Attachments: HIVE-8070.1.patch In testServerInit() method of that test class, it's still using build.properties to retrieve the version # for the war file name -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8062) Stats collection for columns fails on a partitioned table with null values in partitioning column
[ https://issues.apache.org/jira/browse/HIVE-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131174#comment-14131174 ] Pengcheng Xiong commented on HIVE-8062: --- +1 Stats collection for columns fails on a partitioned table with null values in partitioning column - Key: HIVE-8062 URL: https://issues.apache.org/jira/browse/HIVE-8062 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.14.0 Reporter: Deepesh Khandelwal Assignee: Ashutosh Chauhan Attachments: HIVE-8062.patch Steps to reproduce: 1. Create a data file abc.txt with the following contents: {noformat} a,1 b, {noformat} 2. Use the Hive CLI to create and load the partitioned table: {noformat} hive create table abc(a string, b int); OK Time taken: 0.272 seconds hive load data local inpath 'abc.txt' into table abc; Loading data to table default.abc Table default.abc stats: [numFiles=1, numRows=0, totalSize=7, rawDataSize=0] OK Time taken: 0.463 seconds hive create table abc1(a string) partitioned by (b int); OK Time taken: 0.098 seconds hive set hive.exec.dynamic.partition.mode=nonstrict; hive insert overwrite table abc1 partition (b) select a, b from abc; Query ID = hrt_qa_20140911210909_1200fae7-1e18-4e0d-b74f-040453c27cff Total jobs = 1 Launching Job 1 out of 1 Status: Running (application id: Executing on YARN cluster with App id application_1410457588978_0063) Map 1: -/-Reducer 2: 0/1 Map 1: 0/1Reducer 2: 0/1 Map 1: 0(+1)/1Reducer 2: 0/1 Map 1: 1/1Reducer 2: 0(+1)/1 Map 1: 1/1Reducer 2: 0/1 Map 1: 1/1Reducer 2: 1/1 Status: Finished successfully Loading data to table default.abc1 partition (b=null) Loading partition {b=__HIVE_DEFAULT_PARTITION__} Partition default.abc1{b=__HIVE_DEFAULT_PARTITION__} stats: [numFiles=1, numRows=2, totalSize=7, rawDataSize=5] OK Time taken: 7.49 seconds {noformat} 3. Now run the analyze statistics command for columns: {noformat} hive analyze table abc1 partition (b) compute statistics for columns; Query ID = hrt_qa_20140911211010_440bdb4a-6a0d-496b-9d2e-5fc84db3d0ee Total jobs = 1 Launching Job 1 out of 1 Status: Running (application id: Executing on YARN cluster with App id application_1410457588978_0063) Map 1: 0(+1)/1Reducer 2: 0/1 Map 1: 1/1Reducer 2: 0(+1)/1 Map 1: 1/1Reducer 2: 1/1 Status: Finished successfully FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask {noformat} The analyze statistics for columns fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7981) alias of compound aggregation functions fails in having clause
[ https://issues.apache.org/jira/browse/HIVE-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7981: Attachment: HIVE-7981.2.patch.txt alias of compound aggregation functions fails in having clause -- Key: HIVE-7981 URL: https://issues.apache.org/jira/browse/HIVE-7981 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: eyal gruss Assignee: Navis Priority: Minor Attachments: HIVE-7981.1.patch.txt, HIVE-7981.2.patch.txt hive select max(time)-min(time) as span from mytable group by name having span0; FAILED: SemanticException [Error 10025]: Line 1:92 Expression not in GROUP BY key '0' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8071) hive shell tries to write hive-exec.jar for each run
Rajesh Balamohan created HIVE-8071: -- Summary: hive shell tries to write hive-exec.jar for each run Key: HIVE-8071 URL: https://issues.apache.org/jira/browse/HIVE-8071 Project: Hive Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan For every run of the hive CLI there is a delay for the shell startup 14/07/31 23:07:19 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 14/07/31 23:07:19 INFO tez.DagUtils: Hive jar directory is hdfs://mac-10:8020/user/gopal/apps/2014-Jul-31/hive/ 14/07/31 23:07:19 INFO tez.DagUtils: Localizing resource because it does not exist: file:/home/gopal/tez-autobuild/dist/hive/lib/hive-exec-0.14.0-SNAPSHOT.jar to dest: hdfs://mac-10:8020/user/gopal/apps/2014-Jul-31/hive/hive-exec-0.14.0-SNAPSHOTde1f82f0b5561d3db9e3080dfb2897210a3bda4ca5e7b14e881e381115837fd8. jar 14/07/31 23:07:19 INFO tez.DagUtils: Looks like another thread is writing the same file will wait. 14/07/31 23:07:19 INFO tez.DagUtils: Number of wait attempts: 5. Wait interval: 5000 14/07/31 23:07:19 INFO tez.DagUtils: Resource modification time: 1406870512963 14/07/31 23:07:20 INFO tez.TezSessionState: Opening new Tez Session (id: 02d6b558-44cc-4182-b2f2-6a37ffdd25d2, scratch dir: hdfs://mac-10:8020/tmp/hive-gopal/_tez_session_dir/02d6b558-44cc-4182-b2f2-6a37ffdd25d2) Traced this to a method which does PRIVATE LRs - this is marked as PRIVATE even if it is from a common install dir. {code} public LocalResource localizeResource(Path src, Path dest, Configuration conf) throws IOException { return createLocalResource(destFS, dest, LocalResourceType.FILE, LocalResourceVisibility.PRIVATE); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-7858) Parquet compression should be configurable via table property
[ https://issues.apache.org/jira/browse/HIVE-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-7858 started by Ferdinand Xu. -- Parquet compression should be configurable via table property - Key: HIVE-7858 URL: https://issues.apache.org/jira/browse/HIVE-7858 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Ferdinand Xu ORC supports the orc.compress table property: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC {noformat} create table Addresses ( name string, street string, city string, state string, zip int ) stored as orc tblproperties (orc.compress=NONE); {noformat} I think it'd be great to support the same for Parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5690) Support subquery for single sourced multi query
[ https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5690: Attachment: HIVE-5690.12.patch.txt Another rebasing on trunk Support subquery for single sourced multi query --- Key: HIVE-5690 URL: https://issues.apache.org/jira/browse/HIVE-5690 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D13791.1.patch, HIVE-5690.10.patch.txt, HIVE-5690.11.patch.txt, HIVE-5690.12.patch.txt, HIVE-5690.2.patch.txt, HIVE-5690.3.patch.txt, HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, HIVE-5690.6.patch.txt, HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt, HIVE-5690.9.patch.txt Single sourced multi (insert) query is very useful for various ETL processes but it does not allow subqueries included. For example, {noformat} explain from src insert overwrite table x1 select * from (select distinct key,value) b order by key insert overwrite table x2 select * from (select distinct key,value) c order by value; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8040) Commit for HIVE-7925 breaks hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131197#comment-14131197 ] Navis commented on HIVE-8040: - I've changed ExitException to RuntimeException and confirmed test passed. [~mithun], could you check this? Commit for HIVE-7925 breaks hadoop-1 build -- Key: HIVE-8040 URL: https://issues.apache.org/jira/browse/HIVE-8040 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Xuefu Zhang Attachments: HIVE-8040.1.patch.txt, HIVE-8040.2.patch.txt {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-metastore: Compilation failure [ERROR] /home/xzhang/apache/hive7/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:[45,37] package org.apache.commons.math3.stat does not exist [ERROR] - [Help 1] {code} Missing pom file changes maybe? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131201#comment-14131201 ] Hive QA commented on HIVE-8017: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668276/HIVE-8017.5-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6343 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/125/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/125/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-125/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668276 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7163) ReduceRecordProcessor adds null values to getShuffleInputs - Causing NPE
[ https://issues.apache.org/jira/browse/HIVE-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131203#comment-14131203 ] Prasanth J commented on HIVE-7163: -- [~rajesh.balamohan] do you have a reproducible test case for this? ReduceRecordProcessor adds null values to getShuffleInputs - Causing NPE Key: HIVE-7163 URL: https://issues.apache.org/jira/browse/HIVE-7163 Project: Hive Issue Type: Bug Reporter: Rajesh Balamohan Labels: tez 2014-06-01 22:35:55,435 ERROR TezChild org.apache.hadoop.hive.ql.exec.tez.TezProcessor: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.tez.runtime.InputReadyTracker$InputReadyMonitor.init(InputReadyTracker.java:111) at org.apache.tez.runtime.InputReadyTracker.waitForAllInputsReady(InputReadyTracker.java:90) at org.apache.tez.runtime.api.impl.TezProcessorContextImpl.waitForAllInputsReady(TezProcessorContextImpl.java:109) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:198) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:173) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:173) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1125) at java.util.Collections$SetFromMap.add(Collections.java:3903) at java.util.AbstractCollection.addAll(AbstractCollection.java:334) at org.apache.tez.runtime.InputReadyTracker$InputReadyMonitor.init(InputReadyTracker.java:102) ... 17 more ReduceRecordProcessor.getShuffleInputs() adds null values to ShuffleInputs. This is passed on to to org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor: getShuffleInputs : null, null Environment: Latest codebase -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Timeline for release of Hive 0.14
Hi, I'll really appreciate if HIVE-5690 can be included, which becomes harder and harder to rebase. Other 79 patches I've assigned to can be held on. Thanks, Navis 2014-09-11 19:54 GMT+09:00 Vaibhav Gumashta vgumas...@hortonworks.com: Hi Vikram, Can we also add: https://issues.apache.org/jira/browse/HIVE-6799 https://issues.apache.org/jira/browse/HIVE-7935 to the list. Thanks, --Vaibhav On Wed, Sep 10, 2014 at 12:18 AM, Satish Mittal satish.mit...@inmobi.com wrote: Hi, Can you please include HIVE-7892 (Thrift Set type not working with Hive) as well? It is under code review. Regards, Satish On Tue, Sep 9, 2014 at 2:10 PM, Suma Shivaprasad sumasai.shivapra...@gmail.com wrote: Please include https://issues.apache.org/jira/browse/HIVE-7694 as well. It is currently under review by Amareshwari and should be done in the next couple of days. Thanks Suma On Mon, Sep 8, 2014 at 5:44 PM, Alan Gates ga...@hortonworks.com wrote: I'll review that. I just need the time to test it against mysql, oracle, and hopefully sqlserver. But I think we can do this post branch if we need to, as it's a bug fix rather than a feature. Alan. Damien Carol dca...@blitzbs.com September 8, 2014 at 3:19 Same request for https://issues.apache.org/jira/browse/HIVE-7689 I already provided a patch, re-based it many times and I'm waiting for a review. Regards, Le 08/09/2014 12:08, amareshwarisr . a écrit : amareshwarisr . amareshw...@gmail.com September 8, 2014 at 3:08 Would like to include https://issues.apache.org/jira/browse/HIVE-2390 and https://issues.apache.org/jira/browse/HIVE-7936 . I can review and merge them. Thanks Amareshwari Vikram Dixit vik...@hortonworks.com September 5, 2014 at 17:53 Hi Folks, I am going to start consolidating the items mentioned in this list and create a wiki page to track it. I will wait till the end of next week to create the branch taking into account Ashutosh's request. Thanks Vikram. On Fri, Sep 5, 2014 at 5:39 PM, Ashutosh Chauhan hashut...@apache.org hashut...@apache.org Ashutosh Chauhan hashut...@apache.org September 5, 2014 at 17:39 Vikram, Some of us are working on stabilizing cbo branch and trying to get it merged into trunk. We feel we are close. May I request to defer cutting the branch for few more days? Folks interested in this can track our progress here : https://issues.apache.org/jira/browse/HIVE-7946 Thanks, Ashutosh On Fri, Aug 22, 2014 at 4:09 PM, Lars Francke lars.fran...@gmail.com lars.fran...@gmail.com Lars Francke lars.fran...@gmail.com August 22, 2014 at 16:09 Thank you for volunteering to do the release. I think a 0.14 release is a good idea. I have a couple of issues I'd like to get in too: * Either HIVE-7107[0] (Fix an issue in the HiveServer1 JDBC driver) or HIVE-6977[1] (Delete HiveServer1). The former needs a review the latter a patch * HIVE-6123[2] Checkstyle in Maven needs a review HIVE-7622[3] HIVE-7543[4] are waiting for any reviews or comments on my previous thread[5]. I'd still appreciate any helpers for reviews or even just comments. I'd feel very sad if I had done all that work for nothing. Hoping this thread gives me a wider audience. Both patches fix up issues that should have been caught in earlier reviews as they are almost all Checkstyle or other style violations but they make for huge patches. I could also create hundreds of small issues or stop doing these things entirely [0] https://issues.apache.org/jira/browse/HIVE-7107 https://issues.apache.org/jira/browse/HIVE-7107 [1] https://issues.apache.org/jira/browse/HIVE-6977 https://issues.apache.org/jira/browse/HIVE-6977 [2] https://issues.apache.org/jira/browse/HIVE-6123 https://issues.apache.org/jira/browse/HIVE-6123 [3] https://issues.apache.org/jira/browse/HIVE-7622 https://issues.apache.org/jira/browse/HIVE-7622 [4] https://issues.apache.org/jira/browse/HIVE-7543 https://issues.apache.org/jira/browse/HIVE-7543 On Fri, Aug 22, 2014 at 11:01 PM, John Pullokkaran -- Sent with Postbox http://www.getpostbox.com CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly
[jira] [Updated] (HIVE-7156) Group-By operator stat-annotation only uses distinct approx to generate rollups
[ https://issues.apache.org/jira/browse/HIVE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7156: - Attachment: HIVE-7156.4.patch Fixes one last test failure. Group-By operator stat-annotation only uses distinct approx to generate rollups --- Key: HIVE-7156 URL: https://issues.apache.org/jira/browse/HIVE-7156 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Prasanth J Attachments: HIVE-7156.1.patch, HIVE-7156.2.patch, HIVE-7156.3.patch, HIVE-7156.4.patch The stats annotation for a group-by only annotates the reduce-side row-count with the distinct values. The map-side gets the row-count as the rows output instead of distinct * parallelism, while the reducer side gets the correct parallelism. {code} hive explain select distinct L_SHIPDATE from lineitem; Vertices: Map 1 Map Operator Tree: TableScan alias: lineitem Statistics: Num rows: 589709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: l_shipdate (type: string) outputColumnNames: l_shipdate Statistics: Num rows: 589709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: l_shipdate (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 589709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 589709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Reducer 2 Reduce Operator Tree: Group By Operator keys: KEY._col0 (type: string) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: string) outputColumnNames: _col0 Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7156) Group-By operator stat-annotation only uses distinct approx to generate rollups
[ https://issues.apache.org/jira/browse/HIVE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131218#comment-14131218 ] Prasanth J commented on HIVE-7156: -- [~hagleitn]/[~gopalv]/[~rhbutani] can someone plz take a look at this patch? The patch has mostly test file changes. The code changes are small. Group-By operator stat-annotation only uses distinct approx to generate rollups --- Key: HIVE-7156 URL: https://issues.apache.org/jira/browse/HIVE-7156 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Prasanth J Attachments: HIVE-7156.1.patch, HIVE-7156.2.patch, HIVE-7156.3.patch, HIVE-7156.4.patch The stats annotation for a group-by only annotates the reduce-side row-count with the distinct values. The map-side gets the row-count as the rows output instead of distinct * parallelism, while the reducer side gets the correct parallelism. {code} hive explain select distinct L_SHIPDATE from lineitem; Vertices: Map 1 Map Operator Tree: TableScan alias: lineitem Statistics: Num rows: 589709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: l_shipdate (type: string) outputColumnNames: l_shipdate Statistics: Num rows: 589709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: l_shipdate (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 589709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 589709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Reducer 2 Reduce Operator Tree: Group By Operator keys: KEY._col0 (type: string) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: string) outputColumnNames: _col0 Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 25575: HIVE-7615: Beeline should have an option for user to see the query progress
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25575/ --- Review request for hive. Repository: hive-git Description --- When executing query in Beeline, user should have a option to see the progress through the outputs. Beeline could use the API introduced in HIVE-4629 to get and display the logs to the client. Diffs - beeline/pom.xml 45fa02b beeline/src/java/org/apache/hive/beeline/Commands.java a92d69f itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java e1d44ec itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java ae128a9 jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 2cbf58c Diff: https://reviews.apache.org/r/25575/diff/ Testing --- UT passed. Thanks, Dong Chen
[jira] [Updated] (HIVE-7615) Beeline should have an option for user to see the query progress
[ https://issues.apache.org/jira/browse/HIVE-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Chen updated HIVE-7615: Status: Patch Available (was: Open) Beeline should have an option for user to see the query progress Key: HIVE-7615 URL: https://issues.apache.org/jira/browse/HIVE-7615 Project: Hive Issue Type: Improvement Components: CLI Reporter: Dong Chen Assignee: Dong Chen When executing query in Beeline, user should have a option to see the progress through the outputs. Beeline could use the API introduced in HIVE-4629 to get and display the logs to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7826) Dynamic partition pruning on Tez
[ https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131221#comment-14131221 ] Lefty Leverenz commented on HIVE-7826: -- Typo fixed: HIVE-8018. The parameter name is now *hive.tez.dynamic.partition.pruning.max.data.size* for release 0.14.0. Dynamic partition pruning on Tez Key: HIVE-7826 URL: https://issues.apache.org/jira/browse/HIVE-7826 Project: Hive Issue Type: Bug Components: Tez Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Labels: TODOC14, tez Fix For: 0.14.0 Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch It's natural in a star schema to map one or more dimensions to partition columns. Time or location are likely candidates. It can also useful to be to compute the partitions one would like to scan via a subquery (where p in select ... from ...). The resulting joins in hive require a full table scan of the large table though, because partition pruning takes place before the corresponding values are known. On Tez it's relatively straight forward to send the values needed to prune to the application master - where splits are generated and tasks are submitted. Using these values we can strip out any unneeded partitions dynamically, while the query is running. The approach is straight forward: - Insert synthetic conditions for each join representing x in (keys of other side in join) - This conditions will be pushed as far down as possible - If the condition hits a table scan and the column involved is a partition column: - Setup Operator to send key events to AM - else: - Remove synthetic predicate Add these properties : ||Property||Default Value|| |{{hive.tez.dynamic.partition.pruning}}|true| |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L| |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L| -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7156) Group-By operator stat-annotation only uses distinct approx to generate rollups
[ https://issues.apache.org/jira/browse/HIVE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131223#comment-14131223 ] Gopal V commented on HIVE-7156: --- I just re-ran my test with this patch applied the group-by still increases the row-count when applied? {code} hive explain select distinct L_SHIPDATE from lineitem; STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 - Map 1 (SIMPLE_EDGE) DagName: gopal_20140912004141_1f23f948-7852-4882-9f3e-1810904988b8:1 Vertices: Map 1 Map Operator Tree: TableScan alias: lineitem Statistics: Num rows: 589709 Data size: 4833087637230 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: l_shipdate (type: string) outputColumnNames: l_shipdate Statistics: Num rows: 589709 Data size: 4833087637230 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: l_shipdate (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 113279805705920 Data size: 10648301736356480 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 113279805705920 Data size: 10648301736356480 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized {code} Group-By operator stat-annotation only uses distinct approx to generate rollups --- Key: HIVE-7156 URL: https://issues.apache.org/jira/browse/HIVE-7156 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Prasanth J Attachments: HIVE-7156.1.patch, HIVE-7156.2.patch, HIVE-7156.3.patch, HIVE-7156.4.patch The stats annotation for a group-by only annotates the reduce-side row-count with the distinct values. The map-side gets the row-count as the rows output instead of distinct * parallelism, while the reducer side gets the correct parallelism. {code} hive explain select distinct L_SHIPDATE from lineitem; Vertices: Map 1 Map Operator Tree: TableScan alias: lineitem Statistics: Num rows: 589709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: l_shipdate (type: string) outputColumnNames: l_shipdate Statistics: Num rows: 589709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: l_shipdate (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 589709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 589709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Reducer 2 Reduce Operator Tree: Group By Operator keys: KEY._col0 (type: string) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: string) outputColumnNames: _col0 Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8041) Hadoop-2 build is broken with JDK6
[ https://issues.apache.org/jira/browse/HIVE-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis resolved HIVE-8041. - Resolution: Fixed Fix Version/s: 0.14.0 Committed to trunk. Hadoop-2 build is broken with JDK6 -- Key: HIVE-8041 URL: https://issues.apache.org/jira/browse/HIVE-8041 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Xuefu Zhang Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-8041.1.patch.txt {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-exec: Compilation failure [ERROR] /home/xzhang/apache/hive7/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIf.java:[81,1] illegal start of expression {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7704) Create tez task for fast file merging
[ https://issues.apache.org/jira/browse/HIVE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7704: - Attachment: HIVE-7704.11.patch Rebased this patch against trunk. Create tez task for fast file merging - Key: HIVE-7704 URL: https://issues.apache.org/jira/browse/HIVE-7704 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7704.1.patch, HIVE-7704.10.patch, HIVE-7704.11.patch, HIVE-7704.2.patch, HIVE-7704.3.patch, HIVE-7704.4.patch, HIVE-7704.4.patch, HIVE-7704.5.patch, HIVE-7704.6.patch, HIVE-7704.7.patch, HIVE-7704.8.patch, HIVE-7704.9.patch Currently tez falls back to MR task for merge file task. It will beneficial to convert the merge file tasks to tez task to make use of the performance gains from tez. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7156) Group-By operator stat-annotation only uses distinct approx to generate rollups
[ https://issues.apache.org/jira/browse/HIVE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131229#comment-14131229 ] Prasanth J commented on HIVE-7156: -- Thanks [~gopalv] for looking into it. I will check what going on and will report back. Group-By operator stat-annotation only uses distinct approx to generate rollups --- Key: HIVE-7156 URL: https://issues.apache.org/jira/browse/HIVE-7156 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Prasanth J Attachments: HIVE-7156.1.patch, HIVE-7156.2.patch, HIVE-7156.3.patch, HIVE-7156.4.patch The stats annotation for a group-by only annotates the reduce-side row-count with the distinct values. The map-side gets the row-count as the rows output instead of distinct * parallelism, while the reducer side gets the correct parallelism. {code} hive explain select distinct L_SHIPDATE from lineitem; Vertices: Map 1 Map Operator Tree: TableScan alias: lineitem Statistics: Num rows: 589709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: l_shipdate (type: string) outputColumnNames: l_shipdate Statistics: Num rows: 589709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: l_shipdate (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 589709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 589709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Reducer 2 Reduce Operator Tree: Group By Operator keys: KEY._col0 (type: string) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: string) outputColumnNames: _col0 Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7733) Ambiguous column reference error on query
[ https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131232#comment-14131232 ] Hive QA commented on HIVE-7733: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668234/HIVE-7733.4.patch.txt {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6198 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.testStatsAfterCompactionPartTbl org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/755/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/755/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-755/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668234 Ambiguous column reference error on query - Key: HIVE-7733 URL: https://issues.apache.org/jira/browse/HIVE-7733 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Jason Dere Assignee: Navis Attachments: HIVE-7733.1.patch.txt, HIVE-7733.2.patch.txt, HIVE-7733.3.patch.txt, HIVE-7733.4.patch.txt {noformat} CREATE TABLE agg1 ( col0 INT, col1 STRING, col2 DOUBLE ); explain SELECT single_use_subq11.a1 AS a1, single_use_subq11.a2 AS a2 FROM (SELECT Sum(agg1.col2) AS a1 FROM agg1 GROUP BY agg1.col0) single_use_subq12 JOIN (SELECT alias.a2 AS a0, alias.a1 AS a1, alias.a1 AS a2 FROM (SELECT agg1.col1 AS a0, '42' AS a1, agg1.col0 AS a2 FROM agg1 UNION ALL SELECT agg1.col1 AS a0, '41' AS a1, agg1.col0 AS a2 FROM agg1) alias GROUP BY alias.a2, alias.a1) single_use_subq11 ON ( single_use_subq11.a0 = single_use_subq11.a0 ); {noformat} Gets the following error: FAILED: SemanticException [Error 10007]: Ambiguous column reference a2 Looks like this query had been working in 0.12 but starting failing with this error in 0.13 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7946) CBO: Merge CBO changes to Trunk
[ https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131233#comment-14131233 ] Hive QA commented on HIVE-7946: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668238/HIVE-7946.6.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/756/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/756/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-756/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-756/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'ql/src/test/results/clientnegative/ambiguous_col.q.out' Reverted 'ql/src/test/results/clientnegative/ambiguous_col0.q.out' Reverted 'ql/src/test/results/clientnegative/ambiguous_col1.q.out' Reverted 'ql/src/test/results/clientnegative/ambiguous_col2.q.out' Reverted 'ql/src/test/results/clientpositive/ambiguous_col.q.out' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target accumulo-handler/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target ql/src/test/results/clientpositive/complex_alias.q.out ql/src/test/queries/clientpositive/complex_alias.q + svn update Uql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIf.java Fetching external item into 'hcatalog/src/test/e2e/harness' Updated external to revision 1624472. Updated to revision 1624472. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12668238 CBO: Merge CBO changes to Trunk --- Key: HIVE-7946 URL: https://issues.apache.org/jira/browse/HIVE-7946 Project: Hive Issue Type: Bug Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch,
[jira] [Commented] (HIVE-8036) PTest SSH Options
[ https://issues.apache.org/jira/browse/HIVE-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131245#comment-14131245 ] Lefty Leverenz commented on HIVE-8036: -- Should these options be documented in the wiki? PTest SSH Options - Key: HIVE-8036 URL: https://issues.apache.org/jira/browse/HIVE-8036 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.14.0 Attachments: HIVE-8036.patch I'd like to be able to specify the following options: {noformat} StrictHostKeyChecking no ConnectionAttempts 3 ServerAliveInterval 1 {noformat} as a config param in the ptest config file as opposed to depending on them set in the env. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7733) Ambiguous column reference error on query
[ https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131246#comment-14131246 ] Navis commented on HIVE-7733: - [~ashutoshc] The uniqueness of columns name should be mandatory for sub query. testCliDriver_ambiguous_col should be failed in this assumption but it's succeeded in exceptional way (select-star in top-level query). Should we allow this? Ambiguous column reference error on query - Key: HIVE-7733 URL: https://issues.apache.org/jira/browse/HIVE-7733 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Jason Dere Assignee: Navis Attachments: HIVE-7733.1.patch.txt, HIVE-7733.2.patch.txt, HIVE-7733.3.patch.txt, HIVE-7733.4.patch.txt {noformat} CREATE TABLE agg1 ( col0 INT, col1 STRING, col2 DOUBLE ); explain SELECT single_use_subq11.a1 AS a1, single_use_subq11.a2 AS a2 FROM (SELECT Sum(agg1.col2) AS a1 FROM agg1 GROUP BY agg1.col0) single_use_subq12 JOIN (SELECT alias.a2 AS a0, alias.a1 AS a1, alias.a1 AS a2 FROM (SELECT agg1.col1 AS a0, '42' AS a1, agg1.col0 AS a2 FROM agg1 UNION ALL SELECT agg1.col1 AS a0, '41' AS a1, agg1.col0 AS a2 FROM agg1) alias GROUP BY alias.a2, alias.a1) single_use_subq11 ON ( single_use_subq11.a0 = single_use_subq11.a0 ); {noformat} Gets the following error: FAILED: SemanticException [Error 10007]: Ambiguous column reference a2 Looks like this query had been working in 0.12 but starting failing with this error in 0.13 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-7733) Ambiguous column reference error on query
[ https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131246#comment-14131246 ] Navis edited comment on HIVE-7733 at 9/12/14 8:11 AM: -- [~ashutoshc] The uniqueness of columns name should be mandatory for sub query. testCliDriver_ambiguous_col should be failed under this assumption but it's succeeded in exceptional way (select-star in top-level query). Should we allow this? was (Author: navis): [~ashutoshc] The uniqueness of columns name should be mandatory for sub query. testCliDriver_ambiguous_col should be failed in this assumption but it's succeeded in exceptional way (select-star in top-level query). Should we allow this? Ambiguous column reference error on query - Key: HIVE-7733 URL: https://issues.apache.org/jira/browse/HIVE-7733 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Jason Dere Assignee: Navis Attachments: HIVE-7733.1.patch.txt, HIVE-7733.2.patch.txt, HIVE-7733.3.patch.txt, HIVE-7733.4.patch.txt {noformat} CREATE TABLE agg1 ( col0 INT, col1 STRING, col2 DOUBLE ); explain SELECT single_use_subq11.a1 AS a1, single_use_subq11.a2 AS a2 FROM (SELECT Sum(agg1.col2) AS a1 FROM agg1 GROUP BY agg1.col0) single_use_subq12 JOIN (SELECT alias.a2 AS a0, alias.a1 AS a1, alias.a1 AS a2 FROM (SELECT agg1.col1 AS a0, '42' AS a1, agg1.col0 AS a2 FROM agg1 UNION ALL SELECT agg1.col1 AS a0, '41' AS a1, agg1.col0 AS a2 FROM agg1) alias GROUP BY alias.a2, alias.a1) single_use_subq11 ON ( single_use_subq11.a0 = single_use_subq11.a0 ); {noformat} Gets the following error: FAILED: SemanticException [Error 10007]: Ambiguous column reference a2 Looks like this query had been working in 0.12 but starting failing with this error in 0.13 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8042) Optionally allow move tasks to run in parallel
[ https://issues.apache.org/jira/browse/HIVE-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131261#comment-14131261 ] Lefty Leverenz commented on HIVE-8042: -- Should the description of *hive.exec.parallel* be revised in HiveConf.java, since its behavior is changing in 0.14.0? (Sorry I didn't chime in earlier.) Alternatively, the wiki could provide information about this change: * [hive.exec.parallel | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.parallel] Optionally allow move tasks to run in parallel -- Key: HIVE-8042 URL: https://issues.apache.org/jira/browse/HIVE-8042 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.14.0 Attachments: HIVE-8042.1.patch, HIVE-8042.2.patch, HIVE-8042.3.patch hive.exec.parallel allows one to run different stages of a query in parallel. However that applies only to map-reduce tasks. When using large multi insert queries there are many MoveTasks that are all executed in sequence on the client. There's no real reason for that - they could be run in parallel as well (i.e.: the stage graph captures the dependencies and knows which tasks can happen in parallel). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7859) Tune zlib compression in ORC to account for the encoding strategy
[ https://issues.apache.org/jira/browse/HIVE-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7859: - Labels: TODOC14 (was: ) Tune zlib compression in ORC to account for the encoding strategy - Key: HIVE-7859 URL: https://issues.apache.org/jira/browse/HIVE-7859 Project: Hive Issue Type: Bug Components: File Formats Reporter: Gopal V Assignee: Gopal V Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7859.1.patch, HIVE-7859.2.patch, HIVE-7859.3.patch Currently ORC Zlib is slow because several compression strategies ZLib uses is already done by ORC in itself (dictionary, RLE, bit-packing). We need to pick between Z_FILTERED, Z_HUFFMAN_ONLY, Z_RLE, Z_FIXED and Z_DEFAULT_STRATEGY according to column stream type. For instance an RLE_V2 stream could a use Z_FILTERED compression without invoking the rest of the strategies. The string streams can use Z_FIXED compression strategies and so on. The core limitation to stick to retain compatibility with the default decompressor, so that these are automatically backward compatible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7859) Tune zlib compression in ORC to account for the encoding strategy
[ https://issues.apache.org/jira/browse/HIVE-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131271#comment-14131271 ] Lefty Leverenz commented on HIVE-7859: -- Doc note: This adds configuration parameter *hive.exec.orc.compression.strategy* to HiveConf.java, so it needs to be documented in the wiki: * [Configuration Properties -- ORC File Format | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat] Tune zlib compression in ORC to account for the encoding strategy - Key: HIVE-7859 URL: https://issues.apache.org/jira/browse/HIVE-7859 Project: Hive Issue Type: Bug Components: File Formats Reporter: Gopal V Assignee: Gopal V Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7859.1.patch, HIVE-7859.2.patch, HIVE-7859.3.patch Currently ORC Zlib is slow because several compression strategies ZLib uses is already done by ORC in itself (dictionary, RLE, bit-packing). We need to pick between Z_FILTERED, Z_HUFFMAN_ONLY, Z_RLE, Z_FIXED and Z_DEFAULT_STRATEGY according to column stream type. For instance an RLE_V2 stream could a use Z_FILTERED compression without invoking the rest of the strategies. The string streams can use Z_FIXED compression strategies and so on. The core limitation to stick to retain compatibility with the default decompressor, so that these are automatically backward compatible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7615) Beeline should have an option for user to see the query progress
[ https://issues.apache.org/jira/browse/HIVE-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Chen updated HIVE-7615: Attachment: HIVE-7615.patch [~cartershanklin], [~brocknoland], [~thejas] Thanks very much for looking at this jira and give ideas on it. I got a patch uploaded. Since HIVE-4629 was in trunk, I tried to use that API to fetch logs and it works fine. Review board: https://reviews.apache.org/r/25575/ The patch mainly contains: 1. add jdbc layer API, which use thrift method FetchResult(), to get operation (query) logs. Also add a QueryStatus to keep the query state in jdbc, since execute() method is blocking and get log method needs to be sync up with it in another thread. 2. Beeline use the added jdbc API to fetch logs and show them in console. Use existed 'silent' option to choose whether to show logs, since Beeline seems to have many options already. If a seperated option like showProgress is preferable, please let me know. Beeline should have an option for user to see the query progress Key: HIVE-7615 URL: https://issues.apache.org/jira/browse/HIVE-7615 Project: Hive Issue Type: Improvement Components: CLI Reporter: Dong Chen Assignee: Dong Chen Attachments: HIVE-7615.patch When executing query in Beeline, user should have a option to see the progress through the outputs. Beeline could use the API introduced in HIVE-4629 to get and display the logs to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7977) Avoid creating serde for partitions if possible in FetchTask
[ https://issues.apache.org/jira/browse/HIVE-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7977: Attachment: HIVE-7977.3.patch.txt Fixed NPE in TestJdbcDriver2, but cannot reproduce fail of file_with_header_footer.q Avoid creating serde for partitions if possible in FetchTask Key: HIVE-7977 URL: https://issues.apache.org/jira/browse/HIVE-7977 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-7977.1.patch.txt, HIVE-7977.2.patch.txt, HIVE-7977.3.patch.txt Currently, FetchTask creates SerDe instance thrice for each partition, which can be avoided if it's same with table SerDe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7615) Beeline should have an option for user to see the query progress
[ https://issues.apache.org/jira/browse/HIVE-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131309#comment-14131309 ] Dong Chen commented on HIVE-7615: - Another thing I planned to put in this patch was Beeline filtering the fetched logs and only showing a simple percentage progress. When filtering, a log4j pattern is used to derive needed class and message info. (MR job progress info is mainly in class ql.Driver and ql.exec.Task) This needs some work: Beeline does not know the pattern HS2 using, and Beeline is introduced two dependency of jar “chainsaw”, “apache-log4j-extras” to parse fetched log. Any thought on it? Do you think user having an option to show the progress or not is enough, or it is prefered to make this filter work? Beeline should have an option for user to see the query progress Key: HIVE-7615 URL: https://issues.apache.org/jira/browse/HIVE-7615 Project: Hive Issue Type: Improvement Components: CLI Reporter: Dong Chen Assignee: Dong Chen Attachments: HIVE-7615.patch When executing query in Beeline, user should have a option to see the progress through the outputs. Beeline could use the API introduced in HIVE-4629 to get and display the logs to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8040) Commit for HIVE-7925 breaks hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131330#comment-14131330 ] Satish Mittal commented on HIVE-8040: - Applied 2nd patch and ran 'mvn clean install -DskipTests -Phadoop-1'. Now it failed at: {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) on project hive-exec: Compilation failure: Compilation failure: [ERROR] /home/satish/work/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionEdge.java:[29,27] cannot find symbol [ERROR] symbol : class DataInputByteBuffer [ERROR] location: package org.apache.hadoop.io [ERROR] /home/satish/work/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionEdge.java:[73,4] cannot find symbol [ERROR] symbol : class DataInputByteBuffer [ERROR] location: class org.apache.hadoop.hive.ql.exec.tez.CustomPartitionEdge [ERROR] /home/satish/work/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionEdge.java:[73,35] cannot find symbol [ERROR] symbol : class DataInputByteBuffer [ERROR] location: class org.apache.hadoop.hive.ql.exec.tez.CustomPartitionEdge {noformat} Commit for HIVE-7925 breaks hadoop-1 build -- Key: HIVE-8040 URL: https://issues.apache.org/jira/browse/HIVE-8040 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Xuefu Zhang Attachments: HIVE-8040.1.patch.txt, HIVE-8040.2.patch.txt {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-metastore: Compilation failure [ERROR] /home/xzhang/apache/hive7/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:[45,37] package org.apache.commons.math3.stat does not exist [ERROR] - [Help 1] {code} Missing pom file changes maybe? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7777) add CSV support for Serde
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131445#comment-14131445 ] Hive QA commented on HIVE-: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668248/HIVE-.2.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6201 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/758/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/758/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-758/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668248 add CSV support for Serde - Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.patch, csv-serde-master.zip There is no official support for csvSerde for hive while there is an open source project in github(https://github.com/ogrodnek/csv-serde). CSV is of high frequency in use as a data format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-712) Cleanup build scripts for Hive
[ https://issues.apache.org/jira/browse/HIVE-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Francke resolved HIVE-712. --- Resolution: Fixed Assignee: (was: Ashish Thusoo) Fixed as part of the Mavenization Cleanup build scripts for Hive -- Key: HIVE-712 URL: https://issues.apache.org/jira/browse/HIVE-712 Project: Hive Issue Type: Improvement Components: Build Infrastructure Affects Versions: 0.3.0 Reporter: Ashish Thusoo Priority: Minor The build scripts for hive have a lot of duplication in build-common.xml and the individual build.xml for the different modules. We need to simply this as well as remove any duplications etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-721) Integration with HadoopDB
[ https://issues.apache.org/jira/browse/HIVE-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131455#comment-14131455 ] Lars Francke commented on HIVE-721: --- There's not much development on HadoopDB and there's Tez and Spark now. Do you plan to work on this? Otherwise I suggest closing it. Integration with HadoopDB - Key: HIVE-721 URL: https://issues.apache.org/jira/browse/HIVE-721 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.4.0 Reporter: Azza Abouzeid Priority: Minor Original Estimate: 2h Remaining Estimate: 2h The HadoopDB project integrates Hadoop with single node databases, which provide a high performance data layer for analytical queries over structured data. HadoopDB's SMS (SQL-to-MapReduce-to-SQL) component uses Hive's SemanticAnalyzer to convert SQL to MapReduce plans. After plan generation, we recreate SQL from the lower plan operators and push the SQL into database layer maintaining the upper layers of the plan, that can't be pushed into the single node databases, intact. For more information on this process, please read the HadoopDB paper (http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf) and browse the source code if you feel like it (more specifically the SQLQueryGenerator class) at http://sourceforge.net/projects/hadoopdb/. HadoopDB is a natural system level extension of Hive's goal of providing a simple SQL interface for large-scale data processing. A simple patch that integrates Hive with HadoopDB's SMS could be found here: http://hadoopdb.svn.sourceforge.net/viewvc/hadoopdb/trunk/Patches/hive-sms.patch?view=log In addition to the semantic analyzer post-processing, we modified certain areas to allow paths to be associated with databases to allow the recreation of the operator tree from the map.input.file configuration. Instead of FileInputSplit --- we set up an interface Pathable, to allow any inputsplit that implements pathable to return a dummy path equivalent to the map.input.file path. Instead of the post semantic analysis function call to the SQLQueryGenerator class, you could also use hooks. One such suggestion provided by a HadoopDB user is found here http://sourceforge.net/tracker/index.php?func=detailaid=2829253group_id=269559atid=1146689. We would really appreciate your help in better integrating Hive and HadoopDB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-2438) add trademark attributions to Hive homepage
[ https://issues.apache.org/jira/browse/HIVE-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131458#comment-14131458 ] Lars Francke commented on HIVE-2438: Looking at the homepage and the linked document I think this is fixed and this issue can be closed. add trademark attributions to Hive homepage --- Key: HIVE-2438 URL: https://issues.apache.org/jira/browse/HIVE-2438 Project: Hive Issue Type: Sub-task Reporter: John Sichi Assignee: Carl Steinbach http://www.apache.org/foundation/marks/pmcs.html#attributions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-2437) update project website navigation links
[ https://issues.apache.org/jira/browse/HIVE-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131459#comment-14131459 ] Lars Francke commented on HIVE-2437: I think this has been fixed and the issue can be closed unless I'm missing something. update project website navigation links --- Key: HIVE-2437 URL: https://issues.apache.org/jira/browse/HIVE-2437 Project: Hive Issue Type: Sub-task Reporter: John Sichi Assignee: Carl Steinbach http://www.apache.org/foundation/marks/pmcs.html#navigation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25245: Support dynamic service discovery for HiveServer2
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25245/ --- (Updated Sept. 12, 2014, 12:30 p.m.) Review request for hive, Alan Gates, Navis Ryu, Szehon Ho, and Thejas Nair. Bugs: HIVE-7935 https://issues.apache.org/jira/browse/HIVE-7935 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-7935 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5d2e6b0 itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java ae128a9 jdbc/pom.xml 1ad13a7 jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 jdbc/src/java/org/apache/hive/jdbc/HiveDriver.java 6e248d6 jdbc/src/java/org/apache/hive/jdbc/JdbcUriParseException.java PRE-CREATION jdbc/src/java/org/apache/hive/jdbc/Utils.java 58339bf jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientException.java PRE-CREATION jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientHelper.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 0919d2f ql/src/java/org/apache/hadoop/hive/ql/util/ZooKeeperHiveHelper.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/lockmgr/zookeeper/TestZookeeperLockManager.java 59294b1 service/src/java/org/apache/hive/service/cli/CLIService.java a0bc905 service/src/java/org/apache/hive/service/cli/operation/OperationManager.java f5a8f27 service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java b0bb8be service/src/java/org/apache/hive/service/cli/session/SessionManager.java 11d25cc service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java 2b80adc service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 443c371 service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 4067106 service/src/java/org/apache/hive/service/server/HiveServer2.java 124996c service/src/test/org/apache/hive/service/cli/session/TestSessionGlobalInitFile.java 66fc1fc Diff: https://reviews.apache.org/r/25245/diff/ Testing --- Manual testing. Thanks, Vaibhav Gumashta
[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk
[ https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-7946: - Status: Open (was: Patch Available) CBO: Merge CBO changes to Trunk --- Key: HIVE-7946 URL: https://issues.apache.org/jira/browse/HIVE-7946 Project: Hive Issue Type: Bug Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, HIVE-7946.4.patch, HIVE-7946.5.patch, HIVE-7946.6.patch, HIVE-7946.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk
[ https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-7946: - Status: Patch Available (was: Open) CBO: Merge CBO changes to Trunk --- Key: HIVE-7946 URL: https://issues.apache.org/jira/browse/HIVE-7946 Project: Hive Issue Type: Bug Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, HIVE-7946.4.patch, HIVE-7946.5.patch, HIVE-7946.6.patch, HIVE-7946.7.patch, HIVE-7946.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk
[ https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-7946: - Attachment: HIVE-7946.7.patch CBO: Merge CBO changes to Trunk --- Key: HIVE-7946 URL: https://issues.apache.org/jira/browse/HIVE-7946 Project: Hive Issue Type: Bug Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, HIVE-7946.4.patch, HIVE-7946.5.patch, HIVE-7946.6.patch, HIVE-7946.7.patch, HIVE-7946.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8061) improve the partition col stats update speed
[ https://issues.apache.org/jira/browse/HIVE-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131510#comment-14131510 ] Hive QA commented on HIVE-8061: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668256/HIVE-8061.2.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6197 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnstats_partlvl_dp org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/759/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/759/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-759/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668256 improve the partition col stats update speed Key: HIVE-8061 URL: https://issues.apache.org/jira/browse/HIVE-8061 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Minor Attachments: HIVE-8061.1.patch, HIVE-8061.2.patch We worked hard towards faster update stats for columns of a partition of a table previously HIVE-7736 and HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to Eugene Koifman 's comments. We fixed this in HIVE-7944 by reversing the patch. This JIRA ticket is my another try to improve the speed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7788) Generate plans for insert, update, and delete
[ https://issues.apache.org/jira/browse/HIVE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7788: - Status: Open (was: Patch Available) Canceling patch as I need to fix the failing tests. Generate plans for insert, update, and delete - Key: HIVE-7788 URL: https://issues.apache.org/jira/browse/HIVE-7788 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7788.2.patch, HIVE-7788.3.patch, HIVE-7788.WIP.patch, HIVE-7788.patch Insert plans needs to be generated differently for ACID tables, plus we need to be able to generate plans in the semantic analyzer for update and delete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7788) Generate plans for insert, update, and delete
[ https://issues.apache.org/jira/browse/HIVE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7788: - Attachment: HIVE-7788.4.patch Backed out the change where I used isAssignedFrom instead of equals to check whether an output format was an acid output format, as it incorrectly said all output formats were acid. Generate plans for insert, update, and delete - Key: HIVE-7788 URL: https://issues.apache.org/jira/browse/HIVE-7788 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7788.2.patch, HIVE-7788.3.patch, HIVE-7788.4.patch, HIVE-7788.WIP.patch, HIVE-7788.patch Insert plans needs to be generated differently for ACID tables, plus we need to be able to generate plans in the semantic analyzer for update and delete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7788) Generate plans for insert, update, and delete
[ https://issues.apache.org/jira/browse/HIVE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7788: - Status: Patch Available (was: Open) Generate plans for insert, update, and delete - Key: HIVE-7788 URL: https://issues.apache.org/jira/browse/HIVE-7788 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7788.2.patch, HIVE-7788.3.patch, HIVE-7788.4.patch, HIVE-7788.WIP.patch, HIVE-7788.patch Insert plans needs to be generated differently for ACID tables, plus we need to be able to generate plans in the semantic analyzer for update and delete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131603#comment-14131603 ] Xuefu Zhang commented on HIVE-8017: --- {quote} do you think we need a JIRA to track this difference so we can find the cause when we have time {quote} Yes, please. Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25414: HIVE-7788 Generate plans for insert, update, and delete
On Sept. 9, 2014, 9:33 p.m., Eugene Koifman wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 11872 https://reviews.apache.org/r/25414/diff/1/?file=682029#file682029line11872 does this work if of implements a sublcass of AcidOutputFormat? Perhaps Class.isAssignableFrom() is a safer choice Alan Gates wrote: Changed. Actually, I had to back this out. Making this change made it so that it said all output formats were acid. - Alan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25414/#review52655 --- On Sept. 11, 2014, 2:17 p.m., Alan Gates wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25414/ --- (Updated Sept. 11, 2014, 2:17 p.m.) Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jason Dere, and Thejas Nair. Bugs: HIVE-7788 https://issues.apache.org/jira/browse/HIVE-7788 Repository: hive-git Description --- This patch adds plan generation as well as making modifications to some of the exec operators to make insert/value, update, and delete work. The patch is large, but about 2/3 of that are tests. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5d2e6b0 data/conf/tez/hive-site.xml 0b3877c itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java 1a84024 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java 9807497 itests/src/test/resources/testconfiguration.properties 99049ca metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java f1697bb ql/src/java/org/apache/hadoop/hive/ql/Context.java 7fcbe3c ql/src/java/org/apache/hadoop/hive/ql/Driver.java 9953919 ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 4246d68 ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 7477199 ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java f018ca0 ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java e3bc3b1 ql/src/java/org/apache/hadoop/hive/ql/hooks/WriteEntity.java 7f1d71b ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java b1c4441 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 264052f ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 8354ad9 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 32d2f7a ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2b1a345 ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 4acafba ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketingSortingReduceSinkOptimizer.java 96a5d78 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java 5c711cf ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 5195748 ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 911ac8a ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 496f6a6 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 3e3926e ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java ad91b0f ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 2dbf1c8 ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 6dce30c ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 5695f35 ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 5164b16 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java 789c780 ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 63ecb8d ql/src/test/org/apache/hadoop/hive/ql/parse/TestUpdateDeleteSemanticAnalyzer.java PRE-CREATION ql/src/test/queries/clientnegative/acid_overwrite.q PRE-CREATION ql/src/test/queries/clientnegative/delete_not_acid.q PRE-CREATION ql/src/test/queries/clientnegative/update_not_acid.q PRE-CREATION ql/src/test/queries/clientnegative/update_partition_col.q PRE-CREATION ql/src/test/queries/clientpositive/delete_all_non_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/delete_all_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/delete_orig_table.q PRE-CREATION ql/src/test/queries/clientpositive/delete_tmp_table.q PRE-CREATION ql/src/test/queries/clientpositive/delete_where_no_match.q PRE-CREATION ql/src/test/queries/clientpositive/delete_where_non_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/delete_where_partitioned.q PRE-CREATION
[jira] [Comment Edited] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131603#comment-14131603 ] Xuefu Zhang edited comment on HIVE-8017 at 9/12/14 2:30 PM: {quote} do you think we need a JIRA to track this difference so we can find the cause when we have time {quote} Yes, please. I will commit this patch shortly. was (Author: xuefuz): {quote} do you think we need a JIRA to track this difference so we can find the cause when we have time {quote} Yes, please. Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8069) CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR
[ https://issues.apache.org/jira/browse/HIVE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-8069: Attachment: HIVE-8069.1.patch previous patch file was wrong CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR -- Key: HIVE-8069 URL: https://issues.apache.org/jira/browse/HIVE-8069 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-8069.1.patch, HIVE-8069.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8069) CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR
[ https://issues.apache.org/jira/browse/HIVE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-8069: Fix Version/s: 0.14.0 CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR -- Key: HIVE-8069 URL: https://issues.apache.org/jira/browse/HIVE-8069 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.14.0 Attachments: HIVE-8069.1.patch, HIVE-8069.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8069) CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR
[ https://issues.apache.org/jira/browse/HIVE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani resolved HIVE-8069. - Resolution: Fixed Committed to CBO branch [~jpullokkaran] thanks for reviewing CBO: RowResolver after SubQuery predicate handling should be reset to outer query block RR -- Key: HIVE-8069 URL: https://issues.apache.org/jira/browse/HIVE-8069 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-8069.1.patch, HIVE-8069.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7325) Support non-constant expressions for MAP type indices.
[ https://issues.apache.org/jira/browse/HIVE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131614#comment-14131614 ] Hive QA commented on HIVE-7325: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668272/HIVE-7325.3.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6195 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_map_index {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/760/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/760/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-760/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668272 Support non-constant expressions for MAP type indices. -- Key: HIVE-7325 URL: https://issues.apache.org/jira/browse/HIVE-7325 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Mala Chikka Kempanna Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-7325.1.patch.txt, HIVE-7325.2.patch.txt, HIVE-7325.3.patch.txt Here is my sample: {code} CREATE TABLE RECORD(RecordID string, BatchDate string, Country string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,D:BatchDate,D:Country) TBLPROPERTIES (hbase.table.name = RECORD); CREATE TABLE KEY_RECORD(KeyValue String, RecordId mapstring,string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key, K:) TBLPROPERTIES (hbase.table.name = KEY_RECORD); {code} The following join statement doesn't work. {code} SELECT a.*, b.* from KEY_RECORD a join RECORD b WHERE a.RecordId[b.RecordID] is not null; {code} FAILED: SemanticException 2:16 Non-constant expression for map indexes not supported. Error encountered near token 'RecordID' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation
[ https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131648#comment-14131648 ] Pankit Thapar commented on HIVE-8038: - Hi, Can you please take a look at this cr : https://reviews.apache.org/r/25521/ Thanks, Pankit Decouple ORC files split calculation logic from Filesystem's get file location implementation - Key: HIVE-8038 URL: https://issues.apache.org/jira/browse/HIVE-8038 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.14.0 Attachments: HIVE-8038.patch What is the Current Logic == 1.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using the array index (index = offset/blockSize), get the corresponding host having the blockLocation 4.If the split spans multiple blocks, then get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits Issue with Current Logic = Dependency on FileSystem API’s logic for block location calculations. It returns an array and we need to rely on FileSystem to make all blocks of same size if we want to directly access a block from the array. What is the Fix = 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 1b.convert the array into a tree map offset, BlockLocation and return it through getLocationsWithOffSet() 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using Tree.floorEntry(key), get the highest entry smaller than offset for the split and get the corresponding host. 4a.If the split spans multiple blocks, get a submap, which contains all entries containing blockLocations from the offset to offset + length 4b.get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits What are the major changes in logic == 1. store BlockLocations in a Map instead of an array 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations() 3. one block case is checked by if(offset + length = start.getOffset() + start.getLength()) instead of if((offset % blockSize) + length = blockSize) What is the affect on Complexity (Big O) = 1. We add a O(n) loop to build a TreeMap from an array but its a one time cost and would not be called for each split 2. In case of one block case, we can get the block in O(logn) worst case which was O(1) before 3. Getting the submap is O(logn) 4. In case of multiple block case, building the list of hosts is O(m) which was O(n) m n as previously we were iterating over all the block locations but now we are only iterating only blocks that belong to that range go offsets that we need. What are the benefits of the change == 1. With this fix, we do not depend on the blockLocations returned by FileSystem to figure out the block corresponding to the offset and blockSize 2. Also, it is not necessary that block lengths is same for all blocks for all FileSystems 3. Previously we were using blockSize for one block case and block.length for multiple block case, which is not the case now. We figure out the block depending upon the actual length and offset of the block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8062) Stats collection for columns fails on a partitioned table with null values in partitioning column
[ https://issues.apache.org/jira/browse/HIVE-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131701#comment-14131701 ] Hive QA commented on HIVE-8062: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668275/HIVE-8062.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6197 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.parse.TestParse.testParse_union org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/761/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/761/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-761/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668275 Stats collection for columns fails on a partitioned table with null values in partitioning column - Key: HIVE-8062 URL: https://issues.apache.org/jira/browse/HIVE-8062 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.14.0 Reporter: Deepesh Khandelwal Assignee: Ashutosh Chauhan Attachments: HIVE-8062.patch Steps to reproduce: 1. Create a data file abc.txt with the following contents: {noformat} a,1 b, {noformat} 2. Use the Hive CLI to create and load the partitioned table: {noformat} hive create table abc(a string, b int); OK Time taken: 0.272 seconds hive load data local inpath 'abc.txt' into table abc; Loading data to table default.abc Table default.abc stats: [numFiles=1, numRows=0, totalSize=7, rawDataSize=0] OK Time taken: 0.463 seconds hive create table abc1(a string) partitioned by (b int); OK Time taken: 0.098 seconds hive set hive.exec.dynamic.partition.mode=nonstrict; hive insert overwrite table abc1 partition (b) select a, b from abc; Query ID = hrt_qa_20140911210909_1200fae7-1e18-4e0d-b74f-040453c27cff Total jobs = 1 Launching Job 1 out of 1 Status: Running (application id: Executing on YARN cluster with App id application_1410457588978_0063) Map 1: -/-Reducer 2: 0/1 Map 1: 0/1Reducer 2: 0/1 Map 1: 0(+1)/1Reducer 2: 0/1 Map 1: 1/1Reducer 2: 0(+1)/1 Map 1: 1/1Reducer 2: 0/1 Map 1: 1/1Reducer 2: 1/1 Status: Finished successfully Loading data to table default.abc1 partition (b=null) Loading partition {b=__HIVE_DEFAULT_PARTITION__} Partition default.abc1{b=__HIVE_DEFAULT_PARTITION__} stats: [numFiles=1, numRows=2, totalSize=7, rawDataSize=5] OK Time taken: 7.49 seconds {noformat} 3. Now run the analyze statistics command for columns: {noformat} hive analyze table abc1 partition (b) compute statistics for columns; Query ID = hrt_qa_20140911211010_440bdb4a-6a0d-496b-9d2e-5fc84db3d0ee Total jobs = 1 Launching Job 1 out of 1 Status: Running (application id: Executing on YARN cluster with App id application_1410457588978_0063) Map 1: 0(+1)/1Reducer 2: 0/1 Map 1: 1/1Reducer 2: 0(+1)/1 Map 1: 1/1Reducer 2: 1/1 Status: Finished successfully FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask {noformat} The analyze statistics for columns fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8017: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Patch committed to spark branch. Thanks to Rui for the contribution. Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8017: -- Labels: Spark-M1 (was: ) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Labels: Spark-M1 Fix For: spark-branch Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, HIVE-8017.3-spark.patch, HIVE-8017.4-spark.patch, HIVE-8017.5-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8042) Optionally allow move tasks to run in parallel
[ https://issues.apache.org/jira/browse/HIVE-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131721#comment-14131721 ] Xuefu Zhang commented on HIVE-8042: --- HiveConf.java actually doesn't say anything about task types. It uses a general term job, which seems good even after this patch. Wiki of course can supply more info. Optionally allow move tasks to run in parallel -- Key: HIVE-8042 URL: https://issues.apache.org/jira/browse/HIVE-8042 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.14.0 Attachments: HIVE-8042.1.patch, HIVE-8042.2.patch, HIVE-8042.3.patch hive.exec.parallel allows one to run different stages of a query in parallel. However that applies only to map-reduce tasks. When using large multi insert queries there are many MoveTasks that are all executed in sequence on the client. There's no real reason for that - they could be run in parallel as well (i.e.: the stage graph captures the dependencies and knows which tasks can happen in parallel). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8072) TesParse_union is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8072: --- Status: Patch Available (was: Open) TesParse_union is failing on trunk -- Key: HIVE-8072 URL: https://issues.apache.org/jira/browse/HIVE-8072 Project: Hive Issue Type: Task Components: Tests Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8072.patch Needs golden file update -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8072) TesParse_union is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8072: --- Attachment: HIVE-8072.patch TesParse_union is failing on trunk -- Key: HIVE-8072 URL: https://issues.apache.org/jira/browse/HIVE-8072 Project: Hive Issue Type: Task Components: Tests Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8072.patch Needs golden file update -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8072) TesParse_union is failing on trunk
Ashutosh Chauhan created HIVE-8072: -- Summary: TesParse_union is failing on trunk Key: HIVE-8072 URL: https://issues.apache.org/jira/browse/HIVE-8072 Project: Hive Issue Type: Task Components: Tests Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8072.patch Needs golden file update -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7812) Disable CombineHiveInputFormat when ACID format is used
[ https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-7812: Resolution: Fixed Fix Version/s: 0.14.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Disable CombineHiveInputFormat when ACID format is used --- Key: HIVE-7812 URL: https://issues.apache.org/jira/browse/HIVE-7812 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.14.0 Attachments: HIVE-7812.patch, HIVE-7812.patch, HIVE-7812.patch Currently the HiveCombineInputFormat complains when called on an ACID directory. Modify HiveCombineInputFormat so that HiveInputFormat is used instead if the directory is ACID format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-2438) add trademark attributions to Hive homepage
[ https://issues.apache.org/jira/browse/HIVE-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland resolved HIVE-2438. Resolution: Fixed Good call, I fixed this when creating the new site. add trademark attributions to Hive homepage --- Key: HIVE-2438 URL: https://issues.apache.org/jira/browse/HIVE-2438 Project: Hive Issue Type: Sub-task Reporter: John Sichi Assignee: Carl Steinbach http://www.apache.org/foundation/marks/pmcs.html#attributions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8036) PTest SSH Options
[ https://issues.apache.org/jira/browse/HIVE-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131761#comment-14131761 ] Brock Noland commented on HIVE-8036: Not today. The docs we have today are not about ptest but about using the infra ptest creates. I need to go ahead and create some basic user docs for ptest and then we can start docing this stuff. PTest SSH Options - Key: HIVE-8036 URL: https://issues.apache.org/jira/browse/HIVE-8036 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.14.0 Attachments: HIVE-8036.patch I'd like to be able to specify the following options: {noformat} StrictHostKeyChecking no ConnectionAttempts 3 ServerAliveInterval 1 {noformat} as a config param in the ptest config file as opposed to depending on them set in the env. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8073) Go thru all operator plan optimizations and disable those that are not suitable for Spark [Spark Branch]
Xuefu Zhang created HIVE-8073: - Summary: Go thru all operator plan optimizations and disable those that are not suitable for Spark [Spark Branch] Key: HIVE-8073 URL: https://issues.apache.org/jira/browse/HIVE-8073 Project: Hive Issue Type: Task Components: Spark Reporter: Xuefu Zhang I have seen some optimization done in the logical plan that's not applicable, such as in HIVE-8054. We should go thru all those optimizaitons to identify if any. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8062) Stats collection for columns fails on a partitioned table with null values in partitioning column
[ https://issues.apache.org/jira/browse/HIVE-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8062: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) TestParse_union is failing on trunk too. Created HIVE-8072 for it. Committed this one to trunk. Stats collection for columns fails on a partitioned table with null values in partitioning column - Key: HIVE-8062 URL: https://issues.apache.org/jira/browse/HIVE-8062 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.14.0 Reporter: Deepesh Khandelwal Assignee: Ashutosh Chauhan Fix For: 0.14.0 Attachments: HIVE-8062.patch Steps to reproduce: 1. Create a data file abc.txt with the following contents: {noformat} a,1 b, {noformat} 2. Use the Hive CLI to create and load the partitioned table: {noformat} hive create table abc(a string, b int); OK Time taken: 0.272 seconds hive load data local inpath 'abc.txt' into table abc; Loading data to table default.abc Table default.abc stats: [numFiles=1, numRows=0, totalSize=7, rawDataSize=0] OK Time taken: 0.463 seconds hive create table abc1(a string) partitioned by (b int); OK Time taken: 0.098 seconds hive set hive.exec.dynamic.partition.mode=nonstrict; hive insert overwrite table abc1 partition (b) select a, b from abc; Query ID = hrt_qa_20140911210909_1200fae7-1e18-4e0d-b74f-040453c27cff Total jobs = 1 Launching Job 1 out of 1 Status: Running (application id: Executing on YARN cluster with App id application_1410457588978_0063) Map 1: -/-Reducer 2: 0/1 Map 1: 0/1Reducer 2: 0/1 Map 1: 0(+1)/1Reducer 2: 0/1 Map 1: 1/1Reducer 2: 0(+1)/1 Map 1: 1/1Reducer 2: 0/1 Map 1: 1/1Reducer 2: 1/1 Status: Finished successfully Loading data to table default.abc1 partition (b=null) Loading partition {b=__HIVE_DEFAULT_PARTITION__} Partition default.abc1{b=__HIVE_DEFAULT_PARTITION__} stats: [numFiles=1, numRows=2, totalSize=7, rawDataSize=5] OK Time taken: 7.49 seconds {noformat} 3. Now run the analyze statistics command for columns: {noformat} hive analyze table abc1 partition (b) compute statistics for columns; Query ID = hrt_qa_20140911211010_440bdb4a-6a0d-496b-9d2e-5fc84db3d0ee Total jobs = 1 Launching Job 1 out of 1 Status: Running (application id: Executing on YARN cluster with App id application_1410457588978_0063) Map 1: 0(+1)/1Reducer 2: 0/1 Map 1: 1/1Reducer 2: 0(+1)/1 Map 1: 1/1Reducer 2: 1/1 Status: Finished successfully FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask {noformat} The analyze statistics for columns fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-8040) Commit for HIVE-7925 breaks hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley reopened HIVE-8040: - Commit for HIVE-7925 breaks hadoop-1 build -- Key: HIVE-8040 URL: https://issues.apache.org/jira/browse/HIVE-8040 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Xuefu Zhang Attachments: HIVE-8040.1.patch.txt, HIVE-8040.2.patch.txt {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-metastore: Compilation failure [ERROR] /home/xzhang/apache/hive7/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:[45,37] package org.apache.commons.math3.stat does not exist [ERROR] - [Help 1] {code} Missing pom file changes maybe? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8040) Commit for HIVE-7925 breaks hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-8040: Attachment: HIVE-8040.patch With this patch on trunk, the hadoop-1 profile compiles. Commit for HIVE-7925 breaks hadoop-1 build -- Key: HIVE-8040 URL: https://issues.apache.org/jira/browse/HIVE-8040 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Xuefu Zhang Attachments: HIVE-8040.1.patch.txt, HIVE-8040.2.patch.txt, HIVE-8040.patch {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-metastore: Compilation failure [ERROR] /home/xzhang/apache/hive7/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:[45,37] package org.apache.commons.math3.stat does not exist [ERROR] - [Help 1] {code} Missing pom file changes maybe? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7981) alias of compound aggregation functions fails in having clause
[ https://issues.apache.org/jira/browse/HIVE-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131802#comment-14131802 ] Hive QA commented on HIVE-7981: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668287/HIVE-7981.2.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6198 tests executed *Failed tests:* {noformat} org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/762/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/762/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-762/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668287 alias of compound aggregation functions fails in having clause -- Key: HIVE-7981 URL: https://issues.apache.org/jira/browse/HIVE-7981 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: eyal gruss Assignee: Navis Priority: Minor Attachments: HIVE-7981.1.patch.txt, HIVE-7981.2.patch.txt hive select max(time)-min(time) as span from mytable group by name having span0; FAILED: SemanticException [Error 10025]: Line 1:92 Expression not in GROUP BY key '0' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8040) Commit for HIVE-7925 breaks hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8040: --- Status: Patch Available (was: Reopened) LGTM. +1 Copying class from hadoop to hive seems ok since there is no dependency of that class .cc: [~hagleitn] Commit for HIVE-7925 breaks hadoop-1 build -- Key: HIVE-8040 URL: https://issues.apache.org/jira/browse/HIVE-8040 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Xuefu Zhang Attachments: HIVE-8040.1.patch.txt, HIVE-8040.2.patch.txt, HIVE-8040.patch {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-metastore: Compilation failure [ERROR] /home/xzhang/apache/hive7/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:[45,37] package org.apache.commons.math3.stat does not exist [ERROR] - [Help 1] {code} Missing pom file changes maybe? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7981) alias of compound aggregation functions fails in having clause
[ https://issues.apache.org/jira/browse/HIVE-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131812#comment-14131812 ] Ashutosh Chauhan commented on HIVE-7981: [~navis] Can you create RB entry for this? alias of compound aggregation functions fails in having clause -- Key: HIVE-7981 URL: https://issues.apache.org/jira/browse/HIVE-7981 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: eyal gruss Assignee: Navis Priority: Minor Attachments: HIVE-7981.1.patch.txt, HIVE-7981.2.patch.txt hive select max(time)-min(time) as span from mytable group by name having span0; FAILED: SemanticException [Error 10025]: Line 1:92 Expression not in GROUP BY key '0' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7156) Group-By operator stat-annotation only uses distinct approx to generate rollups
[ https://issues.apache.org/jira/browse/HIVE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7156: - Status: Open (was: Patch Available) Group-By operator stat-annotation only uses distinct approx to generate rollups --- Key: HIVE-7156 URL: https://issues.apache.org/jira/browse/HIVE-7156 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Prasanth J Attachments: HIVE-7156.1.patch, HIVE-7156.2.patch, HIVE-7156.3.patch, HIVE-7156.4.patch The stats annotation for a group-by only annotates the reduce-side row-count with the distinct values. The map-side gets the row-count as the rows output instead of distinct * parallelism, while the reducer side gets the correct parallelism. {code} hive explain select distinct L_SHIPDATE from lineitem; Vertices: Map 1 Map Operator Tree: TableScan alias: lineitem Statistics: Num rows: 589709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: l_shipdate (type: string) outputColumnNames: l_shipdate Statistics: Num rows: 589709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: l_shipdate (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 589709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 589709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Reducer 2 Reduce Operator Tree: Group By Operator keys: KEY._col0 (type: string) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: string) outputColumnNames: _col0 Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8040) Commit for HIVE-7925 breaks hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-8040: Resolution: Fixed Status: Resolved (was: Patch Available) Given that this is fixing the build, with Ashutosh's review I've committed it. Commit for HIVE-7925 breaks hadoop-1 build -- Key: HIVE-8040 URL: https://issues.apache.org/jira/browse/HIVE-8040 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Xuefu Zhang Attachments: HIVE-8040.1.patch.txt, HIVE-8040.2.patch.txt, HIVE-8040.patch {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-metastore: Compilation failure [ERROR] /home/xzhang/apache/hive7/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:[45,37] package org.apache.commons.math3.stat does not exist [ERROR] - [Help 1] {code} Missing pom file changes maybe? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7704) Create tez task for fast file merging
[ https://issues.apache.org/jira/browse/HIVE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7704: - Attachment: HIVE-7704.12.patch HIVE-7859 changes ORC file sizes which causes diffs in 2 test files. Updated them in this patch. Create tez task for fast file merging - Key: HIVE-7704 URL: https://issues.apache.org/jira/browse/HIVE-7704 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7704.1.patch, HIVE-7704.10.patch, HIVE-7704.11.patch, HIVE-7704.12.patch, HIVE-7704.2.patch, HIVE-7704.3.patch, HIVE-7704.4.patch, HIVE-7704.4.patch, HIVE-7704.5.patch, HIVE-7704.6.patch, HIVE-7704.7.patch, HIVE-7704.8.patch, HIVE-7704.9.patch Currently tez falls back to MR task for merge file task. It will beneficial to convert the merge file tasks to tez task to make use of the performance gains from tez. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7788) Generate plans for insert, update, and delete
[ https://issues.apache.org/jira/browse/HIVE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131874#comment-14131874 ] Thejas M Nair commented on HIVE-7788: - Alan, can you please upload the updated patch to review board ? Generate plans for insert, update, and delete - Key: HIVE-7788 URL: https://issues.apache.org/jira/browse/HIVE-7788 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7788.2.patch, HIVE-7788.3.patch, HIVE-7788.4.patch, HIVE-7788.WIP.patch, HIVE-7788.patch Insert plans needs to be generated differently for ACID tables, plus we need to be able to generate plans in the semantic analyzer for update and delete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8074) Merge spark into trunk 9/12/2014
Brock Noland created HIVE-8074: -- Summary: Merge spark into trunk 9/12/2014 Key: HIVE-8074 URL: https://issues.apache.org/jira/browse/HIVE-8074 Project: Hive Issue Type: Sub-task Reporter: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8076) Test Failure input23
Laljo John Pullokkaran created HIVE-8076: Summary: Test Failure input23 Key: HIVE-8076 URL: https://issues.apache.org/jira/browse/HIVE-8076 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8075) Test limit_pushdown failure
Laljo John Pullokkaran created HIVE-8075: Summary: Test limit_pushdown failure Key: HIVE-8075 URL: https://issues.apache.org/jira/browse/HIVE-8075 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8076) CBO Trunk Merge: Test Failure input23
[ https://issues.apache.org/jira/browse/HIVE-8076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-8076: - Summary: CBO Trunk Merge: Test Failure input23 (was: Test Failure input23) CBO Trunk Merge: Test Failure input23 - Key: HIVE-8076 URL: https://issues.apache.org/jira/browse/HIVE-8076 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8075) CBO Trunk Merge: Test limit_pushdown failure
[ https://issues.apache.org/jira/browse/HIVE-8075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-8075: - Summary: CBO Trunk Merge: Test limit_pushdown failure (was: Test limit_pushdown failure) CBO Trunk Merge: Test limit_pushdown failure Key: HIVE-8075 URL: https://issues.apache.org/jira/browse/HIVE-8075 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8077) Test Failure vectorization_7
Laljo John Pullokkaran created HIVE-8077: Summary: Test Failure vectorization_7 Key: HIVE-8077 URL: https://issues.apache.org/jira/browse/HIVE-8077 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8077) CBO Trunk Merge: Test Failure vectorization_7
[ https://issues.apache.org/jira/browse/HIVE-8077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-8077: - Summary: CBO Trunk Merge: Test Failure vectorization_7 (was: Test Failure vectorization_7) CBO Trunk Merge: Test Failure vectorization_7 - Key: HIVE-8077 URL: https://issues.apache.org/jira/browse/HIVE-8077 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8077) CBO Trunk Merge: Test Failure vectorization_7
[ https://issues.apache.org/jira/browse/HIVE-8077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran reassigned HIVE-8077: Assignee: Laljo John Pullokkaran CBO Trunk Merge: Test Failure vectorization_7 - Key: HIVE-8077 URL: https://issues.apache.org/jira/browse/HIVE-8077 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7325) Support non-constant expressions for MAP type indices.
[ https://issues.apache.org/jira/browse/HIVE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131911#comment-14131911 ] Jason Dere commented on HIVE-7325: -- This looks good. +1 if you remove invalid_map_index.q since it looks like that test is no longer valid with your changes to map index. Support non-constant expressions for MAP type indices. -- Key: HIVE-7325 URL: https://issues.apache.org/jira/browse/HIVE-7325 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Mala Chikka Kempanna Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-7325.1.patch.txt, HIVE-7325.2.patch.txt, HIVE-7325.3.patch.txt Here is my sample: {code} CREATE TABLE RECORD(RecordID string, BatchDate string, Country string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,D:BatchDate,D:Country) TBLPROPERTIES (hbase.table.name = RECORD); CREATE TABLE KEY_RECORD(KeyValue String, RecordId mapstring,string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key, K:) TBLPROPERTIES (hbase.table.name = KEY_RECORD); {code} The following join statement doesn't work. {code} SELECT a.*, b.* from KEY_RECORD a join RECORD b WHERE a.RecordId[b.RecordID] is not null; {code} FAILED: SemanticException 2:16 Non-constant expression for map indexes not supported. Error encountered near token 'RecordID' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8061) improve the partition col stats update speed
[ https://issues.apache.org/jira/browse/HIVE-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8061: -- Status: Open (was: Patch Available) improve the partition col stats update speed Key: HIVE-8061 URL: https://issues.apache.org/jira/browse/HIVE-8061 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Minor Attachments: HIVE-8061.1.patch, HIVE-8061.2.patch, HIVE-8061.3.patch We worked hard towards faster update stats for columns of a partition of a table previously HIVE-7736 and HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to Eugene Koifman 's comments. We fixed this in HIVE-7944 by reversing the patch. This JIRA ticket is my another try to improve the speed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25557: improve the speed of col stats update speed
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25557/ --- (Updated Sept. 12, 2014, 6:47 p.m.) Review request for hive. Changes --- address case insensitive problem Repository: hive-git Description --- Major improvement (1) All the partition status update/insert is now done in one transaction. (2) Rather than to use a query to update per col per partition (total query = #col * # part), now we use 1 query to delete everything and then use 1 query to insert everything. The transaction makes sure that this happens in ACID mode. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9df6656 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 33745e4 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 5a8591a metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 637a39a metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 5c5ed7f metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 5905efe metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 88b0791 ql/src/test/queries/clientpositive/analyze_tbl_part.q 9040bd4 ql/src/test/results/clientpositive/analyze_tbl_part.q.out 40b926c Diff: https://reviews.apache.org/r/25557/diff/ Testing --- Thanks, pengcheng xiong