[jira] [Created] (HIVE-8000) Hive query plan suboptimal on select from partitioned table with partition id as filters
Chu Tong created HIVE-8000: -- Summary: Hive query plan suboptimal on select from partitioned table with partition id as filters Key: HIVE-8000 URL: https://issues.apache.org/jira/browse/HIVE-8000 Project: Hive Issue Type: Bug Components: Metastore Reporter: Chu Tong Priority: Minor When it comes to issue "SELECT * FROM test where id = 100 OR id = 101 OR id = 102" on table test with a large number of partitions, most of the load on the metastore comes from this query: select "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = @P0 inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = @P1 inner join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 where ( ( (((case when "TBLS"."TBL_NAME" = @P2 and "DBS"."NAME" = @P3 then cast("FILTER0"."PART_KEY_VAL" as decimal(21,0)) else null end) = @P4) or ((case when "TBLS"."TBL_NAME" = @P5 and "DBS"."NAME" = @P6 then cast("FILTER0"."PART_KEY_VAL" as decimal(21,0)) else null end) = @P7)) or ((case when "TBLS"."TBL_NAME" = @P8 and "DBS"."NAME" = @P9 then cast("FILTER0"."PART_KEY_VAL" as decimal(21,0)) else null end) = @P10)) ) ',N'test',N'default',N'test',N'default',25,N'test',N'default',20,N'test',N'default',21 >From the query plan, it shows several index scans can be done in seeks by >pushing down the filter operator on PART_KEY_VAL early by putting in >(FILTER0.PART_KEY_VAL in (@p4,@P7,@P10)) before case statements. And resulting query becomes: select "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = @P0 inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = @P1 inner join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 where ( (("TBLS"."TBL_NAME" = @P2 and "DBS"."NAME" = @P3 and cast("FILTER0"."PART_KEY_VAL" as decimal(21,0)) = @P4) or ("TBLS"."TBL_NAME" = @P5 and "DBS"."NAME" = @P6 and cast("FILTER0"."PART_KEY_VAL" as decimal(21,0)) = @P7)) ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7999) Hive metastore query too long when select * on table with large number of partitions
[ https://issues.apache.org/jira/browse/HIVE-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-7999: --- Description: When it comes to selecting * on a Hive table with large number of partitions, a query like this: SELECT PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, PARTITIONS.CREATE_TIME, ... SERDES.SLIB from PARTITIONS LEFT OUTER JOIN SDS ON PARTITIONS.SD_ID = SDS.SD_ID LEFT OUTER JOIN SERDES ON SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in (1,2,3,4 ... 1000 ... is generated and executed on metastore, however, due the query lists all the partitions in it, SQL DB is unable to compile/execute such a long query and causing the whole query to fail. was: When it comes to selecting * on a Hive table with large number of partitions, a query like this: SELECT PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, PARTITIONS.CREATE_TIME, ... SERDES.SLIB from PARTITIONS LEFT OUTER JOIN SDS ON PARTITIONS.SD_ID = SDS.SD_ID LEFT OUTER JOIN SERDES ON SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in (1,2,3,4 ... 1000 ... is generated and executed on metastore, however, due the query lists all the partitions in it, SQL server is unable to compile/execute such a long query and causing the whole query to fail. > Hive metastore query too long when select * on table with large number of > partitions > - > > Key: HIVE-7999 > URL: https://issues.apache.org/jira/browse/HIVE-7999 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Chu Tong > > When it comes to selecting * on a Hive table with large number of partitions, > a query like this: > SELECT PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, > PARTITIONS.CREATE_TIME, ... SERDES.SLIB from PARTITIONS LEFT OUTER JOIN SDS > ON PARTITIONS.SD_ID = SDS.SD_ID LEFT OUTER JOIN SERDES ON SDS.SERDE_ID = > SERDES.SERDE_ID where PART_ID in (1,2,3,4 ... 1000 ... > is generated and executed on metastore, however, due the query lists all the > partitions in it, SQL DB is unable to compile/execute such a long query and > causing the whole query to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7999) Hive metastore query too long when select * on table with large number of partitions
[ https://issues.apache.org/jira/browse/HIVE-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-7999: --- Description: When it comes to selecting * on a Hive table with large number of partitions, a query like this: SELECT PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, PARTITIONS.CREATE_TIME, ... SERDES.SLIB from PARTITIONS LEFT OUTER JOIN SDS ON PARTITIONS.SD_ID = SDS.SD_ID LEFT OUTER JOIN SERDES ON SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in (1,2,3,4 ... 1000 ... is generated and executed on metastore, however, due the query lists all the partitions in it, SQL server is unable to compile/execute such a long query and causing the whole query to fail. was: When it comes to selecting * on a Hive table with large number of partitions, a query like this: SELECT PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, PARTITIONS.CREATE_TIME, ... SERDES.SLIB from PARTITIONS LEFT OUTER JOIN SDS ON PARTITIONS.SD_ID = SDS.SD_ID LEFT OUTER JOIN SERDES ON SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in (1,2,3,4 ... 1000 ... is generated and executed on metastore, however, due the query lists all the partitions in it, SQL DB is unable to compile/execute such a long query and causing the whole query to fail. > Hive metastore query too long when select * on table with large number of > partitions > - > > Key: HIVE-7999 > URL: https://issues.apache.org/jira/browse/HIVE-7999 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Chu Tong > > When it comes to selecting * on a Hive table with large number of partitions, > a query like this: > SELECT PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, > PARTITIONS.CREATE_TIME, ... SERDES.SLIB from PARTITIONS LEFT OUTER JOIN SDS > ON PARTITIONS.SD_ID = SDS.SD_ID LEFT OUTER JOIN SERDES ON SDS.SERDE_ID = > SERDES.SERDE_ID where PART_ID in (1,2,3,4 ... 1000 ... > is generated and executed on metastore, however, due the query lists all the > partitions in it, SQL server is unable to compile/execute such a long query > and causing the whole query to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7999) Hive metastore query too long when select * on table with large number of partitions
Chu Tong created HIVE-7999: -- Summary: Hive metastore query too long when select * on table with large number of partitions Key: HIVE-7999 URL: https://issues.apache.org/jira/browse/HIVE-7999 Project: Hive Issue Type: Bug Components: Metastore Reporter: Chu Tong When it comes to selecting * on a Hive table with large number of partitions, a query like this: SELECT PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, PARTITIONS.CREATE_TIME, ... SERDES.SLIB from PARTITIONS LEFT OUTER JOIN SDS ON PARTITIONS.SD_ID = SDS.SD_ID LEFT OUTER JOIN SERDES ON SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in (1,2,3,4 ... 1000 ... is generated and executed on metastore, however, due the query lists all the partitions in it, SQL server is unable to compile/execute such a long query and causing the whole query to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7961) metastore schema improvement for adding partition to Hive table
Chu Tong created HIVE-7961: -- Summary: metastore schema improvement for adding partition to Hive table Key: HIVE-7961 URL: https://issues.apache.org/jira/browse/HIVE-7961 Project: Hive Issue Type: Bug Components: Metastore Reporter: Chu Tong Priority: Minor One of the performance bottlenecks for adding a partition in Hive table and the query takes most of the time in this process is: SELECT A0.PART_NAME FROM PARTITIONS A0 LEFT OUTER JOIN TBLS B0 ON A0.TBL_ID = B0.TBL_ID LEFT OUTER JOIN DBS C0 ON B0.DB_ID = C0.DB_ID WHERE B0.TBL_NAME = @P0 AND C0."NAME" = @P1 AND A0.PART_NAME = @P2 This query joins partition table with table table and database table in Hive metastore and it becomes slow when these tables are big. A viable way to optimize this is the de-normalize the partition table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7906) Missing Index on Hive metastore query
[ https://issues.apache.org/jira/browse/HIVE-7906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-7906: --- Attachment: HIVE-456.patch.txt > Missing Index on Hive metastore query > - > > Key: HIVE-7906 > URL: https://issues.apache.org/jira/browse/HIVE-7906 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.13.1 >Reporter: Chu Tong > Attachments: HIVE-456.patch.txt > > > When it comes to SELECT statement on a table with large number of partitions, > the query in the word document below causes major performance degradation. > Adding this missing index to turn index scan into seek. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7906) Missing Index on Hive metastore query
[ https://issues.apache.org/jira/browse/HIVE-7906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-7906: --- Description: When it comes to SELECT statement on a table with large number of partitions on Windows Azure DB, the query in the word document below causes major performance degradation. Adding this missing index to turn index scan into seek. (was: When it comes to SELECT statement on a table with large number of partitions, the query in the word document below causes major performance degradation. Adding this missing index to turn index scan into seek.) > Missing Index on Hive metastore query > - > > Key: HIVE-7906 > URL: https://issues.apache.org/jira/browse/HIVE-7906 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.13.1 >Reporter: Chu Tong > Attachments: HIVE-456.patch.txt > > > When it comes to SELECT statement on a table with large number of partitions > on Windows Azure DB, the query in the word document below causes major > performance degradation. Adding this missing index to turn index scan into > seek. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7906) Missing Index on Hive metastore query
[ https://issues.apache.org/jira/browse/HIVE-7906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-7906: --- Status: Patch Available (was: Open) > Missing Index on Hive metastore query > - > > Key: HIVE-7906 > URL: https://issues.apache.org/jira/browse/HIVE-7906 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.13.1 >Reporter: Chu Tong > Attachments: HIVE-456.patch.txt > > > When it comes to SELECT statement on a table with large number of partitions > on Windows Azure DB, the query in the word document below causes major > performance degradation. Adding this missing index to turn index scan into > seek. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7906) Missing Index on Hive metastore query
Chu Tong created HIVE-7906: -- Summary: Missing Index on Hive metastore query Key: HIVE-7906 URL: https://issues.apache.org/jira/browse/HIVE-7906 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.1 Reporter: Chu Tong When it comes to SELECT statement on a table with large number of partitions, the query in the word document below causes major performance degradation. Adding this missing index to turn index scan into seek. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-4801) output deprecation warning for hive.mapred.map.tasks.speculative.execution
[ https://issues.apache.org/jira/browse/HIVE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-4801: --- Description: Output deprecation warning for hive.mapred.map.tasks.speculative.execution, use are encouraged to use (was: Hive does not honor hive.mapred.map.tasks.speculative.execution parameter while it comes to configuring hadoop jobs.) > output deprecation warning for hive.mapred.map.tasks.speculative.execution > -- > > Key: HIVE-4801 > URL: https://issues.apache.org/jira/browse/HIVE-4801 > Project: Hive > Issue Type: Bug > Components: Configuration >Affects Versions: 0.10.0 >Reporter: Chu Tong >Assignee: Chu Tong > Attachments: HIVE-4801.patch, HIVE-4801.patch > > > Output deprecation warning for hive.mapred.map.tasks.speculative.execution, > use are encouraged to use -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4801) output deprecation warning for hive.mapred.map.tasks.speculative.execution
[ https://issues.apache.org/jira/browse/HIVE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-4801: --- Status: Patch Available (was: Open) > output deprecation warning for hive.mapred.map.tasks.speculative.execution > -- > > Key: HIVE-4801 > URL: https://issues.apache.org/jira/browse/HIVE-4801 > Project: Hive > Issue Type: Bug > Components: Configuration >Affects Versions: 0.10.0 >Reporter: Chu Tong >Assignee: Chu Tong > Attachments: HIVE-4801.patch, HIVE-4801.patch > > > Output deprecation warning for hive.mapred.map.tasks.speculative.execution, > use are encouraged to use mapred.reduce.tasks.speculative.execution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4801) output deprecation warning for hive.mapred.map.tasks.speculative.execution
[ https://issues.apache.org/jira/browse/HIVE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-4801: --- Description: Output deprecation warning for hive.mapred.map.tasks.speculative.execution, use are encouraged to use mapred.reduce.tasks.speculative.execution (was: Output deprecation warning for hive.mapred.map.tasks.speculative.execution, use are encouraged to use ) > output deprecation warning for hive.mapred.map.tasks.speculative.execution > -- > > Key: HIVE-4801 > URL: https://issues.apache.org/jira/browse/HIVE-4801 > Project: Hive > Issue Type: Bug > Components: Configuration >Affects Versions: 0.10.0 >Reporter: Chu Tong >Assignee: Chu Tong > Attachments: HIVE-4801.patch, HIVE-4801.patch > > > Output deprecation warning for hive.mapred.map.tasks.speculative.execution, > use are encouraged to use mapred.reduce.tasks.speculative.execution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4801) output deprecation warning for hive.mapred.map.tasks.speculative.execution
[ https://issues.apache.org/jira/browse/HIVE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-4801: --- Summary: output deprecation warning for hive.mapred.map.tasks.speculative.execution (was: hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop jobs) > output deprecation warning for hive.mapred.map.tasks.speculative.execution > -- > > Key: HIVE-4801 > URL: https://issues.apache.org/jira/browse/HIVE-4801 > Project: Hive > Issue Type: Bug > Components: Configuration >Affects Versions: 0.10.0 >Reporter: Chu Tong >Assignee: Chu Tong > Attachments: HIVE-4801.patch, HIVE-4801.patch > > > Hive does not honor hive.mapred.map.tasks.speculative.execution parameter > while it comes to configuring hadoop jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4801) output deprecation warning for hive.mapred.map.tasks.speculative.execution
[ https://issues.apache.org/jira/browse/HIVE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-4801: --- Attachment: HIVE-4801.patch > output deprecation warning for hive.mapred.map.tasks.speculative.execution > -- > > Key: HIVE-4801 > URL: https://issues.apache.org/jira/browse/HIVE-4801 > Project: Hive > Issue Type: Bug > Components: Configuration >Affects Versions: 0.10.0 >Reporter: Chu Tong >Assignee: Chu Tong > Attachments: HIVE-4801.patch, HIVE-4801.patch > > > Output deprecation warning for hive.mapred.map.tasks.speculative.execution, > use are encouraged to use mapred.reduce.tasks.speculative.execution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4801) hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop jobs
[ https://issues.apache.org/jira/browse/HIVE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702725#comment-13702725 ] Chu Tong commented on HIVE-4801: so we throw deprecation warning for hive.mapred.reduce.tasks.speculative.execution and do not do anything for hive.mapred.map.tasks.speculative.execution? > hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop > jobs > > > Key: HIVE-4801 > URL: https://issues.apache.org/jira/browse/HIVE-4801 > Project: Hive > Issue Type: Bug > Components: Configuration >Affects Versions: 0.10.0 >Reporter: Chu Tong >Assignee: Chu Tong > Attachments: HIVE-4801.patch > > > Hive does not honor hive.mapred.map.tasks.speculative.execution parameter > while it comes to configuring hadoop jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4801) hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop jobs
[ https://issues.apache.org/jira/browse/HIVE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702694#comment-13702694 ] Chu Tong commented on HIVE-4801: I think it is easy for people who uses hive.mapred.reduce.tasks.speculative.execution to use hive.mapred.map.tasks.speculative.execution by intuition, if we do not want to add this, can we throw at least a warning to user when they try to set hive.mapred.map.tasks.speculative.execution? > hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop > jobs > > > Key: HIVE-4801 > URL: https://issues.apache.org/jira/browse/HIVE-4801 > Project: Hive > Issue Type: Bug > Components: Configuration >Affects Versions: 0.10.0 >Reporter: Chu Tong >Assignee: Chu Tong > Attachments: HIVE-4801.patch > > > Hive does not honor hive.mapred.map.tasks.speculative.execution parameter > while it comes to configuring hadoop jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4801) hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop jobs
[ https://issues.apache.org/jira/browse/HIVE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-4801: --- Attachment: HIVE-4801.patch > hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop > jobs > > > Key: HIVE-4801 > URL: https://issues.apache.org/jira/browse/HIVE-4801 > Project: Hive > Issue Type: Bug > Components: Configuration >Affects Versions: 0.10.0 >Reporter: Chu Tong >Assignee: Chu Tong > Attachments: HIVE-4801.patch > > > Hive does not honor hive.mapred.map.tasks.speculative.execution parameter > while it comes to configuring hadoop jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4801) hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop jobs
[ https://issues.apache.org/jira/browse/HIVE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-4801: --- Status: Patch Available (was: Open) > hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop > jobs > > > Key: HIVE-4801 > URL: https://issues.apache.org/jira/browse/HIVE-4801 > Project: Hive > Issue Type: Bug > Components: Configuration >Affects Versions: 0.10.0 >Reporter: Chu Tong >Assignee: Chu Tong > Attachments: HIVE-4801.patch > > > Hive does not honor hive.mapred.map.tasks.speculative.execution parameter > while it comes to configuring hadoop jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4801) hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop jobs
[ https://issues.apache.org/jira/browse/HIVE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-4801: --- Description: Hive does not honor hive.mapred.map.tasks.speculative.execution parameter while it comes to configuring hadoop jobs. (was: Hive does not honor ) > hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop > jobs > > > Key: HIVE-4801 > URL: https://issues.apache.org/jira/browse/HIVE-4801 > Project: Hive > Issue Type: Bug > Components: Configuration >Affects Versions: 0.10.0 >Reporter: Chu Tong >Assignee: Chu Tong > > Hive does not honor hive.mapred.map.tasks.speculative.execution parameter > while it comes to configuring hadoop jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4801) hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop jobs
[ https://issues.apache.org/jira/browse/HIVE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-4801: --- Description: Hive does not honor > hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop > jobs > > > Key: HIVE-4801 > URL: https://issues.apache.org/jira/browse/HIVE-4801 > Project: Hive > Issue Type: Bug > Components: Configuration >Affects Versions: 0.10.0 >Reporter: Chu Tong >Assignee: Chu Tong > > Hive does not honor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4801) hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop jobs
Chu Tong created HIVE-4801: -- Summary: hive.mapred.map.tasks.speculative.execution is not used to configure Hadoop jobs Key: HIVE-4801 URL: https://issues.apache.org/jira/browse/HIVE-4801 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.10.0 Reporter: Chu Tong Assignee: Chu Tong -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4403) Running Hive queries on Yarn (MR2) gives warnings related to overriding final parameters
[ https://issues.apache.org/jira/browse/HIVE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13674023#comment-13674023 ] Chu Tong commented on HIVE-4403: I took a look at the Hudson build log and seems it is because of a build failure instead of test failure due to newly added functionality/bug fix. > Running Hive queries on Yarn (MR2) gives warnings related to overriding final > parameters > > > Key: HIVE-4403 > URL: https://issues.apache.org/jira/browse/HIVE-4403 > Project: Hive > Issue Type: Bug >Affects Versions: 0.10.0, 0.11.0 >Reporter: Mark Grover >Assignee: Chu Tong > Fix For: 0.12.0 > > Attachments: HIVE-4403.patch, HIVE-4403.patch > > > While working on BIGTOP-885, I saw that Hive was giving a bunch of warnings > related to overriding final parameters in job.conf. This was on a pseudo > distributed cluster. FWIW, I didn't see this happen on a fully-distributed > cluster. Perhaps, Hive's job.conf is overriding some final parameters it > shouldn't. > Here is what the warnings looked like: > {code} > 2013-04-19 14:20:32,304 WARN [main] conf.Configuration > (Configuration.java:loadProperty(2032)) - > file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.retry.interval; Ignoring. > 2013-04-19 14:20:32,367 WARN [main] conf.Configuration > (Configuration.java:loadProperty(2032)) - > file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > {code} > To reproduce, run a query like: > {code} > CREATE TABLE u_data ( > userid INT, > movieid INT, > rating INT, > unixtime STRING) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > STORED AS TEXTFILE; > {code} > Load some data into u_data, here is some sample data: > https://github.com/apache/bigtop/blob/master/bigtop-tests/test-artifacts/hive/src/main/resources/seed_data_files/ml-data/u.data > Run a simple query on that data (on YARN/MR2) > {code} > INSERT OVERWRITE DIRECTORY '/tmp/count' > SELECT COUNT(1) FROM u_data > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4403) Running Hive queries on Yarn (MR2) gives warnings related to overriding final parameters
[ https://issues.apache.org/jira/browse/HIVE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673166#comment-13673166 ] Chu Tong commented on HIVE-4403: no problem, thank you for reviewing it [~ashutoshgupt...@gmail.com] > Running Hive queries on Yarn (MR2) gives warnings related to overriding final > parameters > > > Key: HIVE-4403 > URL: https://issues.apache.org/jira/browse/HIVE-4403 > Project: Hive > Issue Type: Bug >Affects Versions: 0.10.0, 0.11.0 >Reporter: Mark Grover >Assignee: Chu Tong > Fix For: 0.12.0 > > Attachments: HIVE-4403.patch, HIVE-4403.patch > > > While working on BIGTOP-885, I saw that Hive was giving a bunch of warnings > related to overriding final parameters in job.conf. This was on a pseudo > distributed cluster. FWIW, I didn't see this happen on a fully-distributed > cluster. Perhaps, Hive's job.conf is overriding some final parameters it > shouldn't. > Here is what the warnings looked like: > {code} > 2013-04-19 14:20:32,304 WARN [main] conf.Configuration > (Configuration.java:loadProperty(2032)) - > file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.retry.interval; Ignoring. > 2013-04-19 14:20:32,367 WARN [main] conf.Configuration > (Configuration.java:loadProperty(2032)) - > file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > {code} > To reproduce, run a query like: > {code} > CREATE TABLE u_data ( > userid INT, > movieid INT, > rating INT, > unixtime STRING) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > STORED AS TEXTFILE; > {code} > Load some data into u_data, here is some sample data: > https://github.com/apache/bigtop/blob/master/bigtop-tests/test-artifacts/hive/src/main/resources/seed_data_files/ml-data/u.data > Run a simple query on that data (on YARN/MR2) > {code} > INSERT OVERWRITE DIRECTORY '/tmp/count' > SELECT COUNT(1) FROM u_data > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4403) Running Hive queries on Yarn (MR2) gives warnings related to overriding final parameters
[ https://issues.apache.org/jira/browse/HIVE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672688#comment-13672688 ] Chu Tong commented on HIVE-4403: [~mgrover], I tried to search on the Facebook Phabricator and I can't find this bug there. Do I need to create on there first and upload the patch to it afterwards? Thanks a lot. > Running Hive queries on Yarn (MR2) gives warnings related to overriding final > parameters > > > Key: HIVE-4403 > URL: https://issues.apache.org/jira/browse/HIVE-4403 > Project: Hive > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Mark Grover >Assignee: Chu Tong > Attachments: HIVE-4403.patch, HIVE-4403.patch > > > While working on BIGTOP-885, I saw that Hive was giving a bunch of warnings > related to overriding final parameters in job.conf. This was on a pseudo > distributed cluster. FWIW, I didn't see this happen on a fully-distributed > cluster. Perhaps, Hive's job.conf is overriding some final parameters it > shouldn't. > Here is what the warnings looked like: > {code} > 2013-04-19 14:20:32,304 WARN [main] conf.Configuration > (Configuration.java:loadProperty(2032)) - > file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.retry.interval; Ignoring. > 2013-04-19 14:20:32,367 WARN [main] conf.Configuration > (Configuration.java:loadProperty(2032)) - > file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > {code} > To reproduce, run a query like: > {code} > CREATE TABLE u_data ( > userid INT, > movieid INT, > rating INT, > unixtime STRING) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > STORED AS TEXTFILE; > {code} > Load some data into u_data, here is some sample data: > https://github.com/apache/bigtop/blob/master/bigtop-tests/test-artifacts/hive/src/main/resources/seed_data_files/ml-data/u.data > Run a simple query on that data (on YARN/MR2) > {code} > INSERT OVERWRITE DIRECTORY '/tmp/count' > SELECT COUNT(1) FROM u_data > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4403) Running Hive queries on Yarn (MR2) gives warnings related to overriding final parameters
[ https://issues.apache.org/jira/browse/HIVE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-4403: --- Attachment: HIVE-4403.patch Sync the changes with trunk > Running Hive queries on Yarn (MR2) gives warnings related to overriding final > parameters > > > Key: HIVE-4403 > URL: https://issues.apache.org/jira/browse/HIVE-4403 > Project: Hive > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Mark Grover >Assignee: Chu Tong > Attachments: HIVE-4403.patch, HIVE-4403.patch > > > While working on BIGTOP-885, I saw that Hive was giving a bunch of warnings > related to overriding final parameters in job.conf. This was on a pseudo > distributed cluster. FWIW, I didn't see this happen on a fully-distributed > cluster. Perhaps, Hive's job.conf is overriding some final parameters it > shouldn't. > Here is what the warnings looked like: > {code} > 2013-04-19 14:20:32,304 WARN [main] conf.Configuration > (Configuration.java:loadProperty(2032)) - > file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.retry.interval; Ignoring. > 2013-04-19 14:20:32,367 WARN [main] conf.Configuration > (Configuration.java:loadProperty(2032)) - > file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > {code} > To reproduce, run a query like: > {code} > CREATE TABLE u_data ( > userid INT, > movieid INT, > rating INT, > unixtime STRING) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > STORED AS TEXTFILE; > {code} > Load some data into u_data, here is some sample data: > https://github.com/apache/bigtop/blob/master/bigtop-tests/test-artifacts/hive/src/main/resources/seed_data_files/ml-data/u.data > Run a simple query on that data (on YARN/MR2) > {code} > INSERT OVERWRITE DIRECTORY '/tmp/count' > SELECT COUNT(1) FROM u_data > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4403) Running Hive queries on Yarn (MR2) gives warnings related to overriding final parameters
[ https://issues.apache.org/jira/browse/HIVE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659782#comment-13659782 ] Chu Tong commented on HIVE-4403: Can someone please help me to review this code? Thanks > Running Hive queries on Yarn (MR2) gives warnings related to overriding final > parameters > > > Key: HIVE-4403 > URL: https://issues.apache.org/jira/browse/HIVE-4403 > Project: Hive > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Mark Grover > Attachments: HIVE-4403.patch > > > While working on BIGTOP-885, I saw that Hive was giving a bunch of warnings > related to overriding final parameters in job.conf. This was on a pseudo > distributed cluster. FWIW, I didn't see this happen on a fully-distributed > cluster. Perhaps, Hive's job.conf is overriding some final parameters it > shouldn't. > Here is what the warnings looked like: > {code} > 2013-04-19 14:20:32,304 WARN [main] conf.Configuration > (Configuration.java:loadProperty(2032)) - > file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.retry.interval; Ignoring. > 2013-04-19 14:20:32,367 WARN [main] conf.Configuration > (Configuration.java:loadProperty(2032)) - > file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > {code} > To reproduce, run a query like: > {code} > CREATE TABLE u_data ( > userid INT, > movieid INT, > rating INT, > unixtime STRING) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > STORED AS TEXTFILE; > {code} > Load some data into u_data, here is some sample data: > https://github.com/apache/bigtop/blob/master/bigtop-tests/test-artifacts/hive/src/main/resources/seed_data_files/ml-data/u.data > Run a simple query on that data (on YARN/MR2) > {code} > INSERT OVERWRITE DIRECTORY '/tmp/count' > SELECT COUNT(1) FROM u_data > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4403) Running Hive queries on Yarn (MR2) gives warnings related to overriding final parameters
[ https://issues.apache.org/jira/browse/HIVE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-4403: --- Status: Patch Available (was: Open) The reason for the warnings presented in this bug is Hive client tries to override some Hadoop final parameters defined in mapred-default.xml. To dig it deeper, Hive configuration inherits Hadoop configuration and the current way Hive overrides Hadoop parameters is: 1) Create a default Hadoop configuration (this contains all default Hadoop parameters including the ones defined as final). 2) Overlay parameters it wants to override on configuration created in 1). 3) Overlay configuration generated in 2) over the default Hadoop parameters it inherits (these default Hadoop parameters contains all default Hadoop parameters including the ones defined as final again). Since configuration generated in 2) contains all the default Hadoop parameters and when it comes to 3), the warning is thrown. Solution to resolve this problem is when 1) happens, instead of create a default Hadoop configuration, an empty Hadoop configuration is created and 2) overlays Hive parameters on this empty configuration. This way, in 3), configuration in 2) will override any default Hadoop parameters it wants to overrides, however, no warning will be thrown as 2) does not contain default Hadoop parameters. I have tested this by different code path including: 1) keep everything as default 2) define overriding parameters in hive-site.xml 3) define overriding parameters in hive client shell and all these cases work well. > Running Hive queries on Yarn (MR2) gives warnings related to overriding final > parameters > > > Key: HIVE-4403 > URL: https://issues.apache.org/jira/browse/HIVE-4403 > Project: Hive > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Mark Grover > Attachments: HIVE-4403.patch > > > While working on BIGTOP-885, I saw that Hive was giving a bunch of warnings > related to overriding final parameters in job.conf. This was on a pseudo > distributed cluster. FWIW, I didn't see this happen on a fully-distributed > cluster. Perhaps, Hive's job.conf is overriding some final parameters it > shouldn't. > Here is what the warnings looked like: > {code} > 2013-04-19 14:20:32,304 WARN [main] conf.Configuration > (Configuration.java:loadProperty(2032)) - > file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.retry.interval; Ignoring. > 2013-04-19 14:20:32,367 WARN [main] conf.Configuration > (Configuration.java:loadProperty(2032)) - > file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > {code} > To reproduce, run a query like: > {code} > CREATE TABLE u_data ( > userid INT, > movieid INT, > rating INT, > unixtime STRING) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > STORED AS TEXTFILE; > {code} > Load some data into u_data, here is some sample data: > https://github.com/apache/bigtop/blob/master/bigtop-tests/test-artifacts/hive/src/main/resources/seed_data_files/ml-data/u.data > Run a simple query on that data (on YARN/MR2) > {code} > INSERT OVERWRITE DIRECTORY '/tmp/count' > SELECT COUNT(1) FROM u_data > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4403) Running Hive queries on Yarn (MR2) gives warnings related to overriding final parameters
[ https://issues.apache.org/jira/browse/HIVE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated HIVE-4403: --- Attachment: HIVE-4403.patch > Running Hive queries on Yarn (MR2) gives warnings related to overriding final > parameters > > > Key: HIVE-4403 > URL: https://issues.apache.org/jira/browse/HIVE-4403 > Project: Hive > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Mark Grover > Attachments: HIVE-4403.patch > > > While working on BIGTOP-885, I saw that Hive was giving a bunch of warnings > related to overriding final parameters in job.conf. This was on a pseudo > distributed cluster. FWIW, I didn't see this happen on a fully-distributed > cluster. Perhaps, Hive's job.conf is overriding some final parameters it > shouldn't. > Here is what the warnings looked like: > {code} > 2013-04-19 14:20:32,304 WARN [main] conf.Configuration > (Configuration.java:loadProperty(2032)) - > file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.retry.interval; Ignoring. > 2013-04-19 14:20:32,367 WARN [main] conf.Configuration > (Configuration.java:loadProperty(2032)) - > file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > {code} > To reproduce, run a query like: > {code} > CREATE TABLE u_data ( > userid INT, > movieid INT, > rating INT, > unixtime STRING) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > STORED AS TEXTFILE; > {code} > Load some data into u_data, here is some sample data: > https://github.com/apache/bigtop/blob/master/bigtop-tests/test-artifacts/hive/src/main/resources/seed_data_files/ml-data/u.data > Run a simple query on that data (on YARN/MR2) > {code} > INSERT OVERWRITE DIRECTORY '/tmp/count' > SELECT COUNT(1) FROM u_data > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira