[jira] [Created] (HIVE-10235) Loop optimization for SIMD in ColumnDivideColumn.txt
Chengxiang Li created HIVE-10235: Summary: Loop optimization for SIMD in ColumnDivideColumn.txt Key: HIVE-10235 URL: https://issues.apache.org/jira/browse/HIVE-10235 Project: Hive Issue Type: Sub-task Components: Vectorization Affects Versions: 1.1.0 Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Minor Found two loop which could be optimized for packed instruction set during execution. 1. hasDivBy0 depends on the result of last loop, which prevent the loop be executed vectorized. {code:java} for(int i = 0; i != n; i++) { OperandType2 denom = vector2[i]; outputVector[i] = vector1[0] OperatorSymbol denom; hasDivBy0 = hasDivBy0 || (denom == 0); } {code} 2. same as HIVE-10180, vector2\[0\] reference provent JVM optimizing loop into packed instruction set. {code:java} for(int i = 0; i != n; i++) { outputVector[i] = vector1[i] OperatorSymbol vector2[0]; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10236) LLAP: Certain errors are not reported to the AM when a fragment fails
Siddharth Seth created HIVE-10236: - Summary: LLAP: Certain errors are not reported to the AM when a fragment fails Key: HIVE-10236 URL: https://issues.apache.org/jira/browse/HIVE-10236 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 32918: HIVE-10180 Loop optimization for SIMD in ColumnArithmeticColumn.txt
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32918/ --- (Updated 四月 7, 2015, 7:24 a.m.) Review request for hive. Changes --- mark variables as final. Bugs: Hive-10180 https://issues.apache.org/jira/browse/Hive-10180 Repository: hive Description --- JVM is quite strict on the code schema which may executed with SIMD instructions, take a loop in DoubleColAddDoubleColumn.java for example, for (int i = 0; i != n; i++) { outputVector[i] = vector1[0] + vector2[i]; } The vector1[0] reference would prevent JVM to execute this part of code with vectorized instructions, we need to assign the vector1[0] to a variable outside of loop, and use that variable in loop. Diffs (updated) - trunk/ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt 1671736 Diff: https://reviews.apache.org/r/32918/diff/ Testing --- Thanks, chengxiang li
[jira] [Created] (HIVE-10237) create external table, location path contains space ,like '/user/hive/warehouse/custom.db/uigs_kmap '
xiaowei wang created HIVE-10237: --- Summary: create external table, location path contains space ,like '/user/hive/warehouse/custom.db/uigs_kmap ' Key: HIVE-10237 URL: https://issues.apache.org/jira/browse/HIVE-10237 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.1 Environment: Hadoop 2.3.0-cdh5.0.0 hive 0.13.1 Reporter: xiaowei wang when i want to create a external table and give the table a location ,i write a wront location path, /user/hive/warehouse/custom.db/uigs_kmap ,which contains a space at the end of the path。 I think hive will trim the space of the location,but it does not。 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 32920: HIVE-10189: Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32920/ --- (Updated April 7, 2015, 6:06 a.m.) Review request for hive and chengxiang li. Summary (updated) - HIVE-10189: Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization Repository: hive-git Description --- Add microbenchmark tool to show performance improvement by JMH Diffs - itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizationBench.java PRE-CREATION Diff: https://reviews.apache.org/r/32920/diff/ Testing --- Thanks, cheng xu
Review Request 32920: Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32920/ --- Review request for hive and chengxiang li. Repository: hive-git Description --- Add microbenchmark tool to show performance improvement by JMH Diffs - itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizationBench.java PRE-CREATION Diff: https://reviews.apache.org/r/32920/diff/ Testing --- Thanks, cheng xu
Re: ORC separate project
Hey guys, Good discussion here. One point of order, I feel like this should be a [DISCUSS] thread. Some folks filter on that specific text as it's quite standard in Apache to use that subject prefix for big issues like this one. Brock On Fri, Apr 3, 2015 at 3:59 PM, Thejas Nair thejas.n...@gmail.com wrote: On Fri, Apr 3, 2015 at 1:25 PM, Lefty Leverenz leftylever...@gmail.com wrote: Hive users who wished to use ORC would obviously need to pull in ORC artifacts in addition to Hive. What would happen with Hive features that (currently) only work with ORC? Would they be extended to work with other file formats and stay in Hive? What about future features -- would they have to work with multiple file formats from the get-go? The storage-api module proposed above would lead to clearer storage interfaces in hive. That will in turn help to implement such features using other storage including parquet, hbase etc. The result of this work will not automatically make those features worth with ORC, somebody would need to do that. Whether future features would work for all formats would depend on whether the new feature needs new functionality to be supported by the storage layer. If the feature needs new storage functionality, I would expect new interfaces to be defined in hive, and then implemented by the storage engines that want to support that feature. This will not negatively impact experience of users with respect to ORC or other storage formats. The way we package parquet in hive, we can package ORC as well. In fact, users would be more easily be able to upgrade their version of ORC being used, as releases can happen independent of each other.
Re: ORC separate project
Is there a way to change this to a DISCUSS thread? Or could everything be copied into a new thread? Or just start a new thread with a reference to this one? -- Lefty On Tue, Apr 7, 2015 at 2:26 AM, Brock Noland br...@apache.org wrote: Hey guys, Good discussion here. One point of order, I feel like this should be a [DISCUSS] thread. Some folks filter on that specific text as it's quite standard in Apache to use that subject prefix for big issues like this one. Brock On Fri, Apr 3, 2015 at 3:59 PM, Thejas Nair thejas.n...@gmail.com wrote: On Fri, Apr 3, 2015 at 1:25 PM, Lefty Leverenz leftylever...@gmail.com wrote: Hive users who wished to use ORC would obviously need to pull in ORC artifacts in addition to Hive. What would happen with Hive features that (currently) only work with ORC? Would they be extended to work with other file formats and stay in Hive? What about future features -- would they have to work with multiple file formats from the get-go? The storage-api module proposed above would lead to clearer storage interfaces in hive. That will in turn help to implement such features using other storage including parquet, hbase etc. The result of this work will not automatically make those features worth with ORC, somebody would need to do that. Whether future features would work for all formats would depend on whether the new feature needs new functionality to be supported by the storage layer. If the feature needs new storage functionality, I would expect new interfaces to be defined in hive, and then implemented by the storage engines that want to support that feature. This will not negatively impact experience of users with respect to ORC or other storage formats. The way we package parquet in hive, we can package ORC as well. In fact, users would be more easily be able to upgrade their version of ORC being used, as releases can happen independent of each other.
Re: Review Request 32920: HIVE-10189: Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32920/#review79136 --- itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizationBench.java https://reviews.apache.org/r/32920/#comment128267 The benchmark look good, my only concern is that how could we expand this benchmark to other expressions? - chengxiang li On April 7, 2015, 6:06 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32920/ --- (Updated April 7, 2015, 6:06 a.m.) Review request for hive and chengxiang li. Repository: hive-git Description --- Add microbenchmark tool to show performance improvement by JMH Diffs - itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizationBench.java PRE-CREATION Diff: https://reviews.apache.org/r/32920/diff/ Testing --- Thanks, cheng xu
[jira] [Created] (HIVE-10238) Loop optimization for SIMD in IfExprColumnColumn.txt
Chengxiang Li created HIVE-10238: Summary: Loop optimization for SIMD in IfExprColumnColumn.txt Key: HIVE-10238 URL: https://issues.apache.org/jira/browse/HIVE-10238 Project: Hive Issue Type: Sub-task Components: Vectorization Affects Versions: 1.1.0 Reporter: Chengxiang Li Assignee: Jitendra Nath Pandey Priority: Minor The ?: operator as following could not be vectorized in loop, we may transfer it into mathematical expression. {code:java} for(int j = 0; j != n; j++) { int i = sel[j]; outputVector[i] = (vector1[i] == 1 ? vector2[i] : vector3[i]); outputIsNull[i] = (vector1[i] == 1 ? arg2ColVector.isNull[i] : arg3ColVector.isNull[i]); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Re: hive 0.14 on some platform return some not NULL value as NULL
I use hive 0.14 to use hive 0.10 metastroe server .The problem fixed. Now hive 0.14 return correct result. r7raul1...@163.com From: r7raul1...@163.com Date: 2015-04-07 10:34 To: dev CC: thejas.nair Subject: Re: Re: hive 0.14 on some platform return some not NULL value as NULL I found difference form log: In hive 0.14 DEBUG lazy.LazySimpleSerDe: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with: columnNames=[date_id, chanl_id, sessn_id, gu_id, prov_id, city_id, landing_page_type_id, landing_track_time, landing_url, nav_refer_tracker_id, nav_refer_page_type_id, nav_refer_page_value, nav_refer_link_position, nav_tracker_id, nav_page_categ_id, nav_page_type_id, nav_page_value, nav_srce_type, internal_keyword, internal_result_sum, pltfm_id, app_vers, nav_link_position, nav_button_position, nav_track_time, nav_next_tracker_id, sessn_last_time, sessn_pv, detl_tracker_id, detl_page_type_id, detl_page_value, detl_pm_id, detl_link_position, detl_position_track_id, cart_tracker_id, cart_page_type_id, cart_page_value, cart_link_postion, cart_button_position, cart_position_track_id, cart_prod_id, ordr_tracker_id, ordr_page_type_id, ordr_code, updt_time, cart_pm_id, brand_code, categ_type, os, end_user_id, add_cart_flag, navgation_page_flag, nav_page_url, detl_button_position, manul_flag, manul_track_date, nav_refer_tpa, nav_refer_tpa_id, nav_refer_tpc, nav_refer_tpi, nav_refer_tcs, nav_refer_tcsa, nav_refer_tcdt, nav_refer_tcd, nav_refer_tci, nav_refer_postn_type, nav_tpa_id, nav_tpa, nav_tpc, nav_tpi, nav_tcs, nav_tcsa, nav_tcdt, nav_tcd, nav_tci, nav_postn_type, detl_tpa_id, detl_tpa, detl_tpc, detl_tpi, detl_tcs, detl_tcsa, detl_tcdt, detl_tcd, detl_tci, detl_postn_type, cart_tpa_id, cart_tpa, cart_tpc, cart_tpi, cart_tcs, cart_tcsa, cart_tcdt, cart_tcd, cart_tci, cart_postn_type] columnTypes=[string, bigint, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, int, string, string, string, string, string, string, int, string, string, string, bigint, string, string, string, string, string, string, string, string, bigint, string, string, string, string, bigint, string, int, string, string, string, int, string, string, int, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string] separator=[[B@e50bca4] nullstring=\N lastColumnTakesRest=false In hive 0.10 DEBUG lazy.LazySimpleSerDe: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with: columnNames=[date_id, chanl_id, sessn_id, gu_id, prov_id, city_id, landing_page_type_id, landing_track_time, landing_url, nav_refer_tracker_id, nav_refer_page_type_id, nav_refer_page_value, nav_refer_link_position, nav_tracker_id, nav_page_categ_id, nav_page_type_id, nav_page_value, nav_srce_type, internal_keyword, internal_result_sum, pltfm_id, app_vers, nav_link_position, nav_button_position, nav_track_time, nav_next_tracker_id, sessn_last_time, sessn_pv, detl_tracker_id, detl_page_type_id, detl_page_value, detl_pm_id, detl_link_position, detl_position_track_id, cart_tracker_id, cart_page_type_id, cart_page_value, cart_link_postion, cart_button_position, cart_position_track_id, cart_prod_id, ordr_tracker_id, ordr_page_type_id, ordr_code, updt_time, cart_pm_id, brand_code, categ_type, os, end_user_id, add_cart_flag, navgation_page_flag, nav_page_url, detl_button_position, manul_flag, manul_track_date, nav_refer_tpa, nav_refer_tpa_id, nav_refer_tpc, nav_refer_tpi, nav_refer_tcs, nav_refer_tcsa, nav_refer_tcdt, nav_refer_tcd, nav_refer_tci, nav_refer_postn_type, nav_tpa_id, nav_tpa, nav_tpc, nav_tpi, nav_tcs, nav_tcsa, nav_tcdt, nav_tcd, nav_tci, nav_postn_type, detl_tpa_id, detl_tpa, detl_tpc, detl_tpi, detl_tcs, detl_tcsa, detl_tcdt, detl_tcd, detl_tci, detl_postn_type, cart_tpa_id, cart_tpa, cart_tpc, cart_tpi, cart_tcs, cart_tcsa, cart_tcdt, cart_tcd, cart_tci, cart_postn_type, sessn_chanl_id, gu_sec_flg, detl_refer_page_type_id, detl_refer_page_value, detl_event_id, nav_refer_intrn_reslt_sum, nav_intrn_reslt_sum, nav_refer_intrn_kw, nav_intrn_kw, detl_track_time, cart_track_time] columnTypes=[string, bigint, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, int, string, string, string, string, string, string, int, string, string, string, bigint, string, string, string, string, string, string, string, string, bigint, string, string, string, string, bigint, string, int, string, string, string, int, string, string, int, string, string, string, string, string, string, string, string, string, string, string, string, string, string,
[jira] [Created] (HIVE-10239) Create scripts to do metastore upgrade tests on jenkins for Derby, Oracle and PostgreSQL
Naveen Gangam created HIVE-10239: Summary: Create scripts to do metastore upgrade tests on jenkins for Derby, Oracle and PostgreSQL Key: HIVE-10239 URL: https://issues.apache.org/jira/browse/HIVE-10239 Project: Hive Issue Type: Improvement Affects Versions: 1.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Need to create DB-implementation specific scripts to use the framework introduced in HIVE-9800 to have any metastore schema changes tested across all supported databases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] hive pull request: Update HiveDatabaseMetaData.java change the ide...
GitHub user Jeffrio opened a pull request: https://github.com/apache/hive/pull/31 Update HiveDatabaseMetaData.java change the identifierQuoteString according to this jira https://issues.apache.org/jira/browse/HIVE-6013 hive use the backstick as the quotestring so, I think the getIdentifierQuoteString() function should return the backstick rather than the space You can merge this pull request into a Git repository by running: $ git pull https://github.com/Jeffrio/hive patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/31.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #31 commit 5ac637c83615aa389db49ce169c0df0461619c63 Author: Jeffrio corej...@163.com Date: 2015-04-07T16:35:11Z Update HiveDatabaseMetaData.java change the identifierQuoteString according to this jira https://issues.apache.org/jira/browse/HIVE-6013 hive use the backstick as the quotestring so, I think the getIdentifierQuoteString() function should return the backstick rather than the space --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Review Request 32809: Disallow create table with dot/colon in column name
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32809/#review79229 --- Ship it! Ship It! - John Pullokkaran On April 7, 2015, 6:18 p.m., pengcheng xiong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32809/ --- (Updated April 7, 2015, 6:18 p.m.) Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- Since we don't allow users to query column names with dot in the middle such as emp.no, don't allow users to create tables with such columns that cannot be queried. Fix the documentation to reflect this fix. Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 2e583da ql/src/test/org/apache/hadoop/hive/ql/parse/TestUnpermittedCharsInColumnNameCreateTableNegative.java PRE-CREATION Diff: https://reviews.apache.org/r/32809/diff/ Testing --- Thanks, pengcheng xiong
[jira] [Created] (HIVE-10241) ACID: drop table doesn't acquire any locks
Eugene Koifman created HIVE-10241: - Summary: ACID: drop table doesn't acquire any locks Key: HIVE-10241 URL: https://issues.apache.org/jira/browse/HIVE-10241 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman with Hive configured to use DbTxnManager, in DbTxnManager.acquireLocks() both plan.getInputs() and plan.getOutputs() are empty when drop table foo is executed and thus no locks are acquired. We should be acquiring X locks to make sure any readers of this table don't get data wiped out while read is in progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 32809: Disallow create table with dot/colon in column name
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32809/ --- (Updated April 7, 2015, 6:18 p.m.) Review request for hive, Ashutosh Chauhan and John Pullokkaran. Changes --- thanks for Swarnim Kulkarni's comments. I tried to answer and address them. Repository: hive-git Description --- Since we don't allow users to query column names with dot in the middle such as emp.no, don't allow users to create tables with such columns that cannot be queried. Fix the documentation to reflect this fix. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 2e583da ql/src/test/org/apache/hadoop/hive/ql/parse/TestUnpermittedCharsInColumnNameCreateTableNegative.java PRE-CREATION Diff: https://reviews.apache.org/r/32809/diff/ Testing --- Thanks, pengcheng xiong
Re: Review Request 32370: HIVE-10040
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32370/#review79228 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java https://reviews.apache.org/r/32370/#comment128449 As we discussed: 1. Move the supported JoinAlgorithm to Sub Class (i.e target exec engine) 2. Move Cost Computation to sub class/target exec engine 3. This logic here should consult target exec engine for supported algorithms, iterate through them and find the cheapest one with out actually knowing anything about algorithm itself. - John Pullokkaran On April 6, 2015, 9:30 p.m., Jesús Camacho Rodríguez wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32370/ --- (Updated April 6, 2015, 9:30 p.m.) Review request for hive and John Pullokkaran. Bugs: HIVE-10040 https://issues.apache.org/jira/browse/HIVE-10040 Repository: hive-git Description --- CBO (Calcite Return Path): Pluggable cost modules [CBO branch] Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7adb38342bfaf72f152a16006bc0bfecbb28f5ed ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java 977313a5a632329fc963daf7ff276ccdd59ce7c5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java 41604cd0af68e7f90296fa271c42debc5aaf743a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveRelMdCost.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java 9a8a5da81b92c7c1f33d1af8072b1fb94e237290 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java 3e45a3fbed3265b126a3ff9b6ffe44bee24453ef ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java f411d9029cf244b66ef1d1591ea55f11f7cb9d27 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveLimit.java 5fc64f3e8c97fc8988bc35be39dbabf78dd7de24 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java 6c215c96190f0fcebe063b15c2763c49ebf1faaf ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java fcf09a5de0e318c6fb69664a8dd618f2d9ae84e5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCollation.java 4984683c3c8c6c0378a22e21fd6d961f3901f25c ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistribution.java f846dd19899af51194f3407ef913fcb9bcc24977 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java dabbe280278dc80f00f0240a0c615fe6c7b8533a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java 95515b23e409d73d5c61e107931727add3f992a6 Diff: https://reviews.apache.org/r/32370/diff/ Testing --- Thanks, Jesús Camacho Rodríguez
[jira] [Created] (HIVE-10242) ACID: insert overwrite prevents create table command
Eugene Koifman created HIVE-10242: - Summary: ACID: insert overwrite prevents create table command Key: HIVE-10242 URL: https://issues.apache.org/jira/browse/HIVE-10242 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman 1. insert overwirte table DB.T1 select ... from T2: this takes X lock on DB.T1 and S lock on T2. X lock makes sense because we don't want anyone reading T1 while it's overwritten. S lock on T2 prevents if from being dropped while the query is in progress. 2. create table DB.T3: takes S lock on DB. This S lock gets blocked by X lock on T1. S lock prevents the DB from being dropped while create table is executed. If the insert statement is long running, this blocks DDL ops on the same database. This is a usability issue. There is no good reason why X lock on a table within a DB and S lock on DB should be in conflict. (this is different from a situation where X lock is on a partition and S lock is on the table to which this partition belongs. Here it makes sense. Basically there is no SQL way to address all tables in a DB but you can easily refer to all partitions of a table) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 32809: Disallow create table with dot/colon in column name
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32809/#review79203 --- ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g https://reviews.apache.org/r/32809/#comment128410 Please address Swarnism's comments - John Pullokkaran On April 3, 2015, 6:24 a.m., pengcheng xiong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32809/ --- (Updated April 3, 2015, 6:24 a.m.) Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- Since we don't allow users to query column names with dot in the middle such as emp.no, don't allow users to create tables with such columns that cannot be queried. Fix the documentation to reflect this fix. Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 2e583da ql/src/test/org/apache/hadoop/hive/ql/parse/TestUnpermittedCharsInColumnNameCreateTableNegative.java PRE-CREATION Diff: https://reviews.apache.org/r/32809/diff/ Testing --- Thanks, pengcheng xiong
Re: Review Request 32809: Disallow create table with dot/colon in column name
On April 7, 2015, 4:45 a.m., Swarnim Kulkarni wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g, line 632 https://reviews.apache.org/r/32809/diff/1/?file=914560#file914560line632 If you choose to use a String.contains, this could as well be a character array. Thanks for your comment. But I assume that char is enough for my purpose. On April 7, 2015, 4:45 a.m., Swarnim Kulkarni wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g, line 634 https://reviews.apache.org/r/32809/diff/1/?file=914560#file914560line634 Why not simply use string.contains here? The contains method is implemented using a call to indexOf, so they are essentially the same. public boolean contains(CharSequence s) { return indexOf(s.toString()) -1; } But in my case, I just would like to check if a string contains a char, rather than a CharSequence. Thus, I think indexOf would be better On April 7, 2015, 4:45 a.m., Swarnim Kulkarni wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g, line 635 https://reviews.apache.org/r/32809/diff/1/?file=914560#file914560line635 This and the following line can be simplified as return input.indexOf(c); The purpose of the function is to test whether a string contains a char. The actual index is only used to check if it is there, the detailed position information is not needed. That is to say, a boolean return value is enough for my purpose. - pengcheng --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32809/#review79121 --- On April 3, 2015, 6:24 a.m., pengcheng xiong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32809/ --- (Updated April 3, 2015, 6:24 a.m.) Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- Since we don't allow users to query column names with dot in the middle such as emp.no, don't allow users to create tables with such columns that cannot be queried. Fix the documentation to reflect this fix. Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 2e583da ql/src/test/org/apache/hadoop/hive/ql/parse/TestUnpermittedCharsInColumnNameCreateTableNegative.java PRE-CREATION Diff: https://reviews.apache.org/r/32809/diff/ Testing --- Thanks, pengcheng xiong
Re: Review Request 32370: HIVE-10040
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32370/ --- (Updated April 7, 2015, 7:12 p.m.) Review request for hive and John Pullokkaran. Changes --- Address John's comments. Bugs: HIVE-10040 https://issues.apache.org/jira/browse/HIVE-10040 Repository: hive-git Description --- CBO (Calcite Return Path): Pluggable cost modules [CBO branch] Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7adb38342bfaf72f152a16006bc0bfecbb28f5ed ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java 977313a5a632329fc963daf7ff276ccdd59ce7c5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java 41604cd0af68e7f90296fa271c42debc5aaf743a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveRelMdCost.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java 9a8a5da81b92c7c1f33d1af8072b1fb94e237290 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java 3e45a3fbed3265b126a3ff9b6ffe44bee24453ef ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java f411d9029cf244b66ef1d1591ea55f11f7cb9d27 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveLimit.java 5fc64f3e8c97fc8988bc35be39dbabf78dd7de24 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java 6c215c96190f0fcebe063b15c2763c49ebf1faaf ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java c8e9b52258eb209535a2bbfe512cf2d04178cb4b ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCollation.java 4984683c3c8c6c0378a22e21fd6d961f3901f25c ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistribution.java f846dd19899af51194f3407ef913fcb9bcc24977 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdMemory.java 207f402013b5c5b2d4ada5493122427ddce9270d ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdParallelism.java 95c2be50c07f0ed6da373425e60f1185cb2cfe2b ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java dabbe280278dc80f00f0240a0c615fe6c7b8533a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java 95515b23e409d73d5c61e107931727add3f992a6 Diff: https://reviews.apache.org/r/32370/diff/ Testing --- Thanks, Jesús Camacho Rodríguez
Re: Review Request 32406: Add another level of explain for RDBMS audience
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32406/#review79199 --- Ship it! Ship It! - John Pullokkaran On April 7, 2015, 12:42 a.m., pengcheng xiong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32406/ --- (Updated April 7, 2015, 12:42 a.m.) Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- Current Hive Explain (default) is targeted at MR Audience. We need a new level of explain plan to be targeted at RDBMS audience. The explain requires these: 1) The focus needs to be on what part of the query is being executed rather than internals of the engines 2) There needs to be a clearly readable tree of operations 3) Examples - Table scan should mention the table being scanned, the Sarg, the size of table and expected cardinality after the Sarg'ed read. The join should mention the table being joined with and the join condition. The aggregate should mention the columns in the group-by. Diffs - common/pom.xml 5b0e78c common/src/java/org/apache/hadoop/hive/common/jsonexplain/JsonParser.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/jsonexplain/JsonParserFactory.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/Attr.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/Connection.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/Op.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/Stage.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/TezJsonParser.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/Vertex.java PRE-CREATION common/src/java/org/apache/hadoop/hive/conf/HiveConf.java cc16c38 itests/src/test/resources/testconfiguration.properties 288270e ql/src/java/org/apache/hadoop/hive/ql/Context.java 0f7da53 ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 149f911 ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileWork.java e572338 ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanWork.java 095afd4 ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/truncate/ColumnTruncateWork.java 092f627 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/CalciteSemanticException.java a71cd35 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java 6c215c9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/RexNodeConverter.java 29134a4 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/SqlFunctionConverter.java 5c0616e ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/TypeConverter.java 8c3587e ql/src/java/org/apache/hadoop/hive/ql/parse/AlterTablePartMergeFilesDesc.java eaf3dc4 ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java c8bf7dc ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java 38b6d96 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1f6d53d ql/src/java/org/apache/hadoop/hive/ql/plan/AbstractOperatorDesc.java 9834fc8 ql/src/java/org/apache/hadoop/hive/ql/plan/AlterDatabaseDesc.java e45bc26 ql/src/java/org/apache/hadoop/hive/ql/plan/AlterIndexDesc.java db2cf7f ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableDesc.java 24cf1da ql/src/java/org/apache/hadoop/hive/ql/plan/ArchiveWork.java 9fb5c8b ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 6ab75a7 ql/src/java/org/apache/hadoop/hive/ql/plan/BucketMapJoinContext.java f436bc0 ql/src/java/org/apache/hadoop/hive/ql/plan/CollectDesc.java 588e14d ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsDesc.java a44c8e8 ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java d644155 ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsWork.java 3cae727 ql/src/java/org/apache/hadoop/hive/ql/plan/CommonMergeJoinDesc.java 2354139 ql/src/java/org/apache/hadoop/hive/ql/plan/CopyWork.java 3353384 ql/src/java/org/apache/hadoop/hive/ql/plan/CreateDatabaseDesc.java a6b52aa ql/src/java/org/apache/hadoop/hive/ql/plan/CreateFunctionDesc.java dce5ece ql/src/java/org/apache/hadoop/hive/ql/plan/CreateMacroDesc.java 3c5a723 ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 8cadb96 ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableLikeDesc.java 3dad4ab ql/src/java/org/apache/hadoop/hive/ql/plan/CreateViewDesc.java dd76a82
[jira] [Created] (HIVE-10240) Patch HIVE-9473 breaks KERBEROS
Olaf Flebbe created HIVE-10240: -- Summary: Patch HIVE-9473 breaks KERBEROS Key: HIVE-10240 URL: https://issues.apache.org/jira/browse/HIVE-10240 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2 Affects Versions: 1.0.0 Reporter: Olaf Flebbe Fix For: 1.0.1 The patch from HIVE-9473 introduces a regression. Hive-Server2 does not start properly any more for our config (more or less the bigtop environment) sql std auth enabled, enableDoAs disabled, tez enabled, kerberos enabled. Problem seems to be that the kerberos ticket is not present when hive-server2 tries first to access HDFS. When HIVE-9473 is reverted getting the ticket is one of the first things hive-server2 does. Posting startup of vanilla hive-1.0.0 and startup of a hive-1.0.0 with this commit revoked, where hive-server2 correctly starts. {code} commit 35582c2065a6b90b003a656bdb3b0ff08b0c35b9 Author: Thejas Nair the...@apache.org Date: Fri Jan 30 00:05:50 2015 + HIVE-9473 : sql std auth should disallow built-in udfs that allow any java methods to be called (Thejas Nair, reviewed by Jason Dere) git-svn-id: https://svn.apache.org/repos/asf/hive/branches/branch-1.0@1655891 13f79535-47bb-0310-9956-ffa450edef68 {code} revoked. Startup of vanilla hive-1.0.0 hive-server2 {code} STARTUP_MSG: build = git://os2-debian80/net/os2-debian80/fs1/olaf/bigtop/output/hive/hive-1.0.0 -r 813996292c9f966109f990127ddd5673cf813125; compiled by 'olaf' on Tue Apr 7 09:33:01 CEST 2015 / 2015-04-07 10:23:52,579 INFO [main]: server.HiveServer2 (HiveServer2.java:startHiveServer2(292)) - Starting HiveServer2 2015-04-07 10:23:53,104 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(556)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 2015-04-07 10:23:53,135 INFO [main]: metastore.ObjectStore (ObjectStore.java:initialize(264)) - ObjectStore, initialize called 2015-04-07 10:23:54,775 INFO [main]: metastore.ObjectStore (ObjectStore.java:getPMF(345)) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes=Table,StorageDescriptor,SerDeInfo,Pa rtition,Database,Type,FieldSchema,Order 2015-04-07 10:23:56,953 INFO [main]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:init(132)) - Using direct SQL, underlying DB is DERBY 2015-04-07 10:23:56,954 INFO [main]: metastore.ObjectStore (ObjectStore.java:setConf(247)) - Initialized ObjectStore 2015-04-07 10:23:57,275 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles_core(630)) - Added admin role in metastore 2015-04-07 10:23:57,276 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles_core(639)) - Added public role in metastore 2015-04-07 10:23:58,241 WARN [main]: ipc.Client (Client.java:run(675)) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2015-04-07 10:23:58,248 WARN [main]: ipc.Client (Client.java:run(675)) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2015-04-07 10:23:58,249 INFO [main]: retry.RetryInvocationHandler (RetryInvocationHandler.java:invoke(140)) - Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB over node2.proto.bsi.de/192.168.100.22:8020 after 1 fail over attempts. Trying to fail over immediately. java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: node2.proto.bsi.de/192.168.100.22; destination host is: node2.proto.bsi.de:8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at
Re: Review Request 32809: Disallow create table with dot/colon in column name
On April 7, 2015, 4:45 a.m., Swarnim Kulkarni wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g, line 635 https://reviews.apache.org/r/32809/diff/1/?file=914560#file914560line635 This and the following line can be simplified as return input.indexOf(c); pengcheng xiong wrote: The purpose of the function is to test whether a string contains a char. The actual index is only used to check if it is there, the detailed position information is not needed. That is to say, a boolean return value is enough for my purpose. Yup. I think your logic is correct. Just having a return input.indexOf(c); serves the same purpose but is simpler. :) - Swarnim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32809/#review79121 --- On April 7, 2015, 6:18 p.m., pengcheng xiong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32809/ --- (Updated April 7, 2015, 6:18 p.m.) Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- Since we don't allow users to query column names with dot in the middle such as emp.no, don't allow users to create tables with such columns that cannot be queried. Fix the documentation to reflect this fix. Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 2e583da ql/src/test/org/apache/hadoop/hive/ql/parse/TestUnpermittedCharsInColumnNameCreateTableNegative.java PRE-CREATION Diff: https://reviews.apache.org/r/32809/diff/ Testing --- Thanks, pengcheng xiong
Re: Review Request 32370: HIVE-10040
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32370/#review79235 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java https://reviews.apache.org/r/32370/#comment128458 This could be in subclass; this will make HiveCostModel opaque to a specific exec engine's algorithms. - John Pullokkaran On April 7, 2015, 7:12 p.m., Jesús Camacho Rodríguez wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32370/ --- (Updated April 7, 2015, 7:12 p.m.) Review request for hive and John Pullokkaran. Bugs: HIVE-10040 https://issues.apache.org/jira/browse/HIVE-10040 Repository: hive-git Description --- CBO (Calcite Return Path): Pluggable cost modules [CBO branch] Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7adb38342bfaf72f152a16006bc0bfecbb28f5ed ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java 977313a5a632329fc963daf7ff276ccdd59ce7c5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java 41604cd0af68e7f90296fa271c42debc5aaf743a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveRelMdCost.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java 9a8a5da81b92c7c1f33d1af8072b1fb94e237290 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java 3e45a3fbed3265b126a3ff9b6ffe44bee24453ef ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java f411d9029cf244b66ef1d1591ea55f11f7cb9d27 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveLimit.java 5fc64f3e8c97fc8988bc35be39dbabf78dd7de24 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java 6c215c96190f0fcebe063b15c2763c49ebf1faaf ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java c8e9b52258eb209535a2bbfe512cf2d04178cb4b ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCollation.java 4984683c3c8c6c0378a22e21fd6d961f3901f25c ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistribution.java f846dd19899af51194f3407ef913fcb9bcc24977 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdMemory.java 207f402013b5c5b2d4ada5493122427ddce9270d ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdParallelism.java 95c2be50c07f0ed6da373425e60f1185cb2cfe2b ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java dabbe280278dc80f00f0240a0c615fe6c7b8533a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java 95515b23e409d73d5c61e107931727add3f992a6 Diff: https://reviews.apache.org/r/32370/diff/ Testing --- Thanks, Jesús Camacho Rodríguez
Re: Review Request 32370: HIVE-10040
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32370/#review79253 --- Ship it! Ship It! - John Pullokkaran On April 7, 2015, 7:18 p.m., Jesús Camacho Rodríguez wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32370/ --- (Updated April 7, 2015, 7:18 p.m.) Review request for hive and John Pullokkaran. Bugs: HIVE-10040 https://issues.apache.org/jira/browse/HIVE-10040 Repository: hive-git Description --- CBO (Calcite Return Path): Pluggable cost modules [CBO branch] Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7adb38342bfaf72f152a16006bc0bfecbb28f5ed ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java 977313a5a632329fc963daf7ff276ccdd59ce7c5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java 41604cd0af68e7f90296fa271c42debc5aaf743a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveRelMdCost.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java 9a8a5da81b92c7c1f33d1af8072b1fb94e237290 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java 3e45a3fbed3265b126a3ff9b6ffe44bee24453ef ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java f411d9029cf244b66ef1d1591ea55f11f7cb9d27 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveLimit.java 5fc64f3e8c97fc8988bc35be39dbabf78dd7de24 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java 6c215c96190f0fcebe063b15c2763c49ebf1faaf ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java c8e9b52258eb209535a2bbfe512cf2d04178cb4b ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCollation.java 4984683c3c8c6c0378a22e21fd6d961f3901f25c ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistribution.java f846dd19899af51194f3407ef913fcb9bcc24977 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdMemory.java 207f402013b5c5b2d4ada5493122427ddce9270d ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdParallelism.java 95c2be50c07f0ed6da373425e60f1185cb2cfe2b ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java dabbe280278dc80f00f0240a0c615fe6c7b8533a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java 95515b23e409d73d5c61e107931727add3f992a6 Diff: https://reviews.apache.org/r/32370/diff/ Testing --- Thanks, Jesús Camacho Rodríguez
Can anyone review HIVE-9864 Create UDF jsonpath which support full JsonPath syntax
Hi Everyone Can anyone review HIVE-9864 Create UDF jsonpath which support full JsonPath syntax ? It uses Jayway JsonPath 2.0.0 library to resolve JsonPath expressions https://github.com/jayway/JsonPath New UDF jsonpath supports full JsonPath syntax in comparison to old get_json_object UDF which supports only limited JsonPath syntax. https://issues.apache.org/jira/browse/HIVE-9864 https://reviews.apache.org/r/32387/diff/# Thank you Alex
[jira] [Created] (HIVE-10243) Introduce JoinAlgorithm Interface
Laljo John Pullokkaran created HIVE-10243: - Summary: Introduce JoinAlgorithm Interface Key: HIVE-10243 URL: https://issues.apache.org/jira/browse/HIVE-10243 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 32920: HIVE-10189: Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32920/#review79241 --- itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizationBench.java https://reviews.apache.org/r/32920/#comment128472 I believe you meant to check for 'Double', right? itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizationBench.java https://reviews.apache.org/r/32920/#comment128475 Did you mean to check for 'Long' here? - Sergio Pena On April 7, 2015, 6:06 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32920/ --- (Updated April 7, 2015, 6:06 a.m.) Review request for hive and chengxiang li. Repository: hive-git Description --- Add microbenchmark tool to show performance improvement by JMH Diffs - itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizationBench.java PRE-CREATION Diff: https://reviews.apache.org/r/32920/diff/ Testing --- Thanks, cheng xu
Re: Review Request 32901: HIVE-10226 Column stats for Date columns not supported
On April 7, 2015, 4:19 a.m., Swarnim Kulkarni wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java, line 1347 https://reviews.apache.org/r/32901/diff/1/?file=918312#file918312line1347 Nit: Else not needed. The new patch will look a bit different, so this will not be needed. On April 7, 2015, 4:19 a.m., Swarnim Kulkarni wrote: ql/src/test/results/clientpositive/compute_stats_date.q.out, line 110 https://reviews.apache.org/r/32901/diff/1/?file=918314#file918314line110 Getting rid of the tabs here would be nice. This is query output generated by Hive, the tabs are expected here as column output delimiters. On April 7, 2015, 4:19 a.m., Swarnim Kulkarni wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java, line 1341 https://reviews.apache.org/r/32901/diff/1/?file=918312#file918312line1341 Would it be a little safer here to assert that parameters has atleast 2 values in it so that we do not fail with an ArrayIndexOutOfBoundsException? GenericUDAFComputeStats.getEvaluator(), which instantiates the various StatsEvaluators, is already doing the initial checking of the array size. The other StatsEvaluators for the other types are relying on this as well. - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32901/#review79119 --- On April 6, 2015, 9:01 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32901/ --- (Updated April 6, 2015, 9:01 p.m.) Review request for hive, Ashutosh Chauhan and Prasanth_J. Bugs: HIVE-10226 https://issues.apache.org/jira/browse/HIVE-10226 Repository: hive-git Description --- Re-use the long stats for Date column stats, using the days since epoch value as the long value. Diffs - metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 475883b ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 0c46b00 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 363039b ql/src/test/queries/clientpositive/compute_stats_date.q PRE-CREATION ql/src/test/results/clientpositive/compute_stats_date.q.out PRE-CREATION Diff: https://reviews.apache.org/r/32901/diff/ Testing --- Thanks, Jason Dere
Re: Review Request 32370: HIVE-10040
On April 7, 2015, 8:38 p.m., John Pullokkaran wrote: Ship It! Remaining review comment will be addressed in HIVE-10243. - John --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32370/#review79253 --- On April 7, 2015, 7:18 p.m., Jesús Camacho Rodríguez wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32370/ --- (Updated April 7, 2015, 7:18 p.m.) Review request for hive and John Pullokkaran. Bugs: HIVE-10040 https://issues.apache.org/jira/browse/HIVE-10040 Repository: hive-git Description --- CBO (Calcite Return Path): Pluggable cost modules [CBO branch] Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7adb38342bfaf72f152a16006bc0bfecbb28f5ed ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java 977313a5a632329fc963daf7ff276ccdd59ce7c5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java 41604cd0af68e7f90296fa271c42debc5aaf743a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveRelMdCost.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java 9a8a5da81b92c7c1f33d1af8072b1fb94e237290 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java 3e45a3fbed3265b126a3ff9b6ffe44bee24453ef ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java f411d9029cf244b66ef1d1591ea55f11f7cb9d27 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveLimit.java 5fc64f3e8c97fc8988bc35be39dbabf78dd7de24 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java 6c215c96190f0fcebe063b15c2763c49ebf1faaf ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java c8e9b52258eb209535a2bbfe512cf2d04178cb4b ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCollation.java 4984683c3c8c6c0378a22e21fd6d961f3901f25c ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistribution.java f846dd19899af51194f3407ef913fcb9bcc24977 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdMemory.java 207f402013b5c5b2d4ada5493122427ddce9270d ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdParallelism.java 95c2be50c07f0ed6da373425e60f1185cb2cfe2b ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java dabbe280278dc80f00f0240a0c615fe6c7b8533a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java 95515b23e409d73d5c61e107931727add3f992a6 Diff: https://reviews.apache.org/r/32370/diff/ Testing --- Thanks, Jesús Camacho Rodríguez
Re: Review Request 32370: HIVE-10040
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32370/ --- (Updated April 7, 2015, 7:18 p.m.) Review request for hive and John Pullokkaran. Changes --- Added override annotation that was missing. Bugs: HIVE-10040 https://issues.apache.org/jira/browse/HIVE-10040 Repository: hive-git Description --- CBO (Calcite Return Path): Pluggable cost modules [CBO branch] Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7adb38342bfaf72f152a16006bc0bfecbb28f5ed ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java 977313a5a632329fc963daf7ff276ccdd59ce7c5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java 41604cd0af68e7f90296fa271c42debc5aaf743a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveRelMdCost.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java 9a8a5da81b92c7c1f33d1af8072b1fb94e237290 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java 3e45a3fbed3265b126a3ff9b6ffe44bee24453ef ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java f411d9029cf244b66ef1d1591ea55f11f7cb9d27 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveLimit.java 5fc64f3e8c97fc8988bc35be39dbabf78dd7de24 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java 6c215c96190f0fcebe063b15c2763c49ebf1faaf ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java c8e9b52258eb209535a2bbfe512cf2d04178cb4b ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCollation.java 4984683c3c8c6c0378a22e21fd6d961f3901f25c ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistribution.java f846dd19899af51194f3407ef913fcb9bcc24977 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdMemory.java 207f402013b5c5b2d4ada5493122427ddce9270d ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdParallelism.java 95c2be50c07f0ed6da373425e60f1185cb2cfe2b ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java dabbe280278dc80f00f0240a0c615fe6c7b8533a ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java 95515b23e409d73d5c61e107931727add3f992a6 Diff: https://reviews.apache.org/r/32370/diff/ Testing --- Thanks, Jesús Camacho Rodríguez
[jira] [Created] (HIVE-10246) [CBO] Table alias should be stored with Scan object, instead of Table object
Ashutosh Chauhan created HIVE-10246: --- Summary: [CBO] Table alias should be stored with Scan object, instead of Table object Key: HIVE-10246 URL: https://issues.apache.org/jira/browse/HIVE-10246 Project: Hive Issue Type: Improvement Components: CBO, Diagnosability, Query Planning Affects Versions: cbo-branch Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 32901: HIVE-10226 Column stats for Date columns not supported
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32901/ --- (Updated April 7, 2015, 10:05 p.m.) Review request for hive, Ashutosh Chauhan and Prasanth_J. Changes --- Check for null values in Date/Decimal versions of MetaDataFormatUtils.convertToString() Bugs: HIVE-10226 https://issues.apache.org/jira/browse/HIVE-10226 Repository: hive-git Description --- Re-use the long stats for Date column stats, using the days since epoch value as the long value. Diffs (updated) - metastore/if/hive_metastore.thrift 57bce0c metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java 1666dc3 metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java bce9f0f ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 0c46b00 ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java b85282c ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java 1662696 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 363039b ql/src/test/queries/clientpositive/compute_stats_date.q PRE-CREATION ql/src/test/results/clientpositive/compute_stats_date.q.out PRE-CREATION Diff: https://reviews.apache.org/r/32901/diff/ Testing --- Thanks, Jason Dere
Review Request 32941: HIVE-10122
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32941/ --- Review request for hive and Ashutosh Chauhan. Repository: hive-git Description --- see JIRA Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java e34ce53 Diff: https://reviews.apache.org/r/32941/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Created] (HIVE-10245) LLAP: Make use of the timed version of getDagStatus in TezJobMonitor
Siddharth Seth created HIVE-10245: - Summary: LLAP: Make use of the timed version of getDagStatus in TezJobMonitor Key: HIVE-10245 URL: https://issues.apache.org/jira/browse/HIVE-10245 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap Version of HIVE-10157 for the LLAP branch since this already works with a branch based on tez 0.7. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 32901: HIVE-10226 Column stats for Date columns not supported
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32901/ --- (Updated April 7, 2015, 9:48 p.m.) Review request for hive, Ashutosh Chauhan and Prasanth_J. Changes --- Created new DateColumnStatsData in hive_metastore.thrift, which will be used for the date stats. Also updated describe formatter/UpdateStatsTask to handle Date column stats. I've removed the generated Thrift code from diff to make this more readable. Bugs: HIVE-10226 https://issues.apache.org/jira/browse/HIVE-10226 Repository: hive-git Description --- Re-use the long stats for Date column stats, using the days since epoch value as the long value. Diffs (updated) - metastore/if/hive_metastore.thrift 57bce0c metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 475883b metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java 1666dc3 metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java bce9f0f ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 0c46b00 ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java b85282c ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java 1662696 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 363039b ql/src/test/queries/clientpositive/compute_stats_date.q PRE-CREATION ql/src/test/results/clientpositive/compute_stats_date.q.out PRE-CREATION Diff: https://reviews.apache.org/r/32901/diff/ Testing --- Thanks, Jason Dere
[jira] [Created] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled
Mostafa Mokhtar created HIVE-10244: -- Summary: Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled Key: HIVE-10244 URL: https://issues.apache.org/jira/browse/HIVE-10244 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline Query {code} set hive.vectorized.execution.reduce.enabled=true; with ssr as (select s_store_id as store_id, sum(ss_ext_sales_price) as sales, sum(coalesce(sr_return_amt, 0)) as returns, sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit from store_sales left outer join store_returns on (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number), date_dim, store, item, promotion where ss_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and ss_store_sk = s_store_sk and ss_item_sk = i_item_sk and i_current_price 50 and ss_promo_sk = p_promo_sk and p_channel_tv = 'N' group by s_store_id) , csr as (select cp_catalog_page_id as catalog_page_id, sum(cs_ext_sales_price) as sales, sum(coalesce(cr_return_amount, 0)) as returns, sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit from catalog_sales left outer join catalog_returns on (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number), date_dim, catalog_page, item, promotion where cs_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and cs_catalog_page_sk = cp_catalog_page_sk and cs_item_sk = i_item_sk and i_current_price 50 and cs_promo_sk = p_promo_sk and p_channel_tv = 'N' group by cp_catalog_page_id) , wsr as (select web_site_id, sum(ws_ext_sales_price) as sales, sum(coalesce(wr_return_amt, 0)) as returns, sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit from web_sales left outer join web_returns on (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number), date_dim, web_site, item, promotion where ws_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and ws_web_site_sk = web_site_sk and ws_item_sk = i_item_sk and i_current_price 50 and ws_promo_sk = p_promo_sk and p_channel_tv = 'N' group by web_site_id) select channel , id , sum(sales) as sales , sum(returns) as returns , sum(profit) as profit from (select 'store channel' as channel , concat('store', store_id) as id , sales , returns , profit from ssr union all select 'catalog channel' as channel , concat('catalog_page', catalog_page_id) as id , sales , returns , profit from csr union all select 'web channel' as channel , concat('web_site', web_site_id) as id , sales , returns , profit from wsr ) x group by channel, id with rollup order by channel ,id limit 100 {code} Exception {code} Vertex failed, vertexName=Reducer 5, vertexId=vertex_1426707664723_1377_1_22, diagnostics=[Task failed, taskId=task_1426707664723_1377_1_22_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) \N\N09.285817653506076E84.639990363237801E7-1.1814318134887291E8 \N\N04.682909323885761E82.2415242712669864E7-5.966176123188091E7 \N\N01.2847032699693155E96.300096113768728E7-5.94963316209578E8 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at
[jira] [Created] (HIVE-10247) [Refactor] Move Noop TableFunctionEvaluator to contrib/ module
Ashutosh Chauhan created HIVE-10247: --- Summary: [Refactor] Move Noop TableFunctionEvaluator to contrib/ module Key: HIVE-10247 URL: https://issues.apache.org/jira/browse/HIVE-10247 Project: Hive Issue Type: Task Components: UDF Reporter: Ashutosh Chauhan see comments from [HIVE-9073 |https://issues.apache.org/jira/browse/HIVE-9073?focusedCommentId=14481894page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14481894] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 31178: Discrepancy in cardinality estimates between partitioned and un-partitioned tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31178/ --- (Updated April 8, 2015, 12:40 a.m.) Review request for hive and Ashutosh Chauhan. Changes --- Address test failures. Repository: hive-git Description --- The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range is contained within each partition and is calculates as select max(NUM_DISTINCTS) from PART_COL_STATS” . This is problematic for columns like ticket number which are naturally increasing with the partitioned date column ss_sold_date_sk. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2b8280e data/files/extrapolate_stats_partial_ndv.txt PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java 74f1b01 metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java 7fc04f1 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java ba27f10 metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 75005aa metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 475883b ql/src/test/queries/clientpositive/extrapolate_part_stats_partial_ndv.q PRE-CREATION ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out PRE-CREATION Diff: https://reviews.apache.org/r/31178/diff/ Testing --- Thanks, pengcheng xiong
[jira] [Created] (HIVE-10248) LLAP: Fix merge conflicts related to HIVE-10067
Prasanth Jayachandran created HIVE-10248: Summary: LLAP: Fix merge conflicts related to HIVE-10067 Key: HIVE-10248 URL: https://issues.apache.org/jira/browse/HIVE-10248 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Some changes were lost in the recent trunk to llap merge related to HIVE-10067 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10249) ACID: show locks should show who the lock is waiting for
Eugene Koifman created HIVE-10249: - Summary: ACID: show locks should show who the lock is waiting for Key: HIVE-10249 URL: https://issues.apache.org/jira/browse/HIVE-10249 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman instead of just showing state WAITING, we should include what the lock is waiting for. It will make diagnostics easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10250) Optimize AuthorizationPreEventListener to reuse TableWrapper objects
Mithun Radhakrishnan created HIVE-10250: --- Summary: Optimize AuthorizationPreEventListener to reuse TableWrapper objects Key: HIVE-10250 URL: https://issues.apache.org/jira/browse/HIVE-10250 Project: Hive Issue Type: Bug Components: Authorization Reporter: Mithun Radhakrishnan Here's the {{PartitionWrapper}} class in {{AuthorizationPreEventListener}}: {code:java|title=AuthorizationPreEventListener.java} public static class PartitionWrapper extends org.apache.hadoop.hive.ql.metadata.Partition { ... public PartitionWrapper(org.apache.hadoop.hive.metastore.api.Partition mapiPart, PreEventContext context) throws ... { Partition wrapperApiPart = mapiPart.deepCopy(); Table t = context.getHandler().get_table_core( mapiPart.getDbName(), mapiPart.getTableName()); ... } {code} {{PreAddPartitionEvent}} (and soon, {{PreDropPartitionEvent}}) correspond not just to a single partition, but an entire set of partitions added atomically. When the event is authorized, {{HMSHandler.get_table_core()}} will be called once for every partition in the Event instance. Since we already make the assumption that the partition-sets correspond to a single table, we might as well make a single call. I'll have a patch for this, shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10252) Make PPD work for Parquet in row group level
Dong Chen created HIVE-10252: Summary: Make PPD work for Parquet in row group level Key: HIVE-10252 URL: https://issues.apache.org/jira/browse/HIVE-10252 Project: Hive Issue Type: Sub-task Reporter: Dong Chen Assignee: Dong Chen In Hive, predicate pushdown figures out the search condition in HQL, serialize it, and push to file format. ORC could use the predicate to filter stripes. Similarly, Parquet should use the statics saved in row group to filter not match row group. But it does not work. In {{ParquetRecordReaderWrapper}}, it get splits with all row groups (client side), and push the filter to Parquet for further processing (parquet side). But in {{ParquetRecordReader.initializeInternalReader()}}, if the splits have already been selected by client side, it will not handle filter again. We should make the behavior consistent in Hive. Maybe we could get splits, filter them, and then pass to parquet. This means using client side strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10254) Parquet PPD support DECIMAL
Dong Chen created HIVE-10254: Summary: Parquet PPD support DECIMAL Key: HIVE-10254 URL: https://issues.apache.org/jira/browse/HIVE-10254 Project: Hive Issue Type: Sub-task Reporter: Dong Chen Assignee: Dong Chen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10253) Parquet PPD support DATE
Dong Chen created HIVE-10253: Summary: Parquet PPD support DATE Key: HIVE-10253 URL: https://issues.apache.org/jira/browse/HIVE-10253 Project: Hive Issue Type: Sub-task Reporter: Dong Chen Assignee: Dong Chen Hive should handle the DATE data type when generating and pushing the predicate to Parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10255) Parquet PPD support TIMESTAMP
Dong Chen created HIVE-10255: Summary: Parquet PPD support TIMESTAMP Key: HIVE-10255 URL: https://issues.apache.org/jira/browse/HIVE-10255 Project: Hive Issue Type: Sub-task Reporter: Dong Chen Assignee: Dong Chen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10256) Eliminate row groups based on the block statistics in Parquet
Dong Chen created HIVE-10256: Summary: Eliminate row groups based on the block statistics in Parquet Key: HIVE-10256 URL: https://issues.apache.org/jira/browse/HIVE-10256 Project: Hive Issue Type: Sub-task Reporter: Dong Chen Assignee: Dong Chen In Parquet PPD, the not matched row groups should be eliminated. See {{TestOrcSplitElimination}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10257) Ensure Parquet Hive has null optimization
Dong Chen created HIVE-10257: Summary: Ensure Parquet Hive has null optimization Key: HIVE-10257 URL: https://issues.apache.org/jira/browse/HIVE-10257 Project: Hive Issue Type: Sub-task Reporter: Dong Chen Assignee: Dong Chen In Parquet statistics, a boolean value {{hasNonNullValue}} is used for each column chunk. Hive could use this value to skip a column, avoid null-checking logic, and speed up vectorization like HIVE-4478 (in the future, it is not completed yet). In this Jira we could check whether this null optimization works, and make changes if any. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10258) LLAP: orc_llap test fails again
Sergey Shelukhin created HIVE-10258: --- Summary: LLAP: orc_llap test fails again Key: HIVE-10258 URL: https://issues.apache.org/jira/browse/HIVE-10258 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Prasanth Jayachandran {noformat} Caused by: java.io.IOException: java.io.IOException: java.io.IOException: Corruption in ORC data encountered. To skip reading corrupted data, set hive.exec.orc.skip.corrupt.data to true{noformat} llap_partitioned passes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10251) HIVE-9664 makes hive depend on ivysettings.xml
Sushanth Sowmyan created HIVE-10251: --- Summary: HIVE-9664 makes hive depend on ivysettings.xml Key: HIVE-10251 URL: https://issues.apache.org/jira/browse/HIVE-10251 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan HIVE-9664 makes hive depend on the existence of ivysettings.xml, and if it is not present, it makes hive NPE when instantiating a CLISessionState. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: ORC separate project
If I understood Allen's #2 comment, we are moving existing ORC code out of Hive and make it a separate project, which I definitely missed. Since existing Hive PMC has governance on the code, I would expect it's still the case even after the spinoff. Obviously the proposal doesn't reflect this. Thanks, Xuefu On Fri, Apr 3, 2015 at 12:51 PM, Alan Gates alanfga...@gmail.com wrote: A couple of points: 1) ORC isn't going into the incubator. The proposal before the board is for it to go straight to TLP. There's no graduation to depend on. 2) As currently proposed Hive would not depend on ORC to build. Hive users who wished to used ORC would obviously need to pull in ORC artifacts in addition to Hive. Given this I don't think it makes any sense to fork ORC and have it in both places. This actually seems the worse outcome, as the two will inevitably diverge. Alan. Xuefu Zhang xzh...@cloudera.com April 3, 2015 at 6:41 I actually have a different thought to share along the same line. ORC is not a subproject in Hive. I'm not sure if it's the best we can do by making a surgery on Hive in order to make ORC a TLP, Not only may this bring instability to Hive, but also it also makes Hive depend an incubating project. Not every project graduates(, though I do wish ORC a success as TLP), some of them fail. Instead, I like the idea of forking Hive ORC as TLP and Hive keeps whatever it has. This way, the new project can do whatever it wants, and Hive community probably doesn't care and has no saying to it. Once ORC as a TLP graduates, Hive community can decide whether to go along with it and if so how to integrate with it. I think this will subside the current controversy, help ORC proceed faster as a TLP, and leave the decision to the near future. Thanks, Xuefu Szehon Ho sze...@cloudera.com April 2, 2015 at 23:54 I also agree with this goal. As such, I think we should first see the proposal (JIRA?) for the storage-api refactoring and other related work of Orc separating as TLP before the actual separation happens, to make sure the separation is not done in a way taking us further from this goal. It may very well be this refactoring moves us closer to the goal, but seeing the proposal first would give a lot of clarity. Thanks Szehon On Thu, Apr 2, 2015 at 10:20 PM, Edward Capriolo edlinuxg...@gmail.com edlinuxg...@gmail.com Edward Capriolo edlinuxg...@gmail.com April 2, 2015 at 22:20 To reiterate, one thing I want to avoid is having hive rely on code that sits in several tiny silos across Apache projects, or Apache Licensed but not ASF projects. Hive is a mature TLP with a large number of committers and it would not be a good situation if often work gets bottle necked because changes had to be made across two projects simultaneously to commit a feature. Especially if the two projects do not share the same committer list. I think if could be done perfectly things like ORC, Parquet, whatever would be provided scope dependencies, meaning the project can be built without a particular piece but as a hole the project still works. (That might be easier said than done :) Nick Dimiduk ndimi...@gmail.com April 1, 2015 at 11:51 I think the storage-api would be very helpful for HBase integration as well. Owen O'Malley omal...@apache.org April 1, 2015 at 11:22 What I'd like to see here is well defined interfaces in Hive so that any storage format that wants can implement them. Hopefully that means things like interfaces and utility classes for acid, sargs, and vectorization move into this new Hive module storage-api. Then Orc, Parquet, etc. can depend on this module without needing to pull in all of Hive. Then Hive contributors would only be forced to make changes in Orc when they want to implement something in Orc. Agreed. The goal of the new module keep a clean separation between the code for ORC and Hive so that vectorization, sargs, and acid are kept in Hive and are not moved to or duplicated in the ORC project. .. Owen
Re: ORC separate project
Actually not so -- a spin-off project would have its own PMC and the Hive PMC wouldn't have any say-so. Of course, there would be some overlap of the two PMCs. (I'm not even sure if the PMC has governance of code, technically. That might belong to the committers or the development community. Well, the PMC does vote on release candidates so that's a kind of goverance. But the community is supposed to decide on major issues.) Anyway under the Apache license, nobody needs permission from the PMC to grab some code and use it for another purpose. -- Lefty On Tue, Apr 7, 2015 at 11:49 PM, Xuefu Zhang xzh...@cloudera.com wrote: If I understood Allen's #2 comment, we are moving existing ORC code out of Hive and make it a separate project, which I definitely missed. Since existing Hive PMC has governance on the code, I would expect it's still the case even after the spinoff. Obviously the proposal doesn't reflect this. Thanks, Xuefu On Fri, Apr 3, 2015 at 12:51 PM, Alan Gates alanfga...@gmail.com wrote: A couple of points: 1) ORC isn't going into the incubator. The proposal before the board is for it to go straight to TLP. There's no graduation to depend on. 2) As currently proposed Hive would not depend on ORC to build. Hive users who wished to used ORC would obviously need to pull in ORC artifacts in addition to Hive. Given this I don't think it makes any sense to fork ORC and have it in both places. This actually seems the worse outcome, as the two will inevitably diverge. Alan. Xuefu Zhang xzh...@cloudera.com April 3, 2015 at 6:41 I actually have a different thought to share along the same line. ORC is not a subproject in Hive. I'm not sure if it's the best we can do by making a surgery on Hive in order to make ORC a TLP, Not only may this bring instability to Hive, but also it also makes Hive depend an incubating project. Not every project graduates(, though I do wish ORC a success as TLP), some of them fail. Instead, I like the idea of forking Hive ORC as TLP and Hive keeps whatever it has. This way, the new project can do whatever it wants, and Hive community probably doesn't care and has no saying to it. Once ORC as a TLP graduates, Hive community can decide whether to go along with it and if so how to integrate with it. I think this will subside the current controversy, help ORC proceed faster as a TLP, and leave the decision to the near future. Thanks, Xuefu Szehon Ho sze...@cloudera.com April 2, 2015 at 23:54 I also agree with this goal. As such, I think we should first see the proposal (JIRA?) for the storage-api refactoring and other related work of Orc separating as TLP before the actual separation happens, to make sure the separation is not done in a way taking us further from this goal. It may very well be this refactoring moves us closer to the goal, but seeing the proposal first would give a lot of clarity. Thanks Szehon On Thu, Apr 2, 2015 at 10:20 PM, Edward Capriolo edlinuxg...@gmail.com edlinuxg...@gmail.com Edward Capriolo edlinuxg...@gmail.com April 2, 2015 at 22:20 To reiterate, one thing I want to avoid is having hive rely on code that sits in several tiny silos across Apache projects, or Apache Licensed but not ASF projects. Hive is a mature TLP with a large number of committers and it would not be a good situation if often work gets bottle necked because changes had to be made across two projects simultaneously to commit a feature. Especially if the two projects do not share the same committer list. I think if could be done perfectly things like ORC, Parquet, whatever would be provided scope dependencies, meaning the project can be built without a particular piece but as a hole the project still works. (That might be easier said than done :) Nick Dimiduk ndimi...@gmail.com April 1, 2015 at 11:51 I think the storage-api would be very helpful for HBase integration as well. Owen O'Malley omal...@apache.org April 1, 2015 at 11:22 What I'd like to see here is well defined interfaces in Hive so that any storage format that wants can implement them. Hopefully that means things like interfaces and utility classes for acid, sargs, and vectorization move into this new Hive module storage-api. Then Orc, Parquet, etc. can depend on this module without needing to pull in all of Hive. Then Hive contributors would only be forced to make changes in Orc when they want to implement something in Orc. Agreed. The goal of the new module keep a clean separation between the code for ORC and Hive so that vectorization, sargs, and acid are kept in Hive and are not moved to or duplicated in the ORC project. .. Owen