Hive-trunk-hadoop2 - Build # 600 - Still Failing
Changes for Build #567 [hashutosh] HIVE-5614 : Subquery support: allow subquery expressions in having clause (Harish Butani via Ashutosh Chauhan) Changes for Build #568 [xuefu] HIVE-5763: ExprNodeGenericFuncDesc.toString() generating unbalanced parenthesises (reviewed by Ashutosh) Changes for Build #569 Changes for Build #570 [rhbutani] HIVE-5849 Improve the stats of operators based on heuristics in the absence of any column statistics (Prasanth Jayachandran via Harish Butani) [hashutosh] HIVE-5793 : Update hive-default.xml.template for HIVE4002 (Navis via Ashutosh Chauhan) Changes for Build #571 [navis] HIVE-4518 : Should be removed files (OptrStatsGroupByHook, etc.) [navis] HIVE-5839 : BytesRefArrayWritable compareTo violates contract (Xuefu Zhang via Navis) [navis] HIVE-4518 : Missing file (HiveFatalException) [navis] HIVE-4518 : Counter Strike: Operation Operator (Gunther Hagleitner and Jason Dere via Navis) Changes for Build #572 [brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad Mujumdar, Navis via Brock Noland) Changes for Build #573 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #574 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #575 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #576 Changes for Build #577 Changes for Build #578 Changes for Build #579 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #580 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #581 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #582 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #583 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #584 [thejas] HIVE-5550 : Import fails for tables created with default text, sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas Nair) [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey Shelukhin via Eric Hanson) [hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via Ashutosh Chauhan) [xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via Xuefu) [brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles (Szehon Ho via Brock Noland) [brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock Noland reviewed by Navis) [brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh Chauhan) Changes for Build #585 [hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter mechanism (Ashutosh Chauhan via Navis) [xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock) Changes for Build #586 [hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis via Ashutosh Chauhan) [navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis) [hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates for join, limit and filter operator (Prasanth J via Harish Butani) [rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns (Jason Dere via Harish Butani) Changes for Build #587 [xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values (Szehon via Xuefu, reviewed by Navis) [brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by Prasad and Thejas) [hashutosh] HIVE-5909 : locate and instr
Hive-trunk-h0.21 - Build # 2501 - Still Failing
Changes for Build #2468 [hashutosh] HIVE-5614 : Subquery support: allow subquery expressions in having clause (Harish Butani via Ashutosh Chauhan) Changes for Build #2469 [xuefu] HIVE-5763: ExprNodeGenericFuncDesc.toString() generating unbalanced parenthesises (reviewed by Ashutosh) Changes for Build #2470 Changes for Build #2471 [rhbutani] HIVE-5849 Improve the stats of operators based on heuristics in the absence of any column statistics (Prasanth Jayachandran via Harish Butani) [hashutosh] HIVE-5793 : Update hive-default.xml.template for HIVE4002 (Navis via Ashutosh Chauhan) Changes for Build #2472 [navis] HIVE-4518 : Should be removed files (OptrStatsGroupByHook, etc.) [navis] HIVE-5839 : BytesRefArrayWritable compareTo violates contract (Xuefu Zhang via Navis) [navis] HIVE-4518 : Missing file (HiveFatalException) [navis] HIVE-4518 : Counter Strike: Operation Operator (Gunther Hagleitner and Jason Dere via Navis) Changes for Build #2473 [brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad Mujumdar, Navis via Brock Noland) Changes for Build #2474 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #2475 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #2476 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #2477 Changes for Build #2478 Changes for Build #2479 Changes for Build #2480 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #2481 Changes for Build #2482 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #2483 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #2484 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #2485 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #2486 [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey Shelukhin via Eric Hanson) [hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via Ashutosh Chauhan) [xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via Xuefu) [brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles (Szehon Ho via Brock Noland) [brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock Noland reviewed by Navis) [brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh Chauhan) Changes for Build #2487 [hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter mechanism (Ashutosh Chauhan via Navis) [xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock) [thejas] HIVE-5550 : Import fails for tables created with default text, sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas Nair) Changes for Build #2488 [hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis via Ashutosh Chauhan) [navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis) [hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates for join, limit and filter operator (Prasanth J via Harish Butani) [rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns (Jason Dere via Harish Butani) Changes for Build #2489 [xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values (Szehon via Xuefu, reviewed by Navis) [brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by Prasad and
[jira] [Updated] (HIVE-5878) Hive standard avg UDAF returns double as the return type for some exact input types
[ https://issues.apache.org/jira/browse/HIVE-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5878: Hadoop Flags: Incompatible change Hive standard avg UDAF returns double as the return type for some exact input types --- Key: HIVE-5878 URL: https://issues.apache.org/jira/browse/HIVE-5878 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5878.1.patch, HIVE-5878.patch For standard, no-partial avg result, hive currently returns double as the result type. {code} hive desc test; OK d int None Time taken: 0.051 seconds, Fetched: 1 row(s) hive explain select avg(`d`) from test; ... Reduce Operator Tree: Group By Operator aggregations: expr: avg(VALUE._col0) bucketGroup: false mode: mergepartial outputColumnNames: _col0 Select Operator expressions: expr: _col0 type: double {code} However, exact types including integers and decimal should yield exact type. Here is what MySQL does: {code} mysql desc test; +---+--+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +---+--+--+-+-+---+ | i | int(11) | YES | | NULL| | | b | tinyint(1) | YES | | NULL| | | d | double | YES | | NULL| | | s | varchar(5) | YES | | NULL| | | dd| decimal(5,2) | YES | | NULL| | +---+--+--+-+-+---+ mysql create table test62 as select avg(i) from test; mysql desc test62; +---+---+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +---+---+--+-+-+---+ | avg(i) | decimal(14,4) | YES | | NULL| | +---+---+--+-+-+---+ 1 row in set (0.00 sec) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HIVE-5053) Let user override the parallelism of each tez task
[ https://issues.apache.org/jira/browse/HIVE-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-5053. -- Resolution: Won't Fix Not that important after all. Hive uses stats to automatically assign reducers through all stages. Also, because of tez having too many reducers isn't really that critical. And finally one can use the old setting to fix all reducers at the same time. Let user override the parallelism of each tez task --- Key: HIVE-5053 URL: https://issues.apache.org/jira/browse/HIVE-5053 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Fix For: tez-branch We need to come up with a way to let the user choose the parallelism for each vertex in the graph. We're numbering the vertices in the graph so we could use that to let the user specify the parallelism. Another way would be to introduce hints in the sql query itself. But that's a lot more complicated for little added value. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HIVE-6019) Tez: Analyze command fails with dbclass=counter
[ https://issues.apache.org/jira/browse/HIVE-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-6019. -- Resolution: Fixed Committed to branch. Tez: Analyze command fails with dbclass=counter --- Key: HIVE-6019 URL: https://issues.apache.org/jira/browse/HIVE-6019 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-6019.1.patch Tez falls back to mr if no column stats are requested. However, it still uses the CounterStatsAggregatorTez to aggregate. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5065) Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask
[ https://issues.apache.org/jira/browse/HIVE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848712#comment-13848712 ] Gunther Hagleitner commented on HIVE-5065: -- part-1 contains some 'proper' unit tests for Tez classes Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask Key: HIVE-5065 URL: https://issues.apache.org/jira/browse/HIVE-5065 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Priority: Blocker Fix For: tez-branch Attachments: HIVE-5065-part-1.1.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5065) Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask
[ https://issues.apache.org/jira/browse/HIVE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5065: - Attachment: HIVE-5065-part-1.1.patch Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask Key: HIVE-5065 URL: https://issues.apache.org/jira/browse/HIVE-5065 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Priority: Blocker Fix For: tez-branch Attachments: HIVE-5065-part-1.1.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Assigned] (HIVE-5065) Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask
[ https://issues.apache.org/jira/browse/HIVE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner reassigned HIVE-5065: Assignee: Gunther Hagleitner Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask Key: HIVE-5065 URL: https://issues.apache.org/jira/browse/HIVE-5065 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Blocker Fix For: tez-branch Attachments: HIVE-5065-part-1.1.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5936) analyze command failing to collect stats with counter mechanism
[ https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5936: Status: Patch Available (was: Open) analyze command failing to collect stats with counter mechanism --- Key: HIVE-5936 URL: https://issues.apache.org/jira/browse/HIVE-5936 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Navis Attachments: HIVE-5936.1.patch.txt, HIVE-5936.10.patch.txt, HIVE-5936.11.patch.txt, HIVE-5936.2.patch.txt, HIVE-5936.3.patch.txt, HIVE-5936.4.patch.txt, HIVE-5936.5.patch.txt, HIVE-5936.6.patch.txt, HIVE-5936.7.patch.txt, HIVE-5936.8.patch.txt, HIVE-5936.9.patch.txt With counter mechanism, MR job is successful, but StatsTask on client fails with NPE. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5936) analyze command failing to collect stats with counter mechanism
[ https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5936: Attachment: HIVE-5936.11.patch.txt analyze command failing to collect stats with counter mechanism --- Key: HIVE-5936 URL: https://issues.apache.org/jira/browse/HIVE-5936 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Navis Attachments: HIVE-5936.1.patch.txt, HIVE-5936.10.patch.txt, HIVE-5936.11.patch.txt, HIVE-5936.2.patch.txt, HIVE-5936.3.patch.txt, HIVE-5936.4.patch.txt, HIVE-5936.5.patch.txt, HIVE-5936.6.patch.txt, HIVE-5936.7.patch.txt, HIVE-5936.8.patch.txt, HIVE-5936.9.patch.txt With counter mechanism, MR job is successful, but StatsTask on client fails with NPE. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5936) analyze command failing to collect stats with counter mechanism
[ https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5936: Status: Open (was: Patch Available) analyze command failing to collect stats with counter mechanism --- Key: HIVE-5936 URL: https://issues.apache.org/jira/browse/HIVE-5936 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Navis Attachments: HIVE-5936.1.patch.txt, HIVE-5936.10.patch.txt, HIVE-5936.11.patch.txt, HIVE-5936.2.patch.txt, HIVE-5936.3.patch.txt, HIVE-5936.4.patch.txt, HIVE-5936.5.patch.txt, HIVE-5936.6.patch.txt, HIVE-5936.7.patch.txt, HIVE-5936.8.patch.txt, HIVE-5936.9.patch.txt With counter mechanism, MR job is successful, but StatsTask on client fails with NPE. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 15993: analyze command failing to collect stats with counter mechanism
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15993/ --- (Updated Dec. 16, 2013, 1:27 a.m.) Review request for hive and Ashutosh Chauhan. Changes --- Fixed TestMTQueries fails (failing stat aggregation by trying concurrent derby connection) Bugs: HIVE-5936 https://issues.apache.org/jira/browse/HIVE-5936 Repository: hive-git Description --- With counter mechanism, MR job is successful, but StatsTask on client fails with NPE Diffs (updated) - hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java a9c3136 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestMTQueries.java 378de03 itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java c16e82d itests/util/src/main/java/org/apache/hadoop/hive/ql/stats/DummyStatsAggregator.java 19f88ee itests/util/src/main/java/org/apache/hadoop/hive/ql/stats/KeyVerifyingStatsAggregator.java 8fa5c3e metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 3deed45 metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java c43145b ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java cbc3cd2 ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 7c61c72 ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java a2ecc80 ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 46d88ce ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 19f7d79 ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanMapper.java 7e701f4 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java cca8481 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java af729e6 ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java fdc0d1a ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 17e6aad ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ace1df9 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticException.java e3dca57 ql/src/java/org/apache/hadoop/hive/ql/plan/StatsWork.java 0dd0b03 ql/src/java/org/apache/hadoop/hive/ql/stats/CounterStatsAggregator.java fa430eb ql/src/java/org/apache/hadoop/hive/ql/stats/StatsAggregator.java 661d648 ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java 8ae32f0 ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java fb5f50e ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 8c23b87 ql/src/test/queries/clientpositive/stats_counter.q 20769e4 ql/src/test/queries/clientpositive/stats_noscan_2.q a19d01b ql/src/test/results/clientnegative/stats_aggregator_error_1.q.out 9a6e38f ql/src/test/results/clientnegative/stats_aggregator_error_2.q.out 2ba99b1 ql/src/test/results/clientnegative/stats_publisher_error_2.q.out 5284672 ql/src/test/results/clientpositive/stats_aggregator_error_1.q.out 5735c4f ql/src/test/results/clientpositive/stats_counter.q.out f15d8c5 ql/src/test/results/clientpositive/stats_noscan_1.q.out 5aa6607 ql/src/test/results/clientpositive/stats_noscan_2.q.out e55fa94 ql/src/test/results/clientpositive/stats_publisher_error_1.q.out a122b83 ql/src/test/results/clientpositive/truncate_column.q.out a247c4a Diff: https://reviews.apache.org/r/15993/diff/ Testing --- Thanks, Navis Ryu
[jira] [Updated] (HIVE-5276) Skip redundant string encoding/decoding for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5276: Status: Open (was: Patch Available) Skip redundant string encoding/decoding for hiveserver2 --- Key: HIVE-5276 URL: https://issues.apache.org/jira/browse/HIVE-5276 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5276.3.patch.txt, HIVE-5276.4.patch.txt, HIVE-5276.5.patch.txt, HIVE-5276.6.patch.txt, HIVE-5276.7.patch.txt, HIVE-5276.8.patch.txt Current hiveserver2 acquires rows in string format which is used for cli output. Then convert them into row again and convert to final format lastly. This is inefficient and memory consuming. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5276) Skip redundant string encoding/decoding for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5276: Attachment: HIVE-5276.8.patch.txt Skip redundant string encoding/decoding for hiveserver2 --- Key: HIVE-5276 URL: https://issues.apache.org/jira/browse/HIVE-5276 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5276.3.patch.txt, HIVE-5276.4.patch.txt, HIVE-5276.5.patch.txt, HIVE-5276.6.patch.txt, HIVE-5276.7.patch.txt, HIVE-5276.8.patch.txt Current hiveserver2 acquires rows in string format which is used for cli output. Then convert them into row again and convert to final format lastly. This is inefficient and memory consuming. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5276) Skip redundant string encoding/decoding for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5276: Attachment: (was: HIVE-5276.8.patch.txt) Skip redundant string encoding/decoding for hiveserver2 --- Key: HIVE-5276 URL: https://issues.apache.org/jira/browse/HIVE-5276 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5276.3.patch.txt, HIVE-5276.4.patch.txt, HIVE-5276.5.patch.txt, HIVE-5276.6.patch.txt, HIVE-5276.7.patch.txt Current hiveserver2 acquires rows in string format which is used for cli output. Then convert them into row again and convert to final format lastly. This is inefficient and memory consuming. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5276) Skip redundant string encoding/decoding for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848742#comment-13848742 ] Navis commented on HIVE-5276: - Forget that I've rebased this already. HIVE-5276.7.patch.txt is the final patch. I'll commit this shortly after. Skip redundant string encoding/decoding for hiveserver2 --- Key: HIVE-5276 URL: https://issues.apache.org/jira/browse/HIVE-5276 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5276.3.patch.txt, HIVE-5276.4.patch.txt, HIVE-5276.5.patch.txt, HIVE-5276.6.patch.txt, HIVE-5276.7.patch.txt Current hiveserver2 acquires rows in string format which is used for cli output. Then convert them into row again and convert to final format lastly. This is inefficient and memory consuming. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5276) Skip redundant string encoding/decoding for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5276: Status: Patch Available (was: In Progress) Skip redundant string encoding/decoding for hiveserver2 --- Key: HIVE-5276 URL: https://issues.apache.org/jira/browse/HIVE-5276 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5276.3.patch.txt, HIVE-5276.4.patch.txt, HIVE-5276.5.patch.txt, HIVE-5276.6.patch.txt, HIVE-5276.7.patch.txt Current hiveserver2 acquires rows in string format which is used for cli output. Then convert them into row again and convert to final format lastly. This is inefficient and memory consuming. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Work started] (HIVE-5276) Skip redundant string encoding/decoding for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-5276 started by Navis. Skip redundant string encoding/decoding for hiveserver2 --- Key: HIVE-5276 URL: https://issues.apache.org/jira/browse/HIVE-5276 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5276.3.patch.txt, HIVE-5276.4.patch.txt, HIVE-5276.5.patch.txt, HIVE-5276.6.patch.txt, HIVE-5276.7.patch.txt Current hiveserver2 acquires rows in string format which is used for cli output. Then convert them into row again and convert to final format lastly. This is inefficient and memory consuming. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5276) Skip redundant string encoding/decoding for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5276: Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks Carl for a review. Skip redundant string encoding/decoding for hiveserver2 --- Key: HIVE-5276 URL: https://issues.apache.org/jira/browse/HIVE-5276 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5276.3.patch.txt, HIVE-5276.4.patch.txt, HIVE-5276.5.patch.txt, HIVE-5276.6.patch.txt, HIVE-5276.7.patch.txt Current hiveserver2 acquires rows in string format which is used for cli output. Then convert them into row again and convert to final format lastly. This is inefficient and memory consuming. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-4256) JDBC2 HiveConnection does not use the specified database
[ https://issues.apache.org/jira/browse/HIVE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anandha L Ranganathan updated HIVE-4256: Attachment: HIVE-4256.5.patch JDBC2 HiveConnection does not use the specified database Key: HIVE-4256 URL: https://issues.apache.org/jira/browse/HIVE-4256 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.11.0 Reporter: Chris Drome Assignee: Anandha L Ranganathan Attachments: HIVE-4256.1.patch, HIVE-4256.2.patch, HIVE-4256.3.patch, HIVE-4256.4.patch, HIVE-4256.5.patch, HIVE-4256.patch HiveConnection ignores the database specified in the connection string when configuring the connection. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-4256) JDBC2 HiveConnection does not use the specified database
[ https://issues.apache.org/jira/browse/HIVE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848754#comment-13848754 ] Anandha L Ranganathan commented on HIVE-4256: - [~prasad mu],[~thejas] Modified the code and moved the test case to TestJdbcMiniHS2.java. JDBC2 HiveConnection does not use the specified database Key: HIVE-4256 URL: https://issues.apache.org/jira/browse/HIVE-4256 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.11.0 Reporter: Chris Drome Assignee: Anandha L Ranganathan Attachments: HIVE-4256.1.patch, HIVE-4256.2.patch, HIVE-4256.3.patch, HIVE-4256.4.patch, HIVE-4256.5.patch, HIVE-4256.patch HiveConnection ignores the database specified in the connection string when configuring the connection. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3746: Attachment: HIVE-3746.2.patch.txt Fix HS2 ResultSet Serialization Performance Regression -- Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: HiveServer2, Server Infrastructure Reporter: Carl Steinbach Assignee: Navis Labels: HiveServer2 Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16076: Fix HS2 ResultSet Serialization Performance Regression
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16076/ --- (Updated Dec. 16, 2013, 2:04 a.m.) Review request for hive. Changes --- Rebased to trunk. Bugs: HIVE-3746 https://issues.apache.org/jira/browse/HIVE-3746 Repository: hive-git Description --- serialize result set in columnar format Diffs (updated) - itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2.java eb08628 jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java b02f374 jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java 061337d service/if/TCLIService.thrift 62a9730 service/src/gen/thrift/gen-cpp/TCLIService_types.h 853bb4c service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 7ab1310 service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java 1c44789 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryValue.java PRE-CREATION service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolValue.java c7495ee service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteValue.java 23d9693 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TColumn.java 497cc01 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TColumnValue.java 44da2cd service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleValue.java d215736 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TExecuteStatementReq.java ea656ac service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java 1cb5147 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Value.java bb5ae96 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Value.java 059408b service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Value.java 9a941cc service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TOpenSessionReq.java 8ab8297 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TOpenSessionResp.java 688f790 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java 0b6772c service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java db2262d service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java 81c2f16 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringValue.java af7a109 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java ff5e54d service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java 251f86a service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeQualifiers.java 393 service/src/gen/thrift/gen-py/TCLIService/ttypes.py 185ea5b service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb c94acbf service/src/java/org/apache/hive/service/cli/ColumnValue.java cf2b3d9 service/src/java/org/apache/hive/service/cli/Row.java 9e419e9 service/src/java/org/apache/hive/service/cli/RowSet.java dce506d service/src/java/org/apache/hive/service/cli/TableSchema.java 155f529 service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java 70cabe3 service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java 8d09d1c service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java e3b161a service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java f413116 service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java d168d5e service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java c8cce08 service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java a923199 service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java e5bfd92 service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 296f8b3 Diff: https://reviews.apache.org/r/16076/diff/ Testing --- Thanks, Navis Ryu
[jira] [Updated] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3746: Status: Open (was: Patch Available) Fix HS2 ResultSet Serialization Performance Regression -- Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: HiveServer2, Server Infrastructure Reporter: Carl Steinbach Assignee: Navis Labels: HiveServer2 Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3746: Status: Patch Available (was: Open) Rebased to trunk Fix HS2 ResultSet Serialization Performance Regression -- Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: HiveServer2, Server Infrastructure Reporter: Carl Steinbach Assignee: Navis Labels: HiveServer2 Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5879) Fix spelling errors in hive-default.xml
[ https://issues.apache.org/jira/browse/HIVE-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848758#comment-13848758 ] Navis commented on HIVE-5879: - +1 Beside, I've made a patch pulling description and default values from hive-default.xml.template into HiveConf.ConfVars (and found so many of description for configurations are missing). I think generating hive-default.xml.template from HiveConf.ConfVars might be better option (making large texts included in HiveConf). Any opinions? Fix spelling errors in hive-default.xml --- Key: HIVE-5879 URL: https://issues.apache.org/jira/browse/HIVE-5879 Project: Hive Issue Type: Improvement Affects Versions: 0.12.0 Reporter: Brock Noland Assignee: Lefty Leverenz Priority: Trivial Labels: documentation Fix For: 0.13.0 Attachments: HIVE-5879.2.patch.txt, HIVE-5879.patch See https://issues.apache.org/jira/browse/HIVE-5400?focusedCommentId=13830626page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13830626 -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5936) analyze command failing to collect stats with counter mechanism
[ https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848768#comment-13848768 ] Hive QA commented on HIVE-5936: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618835/HIVE-5936.11.patch.txt {color:green}SUCCESS:{color} +1 4785 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/643/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/643/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12618835 analyze command failing to collect stats with counter mechanism --- Key: HIVE-5936 URL: https://issues.apache.org/jira/browse/HIVE-5936 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Navis Attachments: HIVE-5936.1.patch.txt, HIVE-5936.10.patch.txt, HIVE-5936.11.patch.txt, HIVE-5936.2.patch.txt, HIVE-5936.3.patch.txt, HIVE-5936.4.patch.txt, HIVE-5936.5.patch.txt, HIVE-5936.6.patch.txt, HIVE-5936.7.patch.txt, HIVE-5936.8.patch.txt, HIVE-5936.9.patch.txt With counter mechanism, MR job is successful, but StatsTask on client fails with NPE. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-4256) JDBC2 HiveConnection does not use the specified database
[ https://issues.apache.org/jira/browse/HIVE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848770#comment-13848770 ] Hive QA commented on HIVE-4256: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618842/HIVE-4256.5.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/644/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/644/console Messages: {noformat} This message was trimmed, see log for full details [INFO] Excluding org.apache.httpcomponents:httpclient:jar:4.2.5 from the shaded jar. [INFO] Excluding org.apache.httpcomponents:httpcore:jar:4.2.4 from the shaded jar. [INFO] Excluding org.apache.zookeeper:zookeeper:jar:3.4.5 from the shaded jar. [INFO] Excluding jline:jline:jar:0.9.94 from the shaded jar. [INFO] Excluding org.codehaus.groovy:groovy-all:jar:2.1.6 from the shaded jar. [INFO] Including org.codehaus.jackson:jackson-core-asl:jar:1.9.2 in the shaded jar. [INFO] Including org.codehaus.jackson:jackson-mapper-asl:jar:1.9.2 in the shaded jar. [INFO] Excluding org.datanucleus:datanucleus-core:jar:3.2.2 from the shaded jar. [INFO] Including com.google.guava:guava:jar:11.0.2 in the shaded jar. [INFO] Excluding com.google.code.findbugs:jsr305:jar:1.3.9 from the shaded jar. [INFO] Including com.google.protobuf:protobuf-java:jar:2.5.0 in the shaded jar. [INFO] Including com.googlecode.javaewah:JavaEWAH:jar:0.3.2 in the shaded jar. [INFO] Including org.iq80.snappy:snappy:jar:0.2 in the shaded jar. [INFO] Including org.json:json:jar:20090211 in the shaded jar. [INFO] Excluding stax:stax-api:jar:1.0.1 from the shaded jar. [INFO] Excluding org.apache.hadoop:hadoop-core:jar:1.2.1 from the shaded jar. [INFO] Excluding xmlenc:xmlenc:jar:0.52 from the shaded jar. [INFO] Excluding com.sun.jersey:jersey-core:jar:1.14 from the shaded jar. [INFO] Excluding com.sun.jersey:jersey-json:jar:1.14 from the shaded jar. [INFO] Excluding org.codehaus.jettison:jettison:jar:1.1 from the shaded jar. [INFO] Excluding com.sun.xml.bind:jaxb-impl:jar:2.2.3-1 from the shaded jar. [INFO] Excluding javax.xml.bind:jaxb-api:jar:2.2.2 from the shaded jar. [INFO] Excluding javax.xml.stream:stax-api:jar:1.0-2 from the shaded jar. [INFO] Excluding javax.activation:activation:jar:1.1 from the shaded jar. [INFO] Excluding org.codehaus.jackson:jackson-jaxrs:jar:1.9.2 from the shaded jar. [INFO] Excluding org.codehaus.jackson:jackson-xc:jar:1.9.2 from the shaded jar. [INFO] Excluding com.sun.jersey:jersey-server:jar:1.14 from the shaded jar. [INFO] Excluding asm:asm:jar:3.1 from the shaded jar. [INFO] Excluding org.apache.commons:commons-math:jar:2.1 from the shaded jar. [INFO] Excluding commons-configuration:commons-configuration:jar:1.6 from the shaded jar. [INFO] Excluding commons-digester:commons-digester:jar:1.8 from the shaded jar. [INFO] Excluding commons-beanutils:commons-beanutils:jar:1.7.0 from the shaded jar. [INFO] Excluding commons-beanutils:commons-beanutils-core:jar:1.8.0 from the shaded jar. [INFO] Excluding commons-net:commons-net:jar:1.4.1 from the shaded jar. [INFO] Excluding org.mortbay.jetty:jetty:jar:6.1.26 from the shaded jar. [INFO] Excluding org.mortbay.jetty:jetty-util:jar:6.1.26 from the shaded jar. [INFO] Excluding tomcat:jasper-runtime:jar:5.5.12 from the shaded jar. [INFO] Excluding tomcat:jasper-compiler:jar:5.5.12 from the shaded jar. [INFO] Excluding org.mortbay.jetty:jsp-api-2.1:jar:6.1.14 from the shaded jar. [INFO] Excluding org.mortbay.jetty:servlet-api-2.5:jar:6.1.14 from the shaded jar. [INFO] Excluding org.mortbay.jetty:jsp-2.1:jar:6.1.14 from the shaded jar. [INFO] Excluding ant:ant:jar:1.6.5 from the shaded jar. [INFO] Excluding commons-el:commons-el:jar:1.0 from the shaded jar. [INFO] Excluding net.java.dev.jets3t:jets3t:jar:0.6.1 from the shaded jar. [INFO] Excluding hsqldb:hsqldb:jar:1.8.0.10 from the shaded jar. [INFO] Excluding oro:oro:jar:2.0.8 from the shaded jar. [INFO] Excluding org.eclipse.jdt:core:jar:3.1.1 from the shaded jar. [INFO] Excluding org.slf4j:slf4j-api:jar:1.7.5 from the shaded jar. [INFO] Excluding org.slf4j:slf4j-log4j12:jar:1.7.5 from the shaded jar. [INFO] Replacing original artifact with shaded artifact. [INFO] Replacing /data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT.jar with /data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT-shaded.jar [INFO] Dependency-reduced POM written at: /data/hive-ptest/working/apache-svn-trunk-source/ql/dependency-reduced-pom.xml [INFO] Dependency-reduced POM written at: /data/hive-ptest/working/apache-svn-trunk-source/ql/dependency-reduced-pom.xml [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-exec --- [INFO] Installing
[jira] [Commented] (HIVE-4256) JDBC2 HiveConnection does not use the specified database
[ https://issues.apache.org/jira/browse/HIVE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848789#comment-13848789 ] Anandha L Ranganathan commented on HIVE-4256: - updated wrong patch JDBC2 HiveConnection does not use the specified database Key: HIVE-4256 URL: https://issues.apache.org/jira/browse/HIVE-4256 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.11.0 Reporter: Chris Drome Assignee: Anandha L Ranganathan Attachments: HIVE-4256.1.patch, HIVE-4256.2.patch, HIVE-4256.3.patch, HIVE-4256.4.patch, HIVE-4256.5.patch, HIVE-4256.6.patch, HIVE-4256.patch HiveConnection ignores the database specified in the connection string when configuring the connection. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-4256) JDBC2 HiveConnection does not use the specified database
[ https://issues.apache.org/jira/browse/HIVE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anandha L Ranganathan updated HIVE-4256: Attachment: HIVE-4256.6.patch JDBC2 HiveConnection does not use the specified database Key: HIVE-4256 URL: https://issues.apache.org/jira/browse/HIVE-4256 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.11.0 Reporter: Chris Drome Assignee: Anandha L Ranganathan Attachments: HIVE-4256.1.patch, HIVE-4256.2.patch, HIVE-4256.3.patch, HIVE-4256.4.patch, HIVE-4256.5.patch, HIVE-4256.6.patch, HIVE-4256.patch HiveConnection ignores the database specified in the connection string when configuring the connection. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-4216) TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely
[ https://issues.apache.org/jira/browse/HIVE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848792#comment-13848792 ] Andrey Klochkov commented on HIVE-4216: --- I believe I'm hitting the same issue in HDP 2.0 which is based on a heavily patched Hive 0.12. The stack trace is the same, and I have profiled it down to {{Hadoop23Shims.newTaskAttemptContext()}} creating a {{TaskAttemptID}} instance with empty type field. I can't verify if this fixes the issue as HDP sources are not available, and in the apache sources it's not possible to build a Hive 0.12 package which works with HBase 0.96 (due to changes in HBase modules structure). I suppose the fix should be replacing {{new TaskAttemptId()}} with something like {{TaskAttemptID.forName(conf.get(MRJobConfig.TASK_ATTEMPT_ID))}} in the code of {{Hadoop23Shims.newTaskAttemptContext()}}. TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely Key: HIVE-4216 URL: https://issues.apache.org/jira/browse/HIVE-4216 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 0.9.0 Reporter: Viraj Bhat After upgrading to Hadoop 23 and HBase 0.94.5 compiled for Hadoop 23. The TestHBaseMinimrCliDriver, fails after performing the following steps Update hbase_bulk.m with the following properties set mapreduce.totalorderpartitioner.naturalorder=false; set mapreduce.totalorderpartitioner.path=/tmp/hbpartition.lst; Otherwise I keep seeing: _partition.lst not found exception in the mappers, even though set total.order.partitioner.path=/tmp/hbpartition.lst is set. When the test runs, the 3 reducer phase of the second query fails with the following error, but the MiniMRCluster keeps spinning up new reducer and the test is stuck infinitely. {code} insert overwrite table hbsort select distinct value, case when key=103 then cast(null as string) else key end, case when key=103 then '' else cast(key+1 as string) end from src cluster by value; {code} The stack trace I see in the syslog for the Node Manager is the following: == 13-03-20 16:26:48,942 FATAL [IPC Server handler 17 on 55996] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1363821864968_0003_r_02_0 - exited : java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:448) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256) ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:477) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:525) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.mapreduce.TaskID$CharTaskTypeMaps.getRepresentingCharacter(TaskID.java:265) at org.apache.hadoop.mapreduce.TaskID.appendTo(TaskID.java:153) at org.apache.hadoop.mapreduce.TaskAttemptID.appendTo(TaskAttemptID.java:119) at
[jira] [Commented] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848797#comment-13848797 ] Hive QA commented on HIVE-3746: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618843/HIVE-3746.2.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4785 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.TestJdbcDriver2.testDataTypes {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/645/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/645/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12618843 Fix HS2 ResultSet Serialization Performance Regression -- Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: HiveServer2, Server Infrastructure Reporter: Carl Steinbach Assignee: Navis Labels: HiveServer2 Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Hive-trunk-h0.21 - Build # 2502 - Still Failing
Changes for Build #2471 [rhbutani] HIVE-5849 Improve the stats of operators based on heuristics in the absence of any column statistics (Prasanth Jayachandran via Harish Butani) [hashutosh] HIVE-5793 : Update hive-default.xml.template for HIVE4002 (Navis via Ashutosh Chauhan) Changes for Build #2472 [navis] HIVE-4518 : Should be removed files (OptrStatsGroupByHook, etc.) [navis] HIVE-5839 : BytesRefArrayWritable compareTo violates contract (Xuefu Zhang via Navis) [navis] HIVE-4518 : Missing file (HiveFatalException) [navis] HIVE-4518 : Counter Strike: Operation Operator (Gunther Hagleitner and Jason Dere via Navis) Changes for Build #2473 [brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad Mujumdar, Navis via Brock Noland) Changes for Build #2474 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #2475 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #2476 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #2477 Changes for Build #2478 Changes for Build #2479 Changes for Build #2480 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #2481 Changes for Build #2482 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #2483 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #2484 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #2485 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #2486 [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey Shelukhin via Eric Hanson) [hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via Ashutosh Chauhan) [xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via Xuefu) [brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles (Szehon Ho via Brock Noland) [brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock Noland reviewed by Navis) [brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh Chauhan) Changes for Build #2487 [hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter mechanism (Ashutosh Chauhan via Navis) [xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock) [thejas] HIVE-5550 : Import fails for tables created with default text, sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas Nair) Changes for Build #2488 [hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis via Ashutosh Chauhan) [navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis) [hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates for join, limit and filter operator (Prasanth J via Harish Butani) [rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns (Jason Dere via Harish Butani) Changes for Build #2489 [xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values (Szehon via Xuefu, reviewed by Navis) [brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by Prasad and Thejas) [hashutosh] HIVE-5909 : locate and instr throw java.nio.BufferUnderflowException when empty string as substring (Navis via Ashutosh Chauhan) [hashutosh] HIVE-5686 : partition column type validation doesn't quite work for dates (Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5887 : metastore
[jira] [Updated] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3746: Attachment: HIVE-3746.3.patch.txt Fix HS2 ResultSet Serialization Performance Regression -- Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: HiveServer2, Server Infrastructure Reporter: Carl Steinbach Assignee: Navis Labels: HiveServer2 Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt, HIVE-3746.3.patch.txt -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3746: Release Note: Merged HIVE-5269 to exploit binary type in thrift Status: Patch Available (was: Open) Fix HS2 ResultSet Serialization Performance Regression -- Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: HiveServer2, Server Infrastructure Reporter: Carl Steinbach Assignee: Navis Labels: HiveServer2 Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt, HIVE-3746.3.patch.txt -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3746: Status: Open (was: Patch Available) Fix HS2 ResultSet Serialization Performance Regression -- Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: HiveServer2, Server Infrastructure Reporter: Carl Steinbach Assignee: Navis Labels: HiveServer2 Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt, HIVE-3746.3.patch.txt -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6036) A test case for embedded beeline - with URL jdbc:hive2:///default
[ https://issues.apache.org/jira/browse/HIVE-6036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848818#comment-13848818 ] Anandha L Ranganathan commented on HIVE-6036: - [~prasadm]] Attached test case for embedded beeline. A test case for embedded beeline - with URL jdbc:hive2:///default --- Key: HIVE-6036 URL: https://issues.apache.org/jira/browse/HIVE-6036 Project: Hive Issue Type: Bug Reporter: Anandha L Ranganathan Assignee: Anandha L Ranganathan Attachments: HIVE-6036.patch A test case for embedded beeline would have been helpful. ie, with URL jdbc:hive2:///default This causes beeline (JDBC driver) to invoken embedded hive. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6036) A test case for embedded beeline - with URL jdbc:hive2:///default
[ https://issues.apache.org/jira/browse/HIVE-6036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anandha L Ranganathan updated HIVE-6036: Status: Patch Available (was: Open) A test case for embedded beeline - with URL jdbc:hive2:///default --- Key: HIVE-6036 URL: https://issues.apache.org/jira/browse/HIVE-6036 Project: Hive Issue Type: Bug Reporter: Anandha L Ranganathan Assignee: Anandha L Ranganathan Attachments: HIVE-6036.patch A test case for embedded beeline would have been helpful. ie, with URL jdbc:hive2:///default This causes beeline (JDBC driver) to invoken embedded hive. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6036) A test case for embedded beeline - with URL jdbc:hive2:///default
[ https://issues.apache.org/jira/browse/HIVE-6036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anandha L Ranganathan updated HIVE-6036: Attachment: HIVE-6036.patch A test case for embedded beeline - with URL jdbc:hive2:///default --- Key: HIVE-6036 URL: https://issues.apache.org/jira/browse/HIVE-6036 Project: Hive Issue Type: Bug Reporter: Anandha L Ranganathan Assignee: Anandha L Ranganathan Attachments: HIVE-6036.patch A test case for embedded beeline would have been helpful. ie, with URL jdbc:hive2:///default This causes beeline (JDBC driver) to invoken embedded hive. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Hive-trunk-hadoop2 - Build # 601 - Still Failing
Changes for Build #570 [rhbutani] HIVE-5849 Improve the stats of operators based on heuristics in the absence of any column statistics (Prasanth Jayachandran via Harish Butani) [hashutosh] HIVE-5793 : Update hive-default.xml.template for HIVE4002 (Navis via Ashutosh Chauhan) Changes for Build #571 [navis] HIVE-4518 : Should be removed files (OptrStatsGroupByHook, etc.) [navis] HIVE-5839 : BytesRefArrayWritable compareTo violates contract (Xuefu Zhang via Navis) [navis] HIVE-4518 : Missing file (HiveFatalException) [navis] HIVE-4518 : Counter Strike: Operation Operator (Gunther Hagleitner and Jason Dere via Navis) Changes for Build #572 [brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad Mujumdar, Navis via Brock Noland) Changes for Build #573 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #574 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #575 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #576 Changes for Build #577 Changes for Build #578 Changes for Build #579 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #580 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #581 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #582 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #583 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #584 [thejas] HIVE-5550 : Import fails for tables created with default text, sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas Nair) [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey Shelukhin via Eric Hanson) [hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via Ashutosh Chauhan) [xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via Xuefu) [brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles (Szehon Ho via Brock Noland) [brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock Noland reviewed by Navis) [brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh Chauhan) Changes for Build #585 [hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter mechanism (Ashutosh Chauhan via Navis) [xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock) Changes for Build #586 [hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis via Ashutosh Chauhan) [navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis) [hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates for join, limit and filter operator (Prasanth J via Harish Butani) [rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns (Jason Dere via Harish Butani) Changes for Build #587 [xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values (Szehon via Xuefu, reviewed by Navis) [brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by Prasad and Thejas) [hashutosh] HIVE-5909 : locate and instr throw java.nio.BufferUnderflowException when empty string as substring (Navis via Ashutosh Chauhan) [hashutosh] HIVE-5686 : partition column type validation doesn't quite work for dates (Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey
[jira] [Commented] (HIVE-4256) JDBC2 HiveConnection does not use the specified database
[ https://issues.apache.org/jira/browse/HIVE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848819#comment-13848819 ] Hive QA commented on HIVE-4256: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618849/HIVE-4256.6.patch {color:green}SUCCESS:{color} +1 4786 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/646/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/646/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12618849 JDBC2 HiveConnection does not use the specified database Key: HIVE-4256 URL: https://issues.apache.org/jira/browse/HIVE-4256 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.11.0 Reporter: Chris Drome Assignee: Anandha L Ranganathan Attachments: HIVE-4256.1.patch, HIVE-4256.2.patch, HIVE-4256.3.patch, HIVE-4256.4.patch, HIVE-4256.5.patch, HIVE-4256.6.patch, HIVE-4256.patch HiveConnection ignores the database specified in the connection string when configuring the connection. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HIVE-4977) HS2: support an alternate resultset serialization format between client and server
[ https://issues.apache.org/jira/browse/HIVE-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach resolved HIVE-4977. -- Resolution: Duplicate Resolving this as a duplicate of HIVE-3746. HS2: support an alternate resultset serialization format between client and server -- Key: HIVE-4977 URL: https://issues.apache.org/jira/browse/HIVE-4977 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.10.0, 0.11.0, 0.12.0 Reporter: Chris Drome Assignee: Chris Drome Current serialization protocol between client and server as defined in cli_service.thrift results in 2x (or more) throughput degradation compared to HS1. Initial proposal is to introduce HS1 serialization protocol as a negotiable alternative. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-2818) Create table should check privilege of target database, not default database
[ https://issues.apache.org/jira/browse/HIVE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-2818: Summary: Create table should check privilege of target database, not default database (was: Create table checks the current database privilege) Create table should check privilege of target database, not default database Key: HIVE-2818 URL: https://issues.apache.org/jira/browse/HIVE-2818 Project: Hive Issue Type: Bug Components: Authorization, Security Affects Versions: 0.7.1 Reporter: Benyi Wang Assignee: Navis Attachments: HIVE-2818.1.patch.txt Hive seems check the current database to determine the privilege of a statement when you use fully qualified name like 'database.table' {code} hive set hive.security.authorization.enabled=true; hive create database test_db; hive grant all on database test_db to user test_user; hive revoke all on database default from test_user; hive use default; hive create table test_db.new_table (id int); Authorization failed:No privilege 'Create' found for outputs { database:default}. Use show grant to get more details. hive use test_db; hive create table test_db.new_table (id int); {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HIVE-5972) Hiveserver2 is much slower than hiveserver1
[ https://issues.apache.org/jira/browse/HIVE-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach resolved HIVE-5972. -- Resolution: Duplicate Resolving this as a duplicate of HIVE-3746. Hiveserver2 is much slower than hiveserver1 --- Key: HIVE-5972 URL: https://issues.apache.org/jira/browse/HIVE-5972 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.10.0 Reporter: a bc Priority: Critical we are building ms sql cube by linkedserver connectiong hiveserver with Cloudera's ODBC driver. There are two test results: 1. hiveserver1 running on 2CPUs, 8G mem, took about 8 hours 2. hiveserver2 running on 4CPUs, 16 mem, took about 13 hours and 27min (never successful on machine with 2CPUs, 8G mem) Although on both cases, almost all CPUs are busy when building cube. But I cannot understand why hiveserver2 is much slower than hiveserver1, because from doc, hs2 support concurrency, it should be faster than hs1, isn't it? Thanks. CDH4.3 on CentOS6. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-2818) Create table should check privilege of target database, not default database
[ https://issues.apache.org/jira/browse/HIVE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-2818: Attachment: HIVE-2818.2.patch.txt Create table should check privilege of target database, not default database Key: HIVE-2818 URL: https://issues.apache.org/jira/browse/HIVE-2818 Project: Hive Issue Type: Bug Components: Authorization, Security Affects Versions: 0.7.1 Reporter: Benyi Wang Assignee: Navis Attachments: HIVE-2818.1.patch.txt, HIVE-2818.2.patch.txt Hive seems check the current database to determine the privilege of a statement when you use fully qualified name like 'database.table' {code} hive set hive.security.authorization.enabled=true; hive create database test_db; hive grant all on database test_db to user test_user; hive revoke all on database default from test_user; hive use default; hive create table test_db.new_table (id int); Authorization failed:No privilege 'Create' found for outputs { database:default}. Use show grant to get more details. hive use test_db; hive create table test_db.new_table (id int); {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-2818) Create table should check privilege of target database, not default database
[ https://issues.apache.org/jira/browse/HIVE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-2818: Status: Patch Available (was: Open) Initial rebased patch. Running tests. Create table should check privilege of target database, not default database Key: HIVE-2818 URL: https://issues.apache.org/jira/browse/HIVE-2818 Project: Hive Issue Type: Bug Components: Authorization, Security Affects Versions: 0.7.1 Reporter: Benyi Wang Assignee: Navis Attachments: HIVE-2818.1.patch.txt, HIVE-2818.2.patch.txt Hive seems check the current database to determine the privilege of a statement when you use fully qualified name like 'database.table' {code} hive set hive.security.authorization.enabled=true; hive create database test_db; hive grant all on database test_db to user test_user; hive revoke all on database default from test_user; hive use default; hive create table test_db.new_table (id int); Authorization failed:No privilege 'Create' found for outputs { database:default}. Use show grant to get more details. hive use test_db; hive create table test_db.new_table (id int); {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16076: Fix HS2 ResultSet Serialization Performance Regression
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16076/#review30434 --- service/if/TCLIService.thrift https://reviews.apache.org/r/16076/#comment58226 Need to add HIVE_CLI_SERVICE_PROTOCOL_V5 and update existing references to HIVE_CLI_SERVICE_PROTOCOL_V5. service/if/TCLIService.thrift https://reviews.apache.org/r/16076/#comment58229 Please add TBoolColumn, TByteColumn, etc instead of redefining the existing T*Value structs. service/if/TCLIService.thrift https://reviews.apache.org/r/16076/#comment58228 These changes break compatibility with older HiveServer2 clients. Instead I think we want to make it possible for a client to pick between the existing serialization format and the new column-oriented serialization format by setting a new optional field in TFetchResultsReq. service/if/TCLIService.thrift https://reviews.apache.org/r/16076/#comment58230 We aren't using TColumn right now so it should be ok to redefine the contents of this struct. Also, it may be worth trying to save a bit of space by moving the binary nulls field outside of the union of the individual T*Column structs, e.g: union TColumn { 1: TBoolColumn boolColumn ... 7: TStringColumn stringColumn } struct TNullableColumn { 1: TColumn column 2: binary nulls } service/if/TCLIService.thrift https://reviews.apache.org/r/16076/#comment58233 Please add a new struct TColumnSet instead of modifying TRowSet. service/if/TCLIService.thrift https://reviews.apache.org/r/16076/#comment58232 please add: 4: optional TResultSetType tResultSetType where TResultSetType is an enum that defaults to ROW_ORIENTED_RESULTSET service/if/TCLIService.thrift https://reviews.apache.org/r/16076/#comment58231 please add: 4: optional TColumnSet columnResults - Carl Steinbach On Dec. 16, 2013, 2:04 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16076/ --- (Updated Dec. 16, 2013, 2:04 a.m.) Review request for hive. Bugs: HIVE-3746 https://issues.apache.org/jira/browse/HIVE-3746 Repository: hive-git Description --- serialize result set in columnar format Diffs - itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2.java eb08628 jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java b02f374 jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java 061337d service/if/TCLIService.thrift 62a9730 service/src/gen/thrift/gen-cpp/TCLIService_types.h 853bb4c service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 7ab1310 service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java 1c44789 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryValue.java PRE-CREATION service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolValue.java c7495ee service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteValue.java 23d9693 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TColumn.java 497cc01 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TColumnValue.java 44da2cd service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleValue.java d215736 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TExecuteStatementReq.java ea656ac service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java 1cb5147 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Value.java bb5ae96 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Value.java 059408b service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Value.java 9a941cc service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TOpenSessionReq.java 8ab8297 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TOpenSessionResp.java 688f790 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java 0b6772c service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java db2262d service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java 81c2f16 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringValue.java af7a109 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java ff5e54d
[jira] [Updated] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-3746: - Status: Open (was: Patch Available) [~navis] Thanks for working on this! I added some comments on RB. My main concern with the current patch is that it breaks backward compatibility with older clients. Fix HS2 ResultSet Serialization Performance Regression -- Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: HiveServer2, Server Infrastructure Reporter: Carl Steinbach Assignee: Navis Labels: HiveServer2 Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt, HIVE-3746.3.patch.txt -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848849#comment-13848849 ] George Chow commented on HIVE-3746: --- My team noticed the breaking change too but the protocol version indicator is available to at least detect this. Fix HS2 ResultSet Serialization Performance Regression -- Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: HiveServer2, Server Infrastructure Reporter: Carl Steinbach Assignee: Navis Labels: HiveServer2 Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt, HIVE-3746.3.patch.txt -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848858#comment-13848858 ] Hive QA commented on HIVE-3746: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618851/HIVE-3746.3.patch.txt {color:green}SUCCESS:{color} +1 4785 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/647/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/647/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12618851 Fix HS2 ResultSet Serialization Performance Regression -- Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: HiveServer2, Server Infrastructure Reporter: Carl Steinbach Assignee: Navis Labels: HiveServer2 Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt, HIVE-3746.3.patch.txt -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf
Navis created HIVE-6037: --- Summary: Synchronize HiveConf with hive-default.xml.template and support show conf Key: HIVE-6037 URL: https://issues.apache.org/jira/browse/HIVE-6037 Project: Hive Issue Type: Improvement Components: Configuration Reporter: Navis Assignee: Navis Priority: Minor see HIVE-5879 -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf
[ https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6037: Status: Patch Available (was: Open) Preliminary test Synchronize HiveConf with hive-default.xml.template and support show conf - Key: HIVE-6037 URL: https://issues.apache.org/jira/browse/HIVE-6037 Project: Hive Issue Type: Improvement Components: Configuration Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6037.1.patch.txt see HIVE-5879 -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf
[ https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6037: Attachment: HIVE-6037.1.patch.txt Synchronize HiveConf with hive-default.xml.template and support show conf - Key: HIVE-6037 URL: https://issues.apache.org/jira/browse/HIVE-6037 Project: Hive Issue Type: Improvement Components: Configuration Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6037.1.patch.txt see HIVE-5879 -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5975) [WebHCat] templeton mapreduce job failed if provide define parameters
[ https://issues.apache.org/jira/browse/HIVE-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5975: Status: Patch Available (was: Open) Marking it patch available. [WebHCat] templeton mapreduce job failed if provide define parameters --- Key: HIVE-5975 URL: https://issues.apache.org/jira/browse/HIVE-5975 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0, 0.13.0 Reporter: shanyu zhao Assignee: shanyu zhao Attachments: hive-5975.2.patch, hive-5975.patch Trying to submit a mapreduce job through templeton failed: curl -k -u user:pass -d user.name=user -d define=JobName=MRPiJob -d class=pi -d arg=16 -d arg=100 -d jar=hadoop-mapreduce-examples.jar https://xxx/templeton/v1/mapreduce/jar The error message is: Usage: org.apache.hadoop.examples.QuasiMonteCarlo nMaps nSamples Generic options supported are -conf configuration file specify an application configuration file -D property=value use value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:port specify a job tracker -files comma separated list of files specify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jars specify comma separated jar files to include in the classpath. -archives comma separated list of archives specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] templeton: job failed with exit code 2 Note that if we remove the define parameter it works fine. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6036) A test case for embedded beeline - with URL jdbc:hive2:///default
[ https://issues.apache.org/jira/browse/HIVE-6036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848886#comment-13848886 ] Hive QA commented on HIVE-6036: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618852/HIVE-6036.patch {color:green}SUCCESS:{color} +1 4786 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/648/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/648/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12618852 A test case for embedded beeline - with URL jdbc:hive2:///default --- Key: HIVE-6036 URL: https://issues.apache.org/jira/browse/HIVE-6036 Project: Hive Issue Type: Bug Reporter: Anandha L Ranganathan Assignee: Anandha L Ranganathan Attachments: HIVE-6036.patch A test case for embedded beeline would have been helpful. ie, with URL jdbc:hive2:///default This causes beeline (JDBC driver) to invoken embedded hive. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-3286) Explicit skew join on user provided condition
[ https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3286: Status: Patch Available (was: Open) Rebased fixed test fails Explicit skew join on user provided condition - Key: HIVE-3286 URL: https://issues.apache.org/jira/browse/HIVE-3286 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D4287.11.patch, HIVE-3286.12.patch.txt, HIVE-3286.13.patch.txt, HIVE-3286.14.patch.txt, HIVE-3286.15.patch.txt, HIVE-3286.16.patch.txt, HIVE-3286.D4287.10.patch, HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, HIVE-3286.D4287.7.patch, HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch Join operation on table with skewed data takes most of execution time handling the skewed keys. But mostly we already know about that and even know what is look like the skewed keys. If we can explicitly assign reducer slots for the skewed keys, total execution time could be greatly shortened. As for a start, I've extended join grammar something like this. {code} select * from src a join src b on a.key=b.key skew on (a.key+1 50, a.key+1 100, a.key 150); {code} which means if above query is executed by 20 reducers, one reducer for a.key+1 50, one reducer for 50 = a.key+1 100, one reducer for 99 = a.key 150, and 17 reducers for others (could be extended to assign more than one reducer later) This can be only used with common-inner-equi joins. And skew condition should be composed of join keys only. Work till done now will be updated shortly after code cleanup. Skew expressions* in SKEW ON (expr, expr, ...) are evaluated sequentially at runtime, and first 'true' one decides skew group for the row. Each skew group has reserved partition slot(s), to which all rows in a group would be assigned. The number of partition slot reserved for each group is decided also at runtime by simple calculation of percentage. If a skew group is CLUSTER BY 20 PERCENT and total partition slot (=number of reducer) is 20, that group will reserve 4 partition slots, etc. DISTRIBUTE BY decides how the rows in a group is dispersed in the range of reserved slots (If there is only one slot for a group, this is meaningless). Currently, three distribution policies are available: RANDOM, KEYS, expression. 1. RANDOM : rows of driver** alias are dispersed by random and rows of non-driver alias are duplicated for all the slots (default if not specified) 2. KEYS : determined by hash value of keys (same with previous) 3. expression : determined by hash of object evaluated by user-provided expression Only possible with inner, equi, common-joins. Not yet supports join tree merging. Might be used by other RS users like SORT BY or GROUP BY If there exists column statistics for the key, it could be possible to apply automatically. For example, if 20 reducers are used for the query below, {code} select count(*) from src a join src b on a.key=b.key skew on ( a.key = '0' CLUSTER BY 10 PERCENT, b.key '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key), cast(a.key as int) 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS); {code} group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will reserve slots 0~5. For a row with key='0' from alias a, the row is randomly assigned in the range of 6~7 (driver alias) : 6 or 7 For a row with key='0' from alias b, the row is disributed for all slots in 6~7 (non-driver alias) : 6 and 7 For a row with key='50', the row is assigned in the range of 8~11 by hashcode of upper(b.key) : 8 + (hash(upper(key)) % 4) For a row with key='500', the row is assigned in the range of 12~19 by hashcode of join key : 12 + (hash(key) % 8) For a row with key='200', this is not belong to any skew group : hash(key) % 6 *expressions in skew condition : 1. all expressions should be made of expression in join condition, which means if join condition is a.key=b.key, user can make any expression with a.key or b.key. But if join condition is a.key+1=b.key, user cannot make expression with a.key solely (should make expression with a.key+1). 2. all expressions should reference one and only-one side of aliases. For example, simple constant expressions or expressions referencing both side of join condition (a.key+b.key100) is not allowed. 3. all functions in expression should be deteministic and stateless. 4. if DISTRIBUTED BY expression is used, distibution expression also should have same alias with skew expression. **driver alias : 1. driver alias means the sole referenced alias from skew expression, which is important for RANDOM distribution. rows of
[jira] [Updated] (HIVE-3286) Explicit skew join on user provided condition
[ https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3286: Attachment: HIVE-3286.16.patch.txt Explicit skew join on user provided condition - Key: HIVE-3286 URL: https://issues.apache.org/jira/browse/HIVE-3286 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D4287.11.patch, HIVE-3286.12.patch.txt, HIVE-3286.13.patch.txt, HIVE-3286.14.patch.txt, HIVE-3286.15.patch.txt, HIVE-3286.16.patch.txt, HIVE-3286.D4287.10.patch, HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, HIVE-3286.D4287.7.patch, HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch Join operation on table with skewed data takes most of execution time handling the skewed keys. But mostly we already know about that and even know what is look like the skewed keys. If we can explicitly assign reducer slots for the skewed keys, total execution time could be greatly shortened. As for a start, I've extended join grammar something like this. {code} select * from src a join src b on a.key=b.key skew on (a.key+1 50, a.key+1 100, a.key 150); {code} which means if above query is executed by 20 reducers, one reducer for a.key+1 50, one reducer for 50 = a.key+1 100, one reducer for 99 = a.key 150, and 17 reducers for others (could be extended to assign more than one reducer later) This can be only used with common-inner-equi joins. And skew condition should be composed of join keys only. Work till done now will be updated shortly after code cleanup. Skew expressions* in SKEW ON (expr, expr, ...) are evaluated sequentially at runtime, and first 'true' one decides skew group for the row. Each skew group has reserved partition slot(s), to which all rows in a group would be assigned. The number of partition slot reserved for each group is decided also at runtime by simple calculation of percentage. If a skew group is CLUSTER BY 20 PERCENT and total partition slot (=number of reducer) is 20, that group will reserve 4 partition slots, etc. DISTRIBUTE BY decides how the rows in a group is dispersed in the range of reserved slots (If there is only one slot for a group, this is meaningless). Currently, three distribution policies are available: RANDOM, KEYS, expression. 1. RANDOM : rows of driver** alias are dispersed by random and rows of non-driver alias are duplicated for all the slots (default if not specified) 2. KEYS : determined by hash value of keys (same with previous) 3. expression : determined by hash of object evaluated by user-provided expression Only possible with inner, equi, common-joins. Not yet supports join tree merging. Might be used by other RS users like SORT BY or GROUP BY If there exists column statistics for the key, it could be possible to apply automatically. For example, if 20 reducers are used for the query below, {code} select count(*) from src a join src b on a.key=b.key skew on ( a.key = '0' CLUSTER BY 10 PERCENT, b.key '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key), cast(a.key as int) 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS); {code} group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will reserve slots 0~5. For a row with key='0' from alias a, the row is randomly assigned in the range of 6~7 (driver alias) : 6 or 7 For a row with key='0' from alias b, the row is disributed for all slots in 6~7 (non-driver alias) : 6 and 7 For a row with key='50', the row is assigned in the range of 8~11 by hashcode of upper(b.key) : 8 + (hash(upper(key)) % 4) For a row with key='500', the row is assigned in the range of 12~19 by hashcode of join key : 12 + (hash(key) % 8) For a row with key='200', this is not belong to any skew group : hash(key) % 6 *expressions in skew condition : 1. all expressions should be made of expression in join condition, which means if join condition is a.key=b.key, user can make any expression with a.key or b.key. But if join condition is a.key+1=b.key, user cannot make expression with a.key solely (should make expression with a.key+1). 2. all expressions should reference one and only-one side of aliases. For example, simple constant expressions or expressions referencing both side of join condition (a.key+b.key100) is not allowed. 3. all functions in expression should be deteministic and stateless. 4. if DISTRIBUTED BY expression is used, distibution expression also should have same alias with skew expression. **driver alias : 1. driver alias means the sole referenced alias from skew expression, which is important for RANDOM distribution. rows of driver alias are assigned to
[jira] [Updated] (HIVE-5924) Save operation logs in per operation directories in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jaideep Dhok updated HIVE-5924: --- Attachment: HIVE-5924.1.patch.txt First version of the patch Save operation logs in per operation directories in HiveServer2 --- Key: HIVE-5924 URL: https://issues.apache.org/jira/browse/HIVE-5924 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Jaideep Dhok Assignee: Jaideep Dhok Attachments: HIVE-5924.1.patch.txt -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-3286) Explicit skew join on user provided condition
[ https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3286: Status: Open (was: Patch Available) Explicit skew join on user provided condition - Key: HIVE-3286 URL: https://issues.apache.org/jira/browse/HIVE-3286 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D4287.11.patch, HIVE-3286.12.patch.txt, HIVE-3286.13.patch.txt, HIVE-3286.14.patch.txt, HIVE-3286.15.patch.txt, HIVE-3286.16.patch.txt, HIVE-3286.D4287.10.patch, HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, HIVE-3286.D4287.7.patch, HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch Join operation on table with skewed data takes most of execution time handling the skewed keys. But mostly we already know about that and even know what is look like the skewed keys. If we can explicitly assign reducer slots for the skewed keys, total execution time could be greatly shortened. As for a start, I've extended join grammar something like this. {code} select * from src a join src b on a.key=b.key skew on (a.key+1 50, a.key+1 100, a.key 150); {code} which means if above query is executed by 20 reducers, one reducer for a.key+1 50, one reducer for 50 = a.key+1 100, one reducer for 99 = a.key 150, and 17 reducers for others (could be extended to assign more than one reducer later) This can be only used with common-inner-equi joins. And skew condition should be composed of join keys only. Work till done now will be updated shortly after code cleanup. Skew expressions* in SKEW ON (expr, expr, ...) are evaluated sequentially at runtime, and first 'true' one decides skew group for the row. Each skew group has reserved partition slot(s), to which all rows in a group would be assigned. The number of partition slot reserved for each group is decided also at runtime by simple calculation of percentage. If a skew group is CLUSTER BY 20 PERCENT and total partition slot (=number of reducer) is 20, that group will reserve 4 partition slots, etc. DISTRIBUTE BY decides how the rows in a group is dispersed in the range of reserved slots (If there is only one slot for a group, this is meaningless). Currently, three distribution policies are available: RANDOM, KEYS, expression. 1. RANDOM : rows of driver** alias are dispersed by random and rows of non-driver alias are duplicated for all the slots (default if not specified) 2. KEYS : determined by hash value of keys (same with previous) 3. expression : determined by hash of object evaluated by user-provided expression Only possible with inner, equi, common-joins. Not yet supports join tree merging. Might be used by other RS users like SORT BY or GROUP BY If there exists column statistics for the key, it could be possible to apply automatically. For example, if 20 reducers are used for the query below, {code} select count(*) from src a join src b on a.key=b.key skew on ( a.key = '0' CLUSTER BY 10 PERCENT, b.key '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key), cast(a.key as int) 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS); {code} group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will reserve slots 0~5. For a row with key='0' from alias a, the row is randomly assigned in the range of 6~7 (driver alias) : 6 or 7 For a row with key='0' from alias b, the row is disributed for all slots in 6~7 (non-driver alias) : 6 and 7 For a row with key='50', the row is assigned in the range of 8~11 by hashcode of upper(b.key) : 8 + (hash(upper(key)) % 4) For a row with key='500', the row is assigned in the range of 12~19 by hashcode of join key : 12 + (hash(key) % 8) For a row with key='200', this is not belong to any skew group : hash(key) % 6 *expressions in skew condition : 1. all expressions should be made of expression in join condition, which means if join condition is a.key=b.key, user can make any expression with a.key or b.key. But if join condition is a.key+1=b.key, user cannot make expression with a.key solely (should make expression with a.key+1). 2. all expressions should reference one and only-one side of aliases. For example, simple constant expressions or expressions referencing both side of join condition (a.key+b.key100) is not allowed. 3. all functions in expression should be deteministic and stateless. 4. if DISTRIBUTED BY expression is used, distibution expression also should have same alias with skew expression. **driver alias : 1. driver alias means the sole referenced alias from skew expression, which is important for RANDOM distribution. rows of driver alias are assigned to
[jira] [Commented] (HIVE-5924) Save operation logs in per operation directories in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848898#comment-13848898 ] Jaideep Dhok commented on HIVE-5924: [~vgumashta] [[~prasadm] Please have a look at the patch. How to submit a review request? Seems that the Phabricator documentation on the wiki is bit outdated. Save operation logs in per operation directories in HiveServer2 --- Key: HIVE-5924 URL: https://issues.apache.org/jira/browse/HIVE-5924 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Jaideep Dhok Assignee: Jaideep Dhok Attachments: HIVE-5924.1.patch.txt -- This message was sent by Atlassian JIRA (v6.1.4#6159)