[jira] [Updated] (HIVE-7826) Dynamic partition pruning on Tez
[ https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7826: - Attachment: HIVE-7826.2.patch .2 removes unnecessary addition of hive conf template. Dynamic partition pruning on Tez Key: HIVE-7826 URL: https://issues.apache.org/jira/browse/HIVE-7826 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Labels: tez Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch It's natural in a star schema to map one or more dimensions to partition columns. Time or location are likely candidates. It can also useful to be to compute the partitions one would like to scan via a subquery (where p in select ... from ...). The resulting joins in hive require a full table scan of the large table though, because partition pruning takes place before the corresponding values are known. On Tez it's relatively straight forward to send the values needed to prune to the application master - where splits are generated and tasks are submitted. Using these values we can strip out any unneeded partitions dynamically, while the query is running. The approach is straight forward: - Insert synthetic conditions for each join representing x in (keys of other side in join) - This conditions will be pushed as far down as possible - If the condition hits a table scan and the column involved is a partition column: - Setup Operator to send key events to AM - else: - Remove synthetic predicate -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
[ https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105090#comment-14105090 ] Gunther Hagleitner commented on HIVE-7254: -- Thanks, I appreciate it! Just verified - the tez shared tests are back. Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test --- Key: HIVE-7254 URL: https://issues.apache.org/jira/browse/HIVE-7254 Project: Hive Issue Type: Test Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Attachments: trunk-mr2.properties Today, the Hive PTest infrastructure has a test-driver configuration called directory, so it will run all the qfiles under that directory for that driver. For example, CLIDriver is configured with directory ql/src/test/queries/clientpositive However the configuration for the miniXXXDrivers (miniMRDriver, miniMRDriverNegative, miniTezDriver) run only a select number of tests under directory. So we have to use the include configuration to hard-code a list of tests for it to run. This is duplicating the list of each miniDriver's tests already in the /itests/qtest pom file, and can get out of date. It would be nice if both got their information the same way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7812) Disable CombineHiveInputFormat when ACID format is used
[ https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105092#comment-14105092 ] Ashutosh Chauhan commented on HIVE-7812: [~owen.omalley] Can you create RB request for this. Also, cc : [~gopalv] who has spent some time in this part of code while dealing with locality issues. Disable CombineHiveInputFormat when ACID format is used --- Key: HIVE-7812 URL: https://issues.apache.org/jira/browse/HIVE-7812 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-7812.patch Currently the HiveCombineInputFormat complains when called on an ACID directory. Modify HiveCombineInputFormat so that HiveInputFormat is used instead if the directory is ACID format. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 24920: CBO path doesn't handle null expr in select list correctly.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24920/ --- Review request for hive and John Pullokkaran. Repository: hive Description --- CBO path doesn't handle null expr in select list correctly. Diffs - branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTBuilder.java 1619293 branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTConverter.java 1619294 branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java 1619293 branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/TypeConverter.java 1619293 branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1619294 branches/cbo/ql/src/test/queries/clientpositive/cbo_correctness.q 1619294 branches/cbo/ql/src/test/results/clientpositive/cbo_correctness.q.out 1619294 Diff: https://reviews.apache.org/r/24920/diff/ Testing --- added new test. Thanks, Ashutosh Chauhan
Hive on Tez Counters
Hi, Needed info on where I can get detailed job counters for Hive on Tez. Am running this on a HDP cluster with Hive 0.13 and see only the following job counters through Hive Tez in Yarn application logs which I got through( yarn logs -applicationId ...) . a. Cannot see any ReduceOperator counters and also only DESERIALIZE_ERRORS is the only counter present in MapOperator b. The CPU_MILLISECONDS in some cases in -ve. Is CPU_MILLISECONDS accurate c. What does COMMITTED_HEAP_BYTES indicate? d. Is there any other place I should be checking the counters? [[File System Counters FILE: BYTES_READ=512, FILE: BYTES_WRITTEN=3079881, FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0, FILE: WRITE_OPS=0, HDFS: BYTES_READ=8215153, HDFS: BYTES_WRITTEN=0, HDFS: READ_OPS=3, HDFS: LARGE_READ_OPS=0, HDFS: WRITE_OPS=0] [org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=222543, GC_TIME_MILLIS=172, *CPU_MILLISECONDS=-19700*, PHYSICAL_MEMORY_BYTES=667566080, VIRTUAL_MEMORY_BYTES=1887797248, COMMITTED_HEAP_BYTES=1011023872, INPUT_RECORDS_PROCESSED=222543, OUTPUT_RECORDS=222543, OUTPUT_BYTES=23543896, OUTPUT_BYTES_WITH_OVERHEAD=23989024, OUTPUT_BYTES_PHYSICAL=3079369, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0] [*org.apache.hadoop.hive.ql.exec.MapOperator*$Counter DESERIALIZE_ERRORS=0]] Thanks Suma
[jira] [Created] (HIVE-7827) [CBO] null expr in select list is not handled correctly
Ashutosh Chauhan created HIVE-7827: -- Summary: [CBO] null expr in select list is not handled correctly Key: HIVE-7827 URL: https://issues.apache.org/jira/browse/HIVE-7827 Project: Hive Issue Type: Bug Components: CBO Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan select null from t1 fails -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24920: CBO path doesn't handle null expr in select list correctly.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24920/ --- (Updated Aug. 21, 2014, 6:28 a.m.) Review request for hive and John Pullokkaran. Bugs: HIVE-7827 https://issues.apache.org/jira/browse/HIVE-7827 Repository: hive Description (updated) --- CBO path doesn't handle null expr in select list correctly Diffs - branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTBuilder.java 1619293 branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTConverter.java 1619294 branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java 1619293 branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/TypeConverter.java 1619293 branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1619294 branches/cbo/ql/src/test/queries/clientpositive/cbo_correctness.q 1619294 branches/cbo/ql/src/test/results/clientpositive/cbo_correctness.q.out 1619294 Diff: https://reviews.apache.org/r/24920/diff/ Testing --- added new test. Thanks, Ashutosh Chauhan
[jira] [Updated] (HIVE-7827) [CBO] null expr in select list is not handled correctly
[ https://issues.apache.org/jira/browse/HIVE-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7827: --- Attachment: HIVE-7827.patch [CBO] null expr in select list is not handled correctly --- Key: HIVE-7827 URL: https://issues.apache.org/jira/browse/HIVE-7827 Project: Hive Issue Type: Bug Components: CBO Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7827.patch select null from t1 fails -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7805) Support running multiple scans in hbase-handler
[ https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105099#comment-14105099 ] Hive QA commented on HIVE-7805: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663200/HIVE-7805.patch {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 6099 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key2 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key3 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_multiscan_pushdown org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.testPigFilterProjection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/425/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/425/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-425/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663200 Support running multiple scans in hbase-handler --- Key: HIVE-7805 URL: https://issues.apache.org/jira/browse/HIVE-7805 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.14.0 Reporter: Andrew Mains Assignee: Andrew Mains Attachments: HIVE-7805.patch Currently, the HiveHBaseTableInputFormat only supports running a single scan. This can be less efficient than running multiple disjoint scans in certain cases, particularly when using a composite row key. For instance, given a row key schema of: {code} structbucket int, time timestamp {code} if one wants to push down the predicate: {code} bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp 1408506670 {code} it's much more efficient to run a scan for each bucket over the time range (particularly if there's a large amount of data per day). With a single scan, the MR job has to process the data for all time for buckets in between 1 and 100. hive should allow HBaseKeyFactory's to decompose a predicate into one or more scans in order to take advantage of this fact. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4523) round() function with specified decimal places not consistent with mysql
[ https://issues.apache.org/jira/browse/HIVE-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105103#comment-14105103 ] Zhan Zhang commented on HIVE-4523: -- I also met the same problem with the new round UDF in spark. org.apache.hadoop.hive.ql.exec.UDFArgumentException: ROUND second argument only takes constant. Because when spark init the udf, it does not know this udf needs to be taken special care. The round should follow the same contract of other UDF, which needs ObjectInspector, instead of ConstObjectInspector. Can we file a jira to get this fixed? round() function with specified decimal places not consistent with mysql - Key: HIVE-4523 URL: https://issues.apache.org/jira/browse/HIVE-4523 Project: Hive Issue Type: Improvement Components: UDF Affects Versions: 0.7.1 Reporter: Fred Desing Assignee: Xuefu Zhang Priority: Minor Fix For: 0.13.0 Attachments: HIVE-4523.1.patch, HIVE-4523.2.patch, HIVE-4523.3.patch, HIVE-4523.4.patch, HIVE-4523.5.patch, HIVE-4523.6.patch, HIVE-4523.7.patch, HIVE-4523.8.patch, HIVE-4523.patch // hive hive select round(150.000, 2) from temp limit 1; 150.0 hive select round(150, 2) from temp limit 1; 150.0 // mysql mysql select round(150.000, 2) from DUAL limit 1; round(150.000, 2) 150.00 mysql select round(150, 2) from DUAL limit 1; round(150, 2) 150 http://dev.mysql.com/doc/refman/5.1/en/mathematical-functions.html#function_round -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Attachment: (was: HIVE-7730.002.patch) Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Attachment: HIVE-7730.002.patch Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7736) improve the columns stats update speed for all the partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105127#comment-14105127 ] Ashutosh Chauhan commented on HIVE-7736: +1 improve the columns stats update speed for all the partitions of a table Key: HIVE-7736 URL: https://issues.apache.org/jira/browse/HIVE-7736 Project: Hive Issue Type: Improvement Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: HIVE-7736.0.patch, HIVE-7736.1.patch, HIVE-7736.2.patch The current implementation of columns stats update for all the partitions of a table takes a long time when there are thousands of partitions. For example, on a given cluster, it took 600+ seconds to update all the partitions' columns stats for a table with 2 columns but 2000 partitions. ANALYZE TABLE src_stat_part partition (partitionId) COMPUTE STATISTICS for columns; We would like to improve the columns stats update speed for all the partitions of a table -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24602: HIVE-7689 : Enable Postgres as METASTORE back-end
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24602/ --- (Updated août 21, 2014, 8:40 matin) Review request for hive. Bugs: HIVE-7689 https://issues.apache.org/jira/browse/HIVE-7689 Repository: hive-git Description (updated) --- I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable these features : * LOCKS on postgres metastore * COMPACTION on postgres metastore * TRANSACTION on postgres metastore * fix metastore update script for postgres Diffs - metastore/scripts/upgrade/postgres/hive-txn-schema-0.13.0.postgres.sql 2ebd3b0 metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java 524a7a4 metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java 30cf814 metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 063dee6 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java f74f683 ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java f636cff ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java db62721 ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java 4625d27 Diff: https://reviews.apache.org/r/24602/diff/ Testing --- Using patched version in production. Enable concurrency with DbTxnManager. Thanks, Damien Carol
[jira] [Commented] (HIVE-7420) Parameterize tests for HCatalog Pig interfaces for testing against all storage formats
[ https://issues.apache.org/jira/browse/HIVE-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105245#comment-14105245 ] Hive QA commented on HIVE-7420: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663310/HIVE-7420.5.patch {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 6207 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[1] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[2] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[3] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[4] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[5] org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/436/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/436/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-436/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663310 Parameterize tests for HCatalog Pig interfaces for testing against all storage formats -- Key: HIVE-7420 URL: https://issues.apache.org/jira/browse/HIVE-7420 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: David Chen Assignee: David Chen Attachments: HIVE-7420-without-HIVE-7457.2.patch, HIVE-7420-without-HIVE-7457.3.patch, HIVE-7420-without-HIVE-7457.4.patch, HIVE-7420-without-HIVE-7457.5.patch, HIVE-7420.1.patch, HIVE-7420.2.patch, HIVE-7420.3.patch, HIVE-7420.4.patch, HIVE-7420.5.patch Currently, HCatalog tests only test against RCFile with a few testing against ORC. The tests should be covering other Hive storage formats as well. HIVE-7286 turns HCatMapReduceTest into a test fixture that can be run with all Hive storage formats and with that patch, all test suites built on HCatMapReduceTest are running and passing against Sequence File, Text, and ORC in addition to RCFile. Similar changes should be made to make the tests for HCatLoader and HCatStorer generic so that they can be run against all Hive storage formats. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7821) StarterProject: enable groupby4.q
[ https://issues.apache.org/jira/browse/HIVE-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam reassigned HIVE-7821: -- Assignee: Chinna Rao Lalam StarterProject: enable groupby4.q - Key: HIVE-7821 URL: https://issues.apache.org/jira/browse/HIVE-7821 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7820) union_null.q is not deterministic
[ https://issues.apache.org/jira/browse/HIVE-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105289#comment-14105289 ] Hive QA commented on HIVE-7820: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663327/HIVE-7820.1.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6098 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/437/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/437/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-437/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663327 union_null.q is not deterministic -- Key: HIVE-7820 URL: https://issues.apache.org/jira/browse/HIVE-7820 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7820.1.patch, HIVE-7820.1.patch union_null.q selects 10 rows from a subquery which returns man rows. Since the subquery does not have an order by the 10 results returned vary. This problem exists on trunk and spark. We'll fix on trunk and merge to spark. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7384) Research into reduce-side join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105301#comment-14105301 ] Lianhui Wang commented on HIVE-7384: i think current spark already support hash by join_col,sort by {join_col,tag}. because in spark map's shuffleWriter hash by Key.hashcode and sort by Key and in Hive HiveKey class already define the hashcode. so that can support hash by HiveKey.hashcode, sort by HiveKey's bytes Research into reduce-side join [Spark Branch] - Key: HIVE-7384 URL: https://issues.apache.org/jira/browse/HIVE-7384 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Szehon Ho Attachments: Hive on Spark Reduce Side Join.docx, sales_items.txt, sales_products.txt, sales_stores.txt Hive's join operator is very sophisticated, especially for reduce-side join. While we expect that other types of join, such as map-side join and SMB map-side join, will work out of the box with our design, there may be some complication in reduce-side join, which extensively utilizes key tag and shuffle behavior. Our design principle prefers to making Hive implementation work out of box also, which might requires new functionality from Spark. The tasks is to research into this area, identifying requirements for Spark community and the work to be done on Hive to make reduce-side join work. A design doc might be needed for this. For more information, please refer to the overall design doc on wiki. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7598) Potential null pointer dereference in MergeTask#closeJob()
[ https://issues.apache.org/jira/browse/HIVE-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105321#comment-14105321 ] Hive QA commented on HIVE-7598: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663321/HIVE-7598.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6098 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/438/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/438/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-438/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663321 Potential null pointer dereference in MergeTask#closeJob() -- Key: HIVE-7598 URL: https://issues.apache.org/jira/browse/HIVE-7598 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: SUYEON LEE Priority: Minor Attachments: HIVE-7598.patch Call to Utilities.mvFileToFinalPath() passes null as second last parameter, conf. null gets passed to createEmptyBuckets() which dereferences conf directly: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7772) Add tests for order/sort/distribute/cluster by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-7772: - Attachment: HIVE-7772-spark.patch Add tests for order/sort/distribute/cluster by query [Spark Branch] --- Key: HIVE-7772 URL: https://issues.apache.org/jira/browse/HIVE-7772 Project: Hive Issue Type: Test Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7772-spark.patch Now that these queries are supported, we should have tests to catch any problems we may have. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7772) Add tests for order/sort/distribute/cluster by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-7772: - Status: Patch Available (was: Open) Add tests for order/sort/distribute/cluster by query [Spark Branch] --- Key: HIVE-7772 URL: https://issues.apache.org/jira/browse/HIVE-7772 Project: Hive Issue Type: Test Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7772-spark.patch Now that these queries are supported, we should have tests to catch any problems we may have. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7772) Add tests for order/sort/distribute/cluster by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105327#comment-14105327 ] Rui Li commented on HIVE-7772: -- This patch adds some simple cases. Other cases require join or union to be ready. I also found some errors in the output file for some cases (e.g. enforce_order.q): {noformat} [Error 30017]: Skipping stats aggregation by error org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30015]: Stats aggregator of type counter cannot be connected to {noformat} I think this is related to HIVE-7761, so I left these cases out as well. Add tests for order/sort/distribute/cluster by query [Spark Branch] --- Key: HIVE-7772 URL: https://issues.apache.org/jira/browse/HIVE-7772 Project: Hive Issue Type: Test Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7772-spark.patch Now that these queries are supported, we should have tests to catch any problems we may have. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7824) CLIServer.getOperationStatus eats ExceutionException
[ https://issues.apache.org/jira/browse/HIVE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105350#comment-14105350 ] Hive QA commented on HIVE-7824: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663336/HIVE-7824.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6098 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/439/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/439/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-439/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663336 CLIServer.getOperationStatus eats ExceutionException Key: HIVE-7824 URL: https://issues.apache.org/jira/browse/HIVE-7824 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Priority: Blocker Attachments: HIVE-7824.patch, HIVE-7824.patch, HIVE-7824.patch ExceutionException has a cause member which could be anything including serious errors and thus it should be logged. The other lines are escape exceptions and can be logged at trace. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7772) Add tests for order/sort/distribute/cluster by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105352#comment-14105352 ] Hive QA commented on HIVE-7772: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663387/HIVE-7772-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5983 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_null {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/74/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/74/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-74/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663387 Add tests for order/sort/distribute/cluster by query [Spark Branch] --- Key: HIVE-7772 URL: https://issues.apache.org/jira/browse/HIVE-7772 Project: Hive Issue Type: Test Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7772-spark.patch Now that these queries are supported, we should have tests to catch any problems we may have. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7702: --- Attachment: HIVE-7702-spark.patch Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105357#comment-14105357 ] Chinna Rao Lalam commented on HIVE-7702: Join related query files will handle in this jira HIVE-7816 filter_join_breaktask.q,\ filter_join_breaktask2.q Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7702: --- Status: Patch Available (was: Open) Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7816) Enable join tests which Tez executes
[ https://issues.apache.org/jira/browse/HIVE-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7816: --- Description: {noformat} auto_join0.q,\ auto_join1.q,\ cross_join.q,\ cross_product_check_1.q,\ cross_product_check_2.q,\ {noformat} {noformat} filter_join_breaktask.q,\ filter_join_breaktask2.q {noformat} was: {noformat} auto_join0.q,\ auto_join1.q,\ cross_join.q,\ cross_product_check_1.q,\ cross_product_check_2.q,\ {noformat} Enable join tests which Tez executes Key: HIVE-7816 URL: https://issues.apache.org/jira/browse/HIVE-7816 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland {noformat} auto_join0.q,\ auto_join1.q,\ cross_join.q,\ cross_product_check_1.q,\ cross_product_check_2.q,\ {noformat} {noformat} filter_join_breaktask.q,\ filter_join_breaktask2.q {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105379#comment-14105379 ] Hive QA commented on HIVE-7702: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663390/HIVE-7702-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5984 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/75/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/75/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-75/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663390 Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7823) HIVE-6185 removed Partition.getPartition
[ https://issues.apache.org/jira/browse/HIVE-7823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105400#comment-14105400 ] Hive QA commented on HIVE-7823: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663332/HIVE-7823.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6098 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/440/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/440/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-440/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663332 HIVE-6185 removed Partition.getPartition Key: HIVE-7823 URL: https://issues.apache.org/jira/browse/HIVE-7823 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Attachments: HIVE-7823.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7824) CLIServer.getOperationStatus eats ExceutionException
[ https://issues.apache.org/jira/browse/HIVE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7824: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Thank you for the review! I committed this trivial patch to trunk. CLIServer.getOperationStatus eats ExceutionException Key: HIVE-7824 URL: https://issues.apache.org/jira/browse/HIVE-7824 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-7824.patch, HIVE-7824.patch, HIVE-7824.patch ExceutionException has a cause member which could be anything including serious errors and thus it should be logged. The other lines are escape exceptions and can be logged at trace. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105447#comment-14105447 ] Brock Noland commented on HIVE-7702: Nice work [~chinnalalam]!! Looks like insert_into2 fails. Looking at the DIFF I see a bunch of odd characters at the bottom. Thank you!! Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7815) Reduce Side Join with single reducer [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105456#comment-14105456 ] Brock Noland commented on HIVE-7815: Nice [~szehon]! I think union_null failed since it's non-deterministic. I have a patch up on HIVE-7820 to fix this. Reduce Side Join with single reducer [Spark Branch] --- Key: HIVE-7815 URL: https://issues.apache.org/jira/browse/HIVE-7815 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7815-spark.patch This is the first part of the reduce-side join work, see HIVE-7384 for details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7828) Fix CLIDriver test parquet_join.q
Brock Noland created HIVE-7828: -- Summary: Fix CLIDriver test parquet_join.q Key: HIVE-7828 URL: https://issues.apache.org/jira/browse/HIVE-7828 Project: Hive Issue Type: Bug Reporter: Brock Noland The test is failing in the HiveQA tests of late. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7821) StarterProject: enable groupby4.q
[ https://issues.apache.org/jira/browse/HIVE-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105458#comment-14105458 ] Brock Noland commented on HIVE-7821: Hi [~chinnalalam], Thank you for picking this up! I should have mentioned, I created this one for Suhas who has recently joined the project team. Would you mind if he takes this one? StarterProject: enable groupby4.q - Key: HIVE-7821 URL: https://issues.apache.org/jira/browse/HIVE-7821 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7772) Add tests for order/sort/distribute/cluster by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105464#comment-14105464 ] Brock Noland commented on HIVE-7772: Nice work [~lirui]!! I think union_null failed since it's non-deterministic. I have a patch up on HIVE-7820 to fix this. Add tests for order/sort/distribute/cluster by query [Spark Branch] --- Key: HIVE-7772 URL: https://issues.apache.org/jira/browse/HIVE-7772 Project: Hive Issue Type: Test Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7772-spark.patch Now that these queries are supported, we should have tests to catch any problems we may have. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7821) StarterProject: enable groupby4.q
[ https://issues.apache.org/jira/browse/HIVE-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suhas Satish reassigned HIVE-7821: -- Assignee: Suhas Satish (was: Chinna Rao Lalam) StarterProject: enable groupby4.q - Key: HIVE-7821 URL: https://issues.apache.org/jira/browse/HIVE-7821 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Suhas Satish -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7820) union_null.q is not deterministic
[ https://issues.apache.org/jira/browse/HIVE-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105472#comment-14105472 ] Brock Noland commented on HIVE-7820: Note this patch should be committed to trunk and merged to spark. union_null.q is not deterministic -- Key: HIVE-7820 URL: https://issues.apache.org/jira/browse/HIVE-7820 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7820.1.patch, HIVE-7820.1.patch union_null.q selects 10 rows from a subquery which returns man rows. Since the subquery does not have an order by the 10 results returned vary. This problem exists on trunk and spark. We'll fix on trunk and merge to spark. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7829) Entity.getLocation can throw an NPE
Brock Noland created HIVE-7829: -- Summary: Entity.getLocation can throw an NPE Key: HIVE-7829 URL: https://issues.apache.org/jira/browse/HIVE-7829 Project: Hive Issue Type: Bug Reporter: Brock Noland Attachments: HIVE-7892.patch It's possible for the getDataLocation methods which Entity.getLocation calls to return null and as such NPE -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7829) Entity.getLocation can throw an NPE
[ https://issues.apache.org/jira/browse/HIVE-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7829: --- Assignee: Brock Noland Status: Patch Available (was: Open) Entity.getLocation can throw an NPE --- Key: HIVE-7829 URL: https://issues.apache.org/jira/browse/HIVE-7829 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7892.patch It's possible for the getDataLocation methods which Entity.getLocation calls to return null and as such NPE -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7829) Entity.getLocation can throw an NPE
[ https://issues.apache.org/jira/browse/HIVE-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7829: --- Attachment: HIVE-7892.patch Entity.getLocation can throw an NPE --- Key: HIVE-7829 URL: https://issues.apache.org/jira/browse/HIVE-7829 Project: Hive Issue Type: Bug Reporter: Brock Noland Attachments: HIVE-7892.patch It's possible for the getDataLocation methods which Entity.getLocation calls to return null and as such NPE -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 24934: HIVE-7829 - Entity.getLocation can throw an NPE
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24934/ --- Review request for hive and Szehon Ho. Repository: hive-git Description --- Very simple change to return null if location cannot be obtained Diffs - ql/src/java/org/apache/hadoop/hive/ql/hooks/Entity.java aafeaab Diff: https://reviews.apache.org/r/24934/diff/ Testing --- Thanks, Brock Noland
[jira] [Commented] (HIVE-7735) Implement Char, Varchar in ParquetSerDe
[ https://issues.apache.org/jira/browse/HIVE-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105481#comment-14105481 ] Mohit Sabharwal commented on HIVE-7735: --- Updated patch after rebase (to address VirtualColumn conflict) Implement Char, Varchar in ParquetSerDe --- Key: HIVE-7735 URL: https://issues.apache.org/jira/browse/HIVE-7735 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Labels: Parquet Attachments: HIVE-7735.1.patch, HIVE-7735.1.patch, HIVE-7735.2.patch, HIVE-7735.2.patch, HIVE-7735.3.patch, HIVE-7735.patch This JIRA is to implement CHAR and VARCHAR support in Parquet SerDe. Both are represented in Parquet as PrimitiveType binary and OriginalType UTF8. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7735) Implement Char, Varchar in ParquetSerDe
[ https://issues.apache.org/jira/browse/HIVE-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohit Sabharwal updated HIVE-7735: -- Attachment: HIVE-7735.3.patch Implement Char, Varchar in ParquetSerDe --- Key: HIVE-7735 URL: https://issues.apache.org/jira/browse/HIVE-7735 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Labels: Parquet Attachments: HIVE-7735.1.patch, HIVE-7735.1.patch, HIVE-7735.2.patch, HIVE-7735.2.patch, HIVE-7735.3.patch, HIVE-7735.patch This JIRA is to implement CHAR and VARCHAR support in Parquet SerDe. Both are represented in Parquet as PrimitiveType binary and OriginalType UTF8. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105511#comment-14105511 ] Alan Gates commented on HIVE-7689: -- I will review this but I'll need to test it against other backends (MySQL, Oracle). It will be a week or so until I get to it. Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on metastore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7828) Fix CLIDriver test parquet_join.q
[ https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105512#comment-14105512 ] Brock Noland commented on HIVE-7828: Caused by the fact that HIVE-7513 was committed between the time HIVE-7629 .out file was generated and committed. Fix CLIDriver test parquet_join.q - Key: HIVE-7828 URL: https://issues.apache.org/jira/browse/HIVE-7828 Project: Hive Issue Type: Bug Reporter: Brock Noland The test is failing in the HiveQA tests of late. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers
[ https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105514#comment-14105514 ] Sergio Peña commented on HIVE-7373: --- [~leftylev] Here's the statement. Does it need more explanation or examples? Prior to 0.14, Hive used to trim trailing zeros for decimal numbers. Currently, the trailing zeros are preserved up to what the scale allows (HIVE-7373). Hive should not remove trailing zeros for decimal numbers - Key: HIVE-7373 URL: https://issues.apache.org/jira/browse/HIVE-7373 Project: Hive Issue Type: Bug Components: Types Affects Versions: 0.13.0, 0.13.1 Reporter: Xuefu Zhang Assignee: Sergio Peña Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7373.1.patch, HIVE-7373.2.patch, HIVE-7373.3.patch, HIVE-7373.4.patch, HIVE-7373.5.patch, HIVE-7373.6.patch, HIVE-7373.6.patch Currently Hive blindly removes trailing zeros of a decimal input number as sort of standardization. This is questionable in theory and problematic in practice. 1. In decimal context, number 3.14 has a different semantic meaning from number 3.14. Removing trailing zeroes makes the meaning lost. 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL because the column doesn't allow a decimal number with integer part. Therefore, I propose Hive preserve the trailing zeroes (up to what the scale allows). With this, in above example, 0.0, 0.00, and 0. will be represented as 0.0 (precision=1, scale=1) internally. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7828) Fix CLIDriver test parquet_join.q
[ https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland reassigned HIVE-7828: -- Assignee: Brock Noland Fix CLIDriver test parquet_join.q - Key: HIVE-7828 URL: https://issues.apache.org/jira/browse/HIVE-7828 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland The test is failing in the HiveQA tests of late. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7828) Fix CLIDriver test parquet_join.q
[ https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7828: --- Attachment: HIVE-7828.patch Fix CLIDriver test parquet_join.q - Key: HIVE-7828 URL: https://issues.apache.org/jira/browse/HIVE-7828 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7828.patch The test is failing in the HiveQA tests of late. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7828) Fix CLIDriver test parquet_join.q
[ https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105523#comment-14105523 ] Brock Noland commented on HIVE-7828: [~alangates] since you reviewed HIVE-7513 would you mind reviewing this one? Fix CLIDriver test parquet_join.q - Key: HIVE-7828 URL: https://issues.apache.org/jira/browse/HIVE-7828 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7828.patch The test is failing in the HiveQA tests of late. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7828) TestCLIDriver.parquet_join.q is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7828: --- Summary: TestCLIDriver.parquet_join.q is failing on trunk (was: Fix CLIDriver test parquet_join.q) TestCLIDriver.parquet_join.q is failing on trunk Key: HIVE-7828 URL: https://issues.apache.org/jira/browse/HIVE-7828 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7828.patch The test is failing in the HiveQA tests of late. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7812) Disable CombineHiveInputFormat when ACID format is used
[ https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105529#comment-14105529 ] Owen O'Malley commented on HIVE-7812: - [~ashutoshc] RB posted as https://reviews.apache.org/r/24937/ Disable CombineHiveInputFormat when ACID format is used --- Key: HIVE-7812 URL: https://issues.apache.org/jira/browse/HIVE-7812 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-7812.patch Currently the HiveCombineInputFormat complains when called on an ACID directory. Modify HiveCombineInputFormat so that HiveInputFormat is used instead if the directory is ACID format. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7812) Disable CombineHiveInputFormat when ACID format is used
[ https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-7812: Attachment: HIVE-7812.patch Updated patch rebased against current trunk. Disable CombineHiveInputFormat when ACID format is used --- Key: HIVE-7812 URL: https://issues.apache.org/jira/browse/HIVE-7812 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-7812.patch, HIVE-7812.patch Currently the HiveCombineInputFormat complains when called on an ACID directory. Modify HiveCombineInputFormat so that HiveInputFormat is used instead if the directory is ACID format. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7807) Refer to umask property using FsPermission.UMASK_LABEL.
[ https://issues.apache.org/jira/browse/HIVE-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105539#comment-14105539 ] Hive QA commented on HIVE-7807: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663210/HIVE-7807.1.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6099 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/442/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/442/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-442/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663210 Refer to umask property using FsPermission.UMASK_LABEL. --- Key: HIVE-7807 URL: https://issues.apache.org/jira/browse/HIVE-7807 Project: Hive Issue Type: Bug Reporter: Venki Korukanti Assignee: Venki Korukanti Attachments: HIVE-7807.1.patch Currently in {{org.apache.hadoop.hive.ql.exec.Utilities.createDirsWithPermission}} umask property is referred using {{fs.permissions.umask-mode}} which is only available in Hadoop 2.x. Property dfs.umaskmode is used in 1.x for the same purpose. Also dfs.umaskmode was not deprecated in 1.x according to HADOOP-8727. This JIRA is to change umask property references to {{FsPermission.UMASK_LABEL}} which always points to proper property in latest Hadoop in each version (0.23.x, 1.x, 2.x) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7829) Entity.getLocation can throw an NPE
[ https://issues.apache.org/jira/browse/HIVE-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105548#comment-14105548 ] Hive QA commented on HIVE-7829: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663413/HIVE-7892.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/443/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/443/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-443/ Messages: {noformat} This message was trimmed, see log for full details warning(200): IdentifiersParser.g:68:4: Decision can match input such as LPAREN LPAREN KW_NULL using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as LPAREN KW_CASE StringLiteral using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:115:5: Decision can match input such as KW_CLUSTER KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:127:5: Decision can match input such as KW_PARTITION KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:138:5: Decision can match input such as KW_DISTRIBUTE KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:149:5: Decision can match input such as KW_SORT KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:166:7: Decision can match input such as STAR using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:179:5: Decision can match input such as KW_STRUCT using multiple alternatives: 4, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:179:5: Decision can match input such as KW_ARRAY using multiple alternatives: 2, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:179:5: Decision can match input such as KW_UNIONTYPE using multiple alternatives: 5, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as KW_NULL using multiple alternatives: 1, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as KW_TRUE using multiple alternatives: 3, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as KW_DATE StringLiteral using multiple alternatives: 2, 3 As a result, alternative(s) 3 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as KW_FALSE using multiple alternatives: 3, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_CLUSTER KW_BY using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_MAP LPAREN using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_OVERWRITE using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_GROUP KW_BY using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_INTO using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_LATERAL KW_VIEW using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_SORT KW_BY using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200):
[jira] [Commented] (HIVE-7663) OrcRecordUpdater needs to implement getStats
[ https://issues.apache.org/jira/browse/HIVE-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105549#comment-14105549 ] Owen O'Malley commented on HIVE-7663: - +1 although you need to look at the unit test failures. OrcRecordUpdater needs to implement getStats Key: HIVE-7663 URL: https://issues.apache.org/jira/browse/HIVE-7663 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7663.patch OrcRecordUpdater.getStats currently returns null. It needs to track the stats and return a valid value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7680) Do not throw SQLException for HiveStatement getMoreResults and setEscapeProcessing(false)
[ https://issues.apache.org/jira/browse/HIVE-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7680: -- Attachment: HIVE-7680.2.patch prev build #430 failed with message HIVE-7680 is not Patch Available. Exiting. + exit 1 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/430/console Attached patch #2 again Do not throw SQLException for HiveStatement getMoreResults and setEscapeProcessing(false) - Key: HIVE-7680 URL: https://issues.apache.org/jira/browse/HIVE-7680 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.13.1 Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-7680.2.patch, HIVE-7680.2.patch, HIVE-7680.patch 1. Some JDBC clients call method setEscapeProcessing(false) (e.g. SQL Workbench) Looks like setEscapeProcessing(false) should do nothing.So, lets do nothing instead of throwing SQLException 2. getMoreResults is needed in case Statements returns several ReseltSet. Hive does not support Multiple ResultSets. So this method can safely always return false. 3. getUpdateCount. Currently this method always returns 0. Hive cannot tell us how many rows were inserted. According to JDBC spec it should return -1 if the current result is a ResultSet object or there are no more results if this method returns 0 then in case of execution insert statement JDBC client shows 0 rows were inserted which is not true. if this method returns -1 then JDBC client runs insert statements and shows that it was executed successfully, no result were returned. I think the latter behaviour is more correct. 4. Some methods in Statement class should throw SQLFeatureNotSupportedException if they are not supported. Current implementation throws SQLException instead which means database access error. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7812) Disable CombineHiveInputFormat when ACID format is used
[ https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105602#comment-14105602 ] Ashutosh Chauhan commented on HIVE-7812: Left some comments on RB. Disable CombineHiveInputFormat when ACID format is used --- Key: HIVE-7812 URL: https://issues.apache.org/jira/browse/HIVE-7812 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-7812.patch, HIVE-7812.patch Currently the HiveCombineInputFormat complains when called on an ACID directory. Modify HiveCombineInputFormat so that HiveInputFormat is used instead if the directory is ACID format. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24909: HIVE-7807: Refer to umask property using FsPermission.UMASK_LABEL
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24909/#review51186 --- Ship it! Ship It! - Brock Noland On Aug. 20, 2014, 8:37 p.m., Venki Korukanti wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24909/ --- (Updated Aug. 20, 2014, 8:37 p.m.) Review request for hive and Thejas Nair. Bugs: HIVE-7807 https://issues.apache.org/jira/browse/HIVE-7807 Repository: hive-git Description --- Refer to JIRA HIVE-7807 and HIVE-7001 for details. Diffs - itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestUtilitiesDfs.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1d6a93a ql/src/test/org/apache/hadoop/hive/ql/exec/TestUtilities.java bf3fd88 Diff: https://reviews.apache.org/r/24909/diff/ Testing --- Note: I had to create a separate test file for using MiniDFS in itests module. Existing TestUtilities in ql needs hadoop-test and few other dependencies which are not needed so far. So decided not to add extra test dependencies in ql/pom.xml for just one test. Thanks, Venki Korukanti
[jira] [Updated] (HIVE-7646) Modify parser to support new grammar for Insert,Update,Delete
[ https://issues.apache.org/jira/browse/HIVE-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7646: - Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Patch checked in. Thanks Eugene. Modify parser to support new grammar for Insert,Update,Delete - Key: HIVE-7646 URL: https://issues.apache.org/jira/browse/HIVE-7646 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: 0.13.1 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.14.0 Attachments: HIVE-7646.1.patch, HIVE-7646.2.patch, HIVE-7646.3.patch, HIVE-7646.patch need parser to recognize constructs such as : {code:sql} INSERT INTO Cust (Customer_Number, Balance, Address) VALUES (101, 50.00, '123 Main Street'), (102, 75.00, '123 Pine Ave'); {code} {code:sql} DELETE FROM Cust WHERE Balance 5.0 {code} {code:sql} UPDATE Cust SET column1=value1,column2=value2,... WHERE some_column=some_value {code} also useful {code:sql} select a,b from values((1,2),(3,4)) as FOO(a,b) {code} This makes writing tests easier. Some references: http://dev.mysql.com/doc/refman/5.6/en/insert.html http://msdn.microsoft.com/en-us/library/dd776382.aspx http://www.postgresql.org/docs/9.1/static/sql-values.html -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7281) DbTxnManager acquiring wrong level of lock for dynamic partitioning
[ https://issues.apache.org/jira/browse/HIVE-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105624#comment-14105624 ] Ashutosh Chauhan commented on HIVE-7281: Filing a separate ticket and unlinking it from this one is fine, only that it will be left to the mercy of someone who will do it : ) Fixing a root cause is always better (in this case deleting error-prone DummyPartitions) but since immediate bug can be fixed by current patch. +1 DbTxnManager acquiring wrong level of lock for dynamic partitioning --- Key: HIVE-7281 URL: https://issues.apache.org/jira/browse/HIVE-7281 Project: Hive Issue Type: Bug Components: Locking, Transactions Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7281.patch Currently DbTxnManager.acquireLocks() locks the DUMMY_PARTITION for dynamic partitioning. But this is not adequate. This will not prevent drop operations on partitions being written to. The lock should be at the table level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7830) CBO: Some UDF(case, lead, lag..) doesn't get translated correctly
Laljo John Pullokkaran created HIVE-7830: Summary: CBO: Some UDF(case, lead, lag..) doesn't get translated correctly Key: HIVE-7830 URL: https://issues.apache.org/jira/browse/HIVE-7830 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7827) [CBO] null expr in select list is not handled correctly
[ https://issues.apache.org/jira/browse/HIVE-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7827: --- Status: Patch Available (was: Open) [CBO] null expr in select list is not handled correctly --- Key: HIVE-7827 URL: https://issues.apache.org/jira/browse/HIVE-7827 Project: Hive Issue Type: Bug Components: CBO Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7827.patch select null from t1 fails -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7829) Entity.getLocation can throw an NPE
[ https://issues.apache.org/jira/browse/HIVE-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7829: --- Attachment: HIVE-7829.1.patch Entity.getLocation can throw an NPE --- Key: HIVE-7829 URL: https://issues.apache.org/jira/browse/HIVE-7829 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7829.1.patch, HIVE-7892.patch It's possible for the getDataLocation methods which Entity.getLocation calls to return null and as such NPE -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7571) RecordUpdater should read virtual columns from row
[ https://issues.apache.org/jira/browse/HIVE-7571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7571: - Status: Open (was: Patch Available) Found an issue in that OrcRecordUpdater isn't properly selecting the object inspector to use. RecordUpdater should read virtual columns from row -- Key: HIVE-7571 URL: https://issues.apache.org/jira/browse/HIVE-7571 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7571.WIP.patch, HIVE-7571.patch Currently RecordUpdater.update and delete take rowid and original transaction as parameters. These values are already present in the row as part of the new ROW__ID virtual column in HIVE-7513, and thus can be read by the writer from there. And the writer will already have to handle skipping ROW__ID when writing, so it needs to be aware of that column anyone. We could instead read the values from ROW__ID and then remove it from the object inspector in FileSinkOperator, but this will be hard in the vectorization case where rows are being dealt with 10k at a time. For these reasons it makes more sense to do this work in the writer. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24918: HIVE-7791 - Enable tests on Spark branch (1) [Sparch Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24918/#review51187 --- Hey Brock, thanks, just some pretty minor comments below. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java https://reviews.apache.org/r/24918/#comment89218 I'm just curious, was this to fix a test? Also is it necessary to catch and wrap in RuntimeException here? ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java https://reviews.apache.org/r/24918/#comment89216 This is a bit strange, as the resolve() method returns null. Not sure if we should assign to a variable as of now? - Szehon Ho On Aug. 21, 2014, 12:26 a.m., Brock Noland wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24918/ --- (Updated Aug. 21, 2014, 12:26 a.m.) Review request for hive. Repository: hive-git Description --- Enable tests Diffs - itests/src/test/resources/testconfiguration.properties ecb8b74 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java d16f1be ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java dc621cf ql/src/java/org/apache/hadoop/hive/ql/stats/CounterStatsAggregator.java 026f4e0 ql/src/test/results/clientpositive/spark/alter_merge_orc.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/alter_merge_stats_orc.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/bucket2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/bucket3.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/bucket4.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/count.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/create_merge_compressed.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/ctas.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/custom_input_output_format.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/disable_merge_for_bucketing.q.out PRE-CREATION Diff: https://reviews.apache.org/r/24918/diff/ Testing --- Verified output vs MR Thanks, Brock Noland
Re: Review Request 24934: HIVE-7829 - Entity.getLocation can throw an NPE
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24934/#review51189 --- Ship it! Ship It! - Szehon Ho On Aug. 21, 2014, 3:47 p.m., Brock Noland wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24934/ --- (Updated Aug. 21, 2014, 3:47 p.m.) Review request for hive and Szehon Ho. Repository: hive-git Description --- Very simple change to return null if location cannot be obtained Diffs - ql/src/java/org/apache/hadoop/hive/ql/hooks/Entity.java aafeaab Diff: https://reviews.apache.org/r/24934/diff/ Testing --- Thanks, Brock Noland
[jira] [Commented] (HIVE-7829) Entity.getLocation can throw an NPE
[ https://issues.apache.org/jira/browse/HIVE-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105642#comment-14105642 ] Szehon Ho commented on HIVE-7829: - +1 pending tests Entity.getLocation can throw an NPE --- Key: HIVE-7829 URL: https://issues.apache.org/jira/browse/HIVE-7829 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7829.1.patch, HIVE-7892.patch It's possible for the getDataLocation methods which Entity.getLocation calls to return null and as such NPE -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7736) improve the columns stats update speed for all the partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengcheng xiong updated HIVE-7736: -- Attachment: HIVE-7736.3.patch regenerate the patch (rebase), wait for QA tests improve the columns stats update speed for all the partitions of a table Key: HIVE-7736 URL: https://issues.apache.org/jira/browse/HIVE-7736 Project: Hive Issue Type: Improvement Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: HIVE-7736.0.patch, HIVE-7736.1.patch, HIVE-7736.2.patch, HIVE-7736.3.patch The current implementation of columns stats update for all the partitions of a table takes a long time when there are thousands of partitions. For example, on a given cluster, it took 600+ seconds to update all the partitions' columns stats for a table with 2 columns but 2000 partitions. ANALYZE TABLE src_stat_part partition (partitionId) COMPUTE STATISTICS for columns; We would like to improve the columns stats update speed for all the partitions of a table -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7807) Refer to umask property using FsPermission.UMASK_LABEL.
[ https://issues.apache.org/jira/browse/HIVE-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7807: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Thank you very much Venki! I have committed this to trunk! Refer to umask property using FsPermission.UMASK_LABEL. --- Key: HIVE-7807 URL: https://issues.apache.org/jira/browse/HIVE-7807 Project: Hive Issue Type: Bug Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 0.14.0 Attachments: HIVE-7807.1.patch Currently in {{org.apache.hadoop.hive.ql.exec.Utilities.createDirsWithPermission}} umask property is referred using {{fs.permissions.umask-mode}} which is only available in Hadoop 2.x. Property dfs.umaskmode is used in 1.x for the same purpose. Also dfs.umaskmode was not deprecated in 1.x according to HADOOP-8727. This JIRA is to change umask property references to {{FsPermission.UMASK_LABEL}} which always points to proper property in latest Hadoop in each version (0.23.x, 1.x, 2.x) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengcheng xiong updated HIVE-7654: -- Attachment: HIVE-7654.7.patch regenerate the patch (rebase), wait for QA tests A method to extrapolate columnStats for partitions of a table - Key: HIVE-7654 URL: https://issues.apache.org/jira/browse/HIVE-7654 Project: Hive Issue Type: New Feature Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch In a PARTITIONED table, there are many partitions. For example, create table if not exists loc_orc ( state string, locid int, zip bigint ) partitioned by(year string) stored as orc; We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003'). We can use the following command to compute statistics for columns state,locid of partition(year='2001') analyze table loc_orc partition(year='2001') compute statistics for columns state,locid; We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001') We propose a method to extrapolate the missing column status for the partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7820) union_null.q is not deterministic
[ https://issues.apache.org/jira/browse/HIVE-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105658#comment-14105658 ] Szehon Ho commented on HIVE-7820: - +1, thanks Brock union_null.q is not deterministic -- Key: HIVE-7820 URL: https://issues.apache.org/jira/browse/HIVE-7820 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7820.1.patch, HIVE-7820.1.patch union_null.q selects 10 rows from a subquery which returns man rows. Since the subquery does not have an order by the 10 results returned vary. This problem exists on trunk and spark. We'll fix on trunk and merge to spark. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7831) Research commented out unset in Utiltities
Brock Noland created HIVE-7831: -- Summary: Research commented out unset in Utiltities Key: HIVE-7831 URL: https://issues.apache.org/jira/browse/HIVE-7831 Project: Hive Issue Type: Sub-task Reporter: Brock Noland -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7671) What happened to HCatalog?
[ https://issues.apache.org/jira/browse/HIVE-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned HIVE-7671: Assignee: Alan Gates What happened to HCatalog? -- Key: HIVE-7671 URL: https://issues.apache.org/jira/browse/HIVE-7671 Project: Hive Issue Type: Bug Reporter: Sebb Assignee: Alan Gates According to the Incubator website, HCatalog graduated to become part of Hive in Feb 2013, yet I could find no references to HCatalog on the Hive website, and there are still downloads on the Incubator mirrors: https://dist.apache.org/repos/dist/release/incubator/hcatalog/ The Incubator HCatalog website redirects to Hive, so it would help if there were some mention of what happened to it. Also if the HCatalog downloads are no longer relevant, they should be deleted from the incubator mirror. They will continue to be available from the archives website if there is a need to keep links to them for historic purposes: http://archive.apache.org/dist/incubator/hcatalog/ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7828) TestCLIDriver.parquet_join.q is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105681#comment-14105681 ] Alan Gates commented on HIVE-7828: -- +1, looks correct. TestCLIDriver.parquet_join.q is failing on trunk Key: HIVE-7828 URL: https://issues.apache.org/jira/browse/HIVE-7828 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7828.patch The test is failing in the HiveQA tests of late. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7663) OrcRecordUpdater needs to implement getStats
[ https://issues.apache.org/jira/browse/HIVE-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105690#comment-14105690 ] Alan Gates commented on HIVE-7663: -- TestDDLWithRemoteMetastoreSecondNamenode and TestHCatLoader pass fine for me when I run them locally. The other two fail for me on trunk and with the patch, both on my mac and on linux. I don't think any of those test anything I changed, since nothing but the streaming library is using the OrcRecordUpdater at this point. OrcRecordUpdater needs to implement getStats Key: HIVE-7663 URL: https://issues.apache.org/jira/browse/HIVE-7663 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7663.patch OrcRecordUpdater.getStats currently returns null. It needs to track the stats and return a valid value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengcheng xiong updated HIVE-7654: -- Status: Open (was: Patch Available) A method to extrapolate columnStats for partitions of a table - Key: HIVE-7654 URL: https://issues.apache.org/jira/browse/HIVE-7654 Project: Hive Issue Type: New Feature Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch In a PARTITIONED table, there are many partitions. For example, create table if not exists loc_orc ( state string, locid int, zip bigint ) partitioned by(year string) stored as orc; We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003'). We can use the following command to compute statistics for columns state,locid of partition(year='2001') analyze table loc_orc partition(year='2001') compute statistics for columns state,locid; We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001') We propose a method to extrapolate the missing column status for the partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengcheng xiong updated HIVE-7654: -- Attachment: HIVE-7654.8.patch A method to extrapolate columnStats for partitions of a table - Key: HIVE-7654 URL: https://issues.apache.org/jira/browse/HIVE-7654 Project: Hive Issue Type: New Feature Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, HIVE-7654.8.patch In a PARTITIONED table, there are many partitions. For example, create table if not exists loc_orc ( state string, locid int, zip bigint ) partitioned by(year string) stored as orc; We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003'). We can use the following command to compute statistics for columns state,locid of partition(year='2001') analyze table loc_orc partition(year='2001') compute statistics for columns state,locid; We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001') We propose a method to extrapolate the missing column status for the partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7736) improve the columns stats update speed for all the partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengcheng xiong updated HIVE-7736: -- Status: Open (was: Patch Available) improve the columns stats update speed for all the partitions of a table Key: HIVE-7736 URL: https://issues.apache.org/jira/browse/HIVE-7736 Project: Hive Issue Type: Improvement Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: HIVE-7736.0.patch, HIVE-7736.1.patch, HIVE-7736.2.patch, HIVE-7736.3.patch The current implementation of columns stats update for all the partitions of a table takes a long time when there are thousands of partitions. For example, on a given cluster, it took 600+ seconds to update all the partitions' columns stats for a table with 2 columns but 2000 partitions. ANALYZE TABLE src_stat_part partition (partitionId) COMPUTE STATISTICS for columns; We would like to improve the columns stats update speed for all the partitions of a table -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengcheng xiong updated HIVE-7654: -- Status: Patch Available (was: Open) wait for QA A method to extrapolate columnStats for partitions of a table - Key: HIVE-7654 URL: https://issues.apache.org/jira/browse/HIVE-7654 Project: Hive Issue Type: New Feature Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, HIVE-7654.8.patch In a PARTITIONED table, there are many partitions. For example, create table if not exists loc_orc ( state string, locid int, zip bigint ) partitioned by(year string) stored as orc; We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003'). We can use the following command to compute statistics for columns state,locid of partition(year='2001') analyze table loc_orc partition(year='2001') compute statistics for columns state,locid; We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001') We propose a method to extrapolate the missing column status for the partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Hive on Tez Counters
I'll let Hive folks answer the questions about the Hive counters. In terms of the CPU counter - that was a bug in Tez-0.4.0, which has been fixed in 0.5.0. COMMITTED_HEAP_BYTES just represents the memory available to the JVM (Runtime.getRuntime().totalMemory()). This will only vary if the VM is started with a different Xms and Xmx option. In terms of Tez, the application logs are currently the best place. Hive may expose these in a more accessible manner though. On Wed, Aug 20, 2014 at 11:16 PM, Suma Shivaprasad sumasai.shivapra...@gmail.com wrote: Hi, Needed info on where I can get detailed job counters for Hive on Tez. Am running this on a HDP cluster with Hive 0.13 and see only the following job counters through Hive Tez in Yarn application logs which I got through( yarn logs -applicationId ...) . a. Cannot see any ReduceOperator counters and also only DESERIALIZE_ERRORS is the only counter present in MapOperator b. The CPU_MILLISECONDS in some cases in -ve. Is CPU_MILLISECONDS accurate c. What does COMMITTED_HEAP_BYTES indicate? d. Is there any other place I should be checking the counters? [[File System Counters FILE: BYTES_READ=512, FILE: BYTES_WRITTEN=3079881, FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0, FILE: WRITE_OPS=0, HDFS: BYTES_READ=8215153, HDFS: BYTES_WRITTEN=0, HDFS: READ_OPS=3, HDFS: LARGE_READ_OPS=0, HDFS: WRITE_OPS=0] [org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=222543, GC_TIME_MILLIS=172, *CPU_MILLISECONDS=-19700*, PHYSICAL_MEMORY_BYTES=667566080, VIRTUAL_MEMORY_BYTES=1887797248, COMMITTED_HEAP_BYTES=1011023872, INPUT_RECORDS_PROCESSED=222543, OUTPUT_RECORDS=222543, OUTPUT_BYTES=23543896, OUTPUT_BYTES_WITH_OVERHEAD=23989024, OUTPUT_BYTES_PHYSICAL=3079369, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0] [*org.apache.hadoop.hive.ql.exec.MapOperator*$Counter DESERIALIZE_ERRORS=0]] Thanks Suma
[jira] [Updated] (HIVE-7736) improve the columns stats update speed for all the partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengcheng xiong updated HIVE-7736: -- Status: Patch Available (was: Open) wait for QA tests improve the columns stats update speed for all the partitions of a table Key: HIVE-7736 URL: https://issues.apache.org/jira/browse/HIVE-7736 Project: Hive Issue Type: Improvement Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: HIVE-7736.0.patch, HIVE-7736.1.patch, HIVE-7736.2.patch, HIVE-7736.3.patch, HIVE-7736.4.patch The current implementation of columns stats update for all the partitions of a table takes a long time when there are thousands of partitions. For example, on a given cluster, it took 600+ seconds to update all the partitions' columns stats for a table with 2 columns but 2000 partitions. ANALYZE TABLE src_stat_part partition (partitionId) COMPUTE STATISTICS for columns; We would like to improve the columns stats update speed for all the partitions of a table -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7736) improve the columns stats update speed for all the partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengcheng xiong updated HIVE-7736: -- Attachment: HIVE-7736.4.patch improve the columns stats update speed for all the partitions of a table Key: HIVE-7736 URL: https://issues.apache.org/jira/browse/HIVE-7736 Project: Hive Issue Type: Improvement Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: HIVE-7736.0.patch, HIVE-7736.1.patch, HIVE-7736.2.patch, HIVE-7736.3.patch, HIVE-7736.4.patch The current implementation of columns stats update for all the partitions of a table takes a long time when there are thousands of partitions. For example, on a given cluster, it took 600+ seconds to update all the partitions' columns stats for a table with 2 columns but 2000 partitions. ANALYZE TABLE src_stat_part partition (partitionId) COMPUTE STATISTICS for columns; We would like to improve the columns stats update speed for all the partitions of a table -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7735) Implement Char, Varchar in ParquetSerDe
[ https://issues.apache.org/jira/browse/HIVE-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105711#comment-14105711 ] Hive QA commented on HIVE-7735: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663414/HIVE-7735.3.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6098 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/444/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/444/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-444/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663414 Implement Char, Varchar in ParquetSerDe --- Key: HIVE-7735 URL: https://issues.apache.org/jira/browse/HIVE-7735 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Labels: Parquet Attachments: HIVE-7735.1.patch, HIVE-7735.1.patch, HIVE-7735.2.patch, HIVE-7735.2.patch, HIVE-7735.3.patch, HIVE-7735.patch This JIRA is to implement CHAR and VARCHAR support in Parquet SerDe. Both are represented in Parquet as PrimitiveType binary and OriginalType UTF8. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7384) Research into reduce-side join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105720#comment-14105720 ] Szehon Ho commented on HIVE-7384: - Thanks for the comment, I had a similar thought initially, but then saw that sortByKey does a re-partitioning (range-partition), as it has to achieve total order. I think we need something that does sorting within a partition. Research into reduce-side join [Spark Branch] - Key: HIVE-7384 URL: https://issues.apache.org/jira/browse/HIVE-7384 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Szehon Ho Attachments: Hive on Spark Reduce Side Join.docx, sales_items.txt, sales_products.txt, sales_stores.txt Hive's join operator is very sophisticated, especially for reduce-side join. While we expect that other types of join, such as map-side join and SMB map-side join, will work out of the box with our design, there may be some complication in reduce-side join, which extensively utilizes key tag and shuffle behavior. Our design principle prefers to making Hive implementation work out of box also, which might requires new functionality from Spark. The tasks is to research into this area, identifying requirements for Spark community and the work to be done on Hive to make reduce-side join work. A design doc might be needed for this. For more information, please refer to the overall design doc on wiki. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7828) TestCLIDriver.parquet_join.q is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105752#comment-14105752 ] Brock Noland commented on HIVE-7828: Thanks Alan TestCLIDriver.parquet_join.q is failing on trunk Key: HIVE-7828 URL: https://issues.apache.org/jira/browse/HIVE-7828 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7828.patch The test is failing in the HiveQA tests of late. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7702: --- Attachment: HIVE-7702.1-spark.patch Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7702: --- Status: Open (was: Patch Available) Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7702: --- Status: Patch Available (was: Open) Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7831) Research commented out unset in Utiltities
[ https://issues.apache.org/jira/browse/HIVE-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7831: --- Description: We did the following in HIVE-7370 {noformat} // TODO HIVE-7831 // conf.unset(FsPermission.UMASK_LABEL); {noformat} We should understand that. Research commented out unset in Utiltities -- Key: HIVE-7831 URL: https://issues.apache.org/jira/browse/HIVE-7831 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland We did the following in HIVE-7370 {noformat} // TODO HIVE-7831 // conf.unset(FsPermission.UMASK_LABEL); {noformat} We should understand that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105759#comment-14105759 ] Chinna Rao Lalam commented on HIVE-7702: insert_into2.q.out is corrected.. Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7222) Support timestamp column statistics in ORC and extend PPD for timestamp
[ https://issues.apache.org/jira/browse/HIVE-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7222: - Attachment: HIVE-7222.1.patch Renamed the patch for Hive QA to pickup. Support timestamp column statistics in ORC and extend PPD for timestamp --- Key: HIVE-7222 URL: https://issues.apache.org/jira/browse/HIVE-7222 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Daniel Dai Labels: orcfile Attachments: HIVE-7222-1.patch, HIVE-7222.1.patch Add column statistics for timestamp columns in ORC. Also extend predicate pushdown to support timestamp column evaluation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7735) Implement Char, Varchar in ParquetSerDe
[ https://issues.apache.org/jira/browse/HIVE-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-7735: Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks Mohit for the contribution! Implement Char, Varchar in ParquetSerDe --- Key: HIVE-7735 URL: https://issues.apache.org/jira/browse/HIVE-7735 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Labels: Parquet Fix For: 0.14.0 Attachments: HIVE-7735.1.patch, HIVE-7735.1.patch, HIVE-7735.2.patch, HIVE-7735.2.patch, HIVE-7735.3.patch, HIVE-7735.patch This JIRA is to implement CHAR and VARCHAR support in Parquet SerDe. Both are represented in Parquet as PrimitiveType binary and OriginalType UTF8. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
Prasanth J created HIVE-7832: Summary: Do ORC dictionary check at a finer level and preserve encoding across stripes Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7832: - Attachment: HIVE-7832.1.patch Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7832.1.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105813#comment-14105813 ] Mostafa Mokhtar commented on HIVE-7723: --- Ping! [~gopalv] [~hagleitn] Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity Key: HIVE-7723 URL: https://issues.apache.org/jira/browse/HIVE-7723 Project: Hive Issue Type: Bug Components: CLI, Physical Optimizer Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-7723.1.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, HIVE-7723.5.patch Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it showed that ReadEntity.equals is taking ~40% of the CPU. ReadEntity.equals is called from the snippet below. Again and again the set is iterated over to get the actual match, a HashMap is a better option for this case as Set doesn't have a Get method. Also for ReadEntity equals is case-insensitive while hash is , which is an undesired behavior. {code} public static ReadEntity addInput(SetReadEntity inputs, ReadEntity newInput) { // If the input is already present, make sure the new parent is added to the input. if (inputs.contains(newInput)) { for (ReadEntity input : inputs) { if (input.equals(newInput)) { if ((newInput.getParents() != null) (!newInput.getParents().isEmpty())) { input.getParents().addAll(newInput.getParents()); input.setDirect(input.isDirect() || newInput.isDirect()); } return input; } } assert false; } else { inputs.add(newInput); return newInput; } // make compile happy return null; } {code} This is the query used : {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status cd2.cd_marital_status and i_color in
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7832: - Status: Patch Available (was: Open) Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7832.1.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7821) StarterProject: enable groupby4.q
[ https://issues.apache.org/jira/browse/HIVE-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105822#comment-14105822 ] Brock Noland commented on HIVE-7821: Hi [~chinnalalam] just an FYI that HIVE-7793 is available! :) StarterProject: enable groupby4.q - Key: HIVE-7821 URL: https://issues.apache.org/jira/browse/HIVE-7821 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Suhas Satish -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105838#comment-14105838 ] Brock Noland commented on HIVE-7702: Hi Chinna, Thank you! Using git and the following command I was able to compare the results against MR {noformat} git status | awk '/new file:/ {print $NF}' | xargs -I {} sh -c 'diff {} $(echo {} | perl -pe s@/spark@@g)' {noformat} Do you know if the differences are due to sorting order or correctness? Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers
[ https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105842#comment-14105842 ] Lefty Leverenz commented on HIVE-7373: -- Thanks [~spena], that's clear and doesn't need examples except for math functions. Even those should probably just have examples in these comments, then the wiki can refer to them here. Did you implement the same thing BigDecimal does? {quote} This is what BigDecimal returns when doing some basic maths: 3.140 * 1. = 3.140 3.140 / 1. = 3.14 3.140 + 1. = 4.1400 3.140 - 1. = 2.1400 3.140 * 3.140 = 9.859600 {quote} Hive should not remove trailing zeros for decimal numbers - Key: HIVE-7373 URL: https://issues.apache.org/jira/browse/HIVE-7373 Project: Hive Issue Type: Bug Components: Types Affects Versions: 0.13.0, 0.13.1 Reporter: Xuefu Zhang Assignee: Sergio Peña Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7373.1.patch, HIVE-7373.2.patch, HIVE-7373.3.patch, HIVE-7373.4.patch, HIVE-7373.5.patch, HIVE-7373.6.patch, HIVE-7373.6.patch Currently Hive blindly removes trailing zeros of a decimal input number as sort of standardization. This is questionable in theory and problematic in practice. 1. In decimal context, number 3.14 has a different semantic meaning from number 3.14. Removing trailing zeroes makes the meaning lost. 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL because the column doesn't allow a decimal number with integer part. Therefore, I propose Hive preserve the trailing zeroes (up to what the scale allows). With this, in above example, 0.0, 0.00, and 0. will be represented as 0.0 (precision=1, scale=1) internally. -- This message was sent by Atlassian JIRA (v6.2#6252)