[jira] [Created] (HIVE-2245) Make CombineHiveInputFormat the default hive.input.format
Make CombineHiveInputFormat the default hive.input.format - Key: HIVE-2245 URL: https://issues.apache.org/jira/browse/HIVE-2245 Project: Hive Issue Type: Improvement Components: Configuration Reporter: Carl Steinbach Assignee: Carl Steinbach We should use CombineHiveInputFormat as the default Hive input format. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2245) Make CombineHiveInputFormat the default hive.input.format
[ https://issues.apache.org/jira/browse/HIVE-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2245: - Attachment: HIVE-2245.1.patch.txt Make CombineHiveInputFormat the default hive.input.format - Key: HIVE-2245 URL: https://issues.apache.org/jira/browse/HIVE-2245 Project: Hive Issue Type: Improvement Components: Configuration Reporter: Carl Steinbach Assignee: Carl Steinbach Attachments: HIVE-2245.1.patch.txt We should use CombineHiveInputFormat as the default Hive input format. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2245) Make CombineHiveInputFormat the default hive.input.format
[ https://issues.apache.org/jira/browse/HIVE-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2245: - Status: Patch Available (was: Open) Make CombineHiveInputFormat the default hive.input.format - Key: HIVE-2245 URL: https://issues.apache.org/jira/browse/HIVE-2245 Project: Hive Issue Type: Improvement Components: Configuration Reporter: Carl Steinbach Assignee: Carl Steinbach Attachments: HIVE-2245.1.patch.txt We should use CombineHiveInputFormat as the default Hive input format. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2225) Purge expired events
[ https://issues.apache.org/jira/browse/HIVE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057770#comment-13057770 ] Hudson commented on HIVE-2225: -- Integrated in Hive-trunk-h0.21 #801 (See [https://builds.apache.org/job/Hive-trunk-h0.21/801/]) HIVE-2225. Purge expired metastore events (Ashutosh Chauhan via cws) cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1141430 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/EventCleanerTask.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java * /hive/trunk/conf/hive-default.xml * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartition.java Purge expired events Key: HIVE-2225 URL: https://issues.apache.org/jira/browse/HIVE-2225 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: hive-2225.patch, hive-2225_1.patch HIVE-2215 adds the ability to add events in metastore. These events needs to be purged as they have limited life. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-trunk-h0.21 #801
See https://builds.apache.org/job/Hive-trunk-h0.21/801/changes Changes: [cws] HIVE-2225. Purge expired metastore events (Ashutosh Chauhan via cws) -- [...truncated 31062 lines...] [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-06-30_04-47-17_514_7493051202110347098/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] Hadoop job information for null: number of mappers: 0; number of reducers: 0 [junit] 2011-06-30 04:47:20,589 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-06-30_04-47-17_514_7493051202110347098/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/job/Hive-trunk-h0.21/ws/hive/build/service/tmp/hive_job_log_hudson_201106300447_2023674836.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://builds.apache.org/job/Hive-trunk-h0.21/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from https://builds.apache.org/job/Hive-trunk-h0.21/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://builds.apache.org/job/Hive-trunk-h0.21/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-06-30_04-47-22_073_3271855355106799760/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-06-30_04-47-22_073_3271855355106799760/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/job/Hive-trunk-h0.21/ws/hive/build/service/tmp/hive_job_log_hudson_201106300447_1992199177.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable
FW: Small file problem and GenMRFileSink1
Hi, I'm not sure weather this belongs in the hive-dev or hive-user. I have a folder with many small files. I would like to reduce the number of files the way hive merges output . I tried to understand from the source of org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1 how to leverage the API to submit a job that merges output files. I think I was able to identify: private void createMergeJob(FileSinkOperator fsOp, GenMRProcContext ctx, String finalName) throws SemanticException As the entry point to the logic that performs the operation, but I did not find documentation as to how to use it Is there an example that simulates the use of this API call?
[jira] [Updated] (HIVE-2040) the retry logic in Hive's concurrency is not working correctly.
[ https://issues.apache.org/jira/browse/HIVE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2040: - Fix Version/s: 0.8.0 the retry logic in Hive's concurrency is not working correctly. Key: HIVE-2040 URL: https://issues.apache.org/jira/browse/HIVE-2040 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Fix For: 0.8.0 Attachments: HIVE-2040.1.patch, HIVE-2040.2.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2030) isEmptyPath() to use ContentSummary cache
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2030: - Component/s: Query Processor Fix Version/s: 0.8.0 isEmptyPath() to use ContentSummary cache - Key: HIVE-2030 URL: https://issues.apache.org/jira/browse/HIVE-2030 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch, HIVE-2030.3.patch addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1918: - Component/s: Metastore Add export/import facilities to the hive system --- Key: HIVE-1918 URL: https://issues.apache.org/jira/browse/HIVE-1918 Project: Hive Issue Type: New Feature Components: Metastore, Query Processor Reporter: Krishna Kumar Assignee: Krishna Kumar Fix For: 0.8.0 Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.5.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf This is an enhancement request to add export/import features to hive. With this language extension, the user can export the data of the table - which may be located in different hdfs locations in case of a partitioned table - as well as the metadata of the table into a specified output location. This output location can then be moved over to another different hadoop/hive instance and imported there. This should work independent of the source and target metastore dbms used; for instance, between derby and mysql. For partitioned tables, the ability to export/import a subset of the partition must be supported. Howl will add more features on top of this: The ability to create/use the exported data even in the absence of hive, using MR or Pig. Please see http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HIVE-1851) wrong number of rows inserted reported by Hive
[ https://issues.apache.org/jira/browse/HIVE-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach reopened HIVE-1851: -- wrong number of rows inserted reported by Hive -- Key: HIVE-1851 URL: https://issues.apache.org/jira/browse/HIVE-1851 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Ning Zhang The counters that hive uses to report the number of rows inserted are not very reliable. Unless they become correct, it is a good idea to disable these reports. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1815) The class HiveResultSet should implement batch fetching.
[ https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058011#comment-13058011 ] Carl Steinbach commented on HIVE-1815: -- Committed as HIVE-1851. The class HiveResultSet should implement batch fetching. Key: HIVE-1815 URL: https://issues.apache.org/jira/browse/HIVE-1815 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.8.0 Environment: Custom Java application using the Hive JDBC driver to connect to a Hive server, execute a Hive query and process the results. Reporter: Guy le Mar Assignee: Bennie Schut Fix For: 0.8.0 Attachments: HIVE-1815.1.patch.txt, HIVE-1815.2.patch.txt When using the Hive JDBC driver, you can execute a Hive query and obtain a HiveResultSet instance that contains the results of the query. Unfortunately, HiveResultSet can then only fetch a single row of these results from the Hive server at a time. As a consequence, it's extremely slow to fetch a resultset of anything other than a trivial size. It would be nice for the HiveResultSet to be able to fetch N rows from the server at a time, so that performance is suitable to support applications that provide human interaction. (From memory, I think it took me around 20 minutes to fetch 4000 rows.) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2062) HivePreparedStatement.executeImmediate always throw exception
[ https://issues.apache.org/jira/browse/HIVE-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2062: - Fix Version/s: 0.8.0 HivePreparedStatement.executeImmediate always throw exception - Key: HIVE-2062 URL: https://issues.apache.org/jira/browse/HIVE-2062 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.7.0 Reporter: Alexey Diomin Assignee: Alexey Diomin Priority: Critical Fix For: 0.8.0 Attachments: HIVE-2062.patch executeImmediate: try { clearWarnings(); resultSet = null; client.execute(sql); } but: public void clearWarnings() throws SQLException { // TODO Auto-generated method stub throw new SQLException(Method not supported); } in result all calls executeQuery() for prepared statement return exception -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2049) Push down partition pruning to JDO filtering for a subset of partition predicates
[ https://issues.apache.org/jira/browse/HIVE-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2049: - Component/s: Metastore Fix Version/s: 0.8.0 Push down partition pruning to JDO filtering for a subset of partition predicates - Key: HIVE-2049 URL: https://issues.apache.org/jira/browse/HIVE-2049 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.8.0 Attachments: HIVE-2049.2.patch, HIVE-2049.3.patch, HIVE-2049.4.patch, HIVE-2049.patch Several tasks: - expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so that PartitionPruner can use that for certain partition predicates. - figure out a safe subset of partition predicates that can be pushed down to JDO filtering. By my initial testing for the 2nd part is equality queries with AND/OR can be pushed down and return correct results. However range queries on partition columns gave NPE by the JDO execute() function. This might be a bug in the JDO query string itself, but we need to figure it out and heavily test all cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-trunk-h0.21 #802
See https://builds.apache.org/job/Hive-trunk-h0.21/802/ -- [...truncated 33253 lines...] [artifact:deploy] Deploying to https://repository.apache.org/content/repositories/snapshots [artifact:deploy] [INFO] Retrieving previous build number from apache.snapshots.https [artifact:deploy] Uploading: org/apache/hive/hive-hbase-handler/0.8.0-SNAPSHOT/hive-hbase-handler-0.8.0-20110630.192630-26.jar to repository apache.snapshots.https at https://repository.apache.org/content/repositories/snapshots [artifact:deploy] Transferring 49K from apache.snapshots.https [artifact:deploy] Uploaded 49K [artifact:deploy] [INFO] Uploading project information for hive-hbase-handler 0.8.0-20110630.192630-26 [artifact:deploy] [INFO] Retrieving previous metadata from apache.snapshots.https [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot org.apache.hive:hive-hbase-handler:0.8.0-SNAPSHOT' [artifact:deploy] [INFO] Retrieving previous metadata from apache.snapshots.https [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.hive:hive-hbase-handler' ivy-init-dirs: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] To: /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/build/ivy/lib/ivy-2.1.0.jar [get] Not modified - so not downloaded ivy-probe-antlib: ivy-init-antlib: ivy-init: ivy-resolve-maven-ant-tasks: [ivy:resolve] :: loading settings :: file = /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/ivy/ivysettings.xml ivy-retrieve-maven-ant-tasks: [ivy:cachepath] DEPRECATED: 'ivy.conf.file' is deprecated, use 'ivy.settings.file' instead [ivy:cachepath] :: loading settings :: file = /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/ivy/ivysettings.xml mvn-taskdef: maven-publish-artifact: [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-http:jar:1.0-beta-2:runtime [artifact:deploy] Deploying to https://repository.apache.org/content/repositories/snapshots [artifact:deploy] [INFO] Retrieving previous build number from apache.snapshots.https [artifact:deploy] Uploading: org/apache/hive/hive-hwi/0.8.0-SNAPSHOT/hive-hwi-0.8.0-20110630.192632-26.jar to repository apache.snapshots.https at https://repository.apache.org/content/repositories/snapshots [artifact:deploy] Transferring 23K from apache.snapshots.https [artifact:deploy] Uploaded 23K [artifact:deploy] [INFO] Retrieving previous metadata from apache.snapshots.https [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot org.apache.hive:hive-hwi:0.8.0-SNAPSHOT' [artifact:deploy] [INFO] Retrieving previous metadata from apache.snapshots.https [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.hive:hive-hwi' [artifact:deploy] [INFO] Uploading project information for hive-hwi 0.8.0-20110630.192632-26 ivy-init-dirs: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] To: /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/build/ivy/lib/ivy-2.1.0.jar [get] Not modified - so not downloaded ivy-probe-antlib: ivy-init-antlib: ivy-init: ivy-resolve-maven-ant-tasks: [ivy:resolve] :: loading settings :: file = /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/ivy/ivysettings.xml ivy-retrieve-maven-ant-tasks: [ivy:cachepath] DEPRECATED: 'ivy.conf.file' is deprecated, use 'ivy.settings.file' instead [ivy:cachepath] :: loading settings :: file = /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/ivy/ivysettings.xml mvn-taskdef: maven-publish-artifact: [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-http:jar:1.0-beta-2:runtime [artifact:deploy] Deploying to https://repository.apache.org/content/repositories/snapshots [artifact:deploy] [INFO] Retrieving previous build number from apache.snapshots.https [artifact:deploy] Uploading: org/apache/hive/hive-jdbc/0.8.0-SNAPSHOT/hive-jdbc-0.8.0-20110630.192633-26.jar to repository apache.snapshots.https at https://repository.apache.org/content/repositories/snapshots [artifact:deploy] Transferring 56K from apache.snapshots.https [artifact:deploy] Uploaded 56K [artifact:deploy] [INFO] Uploading project information for hive-jdbc 0.8.0-20110630.192633-26 [artifact:deploy] [INFO] Retrieving previous metadata from apache.snapshots.https [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot org.apache.hive:hive-jdbc:0.8.0-SNAPSHOT' [artifact:deploy] [INFO] Retrieving previous metadata from apache.snapshots.https [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.hive:hive-jdbc' ivy-init-dirs: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] To:
[jira] [Updated] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
[ https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2003: - Component/s: Security Fix Version/s: 0.8.0 LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it. -- Key: HIVE-2003 URL: https://issues.apache.org/jira/browse/HIVE-2003 Project: Hive Issue Type: Bug Components: Query Processor, Security Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt The table/partition being loaded is not being added to outputs in the LoadSemanticAnalyzer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the loa
[ https://issues.apache.org/jira/browse/HIVE-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2031: - Fix Version/s: 0.8.0 Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the load statement. Key: HIVE-2031 URL: https://issues.apache.org/jira/browse/HIVE-2031 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 0.7.0 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.8.0 Attachments: HIVE-2031.2.patch, HIVE-2031.patch Load into the partitioned table having 2 partitions by specifying only one partition in the load statement is failing and logging the following exception message. {noformat} org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not found '21Oct' at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:685) at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764) at org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {noformat} This needs to be corrected in such a way what is the actual root cause for this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1966) mapjoin operator should not load hashtable for each new inputfile if the hashtable to be loaded is already there.
[ https://issues.apache.org/jira/browse/HIVE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1966: - Component/s: Query Processor Fix Version/s: 0.8.0 mapjoin operator should not load hashtable for each new inputfile if the hashtable to be loaded is already there. - Key: HIVE-1966 URL: https://issues.apache.org/jira/browse/HIVE-1966 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: He Yongqiang Assignee: Liyin Tang Fix For: 0.8.0 Attachments: HIVE-1966.1.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2050) batch processing partition pruning process
[ https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2050: - Component/s: Query Processor Metastore Fix Version/s: 0.8.0 batch processing partition pruning process -- Key: HIVE-2050 URL: https://issues.apache.org/jira/browse/HIVE-2050 Project: Hive Issue Type: Sub-task Components: Metastore, Query Processor Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.8.0 Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.4.patch, HIVE-2050.patch For partition predicates that cannot be pushed down to JDO filtering (HIVE-2049), we should fall back to the old approach of listing all partition names first and use Hive's expression evaluation engine to select the correct partitions. Then the partition pruner should hand Hive a list of partition names and return a list of Partition Object (this should be added to the Hive API). A possible optimization is that the the partition pruner should give Hive a set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and the JDO query should be formulated as range queries. Range queries are possible because the first step list all partition names in sorted order. It's easy to come up with a range and it is guaranteed that the JDO range query results should be equivalent to the query with a list of partition names. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2083) Bug: RowContainer was set to 1 in JoinUtils.
[ https://issues.apache.org/jira/browse/HIVE-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2083: - Component/s: Query Processor Fix Version/s: 0.8.0 Bug: RowContainer was set to 1 in JoinUtils. Key: HIVE-2083 URL: https://issues.apache.org/jira/browse/HIVE-2083 Project: Hive Issue Type: Bug Components: Query Processor Reporter: He Yongqiang Assignee: He Yongqiang Fix For: 0.8.0 Attachments: HIVE-2083.1.patch This cause the skew join super slow because the row container dump every record to disk before using them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2244) Add a Plugin Developer Kit to Hive
[ https://issues.apache.org/jira/browse/HIVE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058092#comment-13058092 ] John Sichi commented on HIVE-2244: -- I'm tweaking a few things based on experience at the mini-hackathon, so this is just a first draft. Add a Plugin Developer Kit to Hive -- Key: HIVE-2244 URL: https://issues.apache.org/jira/browse/HIVE-2244 Project: Hive Issue Type: New Feature Components: UDF Reporter: John Sichi Fix For: 0.8.0 Attachments: HIVE-2244.patch See https://cwiki.apache.org/confluence/display/Hive/PluginDeveloperKit -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db
Dedupe tables' column schemas from partitions in the metastore db - Key: HIVE-2246 URL: https://issues.apache.org/jira/browse/HIVE-2246 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-2246: Dedupe tables' column schemas from partitions in the metastore db
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/985/ --- Review request for hive. Summary --- We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. This addresses bug HIVE-2246. https://issues.apache.org/jira/browse/HIVE-2246 Diffs - trunk/metastore/if/hive_metastore.thrift 1140399 trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java PRE-CREATION trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MDatabase.java 1140399 trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MFieldSchema.java 1140399 trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MIndex.java 1140399 trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartition.java 1140399 trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java 1140399 trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MTable.java 1140399 trunk/metastore/src/model/package.jdo 1140399 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1140399 trunk/ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 1140399 trunk/ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 1140399 trunk/ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 1140399 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1140399 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/MetaDataFormatUtils.java 1140399 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 1140399 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 1140399 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 1140399 Diff: https://reviews.apache.org/r/985/diff Testing --- Haven't run any unit tests yet, just qualitative testing so far. Thanks, Sohan
[jira] [Created] (HIVE-2247) CREATE TABLE RENAME PARTITION
CREATE TABLE RENAME PARTITION - Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Small file problem and GenMRFileSink1
If you are using hive trunk and your table is stored in RCFile format, you can run alter table src_rc_merge_test concatenate; On Jun 30, 2011, at 9:53 AM, David Ginzburg wrote: Hi, I'm not sure weather this belongs in the hive-dev or hive-user. I have a folder with many small files. I would like to reduce the number of files the way hive merges output . I tried to understand from the source of org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1 how to leverage the API to submit a job that merges output files. I think I was able to identify: private void createMergeJob(FileSinkOperator fsOp, GenMRProcContext ctx, String finalName) throws SemanticException As the entry point to the logic that performs the operation, but I did not find documentation as to how to use it Is there an example that simulates the use of this API call?
Re: Review Request: HIVE-2226: Add API to metastore for table filtering based on table properties
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/910/#review928 --- trunk/metastore/if/hive_metastore.thrift https://reviews.apache.org/r/910/#comment2014 Using the form hive_filter_field_params__parameter key seems a little odd. Can't think of an easy way to handle this case though, so it should probably be okay. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java https://reviews.apache.org/r/910/#comment2003 I don't think it's possible to create 2 tables with the same name. In which case, there shouldn't be a need for this check. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java https://reviews.apache.org/r/910/#comment2005 We should catch the case where the keyName is invalid - Paul On 2011-06-20 21:04:45, Sohan Jain wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/910/ --- (Updated 2011-06-20 21:04:45) Review request for hive and Paul Yang. Summary --- Create a function listTableNamesByFilter that returns a list of names for tables in a database that match a certain filter. The syntax of the filter is similar to the one created by HIVE-1609. You can filter the table list based on owner, last access time, or table parameter key/values. The filtering takes place at the JDO level for efficiency/speed. To create a new kind of table filter, add a constant to thrift.if and a branch in the if statement in generateJDOFilterOverTables() in ExpressionTree. Example filter statements include: //translation: owner.matches(.*test.*) and lastAccessTime == 0 filter = Constants.HIVE_FILTER_FIELD_OWNER + like \.*test.*\ and + Constants.HIVE_FILTER_FIELD_LAST_ACCESS + = 0; //translation: owner = test_user and (parameters.get(retention) == 30 || parameters.get(retention) == 90) filter = Constants.HIVE_FILTER_FIELD_OWNER + = \test_user\ and ( + Constants.HIVE_FILTER_FIELD_PARAMS + retention = \30\ or + Constants.HIVE_FILTER_FIELD_PARAMS + retention = \90\) The filter can currently parse string or integer values, where values interpreted as strings must be in quotes. See the comments in IMetaStoreClient for more usage details/restrictions. This addresses bug HIVE-2226. https://issues.apache.org/jira/browse/HIVE-2226 Diffs - trunk/metastore/if/hive_metastore.thrift 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g 1136751 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1136751 Diff: https://reviews.apache.org/r/910/diff Testing --- Added test cases to TestHiveMetaStore Thanks, Sohan
[jira] [Commented] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.
[ https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058117#comment-13058117 ] jirapos...@reviews.apache.org commented on HIVE-2226: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/910/#review928 --- trunk/metastore/if/hive_metastore.thrift https://reviews.apache.org/r/910/#comment2014 Using the form hive_filter_field_params__parameter key seems a little odd. Can't think of an easy way to handle this case though, so it should probably be okay. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java https://reviews.apache.org/r/910/#comment2003 I don't think it's possible to create 2 tables with the same name. In which case, there shouldn't be a need for this check. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java https://reviews.apache.org/r/910/#comment2005 We should catch the case where the keyName is invalid - Paul On 2011-06-20 21:04:45, Sohan Jain wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/910/ bq. --- bq. bq. (Updated 2011-06-20 21:04:45) bq. bq. bq. Review request for hive and Paul Yang. bq. bq. bq. Summary bq. --- bq. bq. Create a function listTableNamesByFilter that returns a list of names for tables in a database that match a certain filter. The syntax of the filter is similar to the one created by HIVE-1609. You can filter the table list based on owner, last access time, or table parameter key/values. The filtering takes place at the JDO level for efficiency/speed. To create a new kind of table filter, add a constant to thrift.if and a branch in the if statement in generateJDOFilterOverTables() in ExpressionTree. bq. bq. Example filter statements include: bq. //translation: owner.matches(.*test.*) and lastAccessTime == 0 bq. filter = Constants.HIVE_FILTER_FIELD_OWNER + bq. like \.*test.*\ and + bq. Constants.HIVE_FILTER_FIELD_LAST_ACCESS + = 0; bq. bq. //translation: owner = test_user and (parameters.get(retention) == 30 || parameters.get(retention) == 90) bq. filter = Constants.HIVE_FILTER_FIELD_OWNER + bq. = \test_user\ and ( + bq. Constants.HIVE_FILTER_FIELD_PARAMS + retention = \30\ or + bq. Constants.HIVE_FILTER_FIELD_PARAMS + retention = \90\) bq. bq. The filter can currently parse string or integer values, where values interpreted as strings must be in quotes. See the comments in IMetaStoreClient for more usage details/restrictions. bq. bq. bq. This addresses bug HIVE-2226. bq. https://issues.apache.org/jira/browse/HIVE-2226 bq. bq. bq. Diffs bq. - bq. bq.trunk/metastore/if/hive_metastore.thrift 1136751 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1136751 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1136751 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1136751 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1136751 bq.trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1136751 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java 1136751 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g 1136751 bq. trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1136751 bq. bq. Diff: https://reviews.apache.org/r/910/diff bq. bq. bq. Testing bq. --- bq. bq. Added test cases to TestHiveMetaStore bq. bq. bq. Thanks, bq. bq. Sohan bq. bq. Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc. --- Key: HIVE-2226 URL: https://issues.apache.org/jira/browse/HIVE-2226 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2226.1.patch Create a function called get_table_names_by_filter that returns a list of table names in a database that match a certain filter. The filter should operate similar to the one HIVE-1609. Initially, you should be able to prune the table list based on owner, retention, or table parameter key/values. The filtering should take place at the JDO level for
[jira] [Created] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary
Comparison Operators convert number types to common type instead of double if necessary --- Key: HIVE-2248 URL: https://issues.apache.org/jira/browse/HIVE-2248 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary
[ https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2248: -- Description: Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always convert the column and 0 to double and compare, which is wasteful, though it is usually a minor costs in the system. But it is easy to fix. was:Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. Comparison Operators convert number types to common type instead of double if necessary --- Key: HIVE-2248 URL: https://issues.apache.org/jira/browse/HIVE-2248 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always convert the column and 0 to double and compare, which is wasteful, though it is usually a minor costs in the system. But it is easy to fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1721) use bloom filters to improve the performance of joins
[ https://issues.apache.org/jira/browse/HIVE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058125#comment-13058125 ] J. Andrew Key commented on HIVE-1721: - Is anyone actively working on this? I've worked with Bloom filters before and was wondering if this issue was perhaps abandoned. If anyone has any notes or code for me to review, I would love to take a crack at this one. use bloom filters to improve the performance of joins - Key: HIVE-1721 URL: https://issues.apache.org/jira/browse/HIVE-1721 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Siying Dong Labels: optimization In case of map-joins, it is likely that the big table will not find many matching rows from the small table. Currently, we perform a hash-map lookup for every row in the big table, which can be pretty expensive. It might be useful to try out a bloom-filter containing all the elements in the small table. Each element from the big table is first searched in the bloom filter, and only in case of a positive match, the small table hash table is explored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary
[ https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2248: -- Status: Patch Available (was: Open) Comparison Operators convert number types to common type instead of double if necessary --- Key: HIVE-2248 URL: https://issues.apache.org/jira/browse/HIVE-2248 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2248.1.patch Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always convert the column and 0 to double and compare, which is wasteful, though it is usually a minor costs in the system. But it is easy to fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary
[ https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2248: -- Attachment: HIVE-2248.1.patch Comparison Operators convert number types to common type instead of double if necessary --- Key: HIVE-2248 URL: https://issues.apache.org/jira/browse/HIVE-2248 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2248.1.patch Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always convert the column and 0 to double and compare, which is wasteful, though it is usually a minor costs in the system. But it is easy to fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary
[ https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058127#comment-13058127 ] Namit Jain commented on HIVE-2248: -- +1 Comparison Operators convert number types to common type instead of double if necessary --- Key: HIVE-2248 URL: https://issues.apache.org/jira/browse/HIVE-2248 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2248.1.patch Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always convert the column and 0 to double and compare, which is wasteful, though it is usually a minor costs in the system. But it is easy to fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2226: Add API to metastore for table filtering based on table properties
On 2011-06-30 22:48:12, Paul Yang wrote: trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java, lines 1533-1537 https://reviews.apache.org/r/910/diff/2/?file=21391#file21391line1533 I don't think it's possible to create 2 tables with the same name. In which case, there shouldn't be a need for this check. Ah, the comment there is a little misleading. Some tables were getting returned multiple times if they matched multiple parts of an OR clause. For example, in the unit test with the filter string: owner = testOwner1 (lastAccessTime = 30 || test_param_1 = hi), a table which had owner=testOwner1, lastAccessTime = 30, and test_param_1 = hi was returned twice by the query. On 2011-06-30 22:48:12, Paul Yang wrote: trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java, lines 187-188 https://reviews.apache.org/r/910/diff/2/?file=21393#file21393line187 We should catch the case where the keyName is invalid Will do - Sohan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/910/#review928 --- On 2011-06-20 21:04:45, Sohan Jain wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/910/ --- (Updated 2011-06-20 21:04:45) Review request for hive and Paul Yang. Summary --- Create a function listTableNamesByFilter that returns a list of names for tables in a database that match a certain filter. The syntax of the filter is similar to the one created by HIVE-1609. You can filter the table list based on owner, last access time, or table parameter key/values. The filtering takes place at the JDO level for efficiency/speed. To create a new kind of table filter, add a constant to thrift.if and a branch in the if statement in generateJDOFilterOverTables() in ExpressionTree. Example filter statements include: //translation: owner.matches(.*test.*) and lastAccessTime == 0 filter = Constants.HIVE_FILTER_FIELD_OWNER + like \.*test.*\ and + Constants.HIVE_FILTER_FIELD_LAST_ACCESS + = 0; //translation: owner = test_user and (parameters.get(retention) == 30 || parameters.get(retention) == 90) filter = Constants.HIVE_FILTER_FIELD_OWNER + = \test_user\ and ( + Constants.HIVE_FILTER_FIELD_PARAMS + retention = \30\ or + Constants.HIVE_FILTER_FIELD_PARAMS + retention = \90\) The filter can currently parse string or integer values, where values interpreted as strings must be in quotes. See the comments in IMetaStoreClient for more usage details/restrictions. This addresses bug HIVE-2226. https://issues.apache.org/jira/browse/HIVE-2226 Diffs - trunk/metastore/if/hive_metastore.thrift 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g 1136751 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1136751 Diff: https://reviews.apache.org/r/910/diff Testing --- Added test cases to TestHiveMetaStore Thanks, Sohan
[jira] [Created] (HIVE-2249) When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double
When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double -- Key: HIVE-2249 URL: https://issues.apache.org/jira/browse/HIVE-2249 Project: Hive Issue Type: Improvement Reporter: Siying Dong The current code to build constant expression for numbers, here is the code: try { v = Double.valueOf(expr.getText()); v = Long.valueOf(expr.getText()); v = Integer.valueOf(expr.getText()); } catch (NumberFormatException e) { // do nothing here, we will throw an exception in the following block } if (v == null) { throw new SemanticException(ErrorMsg.INVALID_NUMERICAL_CONSTANT .getMsg(expr)); } return new ExprNodeConstantDesc(v); The for the case that WHERE BIG_INT_COLUMN = 0, or WHERE DOUBLE_COLUMN = 0, we always have to do a type conversion when comparing, which is unnecessary if it is slightly smarter to choose type when creating the constant expression. We can simply walk one level up the tree, find another comparison party and use the same type with that one if it is possible. For user's wrong query like 'INT_COLUMN=1.1', we can even do more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.
[ https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058137#comment-13058137 ] jirapos...@reviews.apache.org commented on HIVE-2226: - bq. On 2011-06-30 22:48:12, Paul Yang wrote: bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java, lines 1533-1537 bq. https://reviews.apache.org/r/910/diff/2/?file=21391#file21391line1533 bq. bq. I don't think it's possible to create 2 tables with the same name. In which case, there shouldn't be a need for this check. Ah, the comment there is a little misleading. Some tables were getting returned multiple times if they matched multiple parts of an OR clause. For example, in the unit test with the filter string: owner = testOwner1 (lastAccessTime = 30 || test_param_1 = hi), a table which had owner=testOwner1, lastAccessTime = 30, and test_param_1 = hi was returned twice by the query. bq. On 2011-06-30 22:48:12, Paul Yang wrote: bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java, lines 187-188 bq. https://reviews.apache.org/r/910/diff/2/?file=21393#file21393line187 bq. bq. We should catch the case where the keyName is invalid Will do - Sohan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/910/#review928 --- On 2011-06-20 21:04:45, Sohan Jain wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/910/ bq. --- bq. bq. (Updated 2011-06-20 21:04:45) bq. bq. bq. Review request for hive and Paul Yang. bq. bq. bq. Summary bq. --- bq. bq. Create a function listTableNamesByFilter that returns a list of names for tables in a database that match a certain filter. The syntax of the filter is similar to the one created by HIVE-1609. You can filter the table list based on owner, last access time, or table parameter key/values. The filtering takes place at the JDO level for efficiency/speed. To create a new kind of table filter, add a constant to thrift.if and a branch in the if statement in generateJDOFilterOverTables() in ExpressionTree. bq. bq. Example filter statements include: bq. //translation: owner.matches(.*test.*) and lastAccessTime == 0 bq. filter = Constants.HIVE_FILTER_FIELD_OWNER + bq. like \.*test.*\ and + bq. Constants.HIVE_FILTER_FIELD_LAST_ACCESS + = 0; bq. bq. //translation: owner = test_user and (parameters.get(retention) == 30 || parameters.get(retention) == 90) bq. filter = Constants.HIVE_FILTER_FIELD_OWNER + bq. = \test_user\ and ( + bq. Constants.HIVE_FILTER_FIELD_PARAMS + retention = \30\ or + bq. Constants.HIVE_FILTER_FIELD_PARAMS + retention = \90\) bq. bq. The filter can currently parse string or integer values, where values interpreted as strings must be in quotes. See the comments in IMetaStoreClient for more usage details/restrictions. bq. bq. bq. This addresses bug HIVE-2226. bq. https://issues.apache.org/jira/browse/HIVE-2226 bq. bq. bq. Diffs bq. - bq. bq.trunk/metastore/if/hive_metastore.thrift 1136751 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1136751 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1136751 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1136751 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1136751 bq.trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1136751 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java 1136751 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g 1136751 bq. trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1136751 bq. bq. Diff: https://reviews.apache.org/r/910/diff bq. bq. bq. Testing bq. --- bq. bq. Added test cases to TestHiveMetaStore bq. bq. bq. Thanks, bq. bq. Sohan bq. bq. Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc. --- Key: HIVE-2226 URL: https://issues.apache.org/jira/browse/HIVE-2226 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2226.1.patch Create a
[jira] [Commented] (HIVE-306) Support INSERT [INTO] destination
[ https://issues.apache.org/jira/browse/HIVE-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058140#comment-13058140 ] Siying Dong commented on HIVE-306: -- +1. Looks good to me for now. I'm running tests. If it is committed, please open a follow-up JIRA for making moving files more efficient and compacting smaller files smarter for it. Support INSERT [INTO] destination --- Key: HIVE-306 URL: https://issues.apache.org/jira/browse/HIVE-306 Project: Hive Issue Type: New Feature Reporter: Zheng Shao Assignee: Franklin Hu Attachments: hive-306.1.patch, hive-306.2.patch Currently hive only supports INSERT OVERWRITE destination. We should support INSERT [INTO] destination. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
RE: Small file problem and GenMRFileSink1
What if it isn't a hive table? just an hdfs folder? can I create a temporary folder and then merge or somehow use the API that invokes the merge job ? From: ginz...@hotmail.com To: dev@hive.apache.org Subject: Small file problem and GenMRFileSink1 Date: Wed, 29 Jun 2011 15:33:44 + Hi, I'm not sure weather this belongs in the hive-dev or hive-user. I have a folder with many small files. I would like to reduce the number of files the way hive merges output . I tried to understand from the source of org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1 how to leverage the API to submit a job that merges output files. I think I was able to identify: private void createMergeJob(FileSinkOperator fsOp, GenMRProcContext ctx, String finalName) throws SemanticException As the entry point to the logic that performs the operation, but I did not find documentation as to how to use it Is there an example that simulates the use of this API call?
[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary
[ https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-2248: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Siying Comparison Operators convert number types to common type instead of double if necessary --- Key: HIVE-2248 URL: https://issues.apache.org/jira/browse/HIVE-2248 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2248.1.patch Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always convert the column and 0 to double and compare, which is wasteful, though it is usually a minor costs in the system. But it is easy to fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-trunk-h0.21 #803
See https://builds.apache.org/job/Hive-trunk-h0.21/803/changes Changes: [namit] HIVE-2248 Comparison Operators convert number types to common type instead of double if necessary (Siying Dong via namit) -- [...truncated 32993 lines...] [artifact:deploy] [INFO] Retrieving previous build number from apache.snapshots.https [artifact:deploy] Uploading: org/apache/hive/hive-hbase-handler/0.8.0-SNAPSHOT/hive-hbase-handler-0.8.0-20110701.052625-27.jar to repository apache.snapshots.https at https://repository.apache.org/content/repositories/snapshots [artifact:deploy] Transferring 49K from apache.snapshots.https [artifact:deploy] Uploaded 49K [artifact:deploy] [INFO] Uploading project information for hive-hbase-handler 0.8.0-20110701.052625-27 [artifact:deploy] [INFO] Retrieving previous metadata from apache.snapshots.https [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot org.apache.hive:hive-hbase-handler:0.8.0-SNAPSHOT' [artifact:deploy] [INFO] Retrieving previous metadata from apache.snapshots.https [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.hive:hive-hbase-handler' ivy-init-dirs: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] To: /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/build/ivy/lib/ivy-2.1.0.jar [get] Not modified - so not downloaded ivy-probe-antlib: ivy-init-antlib: ivy-init: ivy-resolve-maven-ant-tasks: [ivy:resolve] :: loading settings :: file = /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/ivy/ivysettings.xml ivy-retrieve-maven-ant-tasks: [ivy:cachepath] DEPRECATED: 'ivy.conf.file' is deprecated, use 'ivy.settings.file' instead [ivy:cachepath] :: loading settings :: file = /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/ivy/ivysettings.xml mvn-taskdef: maven-publish-artifact: [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-http:jar:1.0-beta-2:runtime [artifact:deploy] Deploying to https://repository.apache.org/content/repositories/snapshots [artifact:deploy] [INFO] Retrieving previous build number from apache.snapshots.https [artifact:deploy] Uploading: org/apache/hive/hive-hwi/0.8.0-SNAPSHOT/hive-hwi-0.8.0-20110701.052628-27.jar to repository apache.snapshots.https at https://repository.apache.org/content/repositories/snapshots [artifact:deploy] Transferring 23K from apache.snapshots.https [artifact:deploy] Uploaded 23K [artifact:deploy] [INFO] Retrieving previous metadata from apache.snapshots.https [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot org.apache.hive:hive-hwi:0.8.0-SNAPSHOT' [artifact:deploy] [INFO] Retrieving previous metadata from apache.snapshots.https [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.hive:hive-hwi' [artifact:deploy] [INFO] Uploading project information for hive-hwi 0.8.0-20110701.052628-27 ivy-init-dirs: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] To: /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/build/ivy/lib/ivy-2.1.0.jar [get] Not modified - so not downloaded ivy-probe-antlib: ivy-init-antlib: ivy-init: ivy-resolve-maven-ant-tasks: [ivy:resolve] :: loading settings :: file = /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/ivy/ivysettings.xml ivy-retrieve-maven-ant-tasks: [ivy:cachepath] DEPRECATED: 'ivy.conf.file' is deprecated, use 'ivy.settings.file' instead [ivy:cachepath] :: loading settings :: file = /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/ivy/ivysettings.xml mvn-taskdef: maven-publish-artifact: [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-http:jar:1.0-beta-2:runtime [artifact:deploy] Deploying to https://repository.apache.org/content/repositories/snapshots [artifact:deploy] [INFO] Retrieving previous build number from apache.snapshots.https [artifact:deploy] Uploading: org/apache/hive/hive-jdbc/0.8.0-SNAPSHOT/hive-jdbc-0.8.0-20110701.052631-27.jar to repository apache.snapshots.https at https://repository.apache.org/content/repositories/snapshots [artifact:deploy] Transferring 56K from apache.snapshots.https [artifact:deploy] Uploaded 56K [artifact:deploy] [INFO] Uploading project information for hive-jdbc 0.8.0-20110701.052631-27 [artifact:deploy] [INFO] Retrieving previous metadata from apache.snapshots.https [artifact:deploy] [INFO] Uploading repository metadata for: 'snapshot org.apache.hive:hive-jdbc:0.8.0-SNAPSHOT' [artifact:deploy] [INFO] Retrieving previous metadata from apache.snapshots.https [artifact:deploy] [INFO] Uploading repository metadata for: 'artifact org.apache.hive:hive-jdbc' ivy-init-dirs: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] To: