[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengcheng xiong updated HIVE-7654: -- Attachment: HIVE-7654.4.patch reduce # of queries follow Ashutosh's comments A method to extrapolate columnStats for partitions of a table - Key: HIVE-7654 URL: https://issues.apache.org/jira/browse/HIVE-7654 Project: Hive Issue Type: New Feature Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, HIVE-7654.1.patch, HIVE-7654.4.patch In a PARTITIONED table, there are many partitions. For example, create table if not exists loc_orc ( state string, locid int, zip bigint ) partitioned by(year string) stored as orc; We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003'). We can use the following command to compute statistics for columns state,locid of partition(year='2001') analyze table loc_orc partition(year='2001') compute statistics for columns state,locid; We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001') We propose a method to extrapolate the missing column status for the partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24498: A method to extrapolate the missing column status for the partitions.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24498/ --- (Updated Aug. 20, 2014, 6:52 a.m.) Review request for hive. Changes --- reduce # of queries follow Ashutosh's comments Repository: hive-git Description --- We propose a method to extrapolate the missing column status for the partitions. Diffs (updated) - data/files/extrapolate_stats_full.txt PRE-CREATION data/files/extrapolate_stats_partial.txt PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 84ef5f9 metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 767cffc metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java a9f4be2 metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 0364385 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 4eba2b0 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 78ab19a ql/src/test/queries/clientpositive/extrapolate_part_stats_full.q PRE-CREATION ql/src/test/queries/clientpositive/extrapolate_part_stats_partial.q PRE-CREATION ql/src/test/results/clientpositive/extrapolate_part_stats_full.q.out PRE-CREATION ql/src/test/results/clientpositive/extrapolate_part_stats_partial.q.out PRE-CREATION Diff: https://reviews.apache.org/r/24498/diff/ Testing --- File Attachments HIVE-7654.0.patch https://reviews.apache.org/media/uploaded/files/2014/08/12/77b155b0-a417-4225-b6b7-4c8c6ce2b97d__HIVE-7654.0.patch Thanks, pengcheng xiong
[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
[ https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103518#comment-14103518 ] Szehon Ho commented on HIVE-7254: - I think Brock was working on the PTest server and made some config changes recently. [~brocknoland] can you take a look? Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test --- Key: HIVE-7254 URL: https://issues.apache.org/jira/browse/HIVE-7254 Project: Hive Issue Type: Test Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Attachments: trunk-mr2.properties Today, the Hive PTest infrastructure has a test-driver configuration called directory, so it will run all the qfiles under that directory for that driver. For example, CLIDriver is configured with directory ql/src/test/queries/clientpositive However the configuration for the miniXXXDrivers (miniMRDriver, miniMRDriverNegative, miniTezDriver) run only a select number of tests under directory. So we have to use the include configuration to hard-code a list of tests for it to run. This is duplicating the list of each miniDriver's tests already in the /itests/qtest pom file, and can get out of date. It would be nice if both got their information the same way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add column access information from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Summary: Extend ReadEntity to add column access information from query (was: Get instance of HiveSemanticAnalyzerHookContext from configuration) Extend ReadEntity to add column access information from query - Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query). So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103519#comment-14103519 ] Thejas M Nair commented on HIVE-4629: - [~dongc] Earlier patch also had a method in HiveStatement to get the log. I think that will be convenient for many users, though we need to be careful and specify that is the only non jdbc function that is part of a public API in it. But this can also be done as follow up work in separate jira. HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Dong Chen Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, HIVE-4629.5.patch, HIVE-4629.6.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add column access information from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- was: Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query). So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class. Extend ReadEntity to add column access information from query - Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Summary: Extend ReadEntity to add accessed columns from query (was: Extend ReadEntity to add column access information from query) Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7689: --- Description: I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on metastore. was: I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This first patch enable LOCKS on metastore. Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on metastore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7689: --- Attachment: HIVE-7889.2.patch Rebased and add more features. Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This first patch enable LOCKS on metastore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed columns to ReadEntity from columnAccessInfo } was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed columns to ReadEntity from columnAccessInfo } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed columns to ReadEntity from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed columns to ReadEntity from columnAccessInfo } Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed columns to ReadEntity from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
[ https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103541#comment-14103541 ] Szehon Ho commented on HIVE-7254: - I think I see what needs to be done, its on the build machine's configuration (not checked in) and needs to add back the other set of tez tests that Brock accidentally removed. As I'm afraid of hosing the builds tonight, Brock or I can do it tomorrow morning :) Hey Brock, one idea, do you think its a good idea to add the build machine's properties to source control? That way there is a history in case they get changed, and all devs can easily see/modify without having to login to build machine. Just a late night thought. Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test --- Key: HIVE-7254 URL: https://issues.apache.org/jira/browse/HIVE-7254 Project: Hive Issue Type: Test Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Attachments: trunk-mr2.properties Today, the Hive PTest infrastructure has a test-driver configuration called directory, so it will run all the qfiles under that directory for that driver. For example, CLIDriver is configured with directory ql/src/test/queries/clientpositive However the configuration for the miniXXXDrivers (miniMRDriver, miniMRDriverNegative, miniTezDriver) run only a select number of tests under directory. So we have to use the include configuration to hard-code a list of tests for it to run. This is duplicating the list of each miniDriver's tests already in the /itests/qtest pom file, and can get out of date. It would be nice if both got their information the same way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed columns to ReadEntity from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can set another confvar for it) is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can set another confvar for it) is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can set another confvar for it) is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7796) Provide subquery pushdown facility for storage handlers
[ https://issues.apache.org/jira/browse/HIVE-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103560#comment-14103560 ] Hive QA commented on HIVE-7796: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662985/HIVE-7796.1.patch.txt {color:green}SUCCESS:{color} +1 6008 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/416/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/416/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-416/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12662985 Provide subquery pushdown facility for storage handlers --- Key: HIVE-7796 URL: https://issues.apache.org/jira/browse/HIVE-7796 Project: Hive Issue Type: Improvement Components: StorageHandler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-7796.1.patch.txt If underlying storage can handle basic filtering or aggregation, hive can delegate execution of whole subquery to the storage and handle it as a simple scanning operation. Experimentally implemented on JDBC / Phoenix handler and seemed working good. Hopefully open the code for those too, but it's not allowed to me yet. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103568#comment-14103568 ] Xiaomeng Huang commented on HIVE-7730: -- Hi [~ashutoshc] Currently Hive has a new interface for external authorization plugin and semantic hook may be replaced in the future. So I will try to put accessed columns to ReadEntity instread of enhancing semantic hook.This way will be available to hooks as well as authorization interfaces. I have updated the description, and wait for you feedback. Thanks! Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7797) upgrade sql 014-HIVE-3764.postgres.sql failed
Nemon Lou created HIVE-7797: --- Summary: upgrade sql 014-HIVE-3764.postgres.sql failed Key: HIVE-7797 URL: https://issues.apache.org/jira/browse/HIVE-7797 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.1 Reporter: Nemon Lou The sql is : INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '', 'Initial value'); And the result is: ERROR: null value in column SCHEMA_VERSION violates not-null constraint DETAIL: Failing row contains (1, null, Initial value). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column list to ReadEntity getting from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column list to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103624#comment-14103624 ] Hive QA commented on HIVE-7654: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663065/HIVE-7654.4.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6010 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/417/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/417/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-417/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663065 A method to extrapolate columnStats for partitions of a table - Key: HIVE-7654 URL: https://issues.apache.org/jira/browse/HIVE-7654 Project: Hive Issue Type: New Feature Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, HIVE-7654.1.patch, HIVE-7654.4.patch In a PARTITIONED table, there are many partitions. For example, create table if not exists loc_orc ( state string, locid int, zip bigint ) partitioned by(year string) stored as orc; We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003'). We can use the following command to compute statistics for columns state,locid of partition(year='2001') analyze table loc_orc partition(year='2001') compute statistics for columns state,locid; We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001') We propose a method to extrapolate the missing column status for the partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7754) Potential null pointer dereference in ColumnTruncateMapper#jobClose()
[ https://issues.apache.org/jira/browse/HIVE-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SUYEON LEE reassigned HIVE-7754: Assignee: SUYEON LEE Potential null pointer dereference in ColumnTruncateMapper#jobClose() - Key: HIVE-7754 URL: https://issues.apache.org/jira/browse/HIVE-7754 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: SUYEON LEE Priority: Minor {code} Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, null, reporter); {code} null is passed to Utilities.mvFileToFinalPath() which gets passed to createEmptyBuckets() where: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7754) Potential null pointer dereference in ColumnTruncateMapper#jobClose()
[ https://issues.apache.org/jira/browse/HIVE-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SUYEON LEE reassigned HIVE-7754: Assignee: SUYEON LEE (was: KangHS) Potential null pointer dereference in ColumnTruncateMapper#jobClose() - Key: HIVE-7754 URL: https://issues.apache.org/jira/browse/HIVE-7754 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: SUYEON LEE Priority: Minor {code} Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, null, reporter); {code} null is passed to Utilities.mvFileToFinalPath() which gets passed to createEmptyBuckets() where: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7754) Potential null pointer dereference in ColumnTruncateMapper#jobClose()
[ https://issues.apache.org/jira/browse/HIVE-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KangHS reassigned HIVE-7754: Assignee: KangHS (was: SUYEON LEE) Potential null pointer dereference in ColumnTruncateMapper#jobClose() - Key: HIVE-7754 URL: https://issues.apache.org/jira/browse/HIVE-7754 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: KangHS Priority: Minor {code} Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, null, reporter); {code} null is passed to Utilities.mvFileToFinalPath() which gets passed to createEmptyBuckets() where: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7599) NPE in MergeTask#main() when -format is absent
[ https://issues.apache.org/jira/browse/HIVE-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DJ Choi updated HIVE-7599: -- Status: Patch Available (was: Open) NPE in MergeTask#main() when -format is absent -- Key: HIVE-7599 URL: https://issues.apache.org/jira/browse/HIVE-7599 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7599.patch When '-format' is absent from commandline, the following call would result in NPE (format is initialized to null): {code} if (format.equals(rcfile)) { mergeWork = new MergeWork(inputPaths, new Path(outputDir), RCFileInputFormat.class); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7599) NPE in MergeTask#main() when -format is absent
[ https://issues.apache.org/jira/browse/HIVE-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DJ Choi updated HIVE-7599: -- Attachment: HIVE-7599.patch When the format object is null, the printUsage() method will be called. NPE in MergeTask#main() when -format is absent -- Key: HIVE-7599 URL: https://issues.apache.org/jira/browse/HIVE-7599 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7599.patch When '-format' is absent from commandline, the following call would result in NPE (format is initialized to null): {code} if (format.equals(rcfile)) { mergeWork = new MergeWork(inputPaths, new Path(outputDir), RCFileInputFormat.class); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column list to ReadEntity getting from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column list to ReadEntity getting from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column list to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7798) Authentication tokens lost in a UDTF on a secure cluster
Rémy SAISSY created HIVE-7798: - Summary: Authentication tokens lost in a UDTF on a secure cluster Key: HIVE-7798 URL: https://issues.apache.org/jira/browse/HIVE-7798 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.13.0 Reporter: Rémy SAISSY Context: - Secure Cluster running Hive 0.13, Hadoop 2.4 and HBase 0.98 (HDP 2.1) - UDTF written in Java Action: In the UDTF, HBase is contacted through its Java API in order to add a few records. However any requests to HBase fails because tokens are not passed to the call to HBase. Executing the following code in the UDTF: Configuration conf = HBaseConfiguration.create(); UserGroupInformation.setConfiguration(conf); HTable hbaseErrorTable = new HTable(conf, foo :foo); Leads to this error: 2014-07-22 14:44:04,134 DEBUG [main] org.apache.hadoop.ipc.RpcClient: Connecting to node2.cluster.fr/10.197.40.54:60020 2014-07-22 14:44:04,135 DEBUG [main] org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:expecteduser (auth:SIMPLE) from:org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:915) 2014-07-22 14:44:04,135 DEBUG [main] org.apache.hadoop.hbase.security.HBaseSaslRpcClient: Creating SASL GSSAPI client. Server's Kerberos principal name is hbase/node2.cluster.fr@REALM 2014-07-22 14:44:04,137 DEBUG [main] org.apache.hadoop.security.UserGroupInformation: PrivilegedActionException as:expecteduser (auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2014-07-22 14:44:04,138 DEBUG [main] org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:expecteduser (auth:SIMPLE) from:org.apache.hadoop.hbase.ipc.RpcClient$Connection.handleSaslConnectionFailure(RpcClient.java:818) 2014-07-22 14:44:04,138 WARN [main] org.apache.hadoop.ipc.RpcClient: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2014-07-22 14:44:04,138 FATAL [main] org.apache.hadoop.ipc.RpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. The workaround is to add the following in the UDTF before actually contacting HBase: public static void logFromKeytabAndLogoutCurrentUser(String user, String path) throws IOException { //UserGroupInformation.loginUserFromKeytab(expecteduser@REALM, /etc/security/keytabs/expecteduser.headless.keytab); UserGroupInformation.loginUserFromKeytab(user, path); AccessControlContext context = AccessController.getContext(); Subject subject = Subject.getSubject(context); subject.getPrincipals().clear(); subject.getPrivateCredentials().clear(); subject.getPublicCredentials().clear(); } However, it implies to have the keytab to perform a new authentication from inside the UDTF. I'm not sure wether this bug is related to Hive UDTF or to YARN Containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103688#comment-14103688 ] Hive QA commented on HIVE-7689: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663067/HIVE-7889.2.patch {color:green}SUCCESS:{color} +1 6008 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/418/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/418/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-418/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12663067 Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on metastore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
Chengxiang Li created HIVE-7799: --- Summary: TRANSFORM failed in transform_ppr1.q[Spark Branch] Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7799: Description: Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContrainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContrainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails
[ https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103706#comment-14103706 ] Venki Korukanti commented on HIVE-7747: --- Test failure here is related to the change. Failure is complicated. It turns out that output of {{HiveConf(srcHiveConf, SessionState.class)}} is not same as srcHiveConf in terms of (property, value) pairs. Executed as part of constructor, the {{HiveConf.initialize}} method applies system properties on top of copied properties from srcHiveConf. So from the moment srcHiveConf is created to the moment of cloning HiveConf if there are any System properties set, cloned HiveConf inherits those properties. In the test case ({{MiniHS2}}) scratchdir property is modified in System properties (See [here|https://github.com/apache/hive/blob/trunk/itests/hive-unit/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L184]), but the default scratchdir value is {{$\{test.tmp.dir\}/scratchdir}} from hive-site.xml. Scrathdir set in {{MiniHS2}} is never used before, but with this change HS2 started using it. Scratchdir created in {{MiniHS2}} (See [here|https://github.com/apache/hive/blob/trunk/itests/hive-unit/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L183]) doesn't have 777 permissions, so whenever we have user impersonation there are issues (thats where the test is failing). Before this change, scratchdir is always {{$\{test.tmp.dir\}/scratchdir}} which is created in HS2 with 777 permissions (See [here|https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3458]), so there were no issues with the impersonation. I think it is better to fix this in SparkClient by fetching the jar directly than through HiveConf, to avoid unexpected issues. Submitting a query to Spark from HiveServer2 fails -- Key: HIVE-7747 URL: https://issues.apache.org/jira/browse/HIVE-7747 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 0.13.1 Reporter: Venki Korukanti Assignee: Venki Korukanti Attachments: HIVE-7747.1.patch {{spark.serializer}} is set to {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine from Hive CLI. Spark tasks fails with following error: {code} Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated HIVE-7747: -- Summary: Submitting a query to Spark from HiveServer2 fails [Spark Branch] (was: Submitting a query to Spark from HiveServer2 fails) Submitting a query to Spark from HiveServer2 fails [Spark Branch] - Key: HIVE-7747 URL: https://issues.apache.org/jira/browse/HIVE-7747 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 0.13.1 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: spark-branch Attachments: HIVE-7747.1.patch {{spark.serializer}} is set to {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine from Hive CLI. Spark tasks fails with following error: {code} Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated HIVE-7747: -- Fix Version/s: spark-branch Submitting a query to Spark from HiveServer2 fails [Spark Branch] - Key: HIVE-7747 URL: https://issues.apache.org/jira/browse/HIVE-7747 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 0.13.1 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: spark-branch Attachments: HIVE-7747.1.patch {{spark.serializer}} is set to {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine from Hive CLI. Spark tasks fails with following error: {code} Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated HIVE-7747: -- Attachment: HIVE-7747.2-spark.patch Attaching v2 patch specific to spark-branch. Submitting a query to Spark from HiveServer2 fails [Spark Branch] - Key: HIVE-7747 URL: https://issues.apache.org/jira/browse/HIVE-7747 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 0.13.1 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: spark-branch Attachments: HIVE-7747.1.patch, HIVE-7747.2-spark.patch {{spark.serializer}} is set to {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine from Hive CLI. Spark tasks fails with following error: {code} Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated HIVE-7747: -- Affects Version/s: (was: 0.13.1) spark-branch Submitting a query to Spark from HiveServer2 fails [Spark Branch] - Key: HIVE-7747 URL: https://issues.apache.org/jira/browse/HIVE-7747 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: spark-branch Attachments: HIVE-7747.1.patch, HIVE-7747.2-spark.patch {{spark.serializer}} is set to {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine from Hive CLI. Spark tasks fails with following error: {code} Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7599) NPE in MergeTask#main() when -format is absent
[ https://issues.apache.org/jira/browse/HIVE-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103746#comment-14103746 ] Hive QA commented on HIVE-7599: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663086/HIVE-7599.patch {color:green}SUCCESS:{color} +1 6008 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/419/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/419/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-419/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12663086 NPE in MergeTask#main() when -format is absent -- Key: HIVE-7599 URL: https://issues.apache.org/jira/browse/HIVE-7599 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7599.patch When '-format' is absent from commandline, the following call would result in NPE (format is initialized to null): {code} if (format.equals(rcfile)) { mergeWork = new MergeWork(inputPaths, new Path(outputDir), RCFileInputFormat.class); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24602: HIVE-7689 : Enable Postgres as METASTORE back-end
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24602/ --- (Updated août 20, 2014, 10:53 matin) Review request for hive. Changes --- Updated with patch V2 that enable ALL feature of a Metastore Backend Bugs: HIVE-7689 https://issues.apache.org/jira/browse/HIVE-7689 Repository: hive-git Description --- I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This first patch enable LOCKS on metastore. Diffs (updated) - metastore/scripts/upgrade/postgres/hive-txn-schema-0.13.0.postgres.sql 2ebd3b0 metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java 524a7a4 metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java 30cf814 metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 063dee6 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java f74f683 ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java f636cff ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java db62721 ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java 4625d27 Diff: https://reviews.apache.org/r/24602/diff/ Testing --- Using patched version in production. Enable concurrency with DbTxnManager. Thanks, Damien Carol
[jira] [Updated] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails
[ https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-7694: -- Assignee: Suma Shivaprasad SMB join on tables differing by number of sorted by columns with same join prefix fails --- Key: HIVE-7694 URL: https://issues.apache.org/jira/browse/HIVE-7694 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7694.1.patch, HIVE-7694.patch For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by (a) and clustered by (a) are joined, the following exception is seen {noformat} 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 1, Size: 1 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7629) Problem in SMB Joins between two Parquet tables
[ https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-7629: -- Assignee: Suma Shivaprasad Problem in SMB Joins between two Parquet tables --- Key: HIVE-7629 URL: https://issues.apache.org/jira/browse/HIVE-7629 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Labels: Parquet Fix For: 0.14.0 Attachments: HIVE-7629.1.patch, HIVE-7629.patch The issue is clearly seen when two bucketed and sorted parquet tables with different number of columns are involved in the join . The following exception is seen {noformat} Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:79) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:66) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.init(CombineHiveRecordReader.java:65) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column list to ReadEntity getting from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs,
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer
[jira] [Commented] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103793#comment-14103793 ] Hive QA commented on HIVE-7747: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663103/HIVE-7747.2-spark.patch {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 5958 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union7 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union8 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union9 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/66/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/66/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-66/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663103 Submitting a query to Spark from HiveServer2 fails [Spark Branch] - Key: HIVE-7747 URL: https://issues.apache.org/jira/browse/HIVE-7747 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: spark-branch Attachments: HIVE-7747.1.patch, HIVE-7747.2-spark.patch {{spark.serializer}} is set to {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine from Hive CLI. Spark tasks fails with following error: {code} Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Need feedbacks on HIVE-7689
Hi, Anyone can see this ticket : HIVE-7689 https://issues.apache.org/jira/browse/HIVE-7689 Regards, -- Damien CAROL * tél : +33 (0)4 74 96 88 14 * fax : +33 (0)4 74 96 31 88 * email :dca...@blitzbs.com mailto:dca...@blitzbs.com BLITZ BUSINESS SERVICE
[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml
[ https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104059#comment-14104059 ] Larry McCay commented on HIVE-7634: --- Are there plans to commit this to branch-2? Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml -- Key: HIVE-7634 URL: https://issues.apache.org/jira/browse/HIVE-7634 Project: Hive Issue Type: Bug Components: Security Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7634.1.patch HADOOP-10607 provides a Configuration.getPassword() API that allows passwords to be retrieved from a configured credential provider, while also being able to fall back to the HiveConf setting if no provider is set up. Hive should use this API for versions of Hadoop that support this API. This would give users the ability to remove the passwords from their Hive configuration files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7800) Parqet Column Index Access Schema Size Checking
Daniel Weeks created HIVE-7800: -- Summary: Parqet Column Index Access Schema Size Checking Key: HIVE-7800 URL: https://issues.apache.org/jira/browse/HIVE-7800 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Daniel Weeks Assignee: Daniel Weeks In the case that a parquet formatted table has partitions where the files have different size schema, using column index access can result in an index out of bounds exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7800) Parqet Column Index Access Schema Size Checking
[ https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Weeks updated HIVE-7800: --- Attachment: HIVE-7800.1.patch Parqet Column Index Access Schema Size Checking --- Key: HIVE-7800 URL: https://issues.apache.org/jira/browse/HIVE-7800 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Daniel Weeks Assignee: Daniel Weeks Attachments: HIVE-7800.1.patch In the case that a parquet formatted table has partitions where the files have different size schema, using column index access can result in an index out of bounds exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7800) Parqet Column Index Access Schema Size Checking
[ https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Weeks updated HIVE-7800: --- Status: Patch Available (was: Open) Included patch is a trivial fix that simply checks for both the existence of the column in the parquet file as well as checking the column index position to make sure the file contains such a position. In the event the check fails, the column is not included and null values are produced for the missing column, which is the expected behavior. Parqet Column Index Access Schema Size Checking --- Key: HIVE-7800 URL: https://issues.apache.org/jira/browse/HIVE-7800 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Daniel Weeks Assignee: Daniel Weeks Attachments: HIVE-7800.1.patch In the case that a parquet formatted table has partitions where the files have different size schema, using column index access can result in an index out of bounds exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml
[ https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104078#comment-14104078 ] Larry McCay commented on HIVE-7634: --- Just realized that branch-2 is a hadoop branch. Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml -- Key: HIVE-7634 URL: https://issues.apache.org/jira/browse/HIVE-7634 Project: Hive Issue Type: Bug Components: Security Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7634.1.patch HADOOP-10607 provides a Configuration.getPassword() API that allows passwords to be retrieved from a configured credential provider, while also being able to fall back to the HiveConf setting if no provider is set up. Hive should use this API for versions of Hadoop that support this API. This would give users the ability to remove the passwords from their Hive configuration files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7801) Move PTest2 properties files into svn
Brock Noland created HIVE-7801: -- Summary: Move PTest2 properties files into svn Key: HIVE-7801 URL: https://issues.apache.org/jira/browse/HIVE-7801 Project: Hive Issue Type: Bug Reporter: Brock Noland To stop me from screwing them up :) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
[ https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104086#comment-14104086 ] Brock Noland commented on HIVE-7254: Ok, I think there were three issues: 1) I accidently removed minitez.query.files.shared from the properties file (just added that back) 2) HIVE-7757 3) Fixing HIVE-7757 required a restart of ptest2 I agree that we need those properties files in svn. They used to have sensitive information in them but now they don't. I created HIVE-7801 to fix that. Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test --- Key: HIVE-7254 URL: https://issues.apache.org/jira/browse/HIVE-7254 Project: Hive Issue Type: Test Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Attachments: trunk-mr2.properties Today, the Hive PTest infrastructure has a test-driver configuration called directory, so it will run all the qfiles under that directory for that driver. For example, CLIDriver is configured with directory ql/src/test/queries/clientpositive However the configuration for the miniXXXDrivers (miniMRDriver, miniMRDriverNegative, miniTezDriver) run only a select number of tests under directory. So we have to use the include configuration to hard-code a list of tests for it to run. This is duplicating the list of each miniDriver's tests already in the /itests/qtest pom file, and can get out of date. It would be nice if both got their information the same way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7802) Update language manual for insert, update, and delete
Alan Gates created HIVE-7802: Summary: Update language manual for insert, update, and delete Key: HIVE-7802 URL: https://issues.apache.org/jira/browse/HIVE-7802 Project: Hive Issue Type: Sub-task Components: Documentation Reporter: Alan Gates Assignee: Alan Gates With the addition of ACID compliant insert, insert...values, update, and delete we need to update the Hive language manual to cover the new features. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24498: A method to extrapolate the missing column status for the partitions.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24498/#review51109 --- Patch does a great job of making # of queries independent of # of columns. Good work! But it seems its now making queries over all partitions of table, instead of those listed in request. metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java https://reviews.apache.org/r/24498/#comment89121 Now that you have fixed this TODO, you can delete it. metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java https://reviews.apache.org/r/24498/#comment89114 This should contain and PARTITION_NAME in () otherwise we are running query over all partitions of table, instead of those requested. metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java https://reviews.apache.org/r/24498/#comment89117 This should contain and PARTITION_NAME in () otherwise we are running query over all partitions of table, instead of those requested. metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java https://reviews.apache.org/r/24498/#comment89118 This should contain and PARTITION_NAME in () otherwise we are running query over all partitions of table, instead of those requested. - Ashutosh Chauhan On Aug. 20, 2014, 6:52 a.m., pengcheng xiong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24498/ --- (Updated Aug. 20, 2014, 6:52 a.m.) Review request for hive. Repository: hive-git Description --- We propose a method to extrapolate the missing column status for the partitions. Diffs - data/files/extrapolate_stats_full.txt PRE-CREATION data/files/extrapolate_stats_partial.txt PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 84ef5f9 metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 767cffc metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java a9f4be2 metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 0364385 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 4eba2b0 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 78ab19a ql/src/test/queries/clientpositive/extrapolate_part_stats_full.q PRE-CREATION ql/src/test/queries/clientpositive/extrapolate_part_stats_partial.q PRE-CREATION ql/src/test/results/clientpositive/extrapolate_part_stats_full.q.out PRE-CREATION ql/src/test/results/clientpositive/extrapolate_part_stats_partial.q.out PRE-CREATION Diff: https://reviews.apache.org/r/24498/diff/ Testing --- File Attachments HIVE-7654.0.patch https://reviews.apache.org/media/uploaded/files/2014/08/12/77b155b0-a417-4225-b6b7-4c8c6ce2b97d__HIVE-7654.0.patch Thanks, pengcheng xiong
[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7654: --- Status: Open (was: Patch Available) Good work in making # of sql queries independent of # of cols. Left some comments on RB. I was expecting annotate_stats_part results to be updated, because you fixed # of parititions for which stats were found but I was expecting them to change from COMPLETE to PARTIAL. Can you take a look at that as well. A method to extrapolate columnStats for partitions of a table - Key: HIVE-7654 URL: https://issues.apache.org/jira/browse/HIVE-7654 Project: Hive Issue Type: New Feature Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, HIVE-7654.1.patch, HIVE-7654.4.patch In a PARTITIONED table, there are many partitions. For example, create table if not exists loc_orc ( state string, locid int, zip bigint ) partitioned by(year string) stored as orc; We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003'). We can use the following command to compute statistics for columns state,locid of partition(year='2001') analyze table loc_orc partition(year='2001') compute statistics for columns state,locid; We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001') We propose a method to extrapolate the missing column status for the partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers
[ https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7373: --- Labels: TODOC14 (was: ) Hive should not remove trailing zeros for decimal numbers - Key: HIVE-7373 URL: https://issues.apache.org/jira/browse/HIVE-7373 Project: Hive Issue Type: Bug Components: Types Affects Versions: 0.13.0, 0.13.1 Reporter: Xuefu Zhang Assignee: Sergio Peña Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7373.1.patch, HIVE-7373.2.patch, HIVE-7373.3.patch, HIVE-7373.4.patch, HIVE-7373.5.patch, HIVE-7373.6.patch, HIVE-7373.6.patch Currently Hive blindly removes trailing zeros of a decimal input number as sort of standardization. This is questionable in theory and problematic in practice. 1. In decimal context, number 3.14 has a different semantic meaning from number 3.14. Removing trailing zeroes makes the meaning lost. 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL because the column doesn't allow a decimal number with integer part. Therefore, I propose Hive preserve the trailing zeroes (up to what the scale allows). With this, in above example, 0.0, 0.00, and 0. will be represented as 0.0 (precision=1, scale=1) internally. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers
[ https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104125#comment-14104125 ] Brock Noland commented on HIVE-7373: Hi Lefty, Great point. I added TODOC14. [~spena] can you come up with a good user facing statement for this one? Something like Prior to 0.14 trailing zeros on decimals were unnecessarily trimmed ... Hive should not remove trailing zeros for decimal numbers - Key: HIVE-7373 URL: https://issues.apache.org/jira/browse/HIVE-7373 Project: Hive Issue Type: Bug Components: Types Affects Versions: 0.13.0, 0.13.1 Reporter: Xuefu Zhang Assignee: Sergio Peña Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7373.1.patch, HIVE-7373.2.patch, HIVE-7373.3.patch, HIVE-7373.4.patch, HIVE-7373.5.patch, HIVE-7373.6.patch, HIVE-7373.6.patch Currently Hive blindly removes trailing zeros of a decimal input number as sort of standardization. This is questionable in theory and problematic in practice. 1. In decimal context, number 3.14 has a different semantic meaning from number 3.14. Removing trailing zeroes makes the meaning lost. 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL because the column doesn't allow a decimal number with integer part. Therefore, I propose Hive preserve the trailing zeroes (up to what the scale allows). With this, in above example, 0.0, 0.00, and 0. will be represented as 0.0 (precision=1, scale=1) internally. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7629) Problem in SMB Joins between two Parquet tables
[ https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7629: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thank you so much for your contribution! I have committed this to trunk! Problem in SMB Joins between two Parquet tables --- Key: HIVE-7629 URL: https://issues.apache.org/jira/browse/HIVE-7629 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Labels: Parquet Fix For: 0.14.0 Attachments: HIVE-7629.1.patch, HIVE-7629.patch The issue is clearly seen when two bucketed and sorted parquet tables with different number of columns are involved in the join . The following exception is seen {noformat} Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:79) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:66) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.init(CombineHiveRecordReader.java:65) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104143#comment-14104143 ] Sergey Shelukhin commented on HIVE-7654: +1, conditional on Ashutosh also +1ing :) A method to extrapolate columnStats for partitions of a table - Key: HIVE-7654 URL: https://issues.apache.org/jira/browse/HIVE-7654 Project: Hive Issue Type: New Feature Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, HIVE-7654.1.patch, HIVE-7654.4.patch In a PARTITIONED table, there are many partitions. For example, create table if not exists loc_orc ( state string, locid int, zip bigint ) partitioned by(year string) stored as orc; We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003'). We can use the following command to compute statistics for columns state,locid of partition(year='2001') analyze table loc_orc partition(year='2001') compute statistics for columns state,locid; We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001') We propose a method to extrapolate the missing column status for the partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.
[ https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] david serafini updated HIVE-7100: - Attachment: HIVE-7100.3.patch Attached HIVE-7100.3.patch that should fix the errors in test the previous patch. Users of hive should be able to specify skipTrash when dropping tables. --- Key: HIVE-7100 URL: https://issues.apache.org/jira/browse/HIVE-7100 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Ravi Prakash Assignee: Jayesh Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, HIVE-7100.patch Users of our clusters are often running up against their quota limits because of Hive tables. When they drop tables, they have to then manually delete the files from HDFS using skipTrash. This is cumbersome and unnecessary. We should enable users to skipTrash directly when dropping tables. We should also be able to provide this functionality without polluting SQL syntax. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7593) Instantiate SparkClient per user session [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7593: --- Status: Patch Available (was: Open) When ever the spark configurations are updated globally, existing session will be closed and new session will be created. Instantiate SparkClient per user session [Spark Branch] --- Key: HIVE-7593 URL: https://issues.apache.org/jira/browse/HIVE-7593 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chinna Rao Lalam Attachments: HIVE-7593-spark.patch, HIVE-7593.1-spark.patch SparkContext is the main class via which Hive talk to Spark cluster. SparkClient encapsulates a SparkContext instance. Currently all user sessions share a single SparkClient instance in HiveServer2. While this is good enough for a POC, even for our first two milestones, this is not desirable for a multi-tenancy environment and gives least flexibility to Hive users. Here is what we propose: 1. Have a SparkClient instance per user session. The SparkClient instance is created when user executes its first query in the session. It will get destroyed when user session ends. 2. The SparkClient is instantiated based on the spark configurations that are available to the user, including those defined at the global level and those overwritten by the user (thru set command, for instance). 3. Ideally, when user changes any spark configuration during the session, the old SparkClient instance should be destroyed and a new one based on the new configurations is created. This may turn out to be a little hard, and thus it's a nice-to-have. If not implemented, we need to document that subsequent configuration changes will not take effect in the current session. Please note that there is a thread-safety issue on Spark side where multiple SparkContext instances cannot coexist in the same JVM (SPARK-2243). We need to work with Spark community to get this addressed. Besides above functional requirements, avoid potential issues is also a consideration. For instance, sharing SC among users is bad, as resources (such as jar for UDF) will be also shared, which is problematic. On the other hand, one SC per job seems too expensive, as the resource needs to be re-rendered even there isn't any change. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7281) DbTxnManager acquiring wrong level of lock for dynamic partitioning
[ https://issues.apache.org/jira/browse/HIVE-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104187#comment-14104187 ] Alan Gates commented on HIVE-7281: -- I'm fine with doing that, but do we need to link that change to this? Can we file a separate JIRA for that? DbTxnManager acquiring wrong level of lock for dynamic partitioning --- Key: HIVE-7281 URL: https://issues.apache.org/jira/browse/HIVE-7281 Project: Hive Issue Type: Bug Components: Locking, Transactions Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7281.patch Currently DbTxnManager.acquireLocks() locks the DUMMY_PARTITION for dynamic partitioning. But this is not adequate. This will not prevent drop operations on partitions being written to. The lock should be at the table level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7747: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thank you so much Venki! I have committed this to spark! Submitting a query to Spark from HiveServer2 fails [Spark Branch] - Key: HIVE-7747 URL: https://issues.apache.org/jira/browse/HIVE-7747 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: spark-branch Attachments: HIVE-7747.1.patch, HIVE-7747.2-spark.patch {{spark.serializer}} is set to {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine from Hive CLI. Spark tasks fails with following error: {code} Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml
[ https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104209#comment-14104209 ] Lefty Leverenz commented on HIVE-7634: -- Does this need any user/admin documentation? Also, shouldn't it be marked as Fix Version 0.14.0? Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml -- Key: HIVE-7634 URL: https://issues.apache.org/jira/browse/HIVE-7634 Project: Hive Issue Type: Bug Components: Security Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7634.1.patch HADOOP-10607 provides a Configuration.getPassword() API that allows passwords to be retrieved from a configured credential provider, while also being able to fall back to the HiveConf setting if no provider is set up. Hive should use this API for versions of Hadoop that support this API. This would give users the ability to remove the passwords from their Hive configuration files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104205#comment-14104205 ] Brock Noland commented on HIVE-7747: Wow, nice analysis! +1 Submitting a query to Spark from HiveServer2 fails [Spark Branch] - Key: HIVE-7747 URL: https://issues.apache.org/jira/browse/HIVE-7747 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: spark-branch Attachments: HIVE-7747.1.patch, HIVE-7747.2-spark.patch {{spark.serializer}} is set to {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine from Hive CLI. Spark tasks fails with following error: {code} Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7785) CBO: Projection Pruning needs to handle cross Joins
[ https://issues.apache.org/jira/browse/HIVE-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7785: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to branch. Thanks [~jpullokkaran]! CBO: Projection Pruning needs to handle cross Joins --- Key: HIVE-7785 URL: https://issues.apache.org/jira/browse/HIVE-7785 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7785.patch Projection pruning needs to handle cross joins. Ex: select r1.x from r1 join r2. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml
[ https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-7634: - Fix Version/s: 0.14.0 Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml -- Key: HIVE-7634 URL: https://issues.apache.org/jira/browse/HIVE-7634 Project: Hive Issue Type: Bug Components: Security Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.14.0 Attachments: HIVE-7634.1.patch HADOOP-10607 provides a Configuration.getPassword() API that allows passwords to be retrieved from a configured credential provider, while also being able to fall back to the HiveConf setting if no provider is set up. Hive should use this API for versions of Hadoop that support this API. This would give users the ability to remove the passwords from their Hive configuration files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml
[ https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104229#comment-14104229 ] Jason Dere commented on HIVE-7634: -- Thanks for catching that Lefty, need to set the fix version. This could use some doc. Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml -- Key: HIVE-7634 URL: https://issues.apache.org/jira/browse/HIVE-7634 Project: Hive Issue Type: Bug Components: Security Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.14.0 Attachments: HIVE-7634.1.patch HADOOP-10607 provides a Configuration.getPassword() API that allows passwords to be retrieved from a configured credential provider, while also being able to fall back to the HiveConf setting if no provider is set up. Hive should use this API for versions of Hadoop that support this API. This would give users the ability to remove the passwords from their Hive configuration files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7767) hive.optimize.union.remove does not work properly [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104234#comment-14104234 ] Brock Noland commented on HIVE-7767: Kicked off pre-commits on this one again. hive.optimize.union.remove does not work properly [Spark Branch] Key: HIVE-7767 URL: https://issues.apache.org/jira/browse/HIVE-7767 Project: Hive Issue Type: Sub-task Reporter: Na Yang Assignee: Na Yang Attachments: HIVE-7767.1-spark.patch, HIVE-7767.2-spark.patch Turing on the hive.optimize.union.remove property generates wrong union all result. For Example: {noformat} create table inputTbl1(key string, val string) stored as textfile; load data local inpath '../../data/files/T1.txt' into table inputTbl1; SELECT * FROM ( SELECT key, count(1) as values from inputTbl1 group by key UNION ALL SELECT key, count(1) as values from inputTbl1 group by key ) a; {noformat} when the hive.optimize.union.remove is turned on, the query result is like: {noformat} 1 1 2 1 3 1 7 1 8 2 {noformat} when the hive.optimize.union.remove is turned off, the query result is like: {noformat} 7 1 2 1 8 2 3 1 1 1 7 1 2 1 8 2 3 1 1 1 {noformat} The expected query result is: {noformat} 7 1 2 1 8 2 3 1 1 1 7 1 2 1 8 2 3 1 1 1 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml
[ https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7634: - Labels: TODOC14 (was: ) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml -- Key: HIVE-7634 URL: https://issues.apache.org/jira/browse/HIVE-7634 Project: Hive Issue Type: Bug Components: Security Reporter: Jason Dere Assignee: Jason Dere Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7634.1.patch HADOOP-10607 provides a Configuration.getPassword() API that allows passwords to be retrieved from a configured credential provider, while also being able to fall back to the HiveConf setting if no provider is set up. Hive should use this API for versions of Hadoop that support this API. This would give users the ability to remove the passwords from their Hive configuration files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7770) Undo backward-incompatible behaviour change introduced by HIVE-7341
[ https://issues.apache.org/jira/browse/HIVE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-7770: --- Attachment: HIVE-7770.1.patch Changed HCatPartition to pre-cache {{this.sd.getCols()}} into a member variable. Slightly redundant, but it gets around having to change the exception signature of {{HCatPartition.getColumns()}}. And it amortizes the construction-cost for multiple calls. #silverlining Undo backward-incompatible behaviour change introduced by HIVE-7341 --- Key: HIVE-7770 URL: https://issues.apache.org/jira/browse/HIVE-7770 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Mithun Radhakrishnan Labels: regression Attachments: HIVE-7770.1.patch HIVE-7341 introduced a backward-incompatibility regression in Exception signatures for HCatPartition.getColumns() that breaks compilation for external tools like Falcon. This bug tracks a scrub of any other issues we discover, so we can put them back to how it used to be. This bug needs resolution in the same release as HIVE-7341, and thus, must be resolved in 0.14.0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7770) Undo backward-incompatible behaviour change introduced by HIVE-7341
[ https://issues.apache.org/jira/browse/HIVE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-7770: --- Status: Patch Available (was: Open) Undo backward-incompatible behaviour change introduced by HIVE-7341 --- Key: HIVE-7770 URL: https://issues.apache.org/jira/browse/HIVE-7770 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Mithun Radhakrishnan Labels: regression Attachments: HIVE-7770.1.patch HIVE-7341 introduced a backward-incompatibility regression in Exception signatures for HCatPartition.getColumns() that breaks compilation for external tools like Falcon. This bug tracks a scrub of any other issues we discover, so we can put them back to how it used to be. This bug needs resolution in the same release as HIVE-7341, and thus, must be resolved in 0.14.0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7576) Add PartitionSpec support in HCatClient API
[ https://issues.apache.org/jira/browse/HIVE-7576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-7576: --- Attachment: HIVE-7576.1.patch Here's the fix. This won't apply till HIVE-7223 is resolved. Add PartitionSpec support in HCatClient API --- Key: HIVE-7576 URL: https://issues.apache.org/jira/browse/HIVE-7576 Project: Hive Issue Type: Bug Components: HCatalog, Metastore Affects Versions: 0.13.1 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-7576.1.patch HIVE-7223 adds support for PartitionSpecs in Hive Metastore. The HCatClient API must add support to fetch partitions, add partitions, etc. using PartitionSpec semantics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7646) Modify parser to support new grammar for Insert,Update,Delete
[ https://issues.apache.org/jira/browse/HIVE-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104290#comment-14104290 ] Alan Gates commented on HIVE-7646: -- +1 This isn't a big deal and I don't think we should hold the patch for it, but I have a question. If I read the spec correctly a query like: select * from (values((1, 2, 3),(4, 5, 6)); should be legal. At FromClauseParser.g, line 290 you are requiring a tableNameColList as part of the virtualTableSource. This means the user always has to do a table definition after the values clause, so the above would become: select * from (values((1, 2, 3),(4, 5, 6)) as foo(a int, b int, c int); This makes sense since the more common case is probably: select a, count(b) from (values((1, 2, 3),(4, 5, 6)) as foo(a int, b int, c int) group by a where c 4; or something, in which case the table definition is required. But the main question is am I misreading the spec or are we just adding the requirement for the table definition in all cases? I think it's ok if we're adding it, as this is primarily for our own testing purposes. Modify parser to support new grammar for Insert,Update,Delete - Key: HIVE-7646 URL: https://issues.apache.org/jira/browse/HIVE-7646 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: 0.13.1 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-7646.1.patch, HIVE-7646.2.patch, HIVE-7646.3.patch, HIVE-7646.patch need parser to recognize constructs such as : {code:sql} INSERT INTO Cust (Customer_Number, Balance, Address) VALUES (101, 50.00, '123 Main Street'), (102, 75.00, '123 Pine Ave'); {code} {code:sql} DELETE FROM Cust WHERE Balance 5.0 {code} {code:sql} UPDATE Cust SET column1=value1,column2=value2,... WHERE some_column=some_value {code} also useful {code:sql} select a,b from values((1,2),(3,4)) as FOO(a,b) {code} This makes writing tests easier. Some references: http://dev.mysql.com/doc/refman/5.6/en/insert.html http://msdn.microsoft.com/en-us/library/dd776382.aspx -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7001) fs.permissions.umask-mode is getting unset when Session is started
[ https://issues.apache.org/jira/browse/HIVE-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated HIVE-7001: -- Attachment: TestUMask.patch Hi [~thejas], One question regarding the fs.permissions.umask-mode. Looks like fs.permissions.umask-mode doesn't exist in Hadoop 1.x and property dfs.umaskmode is used instead in 1.x for the same purpose. Also dfs.umaskmode was not deprecated in 1.x according to HADOOP-8727. Should we use FsPermission.UMASK_LABEL instead of fs.permissions.umask-mode which always points to proper property in latest Hadoop in each version (0.23.x, 1.x, 2.x)? Attached a testcase to illustrate the problem. Test passes fine with -Phadoop-2, but not with -Phadoop-1. fs.permissions.umask-mode is getting unset when Session is started -- Key: HIVE-7001 URL: https://issues.apache.org/jira/browse/HIVE-7001 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.14.0, 0.13.1 Attachments: HIVE-7001.1.patch, HIVE-7001.2.patch, HIVE-7001.3.patch, TestUMask.patch {code} hive set fs.permissions.umask-mode; fs.permissions.umask-mode=022 hive show tables; OK t1 Time taken: 0.301 seconds, Fetched: 1 row(s) hive set fs.permissions.umask-mode; fs.permissions.umask-mode is undefined {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7593) Instantiate SparkClient per user session [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104308#comment-14104308 ] Hive QA commented on HIVE-7593: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662806/HIVE-7593.1-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5958 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_null {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/67/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/67/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-67/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662806 Instantiate SparkClient per user session [Spark Branch] --- Key: HIVE-7593 URL: https://issues.apache.org/jira/browse/HIVE-7593 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chinna Rao Lalam Attachments: HIVE-7593-spark.patch, HIVE-7593.1-spark.patch SparkContext is the main class via which Hive talk to Spark cluster. SparkClient encapsulates a SparkContext instance. Currently all user sessions share a single SparkClient instance in HiveServer2. While this is good enough for a POC, even for our first two milestones, this is not desirable for a multi-tenancy environment and gives least flexibility to Hive users. Here is what we propose: 1. Have a SparkClient instance per user session. The SparkClient instance is created when user executes its first query in the session. It will get destroyed when user session ends. 2. The SparkClient is instantiated based on the spark configurations that are available to the user, including those defined at the global level and those overwritten by the user (thru set command, for instance). 3. Ideally, when user changes any spark configuration during the session, the old SparkClient instance should be destroyed and a new one based on the new configurations is created. This may turn out to be a little hard, and thus it's a nice-to-have. If not implemented, we need to document that subsequent configuration changes will not take effect in the current session. Please note that there is a thread-safety issue on Spark side where multiple SparkContext instances cannot coexist in the same JVM (SPARK-2243). We need to work with Spark community to get this addressed. Besides above functional requirements, avoid potential issues is also a consideration. For instance, sharing SC among users is bad, as resources (such as jar for UDF) will be also shared, which is problematic. On the other hand, one SC per job seems too expensive, as the resource needs to be re-rendered even there isn't any change. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7682) HadoopThriftAuthBridge20S should not reset configuration unless required
[ https://issues.apache.org/jira/browse/HIVE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland reassigned HIVE-7682: -- Assignee: Brock Noland (was: Sergio Peña) HadoopThriftAuthBridge20S should not reset configuration unless required Key: HIVE-7682 URL: https://issues.apache.org/jira/browse/HIVE-7682 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7682.1.patch In HadoopThriftAuthBridge20S methods createClientWithConf and getCurrentUGIWithConf we create new Configuration objects so we can set the authentication type. When loading the new Configuration object, it looks like core-site.xml for the cluster it's connected to. This causes issues for Oozie since oozie does not have access to the core-site.xml as it's cluster agnostic. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7682) HadoopThriftAuthBridge20S should not reset configuration unless required
[ https://issues.apache.org/jira/browse/HIVE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7682: --- Status: Patch Available (was: Open) HadoopThriftAuthBridge20S should not reset configuration unless required Key: HIVE-7682 URL: https://issues.apache.org/jira/browse/HIVE-7682 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Sergio Peña Attachments: HIVE-7682.1.patch In HadoopThriftAuthBridge20S methods createClientWithConf and getCurrentUGIWithConf we create new Configuration objects so we can set the authentication type. When loading the new Configuration object, it looks like core-site.xml for the cluster it's connected to. This causes issues for Oozie since oozie does not have access to the core-site.xml as it's cluster agnostic. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7682) HadoopThriftAuthBridge20S should not reset configuration unless required
[ https://issues.apache.org/jira/browse/HIVE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7682: --- Attachment: HIVE-7682.1.patch HadoopThriftAuthBridge20S should not reset configuration unless required Key: HIVE-7682 URL: https://issues.apache.org/jira/browse/HIVE-7682 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Sergio Peña Attachments: HIVE-7682.1.patch In HadoopThriftAuthBridge20S methods createClientWithConf and getCurrentUGIWithConf we create new Configuration objects so we can set the authentication type. When loading the new Configuration object, it looks like core-site.xml for the cluster it's connected to. This causes issues for Oozie since oozie does not have access to the core-site.xml as it's cluster agnostic. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7682) HadoopThriftAuthBridge20S should not reset configuration unless required
[ https://issues.apache.org/jira/browse/HIVE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104312#comment-14104312 ] Brock Noland commented on HIVE-7682: I talked with Sergio offline and I am going to grab this one. HadoopThriftAuthBridge20S should not reset configuration unless required Key: HIVE-7682 URL: https://issues.apache.org/jira/browse/HIVE-7682 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7682.1.patch In HadoopThriftAuthBridge20S methods createClientWithConf and getCurrentUGIWithConf we create new Configuration objects so we can set the authentication type. When loading the new Configuration object, it looks like core-site.xml for the cluster it's connected to. This causes issues for Oozie since oozie does not have access to the core-site.xml as it's cluster agnostic. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 24903: HIVE-7682: HadoopThriftAuthBridge20S should not reset configuration unless required
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24903/ --- Review request for hive. Repository: hive-git Description --- Described in JIRA Diffs - shims/common-secure/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java 8b9da7a Diff: https://reviews.apache.org/r/24903/diff/ Testing --- Thanks, Brock Noland
[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104338#comment-14104338 ] Alan Gates commented on HIVE-7689: -- It looks like these changes are all to change the SQL to be uppercase and quote all identifiers. What issues are you seeing to drives the need for this? Have you tested it against any other RDBMSs? Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on metastore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7001) fs.permissions.umask-mode is getting unset when Session is started
[ https://issues.apache.org/jira/browse/HIVE-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104341#comment-14104341 ] Thejas M Nair commented on HIVE-7001: - Using FsPermission.UMASK_LABEL sounds good to me . Please open a new jira. fs.permissions.umask-mode is getting unset when Session is started -- Key: HIVE-7001 URL: https://issues.apache.org/jira/browse/HIVE-7001 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.14.0, 0.13.1 Attachments: HIVE-7001.1.patch, HIVE-7001.2.patch, HIVE-7001.3.patch, TestUMask.patch {code} hive set fs.permissions.umask-mode; fs.permissions.umask-mode=022 hive show tables; OK t1 Time taken: 0.301 seconds, Fetched: 1 row(s) hive set fs.permissions.umask-mode; fs.permissions.umask-mode is undefined {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7803) Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)
Selina Zhang created HIVE-7803: -- Summary: Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition) Key: HIVE-7803 URL: https://issues.apache.org/jira/browse/HIVE-7803 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Environment: Reporter: Selina Zhang Assignee: Selina Zhang Priority: Critical One of our users reports they see intermittent failures due to attempt directories in the input paths. We found with speculative execution turned on, two mappers tried to commit task at the same time using the same committed task path, which cause the corrupt output directory. The original Pig script: (STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME' USING org.apache.hcatalog.pig.HCatStorer();) Two mappers attempt_1405021984947_5394024_m_000523_0: KILLED attempt_1405021984947_5394024_m_000523_1: SUCCEEDED attempt_1405021984947_5394024_m_000523_0 was killed right after the commit. As a result, it created corrupt directory as /projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/ containing part-m-00523 (from attempt_1405021984947_5394024_m_000523_0) and attempt_1405021984947_5394024_m_000523_1/part-m-00523 Namenode Audit log == 1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 cmd=create src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523 dst=null perm=user:group:rw-r- 2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2 cmd=create src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523 dst=null perm=user:group:rw-r- 3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 cmd=rename src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523 perm=user:group:rwxr-x--- 4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2 cmd=rename src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523 perm=user:group:rwxr-x--- After consulting our Hadoop core team, we was pointed out some HCat code does not participating in the two-phase commit protocol, for example in FileRecordWriterContainer.close(): for (Map.EntryString, org.apache.hadoop.mapred.OutputCommitter entry : baseDynamicCommitters.entrySet()) { org.apache.hadoop.mapred.TaskAttemptContext currContext = dynamicContexts.get(entry.getKey()); OutputCommitter baseOutputCommitter = entry.getValue(); if (baseOutputCommitter.needsTaskCommit(currContext)) { baseOutputCommitter.commitTask(currContext); } } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104375#comment-14104375 ] Sergey Shelukhin commented on HIVE-7689: -quoted identifiers should be ansi standard... MySQL would require a flag (see MetaStoreDirectSql.java) Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on metastore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104375#comment-14104375 ] Sergey Shelukhin edited comment on HIVE-7689 at 8/20/14 7:14 PM: - -quoted identifiers are ansi standard... MySQL would require a flag (see MetaStoreDirectSql.java) was (Author: sershe): -quoted identifiers should be ansi standard... MySQL would require a flag (see MetaStoreDirectSql.java) Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on metastore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104372#comment-14104372 ] Sergey Shelukhin commented on HIVE-7689: The problem is that Postgres coerces unquoted identifiers everywhere to lower (iirc) case and has no way to disable this, to put it very mildly, questionable behavior; afair the request to add a flag similar to mysql one for ANSI was also not viewed positively when I tried. So either everything has to be lower case, or everything has to be quoted (and upper case, for simplicity I guess). Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on metastore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7804) CBO: Support SemiJoins
Harish Butani created HIVE-7804: --- Summary: CBO: Support SemiJoins Key: HIVE-7804 URL: https://issues.apache.org/jira/browse/HIVE-7804 Project: Hive Issue Type: Sub-task Reporter: Harish Butani -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7804) CBO: Support SemiJoins
[ https://issues.apache.org/jira/browse/HIVE-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-7804: Attachment: HIVE-7804.1.patch CBO: Support SemiJoins -- Key: HIVE-7804 URL: https://issues.apache.org/jira/browse/HIVE-7804 Project: Hive Issue Type: Sub-task Reporter: Harish Butani Attachments: HIVE-7804.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4765) Improve HBase bulk loading facility
[ https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104398#comment-14104398 ] Nick Dimiduk commented on HIVE-4765: Ping [~navis], [~sushanth]. Any chance we can get some action on this one for 0.14 release? It's definitely better than what's available. Improve HBase bulk loading facility --- Key: HIVE-4765 URL: https://issues.apache.org/jira/browse/HIVE-4765 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, HIVE-4765.D11463.1.patch With some patches, bulk loading process for HBase could be simplified a lot. {noformat} CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter' LOCATION '/tmp/export'; SET mapred.reduce.tasks=4; set hive.optimize.sampling.orderby=true; INSERT OVERWRITE TABLE hbase_export SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as (rowkey,union) FROM src) A ORDER BY rowkey,union; hive !hadoop fs -lsr /tmp/export; drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf1 -rw-r--r-- 1 navis supergroup 4317 2013-06-20 11:05 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835 -rw-r--r-- 1 navis supergroup 5868 2013-06-20 11:05 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d -rw-r--r-- 1 navis supergroup 5214 2013-06-20 11:05 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8 -rw-r--r-- 1 navis supergroup 4290 2013-06-20 11:05 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10 drwxr-xr-x - navis supergroup 0 2013-06-20 11:05 /tmp/export/cf2 -rw-r--r-- 1 navis supergroup 6744 2013-06-20 11:05 /tmp/export/cf2/409673b517d94e16920e445d07710f52 -rw-r--r-- 1 navis supergroup 4975 2013-06-20 11:05 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e -rw-r--r-- 1 navis supergroup 6096 2013-06-20 11:05 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4 -rw-r--r-- 1 navis supergroup 4890 2013-06-20 11:05 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Timeline for release of Hive 0.14
It'd be great to get HIVE-4765 included in 0.14. The proposed changes are a big improvement for us HBase folks. Would someone mind having a look in that direction? Thanks, Nick On Tue, Aug 19, 2014 at 3:20 PM, Thejas Nair the...@hortonworks.com wrote: +1 Sounds good to me. Its already almost 4 months since the last release. It is time to start preparing for the next one. Thanks for volunteering! On Tue, Aug 19, 2014 at 2:02 PM, Vikram Dixit vik...@hortonworks.com wrote: Hi Folks, I was thinking that it was about time that we had a release of hive 0.14 given our commitment to having a release of hive on a periodic basis. We could cut a branch and start working on a release in say 2 weeks time around September 5th (Friday). After branching, we can focus on stabilizing for the release and hopefully have an RC in about 2 weeks post that. I would like to volunteer myself for the duties of the release manager for this version if the community agrees. Thanks Vikram. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-7767) hive.optimize.union.remove does not work properly [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104403#comment-14104403 ] Hive QA commented on HIVE-7767: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662979/HIVE-7767.2-spark.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5978 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/68/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/68/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-68/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662979 hive.optimize.union.remove does not work properly [Spark Branch] Key: HIVE-7767 URL: https://issues.apache.org/jira/browse/HIVE-7767 Project: Hive Issue Type: Sub-task Reporter: Na Yang Assignee: Na Yang Attachments: HIVE-7767.1-spark.patch, HIVE-7767.2-spark.patch Turing on the hive.optimize.union.remove property generates wrong union all result. For Example: {noformat} create table inputTbl1(key string, val string) stored as textfile; load data local inpath '../../data/files/T1.txt' into table inputTbl1; SELECT * FROM ( SELECT key, count(1) as values from inputTbl1 group by key UNION ALL SELECT key, count(1) as values from inputTbl1 group by key ) a; {noformat} when the hive.optimize.union.remove is turned on, the query result is like: {noformat} 1 1 2 1 3 1 7 1 8 2 {noformat} when the hive.optimize.union.remove is turned off, the query result is like: {noformat} 7 1 2 1 8 2 3 1 1 1 7 1 2 1 8 2 3 1 1 1 {noformat} The expected query result is: {noformat} 7 1 2 1 8 2 3 1 1 1 7 1 2 1 8 2 3 1 1 1 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Mail bounces from ebuddy.com
Not quite taken care of. I'm still getting spam about these addresses. On Mon, Aug 18, 2014 at 9:18 AM, Lars Francke lars.fran...@gmail.com wrote: Thanks Alan and Ashutosh for taking care of this! On Mon, Aug 18, 2014 at 5:45 PM, Ashutosh Chauhan hashut...@apache.org wrote: Thanks, Alan for the hint. I just unsubscribed those two email addresses from ebuddy. On Mon, Aug 18, 2014 at 8:23 AM, Alan Gates ga...@hortonworks.com wrote: Anyone who is an admin on the list (I don't who the admins are) can do this by doing user-unsubscribe-USERNAME=ebuddy@hive.apache.org where USERNAME is the name of the bouncing user (see http://untroubled.org/ezmlm/ezman/ezman1.html ) Alan. Thejas Nair the...@hortonworks.com August 17, 2014 at 17:02 I don't know how to do this. Carl, Ashutosh, Do you guys know how to remove these two invalid emails from the mailing list ? Lars Francke lars.fran...@gmail.com August 17, 2014 at 15:41 Hmm great, I see others mentioning this as well. I'm happy to contact INFRA but I'm not sure if they are even needed or if someone from the Hive team can do this? On Fri, Aug 8, 2014 at 3:43 AM, Lefty Leverenz leftylever...@gmail.com leftylever...@gmail.com Lefty Leverenz leftylever...@gmail.com August 7, 2014 at 18:43 (Excuse the spam.) Actually I'm getting two bounces per message, but gmail concatenates them so I didn't notice the second one. -- Lefty On Thu, Aug 7, 2014 at 9:36 PM, Lefty Leverenz leftylever...@gmail.com leftylever...@gmail.com Lefty Leverenz leftylever...@gmail.com August 7, 2014 at 18:36 Curious, I've only been getting one bounce per message. Anyway thanks for bringing this up. -- Lefty Lars Francke lars.fran...@gmail.com August 7, 2014 at 4:38 Hi, every time I send a mail to dev@ I get two bounce mails from two people at ebuddy.com. I don't want to post the E-Mail addresses publicly but I can send them on if needed (and it can be triggered easily by just replying to this mail I guess). Could we maybe remove them from the list? Cheers, Lars -- Sent with Postbox http://www.getpostbox.com CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-7767) hive.optimize.union.remove does not work properly [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104421#comment-14104421 ] Brock Noland commented on HIVE-7767: Hi [~nyang], Thank you very much for your work on this! The patch looks great! I did notice that there are a couple of tests where the results differ from mapreduce (outside the query plan). I used the following command: {noformat} git status | awk '/new file:/ {print $NF}' | xargs -I {} sh -c 'echo {}; diff -y -W 150 {} $(echo {} | perl -pe s@/spark@@g)' | less {noformat} To compare all files and found that at least of the tests produce different results union_remove_10 and union_remove_22. Could you take a look? Thanks! hive.optimize.union.remove does not work properly [Spark Branch] Key: HIVE-7767 URL: https://issues.apache.org/jira/browse/HIVE-7767 Project: Hive Issue Type: Sub-task Reporter: Na Yang Assignee: Na Yang Attachments: HIVE-7767.1-spark.patch, HIVE-7767.2-spark.patch Turing on the hive.optimize.union.remove property generates wrong union all result. For Example: {noformat} create table inputTbl1(key string, val string) stored as textfile; load data local inpath '../../data/files/T1.txt' into table inputTbl1; SELECT * FROM ( SELECT key, count(1) as values from inputTbl1 group by key UNION ALL SELECT key, count(1) as values from inputTbl1 group by key ) a; {noformat} when the hive.optimize.union.remove is turned on, the query result is like: {noformat} 1 1 2 1 3 1 7 1 8 2 {noformat} when the hive.optimize.union.remove is turned off, the query result is like: {noformat} 7 1 2 1 8 2 3 1 1 1 7 1 2 1 8 2 3 1 1 1 {noformat} The expected query result is: {noformat} 7 1 2 1 8 2 3 1 1 1 7 1 2 1 8 2 3 1 1 1 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7805) Support running multiple scans in hbase-handler
Andrew Mains created HIVE-7805: -- Summary: Support running multiple scans in hbase-handler Key: HIVE-7805 URL: https://issues.apache.org/jira/browse/HIVE-7805 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.14.0 Reporter: Andrew Mains Currently, the HiveHBaseTableInputFormat only supports running a single scan. This can be less efficient than running multiple disjoint scans in certain cases, particularly when using a composite row key. For instance, given a row key schema of: {code} structbucket int, time timestamp {code} if one wants to push down the predicate: {code} bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp 1408506670 {code} it's much more efficient to run a scan for each bucket over the time range (particularly if there's a large amount of data per day). With a single scan, the MR job has to process the data for all time for buckets in between 1 and 100. hive should allow HBaseKeyFactory's to decompose a predicate into one or more scans in order to take advantage of this fact. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler
[ https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Mains updated HIVE-7805: --- Attachment: HIVE-7805.patch This patch changes HiveHBaseTableInputFormat to extend MultiTableInputFormatBase, and allows HBaseKeyFactory implementations to push a ListHBaseScanRange, instead of just a single HBaseScanRange. Support running multiple scans in hbase-handler --- Key: HIVE-7805 URL: https://issues.apache.org/jira/browse/HIVE-7805 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.14.0 Reporter: Andrew Mains Attachments: HIVE-7805.patch Currently, the HiveHBaseTableInputFormat only supports running a single scan. This can be less efficient than running multiple disjoint scans in certain cases, particularly when using a composite row key. For instance, given a row key schema of: {code} structbucket int, time timestamp {code} if one wants to push down the predicate: {code} bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp 1408506670 {code} it's much more efficient to run a scan for each bucket over the time range (particularly if there's a large amount of data per day). With a single scan, the MR job has to process the data for all time for buckets in between 1 and 100. hive should allow HBaseKeyFactory's to decompose a predicate into one or more scans in order to take advantage of this fact. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler
[ https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Mains updated HIVE-7805: --- Assignee: Andrew Mains Status: Patch Available (was: Open) Support running multiple scans in hbase-handler --- Key: HIVE-7805 URL: https://issues.apache.org/jira/browse/HIVE-7805 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.14.0 Reporter: Andrew Mains Assignee: Andrew Mains Attachments: HIVE-7805.patch Currently, the HiveHBaseTableInputFormat only supports running a single scan. This can be less efficient than running multiple disjoint scans in certain cases, particularly when using a composite row key. For instance, given a row key schema of: {code} structbucket int, time timestamp {code} if one wants to push down the predicate: {code} bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp 1408506670 {code} it's much more efficient to run a scan for each bucket over the time range (particularly if there's a large amount of data per day). With a single scan, the MR job has to process the data for all time for buckets in between 1 and 100. hive should allow HBaseKeyFactory's to decompose a predicate into one or more scans in order to take advantage of this fact. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7420) Parameterize tests for HCatalog Pig interfaces for testing against all storage formats
[ https://issues.apache.org/jira/browse/HIVE-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Chen updated HIVE-7420: - Attachment: HIVE-7420-without-HIVE-7457.4.patch HIVE-7420.4.patch Parameterize tests for HCatalog Pig interfaces for testing against all storage formats -- Key: HIVE-7420 URL: https://issues.apache.org/jira/browse/HIVE-7420 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: David Chen Assignee: David Chen Attachments: HIVE-7420-without-HIVE-7457.2.patch, HIVE-7420-without-HIVE-7457.3.patch, HIVE-7420-without-HIVE-7457.4.patch, HIVE-7420.1.patch, HIVE-7420.2.patch, HIVE-7420.3.patch, HIVE-7420.4.patch Currently, HCatalog tests only test against RCFile with a few testing against ORC. The tests should be covering other Hive storage formats as well. HIVE-7286 turns HCatMapReduceTest into a test fixture that can be run with all Hive storage formats and with that patch, all test suites built on HCatMapReduceTest are running and passing against Sequence File, Text, and ORC in addition to RCFile. Similar changes should be made to make the tests for HCatLoader and HCatStorer generic so that they can be run against all Hive storage formats. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7806) insert overwrite local directory doesn't complain if it can't actually write the data
Carter Shanklin created HIVE-7806: - Summary: insert overwrite local directory doesn't complain if it can't actually write the data Key: HIVE-7806 URL: https://issues.apache.org/jira/browse/HIVE-7806 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Carter Shanklin Priority: Minor I tried exporting data to a directory that didn't exist and could not be created my my user. Hive reported success. It would be better if it reported failure here. {code} Time taken: 0.397 seconds hive insert overwrite local directory '/home/hue/staging' row format delimited fields terminated by ',' select * from store_sales; Query ID = hue_20140815141414_e4f0d70e-416e-4268-98ee-e5cc8f16ffaa Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1408132753408_0001, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1408132753408_0001/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1408132753408_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-08-15 14:14:47,272 Stage-1 map = 0%, reduce = 0% 2014-08-15 14:14:55,021 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.98 sec MapReduce Total cumulative CPU time: 980 msec Ended Job = job_1408132753408_0001 Copying data to local directory /home/hue/staging Copying data to local directory /home/hue/staging MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 0.98 sec HDFS Read: 327 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 980 msec OK Time taken: 25.903 seconds {code} ... Meanwhile, in another shell ... {code} [hue@sandbox home]$ ls -l /home/hue/staging ls: cannot access /home/hue/staging: No such file or directory {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23797: HIVE-7457: Minor HCatalog Pig Adapter test clean up.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23797/ --- (Updated Aug. 20, 2014, 8:04 p.m.) Review request for hive. Changes --- Address code review comments. Summary (updated) - HIVE-7457: Minor HCatalog Pig Adapter test clean up. Bugs: HIVE-7420 https://issues.apache.org/jira/browse/HIVE-7420 Repository: hive-git Description (updated) --- HIVE-7420: Parameterize tests for HCatalog Pig interfaces for testing against all storage formats. Diffs (updated) - hcatalog/hcatalog-pig-adapter/pom.xml 4d2ca519d413b7de0a6a8b50f9a099c3539fc432 hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/MockLoader.java c87b95a00af03d2531eb8bbdda4e307c3aac1fe2 hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestE2EScenarios.java a4b55c8463b3563f1e602ae2d0809dd318bcfa7f hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoader.java 82fc8a9391667138780be8796931793661f61ebb hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoaderComplexSchema.java eadbf20afc525dd9f33e9e7fb2a5d5cb89907d7e hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatStorer.java fcfc6428e7db80b8bfe0ce10e37d7b0ee6e58e20 hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatStorerMulti.java 76080f7635548ed9af114c823180d8da9ea8f6c2 hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatStorerWrapper.java 7f0bca763eb07db3822c6d6028357e81278803c9 hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestOrcHCatLoader.java 82eb0d72b4f885184c094113f775415c06bdce98 hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestOrcHCatLoaderComplexSchema.java 05387711289279cab743f51aee791069609b904a hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestOrcHCatPigStorer.java a9b452101c15fb7a3f0d8d0339f7d0ad97383441 hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestOrcHCatStorer.java 1084092828a9ac5e37f5b50b9c6bbd03f70b48fd hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestPigHCatUtil.java a8ce61aaad42b03e4de346530d0724f3d69776b9 hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestUtil.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/StorageFormats.java 19fdeb5ed3dba7a3bcba71fb285d92d3f6aabea9 Diff: https://reviews.apache.org/r/23797/diff/ Testing --- Thanks, David Chen