[jira] [Updated] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-2340: - Attachment: HIVE-2340.12.patch [~navis]: Thanks that pointer helped. The column pruning did indeed not carry over the some of the columns in the colExprMap. HIVE-2339 is in, but it was missing handling of the KEY.* columns. I've also looked at infer_bucket_sort and see what you're saying. Seems ok to have additional sort columns/buckets as long as the ones that are explicitly asked for are there. I've updated the golden file for that test. Running full test suite on .12 now. optimize orderby followed by a groupby -- Key: HIVE-2340 URL: https://issues.apache.org/jira/browse/HIVE-2340 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Labels: perfomance Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2339) Preserve RS key columns in columnExprMap after CP optimization, which might be useful to other optimizers
[ https://issues.apache.org/jira/browse/HIVE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576683#comment-13576683 ] Yin Huai commented on HIVE-2339: should we reopen this one? From the discussion of HIVE-2340, seems HIVE-1989 has not completely resolve the issue. Also, HIVE-2206 needs to use mapping of column names to track common keys among ReduceSinkOperators. Since the mapping of column names of keys are missing, if hive.map.aggr=false, the current patch of HIVE-2206 cannot detect common keys. hive.map.aggr=true will not be a problem since I still generate a reduce-side aggregation which is not in the plan tree and thus, will not go through CP optimization Preserve RS key columns in columnExprMap after CP optimization, which might be useful to other optimizers - Key: HIVE-2339 URL: https://issues.apache.org/jira/browse/HIVE-2339 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.8.0 Attachments: HIVE-2339.1.patch In ColumnPrunerProcFactory#pruneReduceSinkOperator, only VALUE parts are retained from columnExprMap. Doesn't anyone want KEY parts to retained, either? In my case, it was very useful for backtracking column names and removing RS in *-RS-*-RS-GBY case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive
[ https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576691#comment-13576691 ] Owen O'Malley commented on HIVE-3874: - Kevin, I had some distractions at work, but I should get the patch uploaded today. Create a new Optimized Row Columnar file format for Hive Key: HIVE-3874 URL: https://issues.apache.org/jira/browse/HIVE-3874 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: hive.3874.2.patch, OrcFileIntro.pptx, orc.tgz There are several limitations of the current RC File format that I'd like to address by creating a new format: * each column value is stored as a binary blob, which means: ** the entire column value must be read, decompressed, and deserialized ** the file format can't use smarter type-specific compression ** push down filters can't be evaluated * the start of each row group needs to be found by scanning * user metadata can only be added to the file when the file is created * the file doesn't store the number of rows per a file or row group * there is no mechanism for seeking to a particular row number, which is required for external indexes. * there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups. * the type of the rows aren't stored in the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1967 - Still Failing
Changes for Build #1964 [namit] HIVE-4001 Add o.a.h.h.serde.Constants for backward compatibility (Navis via namit) Changes for Build #1965 Changes for Build #1966 Changes for Build #1967 No tests ran. The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1967) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1967/ to view the results.
Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #290
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/290/ -- [...truncated 5492 lines...] [copy] Warning: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/pdk/src/test/resources does not exist. init: [echo] Project: pdk create-dirs: [echo] Project: builtins [mkdir] Created dir: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/builtins [mkdir] Created dir: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/builtins/classes [mkdir] Created dir: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/builtins/test [mkdir] Created dir: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/builtins/test/src [mkdir] Created dir: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/builtins/test/classes [mkdir] Created dir: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/builtins/test/resources [copy] Warning: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/builtins/src/test/resources does not exist. init: [echo] Project: builtins jar: [echo] Project: hive create-dirs: [echo] Project: shims [copy] Warning: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/shims/src/test/resources does not exist. init: [echo] Project: shims ivy-init-settings: [echo] Project: shims ivy-resolve: [echo] Project: shims [ivy:resolve] :: loading settings :: file = /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/ivy/ivysettings.xml [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/thrift/libthrift/0.7.0/libthrift-0.7.0.jar ... [ivy:resolve] (294kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.thrift#libthrift;0.7.0!libthrift.jar (155ms) [ivy:report] Processing /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/ivy/resolution-cache/org.apache.hive-hive-shims-default.xml to /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/ivy/report/org.apache.hive-hive-shims-default.html ivy-retrieve: [echo] Project: shims compile: [echo] Project: shims [echo] Building shims 0.20 build_shims: [echo] Project: shims [echo] Compiling /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/shims/src/common/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/shims/src/0.20/java against hadoop 0.20.2 (/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/hadoopcore/hadoop-0.20.2) ivy-init-settings: [echo] Project: shims ivy-resolve-hadoop-shim: [echo] Project: shims [ivy:resolve] :: loading settings :: file = /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/ivy/ivysettings.xml [ivy:resolve] downloading http://repo1.maven.org/maven2/com/google/guava/guava/r09/guava-r09.jar ... [ivy:resolve] ... (1117kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] com.google.guava#guava;r09!guava.jar (165ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/0.20.2/hadoop-core-0.20.2.jar ... [ivy:resolve] (2624kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.hadoop#hadoop-core;0.20.2!hadoop-core.jar (160ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-tools/0.20.2/hadoop-tools-0.20.2.jar ... [ivy:resolve] ... (68kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.hadoop#hadoop-tools;0.20.2!hadoop-tools.jar (211ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-test/0.20.2/hadoop-test-0.20.2.jar ... [ivy:resolve] (1527kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.hadoop#hadoop-test;0.20.2!hadoop-test.jar (245ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/commons-cli/commons-cli/1.2/commons-cli-1.2.jar ... [ivy:resolve] ... (40kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] commons-cli#commons-cli;1.2!commons-cli.jar (32ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/xmlenc/xmlenc/0.52/xmlenc-0.52.jar ... [ivy:resolve] .. (14kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ]
[jira] [Commented] (HIVE-2655) Ability to define functions in HQL
[ https://issues.apache.org/jira/browse/HIVE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576734#comment-13576734 ] Brock Noland commented on HIVE-2655: Jonathan, I haven't seen any updates to this JIRA in a while. Are you still working on it? If not, would you mind if I took it forward? Brock Ability to define functions in HQL -- Key: HIVE-2655 URL: https://issues.apache.org/jira/browse/HIVE-2655 Project: Hive Issue Type: New Feature Components: SQL Reporter: Jonathan Perlow Assignee: Jonathan Chang Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.4.patch Ability to create functions in HQL as a substitute for creating them in Java. Jonathan Chang requested I create this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4009) CLI Tests fail randomly due to MapReduce LocalJobRunner race condition
[ https://issues.apache.org/jira/browse/HIVE-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4009: --- Status: Patch Available (was: Open) CLI Tests fail randomly due to MapReduce LocalJobRunner race condition -- Key: HIVE-4009 URL: https://issues.apache.org/jira/browse/HIVE-4009 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Brock Noland Hadoop has a race condition MAPREDUCE-5001 which causes tests to fail randomly when using LocalJobRunner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4009) CLI Tests fail randomly due to MapReduce LocalJobRunner race condition
[ https://issues.apache.org/jira/browse/HIVE-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4009: --- Attachment: HIVE-4009-0.patch CLI Tests fail randomly due to MapReduce LocalJobRunner race condition -- Key: HIVE-4009 URL: https://issues.apache.org/jira/browse/HIVE-4009 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Brock Noland Attachments: HIVE-4009-0.patch Hadoop has a race condition MAPREDUCE-5001 which causes tests to fail randomly when using LocalJobRunner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3911) udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is disabled.
[ https://issues.apache.org/jira/browse/HIVE-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576811#comment-13576811 ] Thiruvel Thirumoolan commented on HIVE-3911: This also happens with all usages of NumericHistogram (udaf histogram_numeric() too). This algorithm deals with double and the order in which inputs goes to the algorithm matter. If the order is different (as in this case), the results will be different. In Hadoop 20.x, the inputs goto the UDAF as it is in the table. But in Hadoop-23, the input order is reversed and the final output also is different. I have uploaded a patch which works fine for histogram_numeric() but fails with a small difference for udaf_percentile_approx. If there is way to tune this in Hadoop-23 that should help. udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is disabled. - Key: HIVE-3911 URL: https://issues.apache.org/jira/browse/HIVE-3911 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Thiruvel Thirumoolan Fix For: 0.11.0 Attachments: HIVE-3911.patch I am running Hive10 unit tests against Hadoop 0.23.5 and udaf_percentile_approx.q fails with a different value when map-side aggr is disabled and only when 3rd argument to this UDAF is 100. Matches expected output when map-side aggr is enabled for the same arguments. This test passes when hadoop.version is 1.1.1 and fails when its 0.23.x or 2.0.0-alpha or 2.0.2-alpha. [junit] 20c20 [junit] 254.083331 [junit] --- [junit] 252.77 [junit] 47c47 [junit] 254.083331 [junit] --- [junit] 252.77 [junit] 74c74 [junit] [23.358,254.083331,477.0625,489.54667] [junit] --- [junit] [24.07,252.77,476.9,487.82] [junit] 101c101 [junit] [23.358,254.083331,477.0625,489.54667] [junit] --- [junit] [24.07,252.77,476.9,487.82] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576831#comment-13576831 ] Ashutosh Chauhan commented on HIVE-3972: [~navis] I agree HIVE-3562 is orthogonal issue which will make what I am suggesting lesser of an issue, but there are still some cases. As getting discussed on HIVE-3562 consider following query: {code} select value, sum(key) as sum from src group by value order by value limit 10; {code} In this case, limit can't be pushed in map-phase. So, HIVE-3562 optimization won't kick in. After patch as it is currently on this jira, we will generate 1MR job with multiple reducers and than do order-by on client in Fetch task. Here if you don't take advantage of the fact that there is a limit in query you might possibly read millions of rows from hdfs, bring all of them in client memory and than just show 10 to user. If you instead take limit into account and stop merging and reading as soon as you have seen 10 rows, you have saved both on hdfs IO as well as client memory. Make sense ? Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4011) Sort Merge Join does not kick-in and runs locally
[ https://issues.apache.org/jira/browse/HIVE-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Youssefi updated HIVE-4011: Summary: Sort Merge Join does not kick-in and runs locally (was: Sort Merge Join does not kick-in) Sort Merge Join does not kick-in and runs locally - Key: HIVE-4011 URL: https://issues.apache.org/jira/browse/HIVE-4011 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0, 0.10.0 Environment: Linux Reporter: Amir Youssefi Labels: joins, mapjoin After required settings to get Sort Merge Join, it does not kick-in and falls back to MapJoin with a local first step (on two bucketed and partitioned tables). Ran into the issue on Hive 0.9 at large scale to make sure issue persists I ran it on Hive 0.10 with sample public data and regular storage Formats. More details: set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; select /*+ MAPJOIN(l) */ l.stock_price_open lo, r.stock_price_open ro from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and l.stock_symbol = r.stock_symbol and l.dte=r.dte) where ... DDL: (both tables) PARTITIONED BY (year string) CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' also made sure we had: set hive.enforce.bucketing=true; set hive.enforce.sorting=true; Run logs and more info in attached file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4011) Sort Merge Join runs locally
[ https://issues.apache.org/jira/browse/HIVE-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Youssefi updated HIVE-4011: Summary: Sort Merge Join runs locally (was: Sort Merge Join does not kick-in and runs locally) Sort Merge Join runs locally Key: HIVE-4011 URL: https://issues.apache.org/jira/browse/HIVE-4011 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0, 0.10.0 Environment: Linux Reporter: Amir Youssefi Labels: joins, mapjoin After required settings to get Sort Merge Join, it does not kick-in and falls back to MapJoin with a local first step (on two bucketed and partitioned tables). Ran into the issue on Hive 0.9 at large scale to make sure issue persists I ran it on Hive 0.10 with sample public data and regular storage Formats. More details: set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; select /*+ MAPJOIN(l) */ l.stock_price_open lo, r.stock_price_open ro from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and l.stock_symbol = r.stock_symbol and l.dte=r.dte) where ... DDL: (both tables) PARTITIONED BY (year string) CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' also made sure we had: set hive.enforce.bucketing=true; set hive.enforce.sorting=true; Run logs and more info in attached file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4011) Sort Merge Join runs locally
[ https://issues.apache.org/jira/browse/HIVE-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Youssefi updated HIVE-4011: Attachment: SMJ-JIRA-4011.txt Sort Merge Join runs locally Key: HIVE-4011 URL: https://issues.apache.org/jira/browse/HIVE-4011 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0, 0.10.0 Environment: Linux Reporter: Amir Youssefi Labels: joins, mapjoin Attachments: SMJ-JIRA-4011.txt After required settings to get Sort Merge Join, it does not kick-in and falls back to MapJoin with a local first step (on two bucketed and partitioned tables). Ran into the issue on Hive 0.9 at large scale to make sure issue persists I ran it on Hive 0.10 with sample public data and regular storage Formats. More details: set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; select /*+ MAPJOIN(l) */ l.stock_price_open lo, r.stock_price_open ro from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and l.stock_symbol = r.stock_symbol and l.dte=r.dte) where ... DDL: (both tables) PARTITIONED BY (year string) CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' also made sure we had: set hive.enforce.bucketing=true; set hive.enforce.sorting=true; Run logs and more info in attached file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4011) Sort Merge Join does not kick-in
Amir Youssefi created HIVE-4011: --- Summary: Sort Merge Join does not kick-in Key: HIVE-4011 URL: https://issues.apache.org/jira/browse/HIVE-4011 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.9.0 Environment: Linux Reporter: Amir Youssefi After required settings to get Sort Merge Join, it does not kick-in and falls back to MapJoin with a local first step (on two bucketed and partitioned tables). Ran into the issue on Hive 0.9 at large scale to make sure issue persists I ran it on Hive 0.10 with sample public data and regular storage Formats. More details: set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; select /*+ MAPJOIN(l) */ l.stock_price_open lo, r.stock_price_open ro from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and l.stock_symbol = r.stock_symbol and l.dte=r.dte) where ... DDL: (both tables) PARTITIONED BY (year string) CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' also made sure we had: set hive.enforce.bucketing=true; set hive.enforce.sorting=true; Run logs and more info in attached file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2655) Ability to define functions in HQL
[ https://issues.apache.org/jira/browse/HIVE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576870#comment-13576870 ] Jonathan Chang commented on HIVE-2655: -- Yeah, I haven't had a chance to work on this. Looks like all that needs to be done at this point is unittests. I would be more than happy to have you take this the rest of the way! Ability to define functions in HQL -- Key: HIVE-2655 URL: https://issues.apache.org/jira/browse/HIVE-2655 Project: Hive Issue Type: New Feature Components: SQL Reporter: Jonathan Perlow Assignee: Jonathan Chang Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.4.patch Ability to create functions in HQL as a substitute for creating them in Java. Jonathan Chang requested I create this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join
[ https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576873#comment-13576873 ] Ashutosh Chauhan commented on HIVE-3403: Make sense. I am not suggesting to include all that in this jira, but wanted to make sure we are on same page as to where are heading. Though, w.r.t, configs I can see your point about adding more configs, but I still think by default optimization configs should be on. Whole point of release is to ship stable codebase. By definition trunk is not considered stable (as stable as in making release out of it), so time we get between committing to trunk and releasing is for stablizing new codebase, but if by default configs are off, bugs lurking in new codebase will never be exposed. user should not specify mapjoin to perform sort-merge bucketed join --- Key: HIVE-3403 URL: https://issues.apache.org/jira/browse/HIVE-3403 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3403.10.patch, hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, hive.3403.21.patch, hive.3403.22.patch, hive.3403.23.patch, hive.3403.24.patch, hive.3403.25.patch, hive.3403.26.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch Currently, in order to perform a sort merge bucketed join, the user needs to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the mapjoin hint. The user should not specify any hints. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2655) Ability to define functions in HQL
[ https://issues.apache.org/jira/browse/HIVE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576882#comment-13576882 ] Brock Noland commented on HIVE-2655: Hi, OK great! Yes the patch required only minor rebasing. I'll work on the unit tests and then post the resulting patch here. Thanks! Brock Ability to define functions in HQL -- Key: HIVE-2655 URL: https://issues.apache.org/jira/browse/HIVE-2655 Project: Hive Issue Type: New Feature Components: SQL Reporter: Jonathan Perlow Assignee: Jonathan Chang Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.4.patch Ability to create functions in HQL as a substitute for creating them in Java. Jonathan Chang requested I create this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4008) MiniMR tests fail with latest version of Hadoop 23
[ https://issues.apache.org/jira/browse/HIVE-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HIVE-4008: --- Attachment: HIVE-4008_branch10.patch HIVE-4008_branch10.patch tested with hadoop 20.x and 23.x. MiniMR tests fail with latest version of Hadoop 23 -- Key: HIVE-4008 URL: https://issues.apache.org/jira/browse/HIVE-4008 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 0.10.1 Attachments: HIVE-4008_branch10.patch TestMinimrCliDriver and TestNegativeMinimrCliDriver run fine with 0.23.4 on branch 10, but when I moved to 23.5 or a build of 23.6, they start to fail. YARN-144 seem to be the reason and will upload a patch soon for branch10. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3252) Add environment context to metastore Thrift calls
[ https://issues.apache.org/jira/browse/HIVE-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-3252: Resolution: Fixed Fix Version/s: 0.11.0 Assignee: Samuel Yuan (was: John Reese) Status: Resolved (was: Patch Available) Committed, thanks Sam. Add environment context to metastore Thrift calls - Key: HIVE-3252 URL: https://issues.apache.org/jira/browse/HIVE-3252 Project: Hive Issue Type: Improvement Components: Metastore Reporter: John Reese Assignee: Samuel Yuan Priority: Minor Fix For: 0.11.0 Attachments: HIVE-3252.1.patch.txt, HIVE-3252.2.patch.txt Currently in the Hive Thrift metastore API create_table, add_partition, alter_table, alter_partition have with_environment_context analogs. It would be really useful to add similar methods from drop_partition, drop_table, and append_partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4012) Unit test failures with Hadoop 23 due to HADOOP-8551
Thiruvel Thirumoolan created HIVE-4012: -- Summary: Unit test failures with Hadoop 23 due to HADOOP-8551 Key: HIVE-4012 URL: https://issues.apache.org/jira/browse/HIVE-4012 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.11.0 Reporter: Thiruvel Thirumoolan Fix For: 0.11.0, 0.10.1 With HADOOP-8551 (=23.3 or =2.0.2-alpha), its not possible to do a dfs -mkdir of foo/bar when foo does not exist. One has to use '-p' option (not available in Hadoop 20.x). A bunch of our test cases rely on this feature and this was to make it interoperable with Windows too (HIVE-3204). However, all these unit tests fail when using Hadoop =23.3 or =2.0.2-alpha. Its also not possible to use the '-p' option in the tests as thats not supported in Hadoop 20.x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4010) Failure finding iterate method with matching signature
[ https://issues.apache.org/jira/browse/HIVE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576944#comment-13576944 ] Miles Shang commented on HIVE-4010: --- Unit tests pass. Failure finding iterate method with matching signature -- Key: HIVE-4010 URL: https://issues.apache.org/jira/browse/HIVE-4010 Project: Hive Issue Type: Bug Components: UDF Reporter: Miles Shang Priority: Minor Attachments: HIVE-4010.D8517.1.patch Original Estimate: 24h Remaining Estimate: 24h http://fburl.com/10467687 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4010) Failure finding iterate method with matching signature
[ https://issues.apache.org/jira/browse/HIVE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576946#comment-13576946 ] Phabricator commented on HIVE-4010: --- mshang has commented on the revision HIVE-4010 [jira] Failure finding iterate method with matching signature. Ran full unit test suite. Pass. REVISION DETAIL https://reviews.facebook.net/D8517 To: JIRA, mshang Failure finding iterate method with matching signature -- Key: HIVE-4010 URL: https://issues.apache.org/jira/browse/HIVE-4010 Project: Hive Issue Type: Bug Components: UDF Reporter: Miles Shang Priority: Minor Attachments: HIVE-4010.D8517.1.patch Original Estimate: 24h Remaining Estimate: 24h http://fburl.com/10467687 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4010) Failure finding iterate method with matching signature
[ https://issues.apache.org/jira/browse/HIVE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576960#comment-13576960 ] Phabricator commented on HIVE-4010: --- jonchang has added reviewers to the revision HIVE-4010 [jira] Failure finding iterate method with matching signature. Added Reviewers: kevinwilfong I'm not a Hive committer. REVISION DETAIL https://reviews.facebook.net/D8517 To: JIRA, jonchang, kevinwilfong, mshang Failure finding iterate method with matching signature -- Key: HIVE-4010 URL: https://issues.apache.org/jira/browse/HIVE-4010 Project: Hive Issue Type: Bug Components: UDF Reporter: Miles Shang Priority: Minor Attachments: HIVE-4010.D8517.1.patch Original Estimate: 24h Remaining Estimate: 24h http://fburl.com/10467687 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3252) Add environment context to metastore Thrift calls
[ https://issues.apache.org/jira/browse/HIVE-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577003#comment-13577003 ] Hudson commented on HIVE-3252: -- Integrated in hive-trunk-hadoop1 #81 (See [https://builds.apache.org/job/hive-trunk-hadoop1/81/]) HIVE-3252. Add environment context to metastore Thrift calls. (Samuel Yuan via kevinwilfong) (Revision 1445309) Result = ABORTED kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1445309 Files : * /hive/trunk/metastore/if/hive_metastore.thrift * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/EnvironmentContext.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Index.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PrincipalPrivilegeSet.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Schema.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SerDeInfo.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SkewedInfo.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java * /hive/trunk/metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php * /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote * /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py * /hive/trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java Add environment context to metastore Thrift calls - Key: HIVE-3252 URL: https://issues.apache.org/jira/browse/HIVE-3252 Project: Hive Issue Type: Improvement Components: Metastore Reporter: John Reese Assignee: Samuel Yuan Priority: Minor Fix For: 0.11.0 Attachments: HIVE-3252.1.patch.txt, HIVE-3252.2.patch.txt Currently in the Hive Thrift metastore API create_table, add_partition, alter_table, alter_partition have with_environment_context analogs. It would be really useful to add similar methods from drop_partition, drop_table, and append_partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4012) Unit test failures with Hadoop 23 due to HADOOP-8551
[ https://issues.apache.org/jira/browse/HIVE-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HIVE-4012: --- Attachment: HIVE-4012_branch10.patch Unit test failures with Hadoop 23 due to HADOOP-8551 Key: HIVE-4012 URL: https://issues.apache.org/jira/browse/HIVE-4012 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.11.0 Reporter: Thiruvel Thirumoolan Fix For: 0.11.0, 0.10.1 Attachments: HIVE-4012_branch10.patch With HADOOP-8551 (=23.3 or =2.0.2-alpha), its not possible to do a dfs -mkdir of foo/bar when foo does not exist. One has to use '-p' option (not available in Hadoop 20.x). A bunch of our test cases rely on this feature and this was to make it interoperable with Windows too (HIVE-3204). However, all these unit tests fail when using Hadoop =23.3 or =2.0.2-alpha. Its also not possible to use the '-p' option in the tests as thats not supported in Hadoop 20.x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4013) Misc test failures on Hive10 with Hadoop 0.23.x
Thiruvel Thirumoolan created HIVE-4013: -- Summary: Misc test failures on Hive10 with Hadoop 0.23.x Key: HIVE-4013 URL: https://issues.apache.org/jira/browse/HIVE-4013 Project: Hive Issue Type: Bug Reporter: Thiruvel Thirumoolan Attachments: HIVE-4013_branch10.patch Following fail with latest builds of Hadoop23 (tested with 0.23.5 and a build of 0.23.6 also). Its more like making the tests deterministic, adding order by to all the queries. list_bucket_query_oneskew_3.q list_bucket_query_multiskew_2.q list_bucket_query_multiskew_3.q list_bucket_query_multiskew_1.q parenthesis_star_by.q -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4013) Misc test failures on Hive10 with Hadoop 0.23.x
[ https://issues.apache.org/jira/browse/HIVE-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HIVE-4013: --- Attachment: HIVE-4013_branch10.patch Misc test failures on Hive10 with Hadoop 0.23.x --- Key: HIVE-4013 URL: https://issues.apache.org/jira/browse/HIVE-4013 Project: Hive Issue Type: Bug Reporter: Thiruvel Thirumoolan Attachments: HIVE-4013_branch10.patch Following fail with latest builds of Hadoop23 (tested with 0.23.5 and a build of 0.23.6 also). Its more like making the tests deterministic, adding order by to all the queries. list_bucket_query_oneskew_3.q list_bucket_query_multiskew_2.q list_bucket_query_multiskew_3.q list_bucket_query_multiskew_1.q parenthesis_star_by.q -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-hadoop2 - Build # 119 - Still Failing
Changes for Build #81 Changes for Build #82 [namit] HIVE-3927 Potential overflow with new RCFileCat column sizes options (Kevin Wilfong via namit) Changes for Build #83 Changes for Build #84 [cws] HIVE-3931. Add Oracle metastore upgrade script for 0.9 to 10.0 (Prasad Mujumdar via cws) Changes for Build #85 Changes for Build #86 [hashutosh] HIVE-3913 : Possible deadlock in ZK lock manager (Mikhail Bautin via Ashutosh Chauhan) [hashutosh] HIVE-3833 : object inspectors should be initialized based on partition metadata (Namit Jain via Ashutosh Chauhan) Changes for Build #87 Changes for Build #88 [namit] HIVE-3825 Add Operator level Hooks (Pamela Vagata via namit) [hashutosh] HIVE-3528 : Avro SerDe doesn't handle serializing Nullable types that require access to a Schema (Sean Busbey via Ashutosh Chauhan) [namit] HIVE-3943 Skewed query fails if hdfs path has special characters (Gang Tim Liu via namit) Changes for Build #89 [namit] HIVE-3527 Allow CREATE TABLE LIKE command to take TBLPROPERTIES (Kevin Wilfong via namit) [namit] HIVE-3944 Make accept qfile argument for miniMR tests (Navis via namit) Changes for Build #90 [namit] HIVE-3912 table_access_keys_stats.q fails with hadoop 0.23 (Sushanth Sownyan via namit) [namit] HIVE-3921 recursive_dir.q fails on 0.23 (Sushanth Sowmyan via namit) [namit] HIVE-3923 join_filters_overlap.q fails on 0.23 (Sushanth Sowmyan via namit) [namit] HIVE-3924 join_nullsafe.q fails on 0.23 (Sushanth Sownyan via namit) [hashutosh] Adding csv.txt file, left out from commit of 3528 Changes for Build #91 Changes for Build #92 [hashutosh] HIVE-3799 : Better error message if metalisteners or hookContext cannot be loaded/instantiated (Navis via Ashutosh Chauhan) [hashutosh] HIVE-3947 : MiniMR test remains pending after test completion (Navis via Ashutosh Chauhan) Changes for Build #93 Changes for Build #94 [kevinwilfong] HIVE-3903. Allow updating bucketing/sorting metadata of a partition through the CLI. (Samuel Yuan via kevinwilfong) Changes for Build #95 [namit] HIVE-3873 lot of tests failing for hadoop 23 (Gang Tim Liu via namit) Changes for Build #96 [hashutosh] Missed deleting empty file GenMRRedSink4.java while commiting 3784 [hashutosh] HIVE-de-emphasize mapjoin hint (Namit Jain via Ashutosh Chauhan) Changes for Build #97 [namit] HIVE-933 Infer bucketing/sorting properties (Kevin Wilfong via namit) [hashutosh] HIVE-3950 : Remove code for merging files via MR job (Ashutosh Chauhan, Reviewed by Namit Jain) Changes for Build #98 Changes for Build #99 [kevinwilfong] HIVE-3940. Track columns accessed in each table in a query. (Samuel Yuan via kevinwilfong) Changes for Build #100 [namit] HIVE-3778 Add MapJoinDesc.isBucketMapJoin() as part of explain plan (Gang Tim Liu via namit) Changes for Build #101 Changes for Build #102 Changes for Build #103 Changes for Build #104 [hashutosh] HIVE-3977 : Hive 0.10 postgres schema script is broken (Johnny Zhang via Ashutosh Chauhan) [hashutosh] HIVE-3932 : Hive release tarballs don't contain PostgreSQL metastore scripts (Mark Grover via Ashutosh Chauhan) Changes for Build #105 [hashutosh] HIVE-3918 : Normalize more CRLF line endings (Mark Grover via Ashutosh Chauhan) [namit] HIVE-3917 Support noscan operation for analyze command (Gang Tim Liu via namit) Changes for Build #106 [namit] HIVE-3937 Hive Profiler (Pamela Vagata via namit) [hashutosh] HIVE-3571 : add a way to run a small unit quickly (Navis via Ashutosh Chauhan) [hashutosh] HIVE-3956 : TestMetaStoreAuthorization always uses the same port (Navis via Ashutosh Chauhan) Changes for Build #107 Changes for Build #108 Changes for Build #109 Changes for Build #110 [namit] HIVE-2839 Filters on outer join with mapjoin hint is not applied correctly (Navis via namit) Changes for Build #111 Changes for Build #112 [namit] HIVE-3998 Oracle metastore update script will fail when upgrading from 0.9.0 to 0.10.0 (Jarek and Mark via namit) [namit] HIVE-3999 Mysql metastore upgrade script will end up with different schema than the full schema load (Jarek and Mark via namit) Changes for Build #113 Changes for Build #114 [namit] HIVE-3995 PostgreSQL upgrade scripts are not valid (Jarek and Mark via namit) Changes for Build #115 Changes for Build #116 [namit] HIVE-4001 Add o.a.h.h.serde.Constants for backward compatibility (Navis via namit) Changes for Build #117 Changes for Build #118 Changes for Build #119 33 tests failed. FAILED: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_1 Error Message: Unexpected exception See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get more logs. Stack Trace: junit.framework.AssertionFailedError: Unexpected exception See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get more logs. at junit.framework.Assert.fail(Assert.java:50) at
[jira] [Updated] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive
[ https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3874: -- Attachment: HIVE-3874.D8529.1.patch omalley requested code review of HIVE-3874 [jira] Create a new Optimized Row Columnar file format for Hive. Reviewers: JIRA HIVE-3874. Create ORC File format. There are several limitations of the current RC File format that I'd like to address by creating a new format: each column value is stored as a binary blob, which means: the entire column value must be read, decompressed, and deserialized the file format can't use smarter type-specific compression push down filters can't be evaluated the start of each row group needs to be found by scanning user metadata can only be added to the file when the file is created the file doesn't store the number of rows per a file or row group there is no mechanism for seeking to a particular row number, which is required for external indexes. there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups. the type of the rows aren't stored in the file TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D8529 AFFECTED FILES build.properties build.xml ivy/libraries.properties ql/build.xml ql/ivy.xml ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java ql/src/java/org/apache/hadoop/hive/ql/orc/BitFieldReader.java ql/src/java/org/apache/hadoop/hive/ql/orc/BitFieldWriter.java ql/src/java/org/apache/hadoop/hive/ql/orc/BooleanColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/orc/ColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/orc/ColumnStatisticsImpl.java ql/src/java/org/apache/hadoop/hive/ql/orc/CompressionCodec.java ql/src/java/org/apache/hadoop/hive/ql/orc/CompressionKind.java ql/src/java/org/apache/hadoop/hive/ql/orc/DoubleColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/orc/DynamicByteArray.java ql/src/java/org/apache/hadoop/hive/ql/orc/DynamicIntArray.java ql/src/java/org/apache/hadoop/hive/ql/orc/FileDump.java ql/src/java/org/apache/hadoop/hive/ql/orc/InStream.java ql/src/java/org/apache/hadoop/hive/ql/orc/IntegerColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/orc/OrcFile.java ql/src/java/org/apache/hadoop/hive/ql/orc/OrcInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/orc/OrcOutputFormat.java ql/src/java/org/apache/hadoop/hive/ql/orc/OrcSerde.java ql/src/java/org/apache/hadoop/hive/ql/orc/OrcStruct.java ql/src/java/org/apache/hadoop/hive/ql/orc/OrcUnion.java ql/src/java/org/apache/hadoop/hive/ql/orc/OutStream.java ql/src/java/org/apache/hadoop/hive/ql/orc/PositionProvider.java ql/src/java/org/apache/hadoop/hive/ql/orc/PositionRecorder.java ql/src/java/org/apache/hadoop/hive/ql/orc/PositionedOutputStream.java ql/src/java/org/apache/hadoop/hive/ql/orc/Reader.java ql/src/java/org/apache/hadoop/hive/ql/orc/ReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/orc/RecordReader.java ql/src/java/org/apache/hadoop/hive/ql/orc/RecordReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/orc/RedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthByteReader.java ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthByteWriter.java ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthIntegerReader.java ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthIntegerWriter.java ql/src/java/org/apache/hadoop/hive/ql/orc/SerializationUtils.java ql/src/java/org/apache/hadoop/hive/ql/orc/SnappyCodec.java ql/src/java/org/apache/hadoop/hive/ql/orc/StreamName.java ql/src/java/org/apache/hadoop/hive/ql/orc/StringColumnStatistics.java ql/src/java/org/apache/hadoop/hive/ql/orc/StringRedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/orc/StripeInformation.java ql/src/java/org/apache/hadoop/hive/ql/orc/Writer.java ql/src/java/org/apache/hadoop/hive/ql/orc/WriterImpl.java ql/src/java/org/apache/hadoop/hive/ql/orc/ZlibCodec.java ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitFieldReader.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestDynamicArray.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcStruct.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthByteReader.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthIntegerReader.java ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java
[jira] [Updated] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-2340: - Attachment: HIVE-2340.13.patch Clean bill of health on 12, except for incorrect golden files in TestParse_join2 and TestMinimrCliDriver_reduce_deduplicate. I've updated the golden files in patch .13. optimize orderby followed by a groupby -- Key: HIVE-2340 URL: https://issues.apache.org/jira/browse/HIVE-2340 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Labels: perfomance Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, HIVE-2340.13.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2655) Ability to define functions in HQL
[ https://issues.apache.org/jira/browse/HIVE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577188#comment-13577188 ] Phabricator commented on HIVE-2655: --- brock has commented on the revision HIVE-2655 [jira] Ability to define functions in HQL. Regarding * Defining the same macro twice. * Dropping a macro that doesn't exist. I assume the behavior should be the same as functions with allows both of these behaviors. As such I wonder why they were listed under negative tests? REVISION DETAIL https://reviews.facebook.net/D915 BRANCH macro ARCANIST PROJECT hive To: JIRA, jsichi, cwsteinbach, jonchang Cc: jonchang, ikabiljo, brock Ability to define functions in HQL -- Key: HIVE-2655 URL: https://issues.apache.org/jira/browse/HIVE-2655 Project: Hive Issue Type: New Feature Components: SQL Reporter: Jonathan Perlow Assignee: Jonathan Chang Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.4.patch Ability to create functions in HQL as a substitute for creating them in Java. Jonathan Chang requested I create this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join
[ https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577198#comment-13577198 ] Vikram Dixit K commented on HIVE-3403: -- In the patch, the auto_sortmerge_join_6.q is missing. user should not specify mapjoin to perform sort-merge bucketed join --- Key: HIVE-3403 URL: https://issues.apache.org/jira/browse/HIVE-3403 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3403.10.patch, hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, hive.3403.21.patch, hive.3403.22.patch, hive.3403.23.patch, hive.3403.24.patch, hive.3403.25.patch, hive.3403.26.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch Currently, in order to perform a sort merge bucketed join, the user needs to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the mapjoin hint. The user should not specify any hints. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3652) Join optimization for star schema
[ https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K resolved HIVE-3652. -- Resolution: Duplicate Fix Version/s: 0.11.0 The work required for this jira is fixed as part of de-emphasizing of map-join work done in HIVE-3784. The query {format}select /*+ MAPJOIN(b,c) */ from FACT a join DIM1 b on a.k1=b.k1 JOIN DIM2 c on b.k2=c.k2{format} runs in 1 MR job (based on the noConditionalTask.size). Join optimization for star schema - Key: HIVE-3652 URL: https://issues.apache.org/jira/browse/HIVE-3652 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Vikram Dixit K Fix For: 0.11.0 Currently, if we join one fact table with multiple dimension tables, it results in multiple mapreduce jobs for each join with dimension table, because join would be on different keys for each dimension. Usually all the dimension tables will be small and can fit into memory and so map-side join can used to join with fact table. In this issue I want to look at optimizing such query to generate single mapreduce job sothat mapper loads dimension tables into memory and joins with fact table on different keys as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4014) Hive+RCFile is not doing column pruning and reading much more data than necessary
Vinod Kumar Vavilapalli created HIVE-4014: - Summary: Hive+RCFile is not doing column pruning and reading much more data than necessary Key: HIVE-4014 URL: https://issues.apache.org/jira/browse/HIVE-4014 Project: Hive Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli With even simple projection queries, I see that HDFS bytes read counter doesn't show any reduction in the amount of data read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4014) Hive+RCFile is not doing column pruning and reading much more data than necessary
[ https://issues.apache.org/jira/browse/HIVE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577216#comment-13577216 ] Vinod Kumar Vavilapalli commented on HIVE-4014: --- I already tracked it down, will upload a patch soon.. Hive+RCFile is not doing column pruning and reading much more data than necessary - Key: HIVE-4014 URL: https://issues.apache.org/jira/browse/HIVE-4014 Project: Hive Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli With even simple projection queries, I see that HDFS bytes read counter doesn't show any reduction in the amount of data read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4015) Add ORC file to the grammar as a file format
Owen O'Malley created HIVE-4015: --- Summary: Add ORC file to the grammar as a file format Key: HIVE-4015 URL: https://issues.apache.org/jira/browse/HIVE-4015 Project: Hive Issue Type: Improvement Reporter: Owen O'Malley It would be much more convenient for users if we enable them to use ORC as a file format in the HQL grammar. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join
[ https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-3403: - Attachment: auto_sortmerge_join_1_modified.q I run with a modified version of the auto_sortmerge_join_1.q file (attached to the JIRA) and create a query where 2 of the tables in a join are sorted and bucketed and the 3rd table is not sorted. I have enabled the auto map join convert config. I am seeing this exception: FAILED: ClassCastException org.apache.hadoop.hive.ql.exec.SMBMapJ oinOperator cannot be cast to org.apache.hadoop.hive.ql.exec.MapJoinOperator I do not see the exception if I set the noConditionalTask.size to a size greater than the size of the 2 small tables (src1 and small_table) for e.g. 500. user should not specify mapjoin to perform sort-merge bucketed join --- Key: HIVE-3403 URL: https://issues.apache.org/jira/browse/HIVE-3403 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: auto_sortmerge_join_1_modified.q, hive.3403.10.patch, hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, hive.3403.21.patch, hive.3403.22.patch, hive.3403.23.patch, hive.3403.24.patch, hive.3403.25.patch, hive.3403.26.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch Currently, in order to perform a sort merge bucketed join, the user needs to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the mapjoin hint. The user should not specify any hints. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4015) Add ORC file to the grammar as a file format
[ https://issues.apache.org/jira/browse/HIVE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley reassigned HIVE-4015: --- Assignee: Owen O'Malley Add ORC file to the grammar as a file format Key: HIVE-4015 URL: https://issues.apache.org/jira/browse/HIVE-4015 Project: Hive Issue Type: Improvement Reporter: Owen O'Malley Assignee: Owen O'Malley It would be much more convenient for users if we enable them to use ORC as a file format in the HQL grammar. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join
[ https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-3996: - Status: Patch Available (was: Open) Correctly enforce the memory limit on the multi-table map-join -- Key: HIVE-3996 URL: https://issues.apache.org/jira/browse/HIVE-3996 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-3996_2.patch, HIVE-3996.patch Currently with HIVE-3784, the joins are converted to map-joins based on checks of the table size against the config variable: hive.auto.convert.join.noconditionaltask.size. However, the current implementation will also merge multiple mapjoin operators into a single task regardless of whether the sum of the table sizes will exceed the configured value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join
[ https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-3996: - Attachment: HIVE-3996_2.patch Updated patch which improves existing tests with my patch. Correctly enforce the memory limit on the multi-table map-join -- Key: HIVE-3996 URL: https://issues.apache.org/jira/browse/HIVE-3996 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-3996_2.patch, HIVE-3996.patch Currently with HIVE-3784, the joins are converted to map-joins based on checks of the table size against the config variable: hive.auto.convert.join.noconditionaltask.size. However, the current implementation will also merge multiple mapjoin operators into a single task regardless of whether the sum of the table sizes will exceed the configured value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Problem with Phabricator
Hi all, I recently found an issue with the installation of Phabricator used for code review (http://reviews.facebook.net). I reported it and was told that it can actually be fixed with an upgrade of Pygments to the latest release (see https://secure.phabricator.com/T2535). Is anyone familiar with how to go about doing that? Thanks, Sam
[jira] [Commented] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype
[ https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577298#comment-13577298 ] Anandha L Ranganathan commented on HIVE-3850: - Hello Arun, Justification: You should read all the comments and the ticket was re-opened. Also, I added .q and .q.out in the patch. hour() function returns 12 hour clock value when using timestamp datatype - Key: HIVE-3850 URL: https://issues.apache.org/jira/browse/HIVE-3850 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.9.0, 0.10.0 Reporter: Pieterjan Vriends Fix For: 0.11.0 Attachments: hive-3850.patch, HIVE-3850.patch.txt Apparently UDFHour.java does have two evaluate() functions. One that does accept a Text object as parameter and one that does use a TimeStampWritable object as parameter. The first function does return the value of Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the documentation I couldn't find any information on the overload of the evaluation function. I did spent quite some time finding out why my statement didn't return a 24 hour clock value. Shouldn't both functions return the same? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4010) Failure finding iterate method with matching signature
[ https://issues.apache.org/jira/browse/HIVE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miles Shang reassigned HIVE-4010: - Assignee: Miles Shang Failure finding iterate method with matching signature -- Key: HIVE-4010 URL: https://issues.apache.org/jira/browse/HIVE-4010 Project: Hive Issue Type: Bug Components: UDF Reporter: Miles Shang Assignee: Miles Shang Priority: Minor Attachments: HIVE-4010.D8517.1.patch Original Estimate: 24h Remaining Estimate: 24h http://fburl.com/10467687 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4010) Failure finding iterate method with matching signature
[ https://issues.apache.org/jira/browse/HIVE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miles Shang updated HIVE-4010: -- Status: Patch Available (was: Open) Failure finding iterate method with matching signature -- Key: HIVE-4010 URL: https://issues.apache.org/jira/browse/HIVE-4010 Project: Hive Issue Type: Bug Components: UDF Reporter: Miles Shang Assignee: Miles Shang Priority: Minor Attachments: HIVE-4010.D8517.1.patch Original Estimate: 24h Remaining Estimate: 24h http://fburl.com/10467687 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join
[ https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3403: - Status: Open (was: Patch Available) Thanks Vikram, I will take a look. user should not specify mapjoin to perform sort-merge bucketed join --- Key: HIVE-3403 URL: https://issues.apache.org/jira/browse/HIVE-3403 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: auto_sortmerge_join_1_modified.q, hive.3403.10.patch, hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, hive.3403.21.patch, hive.3403.22.patch, hive.3403.23.patch, hive.3403.24.patch, hive.3403.25.patch, hive.3403.26.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch Currently, in order to perform a sort merge bucketed join, the user needs to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the mapjoin hint. The user should not specify any hints. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4013) Misc test failures on Hive10 with Hadoop 0.23.x
[ https://issues.apache.org/jira/browse/HIVE-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan reassigned HIVE-4013: -- Assignee: Thiruvel Thirumoolan Misc test failures on Hive10 with Hadoop 0.23.x --- Key: HIVE-4013 URL: https://issues.apache.org/jira/browse/HIVE-4013 Project: Hive Issue Type: Bug Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Attachments: HIVE-4013_branch10.patch Following fail with latest builds of Hadoop23 (tested with 0.23.5 and a build of 0.23.6 also). Its more like making the tests deterministic, adding order by to all the queries. list_bucket_query_oneskew_3.q list_bucket_query_multiskew_2.q list_bucket_query_multiskew_3.q list_bucket_query_multiskew_1.q parenthesis_star_by.q -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4012) Unit test failures with Hadoop 23 due to HADOOP-8551
[ https://issues.apache.org/jira/browse/HIVE-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HIVE-4012: --- Assignee: Thiruvel Thirumoolan Unit test failures with Hadoop 23 due to HADOOP-8551 Key: HIVE-4012 URL: https://issues.apache.org/jira/browse/HIVE-4012 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.11.0 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 0.11.0, 0.10.1 Attachments: HIVE-4012_branch10.patch With HADOOP-8551 (=23.3 or =2.0.2-alpha), its not possible to do a dfs -mkdir of foo/bar when foo does not exist. One has to use '-p' option (not available in Hadoop 20.x). A bunch of our test cases rely on this feature and this was to make it interoperable with Windows too (HIVE-3204). However, all these unit tests fail when using Hadoop =23.3 or =2.0.2-alpha. Its also not possible to use the '-p' option in the tests as thats not supported in Hadoop 20.x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3951) Allow Decimal type columns in Regex Serde
[ https://issues.apache.org/jira/browse/HIVE-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577324#comment-13577324 ] Mark Grover commented on HIVE-3951: --- This patch is ready for review. Would anyone be willing to please review? Allow Decimal type columns in Regex Serde - Key: HIVE-3951 URL: https://issues.apache.org/jira/browse/HIVE-3951 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: Mark Grover Assignee: Mark Grover Fix For: 0.11.0 Attachments: HIVE-3951.1.patch Decimal type in Hive was recently added by HIVE-2693. We should allow users to create tables with decimal type columns when using Regex Serde. HIVE-3004 did something similar for other primitive types. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4007) Create abstract classes for serializer and deserializer
[ https://issues.apache.org/jira/browse/HIVE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577329#comment-13577329 ] Namit Jain commented on HIVE-4007: -- I agree, it is incompatible. I can change the existing serde's in hive codebase, but there may be external serde's out there, which I have no control over. We have to take this hit sometime. Create abstract classes for serializer and deserializer --- Key: HIVE-4007 URL: https://issues.apache.org/jira/browse/HIVE-4007 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Namit Jain Assignee: Namit Jain Currently, it is very difficult to change the Serializer/Deserializer interface, since all the SerDes directly implement the interface. Instead, we should have abstract classes for implementing these interfaces. In case of a interface change, only the abstract class and the relevant serde needs to change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4007) Create abstract classes for serializer and deserializer
[ https://issues.apache.org/jira/browse/HIVE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4007: - Attachment: hive.4007.1.patch Create abstract classes for serializer and deserializer --- Key: HIVE-4007 URL: https://issues.apache.org/jira/browse/HIVE-4007 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.4007.1.patch Currently, it is very difficult to change the Serializer/Deserializer interface, since all the SerDes directly implement the interface. Instead, we should have abstract classes for implementing these interfaces. In case of a interface change, only the abstract class and the relevant serde needs to change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HIVE-3652) Join optimization for star schema
[ https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu reopened HIVE-3652: --- When I ran the same query on the latest trunk with HIVE-3784 fixed, I see the following : {noformat} explain select /*+ MAPJOIN(b,c) */ * from fact a join dim1 b on a.k1=b.k1 JOIN dim2 c on a.k2=c.k2; FAILED: SemanticException [Error 10227]: Not all clauses are supported with mapjoin hint. Please remove mapjoin hint. {noformat} When I set hive.auto.convert.join=true; and run the following : {noformat} explain select * from fact a join dim1 b on a.k1=b.k1 JOIN dim2 c on a.k2=c.k2; STAGE DEPENDENCIES: Stage-10 is a root stage , consists of Stage-13, Stage-14, Stage-1 Stage-13 has a backup stage: Stage-1 Stage-8 depends on stages: Stage-13 Stage-7 depends on stages: Stage-1, Stage-8, Stage-9 , consists of Stage-11, Stage-12, Stage-2 Stage-11 has a backup stage: Stage-2 Stage-5 depends on stages: Stage-11 Stage-12 has a backup stage: Stage-2 Stage-6 depends on stages: Stage-12 Stage-2 Stage-14 has a backup stage: Stage-1 Stage-9 depends on stages: Stage-14 Stage-1 Stage-0 is a root stage {noformat} And the above query launches two MR jobs. Correct me if i am doing anything wrong. Namit, Can you confirm if this is fixed in HIVE-3784 and is there any other way to run this? Vikram, If you are seeing this fixed, can you please add tests if no code changes are required? Join optimization for star schema - Key: HIVE-3652 URL: https://issues.apache.org/jira/browse/HIVE-3652 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Vikram Dixit K Fix For: 0.11.0 Currently, if we join one fact table with multiple dimension tables, it results in multiple mapreduce jobs for each join with dimension table, because join would be on different keys for each dimension. Usually all the dimension tables will be small and can fit into memory and so map-side join can used to join with fact table. In this issue I want to look at optimizing such query to generate single mapreduce job sothat mapper loads dimension tables into memory and joins with fact table on different keys as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4007) Create abstract classes for serializer and deserializer
[ https://issues.apache.org/jira/browse/HIVE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577345#comment-13577345 ] Namit Jain commented on HIVE-4007: -- https://reviews.facebook.net/D8541 Create abstract classes for serializer and deserializer --- Key: HIVE-4007 URL: https://issues.apache.org/jira/browse/HIVE-4007 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.4007.1.patch Currently, it is very difficult to change the Serializer/Deserializer interface, since all the SerDes directly implement the interface. Instead, we should have abstract classes for implementing these interfaces. In case of a interface change, only the abstract class and the relevant serde needs to change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4016) Remove init(fname) from TestParse.vm for each test
Navis created HIVE-4016: --- Summary: Remove init(fname) from TestParse.vm for each test Key: HIVE-4016 URL: https://issues.apache.org/jira/browse/HIVE-4016 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Navis Assignee: Navis Priority: Trivial TestParse does not change any of configuration or data, which means calling init() method before each test is not necessary. After removing it, test time reduced to 260sec to 16sec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4016) Remove init(fname) from TestParse.vm for each test
[ https://issues.apache.org/jira/browse/HIVE-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-4016: Status: Patch Available (was: Open) Remove init(fname) from TestParse.vm for each test -- Key: HIVE-4016 URL: https://issues.apache.org/jira/browse/HIVE-4016 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-4016.D8547.1.patch TestParse does not change any of configuration or data, which means calling init() method before each test is not necessary. After removing it, test time reduced to 260sec to 16sec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4016) Remove init(fname) from TestParse.vm for each test
[ https://issues.apache.org/jira/browse/HIVE-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4016: -- Attachment: HIVE-4016.D8547.1.patch navis requested code review of HIVE-4016 [jira] Remove init(fname) from TestParse.vm for each test. Reviewers: JIRA HIVE-4016 Remove init(fname) from TestParse.vm for each test TestParse does not change any of configuration or data, which means calling init() method before each test is not necessary. After removing it, test time reduced to 260sec to 16sec. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D8547 AFFECTED FILES ql/src/test/templates/TestParse.vm MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/20805/ To: JIRA, navis Remove init(fname) from TestParse.vm for each test -- Key: HIVE-4016 URL: https://issues.apache.org/jira/browse/HIVE-4016 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-4016.D8547.1.patch TestParse does not change any of configuration or data, which means calling init() method before each test is not necessary. After removing it, test time reduced to 260sec to 16sec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4017) Can't close long running hive Query Statements
Kugathasan Abimaran created HIVE-4017: - Summary: Can't close long running hive Query Statements Key: HIVE-4017 URL: https://issues.apache.org/jira/browse/HIVE-4017 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.9.0 Environment: Ubuntu 11.04 Reporter: Kugathasan Abimaran Currently, we can't set the hive query timeout period. Hive returns Method not supported. Are there anyways to stop the long running hive query statements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3652) Join optimization for star schema
[ https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577353#comment-13577353 ] Amareshwari Sriramadasu commented on HIVE-3652: --- Even with hive.auto.convert.join.noconditionaltask set to true, I'm seeing two MR jobs getting launched. Join optimization for star schema - Key: HIVE-3652 URL: https://issues.apache.org/jira/browse/HIVE-3652 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Vikram Dixit K Fix For: 0.11.0 Currently, if we join one fact table with multiple dimension tables, it results in multiple mapreduce jobs for each join with dimension table, because join would be on different keys for each dimension. Usually all the dimension tables will be small and can fit into memory and so map-side join can used to join with fact table. In this issue I want to look at optimizing such query to generate single mapreduce job sothat mapper loads dimension tables into memory and joins with fact table on different keys as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4007) Create abstract classes for serializer and deserializer
[ https://issues.apache.org/jira/browse/HIVE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4007: - Attachment: hive.4007.2.patch Create abstract classes for serializer and deserializer --- Key: HIVE-4007 URL: https://issues.apache.org/jira/browse/HIVE-4007 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.4007.1.patch, hive.4007.2.patch Currently, it is very difficult to change the Serializer/Deserializer interface, since all the SerDes directly implement the interface. Instead, we should have abstract classes for implementing these interfaces. In case of a interface change, only the abstract class and the relevant serde needs to change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3652) Join optimization for star schema
[ https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577365#comment-13577365 ] Namit Jain commented on HIVE-3652: -- Is your size threshold correct -- hive.auto.convert.join.noconditionaltask.size ? Join optimization for star schema - Key: HIVE-3652 URL: https://issues.apache.org/jira/browse/HIVE-3652 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Vikram Dixit K Fix For: 0.11.0 Currently, if we join one fact table with multiple dimension tables, it results in multiple mapreduce jobs for each join with dimension table, because join would be on different keys for each dimension. Usually all the dimension tables will be small and can fit into memory and so map-side join can used to join with fact table. In this issue I want to look at optimizing such query to generate single mapreduce job sothat mapper loads dimension tables into memory and joins with fact table on different keys as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3652) Join optimization for star schema
[ https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577368#comment-13577368 ] Amareshwari Sriramadasu commented on HIVE-3652: --- bq. Is your size threshold correct – hive.auto.convert.join.noconditionaltask.size ? Yes. The tables are very small. I tested with empty tables as well. I'm seeing the same behavior. Join optimization for star schema - Key: HIVE-3652 URL: https://issues.apache.org/jira/browse/HIVE-3652 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Vikram Dixit K Fix For: 0.11.0 Currently, if we join one fact table with multiple dimension tables, it results in multiple mapreduce jobs for each join with dimension table, because join would be on different keys for each dimension. Usually all the dimension tables will be small and can fit into memory and so map-side join can used to join with fact table. In this issue I want to look at optimizing such query to generate single mapreduce job sothat mapper loads dimension tables into memory and joins with fact table on different keys as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3252) Add environment context to metastore Thrift calls
[ https://issues.apache.org/jira/browse/HIVE-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577372#comment-13577372 ] Hudson commented on HIVE-3252: -- Integrated in Hive-trunk-h0.21 #1968 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1968/]) HIVE-3252. Add environment context to metastore Thrift calls. (Samuel Yuan via kevinwilfong) (Revision 1445309) Result = FAILURE kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1445309 Files : * /hive/trunk/metastore/if/hive_metastore.thrift * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/EnvironmentContext.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Index.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PrincipalPrivilegeSet.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Schema.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SerDeInfo.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SkewedInfo.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java * /hive/trunk/metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php * /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote * /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py * /hive/trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java Add environment context to metastore Thrift calls - Key: HIVE-3252 URL: https://issues.apache.org/jira/browse/HIVE-3252 Project: Hive Issue Type: Improvement Components: Metastore Reporter: John Reese Assignee: Samuel Yuan Priority: Minor Fix For: 0.11.0 Attachments: HIVE-3252.1.patch.txt, HIVE-3252.2.patch.txt Currently in the Hive Thrift metastore API create_table, add_partition, alter_table, alter_partition have with_environment_context analogs. It would be really useful to add similar methods from drop_partition, drop_table, and append_partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1968 - Still Failing
Changes for Build #1964 [namit] HIVE-4001 Add o.a.h.h.serde.Constants for backward compatibility (Navis via namit) Changes for Build #1965 Changes for Build #1966 Changes for Build #1967 Changes for Build #1968 [kevinwilfong] HIVE-3252. Add environment context to metastore Thrift calls. (Samuel Yuan via kevinwilfong) 1 tests failed. FAILED: org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_aggregator_error_1 Error Message: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. Stack Trace: junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. at net.sf.antcontrib.logic.ForTask.doSequentialIteration(ForTask.java:259) at net.sf.antcontrib.logic.ForTask.doToken(ForTask.java:268) at net.sf.antcontrib.logic.ForTask.doTheTasks(ForTask.java:299) at net.sf.antcontrib.logic.ForTask.execute(ForTask.java:244) The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1968) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1968/ to view the results.
[jira] [Commented] (HIVE-3252) Add environment context to metastore Thrift calls
[ https://issues.apache.org/jira/browse/HIVE-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577376#comment-13577376 ] Hudson commented on HIVE-3252: -- Integrated in Hive-trunk-hadoop2 #120 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/120/]) HIVE-3252. Add environment context to metastore Thrift calls. (Samuel Yuan via kevinwilfong) (Revision 1445309) Result = FAILURE kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1445309 Files : * /hive/trunk/metastore/if/hive_metastore.thrift * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/EnvironmentContext.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Index.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PrincipalPrivilegeSet.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Schema.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SerDeInfo.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SkewedInfo.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java * /hive/trunk/metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php * /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote * /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py * /hive/trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java Add environment context to metastore Thrift calls - Key: HIVE-3252 URL: https://issues.apache.org/jira/browse/HIVE-3252 Project: Hive Issue Type: Improvement Components: Metastore Reporter: John Reese Assignee: Samuel Yuan Priority: Minor Fix For: 0.11.0 Attachments: HIVE-3252.1.patch.txt, HIVE-3252.2.patch.txt Currently in the Hive Thrift metastore API create_table, add_partition, alter_table, alter_partition have with_environment_context analogs. It would be really useful to add similar methods from drop_partition, drop_table, and append_partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3652) Join optimization for star schema
[ https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577391#comment-13577391 ] Vikram Dixit K commented on HIVE-3652: -- Hi Amareshwari, If you look at test case join32.q, it is almost the same as the one you had posted. It launches only one MR task (http://svn.apache.org/viewvc/hive/trunk/ql/src/test/results/clientpositive/join32.q.out?view=markup) I tried this with a fully installed cluster as well and I can see only one task. Another issue to consider would be HIVE-3996 and see if that makes a difference. Kindly correct me if I am wrong. Thanks Vikram. Join optimization for star schema - Key: HIVE-3652 URL: https://issues.apache.org/jira/browse/HIVE-3652 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Vikram Dixit K Fix For: 0.11.0 Currently, if we join one fact table with multiple dimension tables, it results in multiple mapreduce jobs for each join with dimension table, because join would be on different keys for each dimension. Usually all the dimension tables will be small and can fit into memory and so map-side join can used to join with fact table. In this issue I want to look at optimizing such query to generate single mapreduce job sothat mapper loads dimension tables into memory and joins with fact table on different keys as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-2340: -- Attachment: HIVE-2340.D1209.12.patch navis updated the revision HIVE-2340 [jira] optimize orderby followed by a groupby. 1. Changed policy of creating new metadatas(colExprMap, etc) in ColumnPrunerProcFactory.pruneReduceSinkOperator() - Remove not retained values from RowResolver, colExprMap and schema (instead of creating new entities by adding retained values) 2. Changed order of applying CP and PPD. Now PPD applies first and CP next (which was CP-PPD) - CP removes some expr mappings which was not yet propagated by PPD - Also removed pruning schema of FilterOperator, which seemed not right (It's not certain that TS will actually prune columns) 3. Refactored to share same code base in ExprNodeDescUtils which was introduced by HIVE-2839 Will run full test tonight Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D1209 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D1209?vs=27315id=27669#toc AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java conf/hive-default.xml.template ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinProcFactory.java ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java ql/src/test/queries/clientpositive/auto_join26.q ql/src/test/queries/clientpositive/groupby_distinct_samekey.q ql/src/test/queries/clientpositive/reduce_deduplicate.q ql/src/test/queries/clientpositive/reduce_deduplicate_extended.q ql/src/test/results/clientpositive/cluster.q.out ql/src/test/results/clientpositive/groupby2.q.out ql/src/test/results/clientpositive/groupby2_map_skew.q.out ql/src/test/results/clientpositive/groupby_cube1.q.out ql/src/test/results/clientpositive/groupby_distinct_samekey.q.out ql/src/test/results/clientpositive/groupby_rollup1.q.out ql/src/test/results/clientpositive/index_bitmap3.q.out ql/src/test/results/clientpositive/index_bitmap_auto.q.out ql/src/test/results/clientpositive/infer_bucket_sort.q.out ql/src/test/results/clientpositive/ppd2.q.out ql/src/test/results/clientpositive/ppd_gby_join.q.out ql/src/test/results/clientpositive/reduce_deduplicate.q.out ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out ql/src/test/results/clientpositive/semijoin.q.out ql/src/test/results/clientpositive/union24.q.out ql/src/test/results/compiler/plan/input2.q.xml ql/src/test/results/compiler/plan/input3.q.xml ql/src/test/results/compiler/plan/join1.q.xml ql/src/test/results/compiler/plan/join2.q.xml ql/src/test/results/compiler/plan/join3.q.xml ql/src/test/results/compiler/plan/sample1.q.xml ql/src/test/results/compiler/plan/sample2.q.xml ql/src/test/results/compiler/plan/sample3.q.xml ql/src/test/results/compiler/plan/sample4.q.xml ql/src/test/results/compiler/plan/sample5.q.xml ql/src/test/results/compiler/plan/sample6.q.xml ql/src/test/results/compiler/plan/sample7.q.xml To: JIRA, navis Cc: hagleitn, njain optimize orderby followed by a groupby -- Key: HIVE-2340 URL: https://issues.apache.org/jira/browse/HIVE-2340 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Labels: perfomance Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, HIVE-2340.13.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira