[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17574089#comment-17574089 ] Wei Zhang commented on HIVE-15121: -- >From my explained plan, there are 2 intermediate path in the final job, as >stated in the MoveTask: hive_hive_xxx-1/-ext-10001 -> hive_hive_xxx-1/-ext-1 -> final table path The question is, if hive_hive_xxx-1/-ext-10001 is under s3, doesn't we need 2 expensive copies in s3 server-side? If hive_hive_xxx-1/-ext-10001 and hive_hive_xxx-1/-ext-1 are in HDFS, just like [~smurthy] said, it will be just a super-fast HDFS rename. > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: 3.2.0 > > Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, > HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, > HIVE-15121.WIP.patch, HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a multi-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. > The advantage is that any copying of data that needs to be done from the > scratch directory to the final table directory can be done server-side, > within the blobstore. The MoveTask simply renames data from the scratch > directory to the final table location, which should translate to a > server-side COPY request. This way HiveServer2 doesn't have to actually copy > any data, it just tells the blobstore to do all the work. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806759#comment-16806759 ] Subhash Reddy commented on HIVE-15121: -- Hi [~stakiar], [~vgarg] Is this change went live in any Hive Version. I am looking for a patch with this feature excluding changes proposed in -HIVE-17620.- Because of -HIVE-17620,- it is writing to S3 temp folder first and then moving from temp to target folder. Moving from temp to target folder is taking very long time. We want even final MR job should write to HDFS and then move data from HDFS to final target table. > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: 3.2.0 > > Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, > HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, > HIVE-15121.WIP.patch, HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a multi-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. > The advantage is that any copying of data that needs to be done from the > scratch directory to the final table directory can be done server-side, > within the blobstore. The MoveTask simply renames data from the scratch > directory to the final table location, which should translate to a > server-side COPY request. This way HiveServer2 doesn't have to actually copy > any data, it just tells the blobstore to do all the work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615279#comment-16615279 ] Damon Cortesi commented on HIVE-15121: -- Will this enable similar optimizations in the FileMergeOperators? Doesn't look like they use the new functionality to get a temp dir. > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: 3.2.0 > > Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, > HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, > HIVE-15121.WIP.patch, HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a multi-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. > The advantage is that any copying of data that needs to be done from the > scratch directory to the final table directory can be done server-side, > within the blobstore. The MoveTask simply renames data from the scratch > directory to the final table location, which should translate to a > server-side COPY request. This way HiveServer2 doesn't have to actually copy > any data, it just tells the blobstore to do all the work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182896#comment-16182896 ] Gergely Hajós commented on HIVE-15121: -- [~stakiar] [~zhihao] created a new ticket: HIVE-17620 > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 2.3.0 > > Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, > HIVE-15121.3.patch, HIVE-15121.patch, HIVE-15121.WIP.1.patch, > HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a multi-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. > The advantage is that any copying of data that needs to be done from the > scratch directory to the final table directory can be done server-side, > within the blobstore. The MoveTask simply renames data from the scratch > directory to the final table location, which should translate to a > server-side COPY request. This way HiveServer2 doesn't have to actually copy > any data, it just tells the blobstore to do all the work. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180458#comment-16180458 ] wangzhihao commented on HIVE-15121: --- Hi [~stakiar] Should not [this statement | https://github.com/apache/hive/blob/aa7edfefe20cede2c37d8c8b6c864c3b6039923f/ql/src/java/org/apache/hadoop/hive/ql/Context.java#L497 ] be {code:java} if (!isFinalJob && BlobStorageUtils.areOptimizationsEnabled(conf)) { {code} We should use HDFS staging dir when optimization is enabled and the MR is not final job. But current logic will call getMRTmpPath() code when areOptimizationsEnabled() return false. > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 2.3.0 > > Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, > HIVE-15121.3.patch, HIVE-15121.patch, HIVE-15121.WIP.1.patch, > HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a multi-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. > The advantage is that any copying of data that needs to be done from the > scratch directory to the final table directory can be done server-side, > within the blobstore. The MoveTask simply renames data from the scratch > directory to the final table location, which should translate to a > server-side COPY request. This way HiveServer2 doesn't have to actually copy > any data, it just tells the blobstore to do all the work. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899249#comment-15899249 ] Lefty Leverenz commented on HIVE-15121: --- Sergio Peña documented *hive.blobstore.optimizations.enabled* in a new Blobstore section of Hive Configuration Properties: * [Configuration Properties -- Blobstore (i.e. Amazon S3) | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Blobstore(i.e.AmazonS3)] * [Configuration Properties -- hive.blobstore.optimizations.enabled | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.blobstore.optimizations.enabled] Removed the TODOC2.2 label. > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 2.2.0 > > Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, > HIVE-15121.3.patch, HIVE-15121.patch, HIVE-15121.WIP.1.patch, > HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a multi-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. > The advantage is that any copying of data that needs to be done from the > scratch directory to the final table directory can be done server-side, > within the blobstore. The MoveTask simply renames data from the scratch > directory to the final table location, which should translate to a > server-side COPY request. This way HiveServer2 doesn't have to actually copy > any data, it just tells the blobstore to do all the work. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15688365#comment-15688365 ] Lefty Leverenz commented on HIVE-15121: --- Doc note: This adds *hive.blobstore.optimizations.enabled* to HiveConf.java, so it needs to be documented in the wiki for release 2.2.0. I recommend using the description in patch 2 (revision 3 on the Review Board) instead of referring back here for details: {quote} This parameter enables a number of optimizations when running on blobstores: (1) If hive.blobstore.use.blobstore.as.scratchdir is false, force the last Hive job to write to the blobstore. This is a performance optimization that forces the final FileSinkOperator to write to the blobstore. The advantage is that any copying of data that needs to be done from the scratch directory to the final table directory can be done server-side, within the blobstore. The MoveTask simply renames data from the scratch directory to the final table location, which should translate to a server-side COPY request. This way HiveServer2 doesn't have to actually copy any data, it just tells the blobstore to do all the work. {quote} I'm not sure if *hive.blobstore.optimizations.enabled* belongs in the general query execution section or a new blobstore section, along with the two parameters created by HIVE-14270. * [Hive Configuration Properties | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties] Added a TODOC2.2 label. > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, > HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, > HIVE-15121.WIP.patch, HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a multi-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. > The advantage is that any copying of data that needs to be done from the > scratch directory to the final table directory can be done server-side, > within the blobstore. The MoveTask simply renames data from the scratch > directory to the final table location, which should translate to a > server-side COPY request. This way HiveServer2 doesn't have to actually copy > any data, it just tells the blobstore to do all the work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15688185#comment-15688185 ] Sahil Takiar commented on HIVE-15121: - [~spena] test failures look unrelated, and the tests are failing on other patches too. > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, > HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, > HIVE-15121.WIP.patch, HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a multi-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. > The advantage is that any copying of data that needs to be done from the > scratch directory to the final table directory can be done server-side, > within the blobstore. The MoveTask simply renames data from the scratch > directory to the final table location, which should translate to a > server-side COPY request. This way HiveServer2 doesn't have to actually copy > any data, it just tells the blobstore to do all the work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15688037#comment-15688037 ] Hive QA commented on HIVE-15121: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12840100/HIVE-15121.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10717 tests executed *Failed tests:* {noformat} TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=114) [join39.q,bucketsortoptimize_insert_7.q,vector_distinct_2.q,join11.q,union13.q,dynamic_rdd_cache.q,auto_sortmerge_join_16.q,windowing.q,union_remove_3.q,skewjoinopt7.q,stats7.q,annotate_stats_join.q,multi_insert_lateral_view.q,ptf_streaming.q,join_1to1.q] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=43) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=91) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2244/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2244/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2244/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12840100 - PreCommit-HIVE-Build > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, > HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, > HIVE-15121.WIP.patch, HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a multi-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. > The advantage is that any copying of data that needs to be done from the > scratch directory to the final table directory can be done server-side, > within the blobstore. The MoveTask simply renames data from the scratch > directory to the final table location, which should translate to a > server-side COPY request. This way HiveServer2 doesn't have to actually copy > any data, it just tells the blobstore to do all the work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687818#comment-15687818 ] Sergio Peña commented on HIVE-15121: LGTM +1 > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, > HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, > HIVE-15121.WIP.patch, HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a mutli-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684823#comment-15684823 ] Sahil Takiar commented on HIVE-15121: - [~spena] test failures look unrelated, could you review? > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, > HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, > HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a mutli-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684727#comment-15684727 ] Hive QA commented on HIVE-15121: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12839864/HIVE-15121.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10701 tests executed *Failed tests:* {noformat} TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=104) [skewjoin_union_remove_2.q,avro_decimal_native.q,skewjoinopt8.q,bucketmapjoin_negative3.q,union32.q,stats6.q,groupby2_map.q,stats_only_null.q,insert_into3.q,join18_multi_distinct.q,vectorization_6.q,cross_join.q,stats9.q,timestamp_1.q,join24.q] TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=106) [union_remove_1.q,ppd_outer_join2.q,groupby1_noskew.q,join20.q,smb_mapjoin_13.q,multi_insert.q,groupby_rollup1.q,temp_table_gb1.q,vector_string_concat.q,smb_mapjoin_6.q,metadata_only_queries.q,auto_sortmerge_join_12.q,groupby_bigdata.q,groupby3_map_multi_distinct.q,innerjoin.q] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_1] (batchId=90) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=91) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] (batchId=90) org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testTaskStatus (batchId=207) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2230/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2230/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2230/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12839864 - PreCommit-HIVE-Build > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, > HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, > HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a mutli-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677025#comment-15677025 ] Sergio Peña commented on HIVE-15121: OK, I see what you're saying. The partial mask is beneficial on some cases though, like doing a {{dfs -ls s3a://table}} in the .q test to verify final files were written on S3. But for the EXPLAIN I understand the problem. I'll do the quick fix. > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-15121.1.patch, HIVE-15121.WIP.1.patch, > HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a mutli-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675975#comment-15675975 ] Sahil Takiar commented on HIVE-15121: - [~spena] unfortunately HIVE-15226 doesn't help much for this patch. HIVE-15226 basically replaces the blobstore URI with {{### test.blobstore.path ###}}, but this URI mainly occurs when listing out file names; for example, when listing out the staging directory of an MR job. The problem is that the staging directory values get replaced by {{QTestUtil}}. The {{QTestUtil}} class matches for {{.*.hive-staging.*}} and replaces it with {{ A masked pattern was here }}. It does this for good reasons, the staging directory typically has some non-deterministic id in the file path. For this specific patch, the {{EXPLAIN EXTENDED}} outputs for a mutli-MR job query end up being the exact same when this optimization is enabled vs. when it is disabled. Mainly because of the behavior above. One easy way to fix this would be to match on {{.*s3a:.*}} and replaces it with {{### test.blobstore.path ###}}; {{QTestUtil}} already does for {{.*hdfs:.*}} and {{.*file:.*}}. Let me know what you think of this approach, I can add the changes to this patch. > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-15121.1.patch, HIVE-15121.WIP.1.patch, > HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a mutli-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646924#comment-15646924 ] Lefty Leverenz commented on HIVE-15121: --- I left a nit-picking comment on RB. > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-15121.1.patch, HIVE-15121.WIP.1.patch, > HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a mutli-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646375#comment-15646375 ] Hive QA commented on HIVE-15121: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12837877/HIVE-15121.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10628 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=91) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2015/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2015/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2015/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12837877 - PreCommit-HIVE-Build > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-15121.1.patch, HIVE-15121.WIP.1.patch, > HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a mutli-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645989#comment-15645989 ] Sahil Takiar commented on HIVE-15121: - [~spena] added qtests, and tested locally. Ready for review. RB link is attached to this JIRA. > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-15121.1.patch, HIVE-15121.WIP.1.patch, > HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a mutli-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638397#comment-15638397 ] Hive QA commented on HIVE-15121: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12837320/HIVE-15121.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10628 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_bulk] (batchId=89) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1981/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1981/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1981/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12837320 - PreCommit-HIVE-Build > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, > HIVE-15121.WIP.patch, HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a mutli-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638062#comment-15638062 ] Sahil Takiar commented on HIVE-15121: - Cleaned up the patch, and tested it more thoroughly. Test failures should be resolved by the next Hive QA run. Ready for review. Notes: * Approach is to find all the places where the scratch dir is specified for the final MR / Tez / Spark job and modify the {{Context.getTempDirForPath}} to take in an optional boolean {{isFinalJob}} that specifies the scratch directory is being made for the final MR job * Adding a new config called {{hive.blobstore.write.final.output.to.blobstore}} that toggles this behavior in case a user wants all intermediate data to be stored on HDFS * Modified a few invocations of the {{getTempDirForPath}} in the {{SemanticAnalyzer.genFileSinkPlan}} - this method create the final {{FileSinkDesc}} for the job * Tested locally against an S3 bucket; the explain output of Hive query with two MR jobs shows that the first one writes to a local file, and the second writes to S3 Will do some more local validation, and writes some unit tests + qtests. > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar > Attachments: HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, > HIVE-15121.WIP.patch, HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a mutli-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637848#comment-15637848 ] Hive QA commented on HIVE-15121: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12837281/HIVE-15121.WIP.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 153 failed/errored test(s), 10628 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert] (batchId=215) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_table] (batchId=19) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_5] (batchId=37) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_6] (batchId=59) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[binary_output_format] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin5] (batchId=75) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative2] (batchId=62) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative] (batchId=21) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[case_sensitivity] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cast1] (batchId=67) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog_dp] (batchId=17) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog_type] (batchId=1) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas_colname] (batchId=53) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas_uses_database_location] (batchId=31) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cte_3] (batchId=31) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cte_mat_3] (batchId=21) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cte_mat_4] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cte_mat_5] (batchId=2) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dynamic_rdd_cache] (batchId=48) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explain_ddl] (batchId=42) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_1_23] (batchId=70) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_3] (batchId=72) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_5] (batchId=41) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_7] (batchId=69) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_skew_1_23] (batchId=8) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_test_1] (batchId=7) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto] (batchId=41) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_file_format] (batchId=49) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables_compact] (batchId=32) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_multiple] (batchId=31) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_partitioned] (batchId=10) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_update] (batchId=66) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_compression] (batchId=9) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_serde] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_skewtable] (batchId=73) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input11] (batchId=49) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input12] (batchId=68) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input13] (batchId=68) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input34] (batchId=16) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input35] (batchId=53) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input36] (batchId=13) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input38] (batchId=12) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input6] (batchId=80) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input7] (batchId=66) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input8] (batchId=8) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input9] (batchId=54) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_dynamicserde] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_part1] (batchId=7) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_part2] (batchId=42) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_part5] (batchId=35) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_testsequencefile] (batchId=77) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_testxpath2] (batchId=33)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634947#comment-15634947 ] Sahil Takiar commented on HIVE-15121: - Something seems to be wrong with ptest will try re-attaching patch tomorrow. > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar > Attachments: HIVE-15121.WIP.1.patch, HIVE-15121.WIP.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a mutli-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634938#comment-15634938 ] Sahil Takiar commented on HIVE-15121: - i think the issues with merge-job should be fixed by HIVE-15114 > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar > Attachments: HIVE-15121.WIP.1.patch, HIVE-15121.WIP.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a mutli-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634922#comment-15634922 ] Hive QA commented on HIVE-15121: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12837012/HIVE-15121.WIP.1.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1959/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1959/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1959/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2016-11-04 01:50:30.500 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + MAVEN_OPTS='-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-1959/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2016-11-04 01:50:30.502 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 97f7d7b HIVE-15109: Set MaxPermSize to 256M for maven tests with JDKs prior to 8 (Chaoyu Tang, reviewed by Siddharth Seth, Sergio Pena) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 97f7d7b HIVE-15109: Set MaxPermSize to 256M for maven tests with JDKs prior to 8 (Chaoyu Tang, reviewed by Siddharth Seth, Sergio Pena) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2016-11-04 01:50:31.419 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch Going to apply patch with: patch -p1 patching file common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java patching file common/src/java/org/apache/hadoop/hive/conf/HiveConf.java patching file ql/src/java/org/apache/hadoop/hive/ql/Context.java patching file ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java patching file ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java patching file ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java patching file ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java + [[ maven == \m\a\v\e\n ]] + rm -rf /data/hiveptest/working/maven/org/apache/hive + mvn -B clean install -DskipTests -T 4 -q -Dmaven.repo.local=/data/hiveptest/working/maven ANTLR Parser Generator Version 3.4 org/apache/hadoop/hive/metastore/parser/Filter.g DataNucleus Enhancer (version 4.1.6) for API "JDO" DataNucleus Enhancer : Classpath >> /usr/share/maven/boot/plexus-classworlds-2.x.jar ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MOrder ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MColumnDescriptor ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MStringList ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MStorageDescriptor ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MPartition ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MIndex ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MRole ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MRoleMap ENHANCED (Persistable) :
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634908#comment-15634908 ] Hive QA commented on HIVE-15121: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12837008/HIVE-15121.WIP.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1958/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1958/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1958/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2016-11-04 01:42:46.284 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + MAVEN_OPTS='-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-1958/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2016-11-04 01:42:46.287 + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive 345353c..97f7d7b master -> origin/master + git reset --hard HEAD HEAD is now at 345353c HIVE-15039: A better job monitor console output for HoS (Rui reviewed by Xuefu and Ferdinand) + git clean -f -d Removing metastore/scripts/upgrade/postgres/037-HIVE-11072.postgres.sql Removing testutils/metastore/dataload.properties Removing testutils/metastore/dbs/derby/validate.sh Removing testutils/metastore/dbs/mysql/validate.sh Removing testutils/metastore/dbs/postgres/validate.sh Removing testutils/metastore/metastore-validation-test.sh + git checkout master Already on 'master' Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded. (use "git pull" to update your local branch) + git reset --hard origin/master HEAD is now at 97f7d7b HIVE-15109: Set MaxPermSize to 256M for maven tests with JDKs prior to 8 (Chaoyu Tang, reviewed by Siddharth Seth, Sergio Pena) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2016-11-04 01:42:47.771 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch Going to apply patch with: patch -p1 patching file common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java patching file common/src/java/org/apache/hadoop/hive/conf/HiveConf.java patching file ql/src/java/org/apache/hadoop/hive/ql/Context.java patching file ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java patching file ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java patching file ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java patching file ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java + [[ maven == \m\a\v\e\n ]] + rm -rf /data/hiveptest/working/maven/org/apache/hive + mvn -B clean install -DskipTests -T 4 -q -Dmaven.repo.local=/data/hiveptest/working/maven ANTLR Parser Generator Version 3.4 org/apache/hadoop/hive/metastore/parser/Filter.g DataNucleus Enhancer (version 4.1.6) for API "JDO" DataNucleus Enhancer : Classpath >> /usr/share/maven/boot/plexus-classworlds-2.x.jar ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MOrder ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MColumnDescriptor ENHANCED
[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
[ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634332#comment-15634332 ] Sahil Takiar commented on HIVE-15121: - I'm hoping it should be a similar to HIVE-14270. I think there is a way to find the last MR job in a plan. The {{MapredWork.isFinalMapRed}} method should work here. > Last MR job in Hive should be able to write to a different scratch directory > > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Sahil Takiar > > Hive should be able to configure all intermediate MR jobs to write to HDFS, > but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that > for a mutli-job query, all intermediate MR jobs write to HDFS, and then the > final job writes to S3. Writing to HDFS should be faster than writing to S3, > so it makes more sense to write intermediate data to HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)