[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2022-08-02 Thread Wei Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17574089#comment-17574089
 ] 

Wei Zhang commented on HIVE-15121:
--

>From my explained plan, there are 2 intermediate path in the final job, as 
>stated in the MoveTask:

hive_hive_xxx-1/-ext-10001 -> hive_hive_xxx-1/-ext-1 -> final table path

The question is, if hive_hive_xxx-1/-ext-10001 is under s3, doesn't we need 2 
expensive copies in s3 server-side?

If hive_hive_xxx-1/-ext-10001 and hive_hive_xxx-1/-ext-1 are in HDFS, just 
like [~smurthy] said, it will be just a super-fast HDFS rename.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2019-04-01 Thread Subhash Reddy (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806759#comment-16806759
 ] 

Subhash Reddy commented on HIVE-15121:
--

Hi [~stakiar], [~vgarg]

Is this change went live in any Hive Version. I am looking for a patch with 
this feature excluding changes proposed in -HIVE-17620.-

Because of -HIVE-17620,- it is writing to S3 temp folder first and then moving 
from temp to target folder. Moving from temp to target folder is taking very 
long time.

We want even final MR job should write to HDFS and then move data from HDFS to 
final target table.

 

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2018-09-14 Thread Damon Cortesi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615279#comment-16615279
 ] 

Damon Cortesi commented on HIVE-15121:
--

Will this enable similar optimizations in the FileMergeOperators? Doesn't look 
like they use the new functionality to get a temp dir.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2017-09-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182896#comment-16182896
 ] 

Gergely Hajós commented on HIVE-15121:
--

[~stakiar] [~zhihao] created a new ticket: HIVE-17620

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.3.0
>
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.patch, HIVE-15121.WIP.1.patch, 
> HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2017-09-26 Thread wangzhihao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180458#comment-16180458
 ] 

wangzhihao commented on HIVE-15121:
---

Hi [~stakiar]

Should not [this statement | 
https://github.com/apache/hive/blob/aa7edfefe20cede2c37d8c8b6c864c3b6039923f/ql/src/java/org/apache/hadoop/hive/ql/Context.java#L497
 ] be 

{code:java}
if (!isFinalJob && BlobStorageUtils.areOptimizationsEnabled(conf)) {
{code}

We should use HDFS staging dir when optimization is enabled and the MR is not 
final job. But current logic will call getMRTmpPath() code when 
areOptimizationsEnabled() return false.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.3.0
>
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.patch, HIVE-15121.WIP.1.patch, 
> HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2017-03-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899249#comment-15899249
 ] 

Lefty Leverenz commented on HIVE-15121:
---

Sergio Peña documented *hive.blobstore.optimizations.enabled* in a new 
Blobstore section of Hive Configuration Properties:

* [Configuration Properties -- Blobstore (i.e. Amazon S3) | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Blobstore(i.e.AmazonS3)]
* [Configuration Properties -- hive.blobstore.optimizations.enabled | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.blobstore.optimizations.enabled]

Removed the TODOC2.2 label.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.2.0
>
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.patch, HIVE-15121.WIP.1.patch, 
> HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-22 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15688365#comment-15688365
 ] 

Lefty Leverenz commented on HIVE-15121:
---

Doc note:  This adds *hive.blobstore.optimizations.enabled* to HiveConf.java, 
so it needs to be documented in the wiki for release 2.2.0.  I recommend using 
the description in patch 2 (revision 3 on the Review Board) instead of 
referring back here for details:

{quote}
This parameter enables a number of optimizations when running on blobstores:
(1) If hive.blobstore.use.blobstore.as.scratchdir is false, force the last Hive 
job to write to the blobstore. This is a performance optimization that forces 
the final FileSinkOperator to write to the blobstore. The advantage is that any 
copying of data that needs to be done from the scratch directory to the final 
table directory can be done server-side, within the blobstore. The MoveTask 
simply renames data from the scratch directory to the final table location, 
which should translate to a server-side COPY request. This way HiveServer2 
doesn't have to actually copy any data, it just tells the blobstore to do all 
the work.
{quote}

I'm not sure if *hive.blobstore.optimizations.enabled* belongs in the general 
query execution section or a new blobstore section, along with the two 
parameters created by HIVE-14270.

* [Hive Configuration Properties | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties]

Added a TODOC2.2 label.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-22 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15688185#comment-15688185
 ] 

Sahil Takiar commented on HIVE-15121:
-

[~spena] test failures look unrelated, and the tests are failing on other 
patches too.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15688037#comment-15688037
 ] 

Hive QA commented on HIVE-15121:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12840100/HIVE-15121.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10717 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=114)

[join39.q,bucketsortoptimize_insert_7.q,vector_distinct_2.q,join11.q,union13.q,dynamic_rdd_cache.q,auto_sortmerge_join_16.q,windowing.q,union_remove_3.q,skewjoinopt7.q,stats7.q,annotate_stats_join.q,multi_insert_lateral_view.q,ptf_streaming.q,join_1to1.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=43)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=91)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2244/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2244/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2244/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12840100 - PreCommit-HIVE-Build

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687818#comment-15687818
 ] 

Sergio Peña commented on HIVE-15121:


LGTM
+1

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-21 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684823#comment-15684823
 ] 

Sahil Takiar commented on HIVE-15121:
-

[~spena] test failures look unrelated, could you review?

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, 
> HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684727#comment-15684727
 ] 

Hive QA commented on HIVE-15121:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12839864/HIVE-15121.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10701 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=104)

[skewjoin_union_remove_2.q,avro_decimal_native.q,skewjoinopt8.q,bucketmapjoin_negative3.q,union32.q,stats6.q,groupby2_map.q,stats_only_null.q,insert_into3.q,join18_multi_distinct.q,vectorization_6.q,cross_join.q,stats9.q,timestamp_1.q,join24.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=106)

[union_remove_1.q,ppd_outer_join2.q,groupby1_noskew.q,join20.q,smb_mapjoin_13.q,multi_insert.q,groupby_rollup1.q,temp_table_gb1.q,vector_string_concat.q,smb_mapjoin_6.q,metadata_only_queries.q,auto_sortmerge_join_12.q,groupby_bigdata.q,groupby3_map_multi_distinct.q,innerjoin.q]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_1] 
(batchId=90)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=91)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=90)
org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testTaskStatus 
(batchId=207)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2230/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2230/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2230/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12839864 - PreCommit-HIVE-Build

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, 
> HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677025#comment-15677025
 ] 

Sergio Peña commented on HIVE-15121:


OK, I see what you're saying. The partial mask is beneficial on some cases 
though, like doing a {{dfs -ls s3a://table}} in the .q test to verify final 
files were written on S3. But for the EXPLAIN I understand the problem.

I'll do the quick fix. 

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.WIP.1.patch, 
> HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-17 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675975#comment-15675975
 ] 

Sahil Takiar commented on HIVE-15121:
-

[~spena] unfortunately HIVE-15226 doesn't help much for this patch. HIVE-15226 
basically replaces the blobstore URI with {{### test.blobstore.path ###}}, but 
this URI mainly occurs when listing out file names; for example, when listing 
out the staging directory of an MR job. The problem is that the staging 
directory values get replaced by {{QTestUtil}}. The {{QTestUtil}} class matches 
for {{.*.hive-staging.*}} and replaces it with {{ A masked pattern was here 
}}. It does this for good reasons, the staging directory typically has some 
non-deterministic id in the file path.

For this specific patch, the {{EXPLAIN EXTENDED}} outputs for a mutli-MR job 
query end up being the exact same when this optimization is enabled vs. when it 
is disabled. Mainly because of the behavior above.

One easy way to fix this would be to match on {{.*s3a:.*}} and replaces it with 
{{### test.blobstore.path ###}}; {{QTestUtil}} already does for {{.*hdfs:.*}} 
and {{.*file:.*}}.

Let me know what you think of this approach, I can add the changes to this 
patch.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.WIP.1.patch, 
> HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-08 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646924#comment-15646924
 ] 

Lefty Leverenz commented on HIVE-15121:
---

I left a nit-picking comment on RB.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.WIP.1.patch, 
> HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646375#comment-15646375
 ] 

Hive QA commented on HIVE-15121:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837877/HIVE-15121.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10628 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=91)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2015/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2015/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2015/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12837877 - PreCommit-HIVE-Build

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.WIP.1.patch, 
> HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-07 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645989#comment-15645989
 ] 

Sahil Takiar commented on HIVE-15121:
-

[~spena] added qtests, and tested locally. Ready for review. RB link is 
attached to this JIRA.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.WIP.1.patch, 
> HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638397#comment-15638397
 ] 

Hive QA commented on HIVE-15121:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837320/HIVE-15121.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10628 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_bulk] 
(batchId=89)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1981/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1981/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1981/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12837320 - PreCommit-HIVE-Build

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-04 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638062#comment-15638062
 ] 

Sahil Takiar commented on HIVE-15121:
-

Cleaned up the patch, and tested it more thoroughly. Test failures should be 
resolved by the next Hive QA run. Ready for review.

Notes:

* Approach is to find all the places where the scratch dir is specified for the 
final MR / Tez / Spark job and modify the {{Context.getTempDirForPath}} to take 
in an optional boolean {{isFinalJob}} that specifies the scratch directory is 
being made for the final MR job
* Adding a new config called {{hive.blobstore.write.final.output.to.blobstore}} 
that toggles this behavior in case a user wants all intermediate data to be 
stored on HDFS
* Modified a few invocations of the {{getTempDirForPath}} in the 
{{SemanticAnalyzer.genFileSinkPlan}} - this method create the final 
{{FileSinkDesc}} for the job 
* Tested locally against an S3 bucket; the explain output of Hive query with 
two MR jobs shows that the first one writes to a local file, and the second 
writes to S3

Will do some more local validation, and writes some unit tests + qtests.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
> Attachments: HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637848#comment-15637848
 ] 

Hive QA commented on HIVE-15121:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837281/HIVE-15121.WIP.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 153 failed/errored test(s), 10628 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=215)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_table] 
(batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_5] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_6] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[binary_output_format] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin5] 
(batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative2] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[case_sensitivity] 
(batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cast1] (batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog_dp] 
(batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog_type] 
(batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas_colname] 
(batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas_uses_database_location]
 (batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cte_3] (batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cte_mat_3] (batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cte_mat_4] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cte_mat_5] (batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dynamic_rdd_cache] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explain_ddl] (batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_1_23] 
(batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_3] 
(batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_5] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_7] 
(batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_skew_1_23] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_test_1] 
(batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto] (batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_file_format] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables_compact]
 (batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_multiple] 
(batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_partitioned] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_update] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_compression] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_serde] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_skewtable] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input11] (batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input12] (batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input13] (batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input34] (batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input35] (batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input36] (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input38] (batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input6] (batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input7] (batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input8] (batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input9] (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_dynamicserde] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_part1] (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_part2] (batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_part5] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_testsequencefile] 
(batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_testxpath2] 
(batchId=33)

[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-03 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634947#comment-15634947
 ] 

Sahil Takiar commented on HIVE-15121:
-

Something seems to be wrong with ptest will try re-attaching patch tomorrow.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
> Attachments: HIVE-15121.WIP.1.patch, HIVE-15121.WIP.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-03 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634938#comment-15634938
 ] 

Sahil Takiar commented on HIVE-15121:
-

i think the issues with merge-job should be fixed by HIVE-15114

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
> Attachments: HIVE-15121.WIP.1.patch, HIVE-15121.WIP.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634922#comment-15634922
 ] 

Hive QA commented on HIVE-15121:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837012/HIVE-15121.WIP.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1959/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1959/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1959/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2016-11-04 01:50:30.500
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ MAVEN_OPTS='-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-1959/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2016-11-04 01:50:30.502
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 97f7d7b HIVE-15109: Set MaxPermSize to 256M for maven tests with 
JDKs prior to 8 (Chaoyu Tang, reviewed by Siddharth Seth, Sergio Pena)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 97f7d7b HIVE-15109: Set MaxPermSize to 256M for maven tests with 
JDKs prior to 8 (Chaoyu Tang, reviewed by Siddharth Seth, Sergio Pena)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2016-11-04 01:50:31.419
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p1
patching file 
common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java
patching file common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
patching file ql/src/java/org/apache/hadoop/hive/ql/Context.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java
patching file ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
ANTLR Parser Generator  Version 3.4
org/apache/hadoop/hive/metastore/parser/Filter.g
DataNucleus Enhancer (version 4.1.6) for API "JDO"
DataNucleus Enhancer : Classpath
>>  /usr/share/maven/boot/plexus-classworlds-2.x.jar
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MOrder
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MColumnDescriptor
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MStringList
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MPartition
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MIndex
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MRole
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MRoleMap
ENHANCED (Persistable) : 

[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634908#comment-15634908
 ] 

Hive QA commented on HIVE-15121:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837008/HIVE-15121.WIP.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1958/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1958/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1958/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2016-11-04 01:42:46.284
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ MAVEN_OPTS='-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-1958/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2016-11-04 01:42:46.287
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   345353c..97f7d7b  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 345353c HIVE-15039: A better job monitor console output for HoS 
(Rui reviewed by Xuefu and Ferdinand)
+ git clean -f -d
Removing metastore/scripts/upgrade/postgres/037-HIVE-11072.postgres.sql
Removing testutils/metastore/dataload.properties
Removing testutils/metastore/dbs/derby/validate.sh
Removing testutils/metastore/dbs/mysql/validate.sh
Removing testutils/metastore/dbs/postgres/validate.sh
Removing testutils/metastore/metastore-validation-test.sh
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 97f7d7b HIVE-15109: Set MaxPermSize to 256M for maven tests with 
JDKs prior to 8 (Chaoyu Tang, reviewed by Siddharth Seth, Sergio Pena)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2016-11-04 01:42:47.771
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p1
patching file 
common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java
patching file common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
patching file ql/src/java/org/apache/hadoop/hive/ql/Context.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java
patching file ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
ANTLR Parser Generator  Version 3.4
org/apache/hadoop/hive/metastore/parser/Filter.g
DataNucleus Enhancer (version 4.1.6) for API "JDO"
DataNucleus Enhancer : Classpath
>>  /usr/share/maven/boot/plexus-classworlds-2.x.jar
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MOrder
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MColumnDescriptor
ENHANCED 

[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-03 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634332#comment-15634332
 ] 

Sahil Takiar commented on HIVE-15121:
-

I'm hoping it should be a similar to HIVE-14270. I think there is a way to find 
the last MR job in a plan. The {{MapredWork.isFinalMapRed}} method should work 
here.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)