[jira] [Updated] (MAPREDUCE-6362) History Plugin should be updated

2018-03-30 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6362:
---
Target Version/s:   (was: 2.7.6)
  Status: Open  (was: Patch Available)

This had been a while. The patch is not applying any more. Canceling 
patch-available.

> History Plugin should be updated
> 
>
> Key: MAPREDUCE-6362
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6362
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Mit Desai
>Assignee: Mit Desai
>Priority: Major
> Attachments: MAPREDUCE-6362.patch
>
>
> As applications complete, the RM tracks their IDs in a completed list. This 
> list is routinely truncated to limit the total number of application 
> remembered by the RM.
> When a user clicks the History for a job, either the browser is redirected to 
> the application's tracking link obtained from the stored application 
> instance. But when the application has been purged from the RM, an error is 
> displayed.
> In very busy clusters the rate at which applications complete can cause 
> applications to be purged from the RM's internal list within hours, which 
> breaks the proxy URLs users have saved for their jobs.
> We would like the RM to provide valid tracking links persist so that users 
> are not frustrated by broken links.
> With the current plugin in place, redirections for the Mapreduce jobs works 
> but we need the add functionality for tez jobs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6988) Let JHS support different file systems for intermediate_done and done

2017-12-15 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6988:
---
Fix Version/s: (was: 2.7.5)

> Let JHS support different file systems for intermediate_done and done
> -
>
> Key: MAPREDUCE-6988
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6988
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.7.4
>Reporter: Johan Gustavsson
>Priority: Minor
> Attachments: MAPREDUCE-6988.000.patch, MAPREDUCE-6988.001.patch, 
> MAPREDUCE-6988.002.patch, MAPREDUCE-6988.003.patch
>
>
> Currently JHS uses filecontext to move files from intermediate_done to done 
> folder. Since filecontext limits the use to 1 filesystem it makes it harder 
> to use s3 as a storage for jhist files. By moving this to filesystem 
> interface we can set hdfs for intermediate storage and s3 as long term 
> storage therefore reducing the number of puts to s3 and removing the need for 
> all M/R containers to carry a s3 sdk.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6950) Error Launching job : java.io.IOException: Unknown Job job_xxx_xxx

2017-12-15 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6950:
---
Fix Version/s: (was: 2.7.5)

> Error Launching job : java.io.IOException: Unknown Job job_xxx_xxx
> --
>
> Key: MAPREDUCE-6950
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6950
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.7.1
>Reporter: zhengchenyu
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> some job report error, like this:
> {code}
> hadoop.mapreduce.Job.monitorAndPrintJob(Job.java 1367) [main] :  map 100% 
> reduce 100%
> [2017-08-31T20:27:12.591+08:00] [INFO] 
> hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java 277) 
> [main] : Application state is completed. FinalApplicationStatus=SUCCEEDED. 
> Redirecting to job history server
> [2017-08-31T20:27:12.821+08:00] [INFO] 
> hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java 277) 
> [main] : Application state is completed. FinalApplicationStatus=SUCCEEDED. 
> Redirecting to job history server
> [2017-08-31T20:27:13.039+08:00] [INFO] 
> hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java 277) 
> [main] : Application state is completed. FinalApplicationStatus=SUCCEEDED. 
> Redirecting to job history server
> [2017-08-31T20:27:13.256+08:00] [ERROR] 
> hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java 1034) [main] : 
> Error Launching job : java.io.IOException: Unknown Job job_xxx_xxx
> {code}
> I found the am container log, like below. Here we know error happened in 
> pipeline, maybe some dn error. And I also found some other reason which close 
> the JobHistoryEventHandler. So MR AM can't write the information for JH. So 
> client counldn't know whether the appplication is finished. 
> {code}
> 2017-08-31 20:27:10,813 INFO [Thread-1968] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: In stop, 
> writing event MAP_ATTEMPT_STARTED
> 2017-08-31 20:27:10,814 ERROR [Thread-1968] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error writing 
> History Event: 
> org.apache.hadoop.mapreduce.jobhistory.TaskAttemptStartedEvent@2055ea0a
> java.io.EOFException: Premature EOF: no length prefix available
> at 
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2292)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1317)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
> 2017-08-31 20:27:10,814 INFO [Thread-1968] 
> org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler 
> failed in state STOPPED; cause: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.EOFException: 
> Premature EOF: no length prefix available
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.EOFException: 
> Premature EOF: no length prefix available
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:580)
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:374)
>  
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
> at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
> {code}
> This problem is serious , especially for hive. Job must rerun meaninglessly!  
> So I think we need to retry the operation of writing history event. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8

2017-12-07 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6165:
---
Fix Version/s: (was: 2.7.6)
   2.7.5

> [JDK8] TestCombineFileInputFormat failed on JDK8
> 
>
> Key: MAPREDUCE-6165
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Akira Ajisaka
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1, 2.7.5
>
> Attachments: MAPREDUCE-6165-001.patch, MAPREDUCE-6165-002.patch, 
> MAPREDUCE-6165-003.patch, MAPREDUCE-6165-003.patch, MAPREDUCE-6165-004.patch, 
> MAPREDUCE-6165-reproduce.patch
>
>
> The error msg:
> {noformat}
> testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
>   Time elapsed: 2.487 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: expected:<2> but was:<1>
>   at junit.framework.Assert.fail(Assert.java:57)
>   at junit.framework.Assert.failNotEquals(Assert.java:329)
>   at junit.framework.Assert.assertEquals(Assert.java:78)
>   at junit.framework.Assert.assertEquals(Assert.java:234)
>   at junit.framework.Assert.assertEquals(Assert.java:241)
>   at junit.framework.TestCase.assertEquals(TestCase.java:409)
>   at 
> org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911)
> testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
>   Time elapsed: 0.985 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: expected:<2> but was:<1>
>   at junit.framework.Assert.fail(Assert.java:57)
>   at junit.framework.Assert.failNotEquals(Assert.java:329)
>   at junit.framework.Assert.assertEquals(Assert.java:78)
>   at junit.framework.Assert.assertEquals(Assert.java:234)
>   at junit.framework.Assert.assertEquals(Assert.java:241)
>   at junit.framework.TestCase.assertEquals(TestCase.java:409)
>   at 
> org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events

2017-12-01 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275389#comment-16275389
 ] 

Konstantin Shvachko commented on MAPREDUCE-5124:


Cool, thanks. I moved branch-2.7 now to version 2.7.6. Please feel free to 
commit there.
Don't forget about the CHANGES :-)

> AM lacks flow control for task events
> -
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Peter Bacsko
> Fix For: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4
>
> Attachments: MAPREDUCE-5124-001.patch, MAPREDUCE-5124-002.patch, 
> MAPREDUCE-5124-003.patch, MAPREDUCE-5124-CoalescingPOC-1.patch, 
> MAPREDUCE-5124-CoalescingPOC2.patch, MAPREDUCE-5124-CoalescingPOC3.patch, 
> MAPREDUCE-5124-branch-2.001.patch, MAPREDUCE-5124-branch-2.002.patch, 
> MAPREDUCE-5124-branch-2.7.001.patch, MAPREDUCE-5124-branch-2.7.002.patch, 
> MAPREDUCE-5124-branch-2.7.002.patch, MAPREDUCE-5124-branch-2.8.001.patch, 
> MAPREDUCE-5124-branch-2.8.001.patch, MAPREDUCE-5124-branch-2.8.002.patch, 
> MAPREDUCE-5124-branch-2.8.002.patch, MAPREDUCE-5124-branch-2.9.001.patch, 
> MAPREDUCE-5124-branch-2.9.002.patch, MAPREDUCE-5124-proto.2.txt, 
> MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events

2017-12-01 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275326#comment-16275326
 ] 

Konstantin Shvachko commented on MAPREDUCE-5124:


Looking at the patch, seems like a pretty substantial change of event handling 
for AM.
I'll revert it from 2.7.5 for now, since I did not get clarity on stability of 
this feature.
We can later commit it to 2.7.6.

Also you forgot to update CHANGES.txt in branch-2.7. I know, I forget it all 
the time myself.

> AM lacks flow control for task events
> -
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Peter Bacsko
> Fix For: 2.7.5, 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4
>
> Attachments: MAPREDUCE-5124-001.patch, MAPREDUCE-5124-002.patch, 
> MAPREDUCE-5124-003.patch, MAPREDUCE-5124-CoalescingPOC-1.patch, 
> MAPREDUCE-5124-CoalescingPOC2.patch, MAPREDUCE-5124-CoalescingPOC3.patch, 
> MAPREDUCE-5124-branch-2.001.patch, MAPREDUCE-5124-branch-2.002.patch, 
> MAPREDUCE-5124-branch-2.7.001.patch, MAPREDUCE-5124-branch-2.7.002.patch, 
> MAPREDUCE-5124-branch-2.7.002.patch, MAPREDUCE-5124-branch-2.8.001.patch, 
> MAPREDUCE-5124-branch-2.8.001.patch, MAPREDUCE-5124-branch-2.8.002.patch, 
> MAPREDUCE-5124-branch-2.8.002.patch, MAPREDUCE-5124-branch-2.9.001.patch, 
> MAPREDUCE-5124-branch-2.9.002.patch, MAPREDUCE-5124-proto.2.txt, 
> MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events

2017-12-01 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275233#comment-16275233
 ] 

Konstantin Shvachko commented on MAPREDUCE-5124:


Hey [~pbacsko], [~jlowe] I am wondering how stable is this feature. You 
included it into branch-2.7 while a I was working on RC. Sorry I was not 
following this jira.

> AM lacks flow control for task events
> -
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Peter Bacsko
> Fix For: 2.7.5, 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4
>
> Attachments: MAPREDUCE-5124-001.patch, MAPREDUCE-5124-002.patch, 
> MAPREDUCE-5124-003.patch, MAPREDUCE-5124-CoalescingPOC-1.patch, 
> MAPREDUCE-5124-CoalescingPOC2.patch, MAPREDUCE-5124-CoalescingPOC3.patch, 
> MAPREDUCE-5124-branch-2.001.patch, MAPREDUCE-5124-branch-2.002.patch, 
> MAPREDUCE-5124-branch-2.7.001.patch, MAPREDUCE-5124-branch-2.7.002.patch, 
> MAPREDUCE-5124-branch-2.7.002.patch, MAPREDUCE-5124-branch-2.8.001.patch, 
> MAPREDUCE-5124-branch-2.8.001.patch, MAPREDUCE-5124-branch-2.8.002.patch, 
> MAPREDUCE-5124-branch-2.8.002.patch, MAPREDUCE-5124-branch-2.9.001.patch, 
> MAPREDUCE-5124-branch-2.9.002.patch, MAPREDUCE-5124-proto.2.txt, 
> MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-5420) Remove mapreduce.task.tmp.dir from mapred-default.xml

2017-09-25 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-5420:
---
Hadoop Flags: Incompatible change,Reviewed  (was: Reviewed)

We can never remove anything from {{*-default.xml}}
Isn't that obvious?

> Remove mapreduce.task.tmp.dir from mapred-default.xml
> -
>
> Key: MAPREDUCE-5420
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5420
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: James Carman
>  Labels: newbie
> Fix For: 2.7.0
>
> Attachments: MAPREDUCE-5420.patch, MAPREDUCE-5420.patch
>
>
> mapreduce.task.tmp.dir no longer has any effect, so it should no longer be 
> documented in mapred-default.  (There is no YARN equivalent for the property. 
>  It now is just always ./tmp).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6750) TestHSAdminServer.testRefreshSuperUserGroups is failing

2017-09-08 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6750:
---
Fix Version/s: 2.7.5

> TestHSAdminServer.testRefreshSuperUserGroups is failing
> ---
>
> Key: MAPREDUCE-6750
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6750
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1, 2.7.5
>
> Attachments: MAPREDUCE-6750.patch
>
>
> HADOOP-13442 changed {{AccessControlList}} to call {{getGroups()}} instead of 
> {{getGroupNames()}}. It should work if the mocks are updated to stub the 
> right method and return the right type.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6931) Remove TestDFSIO "Total Throughput" calculation

2017-09-01 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151228#comment-16151228
 ] 

Konstantin Shvachko commented on MAPREDUCE-6931:


The change is very simple, trivial. But important from API viewpoint, as I 
explained above.

> Remove TestDFSIO "Total Throughput" calculation
> ---
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Assignee: Dennis Huo
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-beta1, 2.7.5, 2.8.3
>
> Attachments: MAPREDUCE-6931-001.patch
>
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6931) Remove TestDFSIO "Total Throughput" calculation

2017-09-01 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6931:
---
Priority: Critical  (was: Trivial)

> Remove TestDFSIO "Total Throughput" calculation
> ---
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Assignee: Dennis Huo
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-beta1, 2.7.5, 2.8.3
>
> Attachments: MAPREDUCE-6931-001.patch
>
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6931) Remove TestDFSIO "Total Throughput" calculation

2017-09-01 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150945#comment-16150945
 ] 

Konstantin Shvachko commented on MAPREDUCE-6931:


Hey [~djp] I got confused with jira versions, as 2.8.3 was not available. Now 
it is, thanks.
But I hoped the confusing field, which this jira is removing, will not sneak 
into any releases at all. To avoid questions like what it means and why it was 
removed later on. 
I would strongly recommend merging this into 2.8.2. The final decision of 
course is up to the release manager.

> Remove TestDFSIO "Total Throughput" calculation
> ---
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Assignee: Dennis Huo
>Priority: Trivial
> Fix For: 2.9.0, 3.0.0-beta1, 2.7.5, 2.8.3
>
> Attachments: MAPREDUCE-6931-001.patch
>
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6931) Remove TestDFSIO "Total Throughput" calculation

2017-08-30 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6931:
---
   Resolution: Fixed
 Assignee: Dennis Huo
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.5
   2.8.2
   3.0.0-beta1
   2.9.0
   Status: Resolved  (was: Patch Available)

I just committed this to trunk, branch-2, branch-2.8, and branch-2.7.
Thank you [~dennishuo].

> Remove TestDFSIO "Total Throughput" calculation
> ---
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Assignee: Dennis Huo
>Priority: Trivial
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.2, 2.7.5
>
> Attachments: MAPREDUCE-6931-001.patch
>
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6931) Remove TestDFSIO "Total Throughput" calculation

2017-08-25 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142506#comment-16142506
 ] 

Konstantin Shvachko commented on MAPREDUCE-6931:


Looks like Jenkins is applying full pull request. [~dennishuo] could you please 
revert last commit, which refactors variable names.

> Remove TestDFSIO "Total Throughput" calculation
> ---
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
> Attachments: MAPREDUCE-6931-001.patch
>
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6931) Remove TestDFSIO "Total Throughput" calculation

2017-08-25 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6931:
---
Attachment: MAPREDUCE-6931-001.patch

Attaching patch to trigger Jenkins

> Remove TestDFSIO "Total Throughput" calculation
> ---
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
> Attachments: MAPREDUCE-6931-001.patch
>
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6931) Remove TestDFSIO "Total Throughput" calculation

2017-08-25 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6931:
---
Status: Patch Available  (was: Open)

> Remove TestDFSIO "Total Throughput" calculation
> ---
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
> Attachments: MAPREDUCE-6931-001.patch
>
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6931) Remove TestDFSIO "Total Throughput" calculation

2017-08-22 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137124#comment-16137124
 ] 

Konstantin Shvachko commented on MAPREDUCE-6931:


BTW, how do we make Jenkins run on a pull request?

> Remove TestDFSIO "Total Throughput" calculation
> ---
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6931) Remove TestDFSIO "Total Throughput" calculation

2017-08-22 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137111#comment-16137111
 ] 

Konstantin Shvachko commented on MAPREDUCE-6931:


We are trying to keep refactoring separate from changes.
I'll go ahead with the first commit. You can create another jira for 
refactoring if needed.

> Remove TestDFSIO "Total Throughput" calculation
> ---
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6931) Remove TestDFSIO "Total Throughput" calculation

2017-08-15 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126935#comment-16126935
 ] 

Konstantin Shvachko commented on MAPREDUCE-6931:


Ah yes, missed the refactoring.
+1 on the patch

> Remove TestDFSIO "Total Throughput" calculation
> ---
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6931) Remove TestDFSIO "Total Throughput" calculation

2017-08-14 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125600#comment-16125600
 ] 

Konstantin Shvachko commented on MAPREDUCE-6931:


I see this pull request:
https://github.com/apache/hadoop/pull/259/commits/b6c6dc613b071105d8e0536ee4496bf750d44e9d
which includes
{code}
-  String resultLine = "Seq Test exec time sec: " + (float)execTime / 1000;
+  String resultLine = "Seq Test exec time sec: " + msToSecs(execTime);
{code}
May be you can attach a patch here.

> Remove TestDFSIO "Total Throughput" calculation
> ---
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation

2017-08-11 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123786#comment-16123786
 ] 

Konstantin Shvachko edited comment on MAPREDUCE-6931 at 8/11/17 7:06 PM:
-

I see only one commit in your pull request, which still fixes the ms problem. 
What did I miss?
And it's fine to edit jira title if the original does not reflect the actual 
contents.


was (Author: shv):
I see only one commit in your pull request, which still fixes the ms problem. 
What did I miss?

> Fix TestDFSIO "Total Throughput" calculation
> 
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation

2017-08-11 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123786#comment-16123786
 ] 

Konstantin Shvachko commented on MAPREDUCE-6931:


I see only one commit in your pull request, which still fixes the ms problem. 
What did I miss?

> Fix TestDFSIO "Total Throughput" calculation
> 
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation

2017-08-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122346#comment-16122346
 ] 

Konstantin Shvachko commented on MAPREDUCE-6931:


[~dennishuo] I agree that "Total Throughput" metric highly depends on how you 
run the job. This is exactly the point that it makes it a Mapreduce metric, not 
HDFS. One can go to Yarn UI and divide HDFS bytes written by the job time for 
any job, but it does not measure HDFS write operation.
I think we should just remove it.

> Fix TestDFSIO "Total Throughput" calculation
> 
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation

2017-08-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122067#comment-16122067
 ] 

Konstantin Shvachko edited comment on MAPREDUCE-6931 at 8/10/17 6:40 PM:
-

Hey [~dennishuo], thanks for reporting this.
As I mentioned in HDFS-9153 ([in this 
comment|https://issues.apache.org/jira/browse/HDFS-9153?focusedCommentId=16122049=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16122049]
 the "Total Throughput" should be removed as a deceiving metrics.
Could you please fix this by removing the line.

Also, DFSIO issues should be filed on HDFS jira.


was (Author: shv):
Hey [~dennishuo], thanks for reporting this.
As I mentioned in HDFS-9153 ([in this 
comment|https://issues.apache.org/jira/browse/HDFS-9153?focusedCommentId=16122049=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16122049]
 the "Total Throughput" should be removed as a deceiving metrics.
Could you please fix this by removing the line.

> Fix TestDFSIO "Total Throughput" calculation
> 
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation

2017-08-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122067#comment-16122067
 ] 

Konstantin Shvachko commented on MAPREDUCE-6931:


Hey [~dennishuo], thanks for reporting this.
As I mentioned in HDFS-9153 ([in this 
comment|https://issues.apache.org/jira/browse/HDFS-9153?focusedCommentId=16122049=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16122049]
 the "Total Throughput" should be removed as a deceiving metrics.
Could you please fix this by removing the line.

> Fix TestDFSIO "Total Throughput" calculation
> 
>
> Key: MAPREDUCE-6931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: benchmarks, test
>Affects Versions: 2.8.0
>Reporter: Dennis Huo
>Priority: Trivial
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
> String resultLines[] = {
> "- TestDFSIO - : " + testType,
> "Date & time: " + new Date(System.currentTimeMillis()),
> "Number of files: " + tasks,
> " Total MBytes processed: " + df.format(toMB(size)),
> "  Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
> "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
> " Average IO rate mb/sec: " + df.format(med),
> "  IO rate std deviation: " + df.format(stdDev),
> " Test exec time sec: " + df.format((float)execTime / 1000),
> "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6697) Concurrent task limits should only be applied when necessary

2017-07-13 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6697:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.2
   2.7.4
   Status: Resolved  (was: Patch Available)

I just committed this to branch 2.8 and branch-2.7. Thank you [~nroberts].

> Concurrent task limits should only be applied when necessary
> 
>
> Key: MAPREDUCE-6697
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6697
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: Nathan Roberts
> Fix For: 2.9.0, 2.7.4, 2.8.2, 3.0.0-alpha4
>
> Attachments: MAPREDUCE-6697-branch-2.7.001.patch, 
> MAPREDUCE-6697-branch-2.8.001.patch, MAPREDUCE-6697-v1.patch
>
>
> The concurrent task limit feature should only adjust the ANY portion of the 
> AM heartbeat ask when a limit is truly necessary, otherwise extraneous 
> containers could be allocated by the RM to the AM adding some overhead to 
> both.  Specifying a concurrent task limit that is beyond the total number of 
> tasks in the job should be the same as asking for no limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6697) Concurrent task limits should only be applied when necessary

2017-07-12 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084913#comment-16084913
 ] 

Konstantin Shvachko commented on MAPREDUCE-6697:


+1. The patch looks good for both branches.

> Concurrent task limits should only be applied when necessary
> 
>
> Key: MAPREDUCE-6697
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6697
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: Nathan Roberts
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: MAPREDUCE-6697-branch-2.7.001.patch, 
> MAPREDUCE-6697-branch-2.8.001.patch, MAPREDUCE-6697-v1.patch
>
>
> The concurrent task limit feature should only adjust the ANY portion of the 
> AM heartbeat ask when a limit is truly necessary, otherwise extraneous 
> containers could be allocated by the RM to the AM adding some overhead to 
> both.  Specifying a concurrent task limit that is beyond the total number of 
> tasks in the job should be the same as asking for no limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6697) Concurrent task limits should only be applied when necessary

2017-07-12 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6697:
---
Attachment: MAPREDUCE-6697-branch-2.7.001.patch

The patch applies cleanly to 2.7, and TestRMContainerAllocator passes.
Let's just reattach the same patch for a full Jenkins run on branch-2.7.

> Concurrent task limits should only be applied when necessary
> 
>
> Key: MAPREDUCE-6697
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6697
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: Nathan Roberts
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: MAPREDUCE-6697-branch-2.7.001.patch, 
> MAPREDUCE-6697-branch-2.8.001.patch, MAPREDUCE-6697-v1.patch
>
>
> The concurrent task limit feature should only adjust the ANY portion of the 
> AM heartbeat ask when a limit is truly necessary, otherwise extraneous 
> containers could be allocated by the RM to the AM adding some overhead to 
> both.  Specifying a concurrent task limit that is beyond the total number of 
> tasks in the job should be the same as asking for no limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6890) Backport MAPREDUCE-6304 to branch 2.7: Specifying node labels when submitting MR jobs

2017-05-18 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015357#comment-16015357
 ] 

Konstantin Shvachko commented on MAPREDUCE-6890:


I added you to MAPREDUCE jira contributors. You should be able to assign now.

> Backport MAPREDUCE-6304 to branch 2.7: Specifying node labels when submitting 
> MR jobs
> -
>
> Key: MAPREDUCE-6890
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6890
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Vinitha Reddy Gankidi
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport MAPREDUCE-6304 to branch-2.7. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6890) Backport MAPREDUCE-6304 to branch 2.7: Specifying node labels when submitting MR jobs

2017-05-18 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved MAPREDUCE-6890.

Resolution: Duplicate

[~redvine] ooks like [~elgoiri] beat you.
Closing as Duplicate since the patch went into original MAPREDUCE-6304.

> Backport MAPREDUCE-6304 to branch 2.7: Specifying node labels when submitting 
> MR jobs
> -
>
> Key: MAPREDUCE-6890
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6890
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Vinitha Reddy Gankidi
>
> As per discussussion in [mailling 
> list|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201705.mbox/browser]
>  backport MAPREDUCE-6304 to branch-2.7. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs

2017-05-18 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6304:
---
Labels: mapreduce  (was: mapreduce release-blocker)

> Specifying node labels when submitting MR jobs
> --
>
> Key: MAPREDUCE-6304
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Jian Fang
>Assignee: Naganarasimha G R
>  Labels: mapreduce
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: MAPREDUCE-6304.20150410-1.patch, 
> MAPREDUCE-6304.20150411-1.patch, MAPREDUCE-6304.20150501-1.patch, 
> MAPREDUCE-6304.20150510-1.patch, MAPREDUCE-6304.20150511-1.patch, 
> MAPREDUCE-6304.20150512-1.patch, MAPREDUCE-6304-branch-2.7.patch
>
>
> Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify 
> node labels when submitting MR jobs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs

2017-05-18 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6304:
---
   Resolution: Fixed
Fix Version/s: 2.7.4
   Status: Resolved  (was: Patch Available)

I just committed this to branch-2.7. Thanks [~elgoiri] for backport.

> Specifying node labels when submitting MR jobs
> --
>
> Key: MAPREDUCE-6304
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Jian Fang
>Assignee: Naganarasimha G R
>  Labels: mapreduce, release-blocker
> Fix For: 2.7.4, 3.0.0-alpha1, 2.8.0
>
> Attachments: MAPREDUCE-6304.20150410-1.patch, 
> MAPREDUCE-6304.20150411-1.patch, MAPREDUCE-6304.20150501-1.patch, 
> MAPREDUCE-6304.20150510-1.patch, MAPREDUCE-6304.20150511-1.patch, 
> MAPREDUCE-6304.20150512-1.patch, MAPREDUCE-6304-branch-2.7.patch
>
>
> Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify 
> node labels when submitting MR jobs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs

2017-05-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015078#comment-16015078
 ] 

Konstantin Shvachko commented on MAPREDUCE-6304:


+1 for backport. Thanks [~elgoiri] for taking on it. Will commit shortly.

> Specifying node labels when submitting MR jobs
> --
>
> Key: MAPREDUCE-6304
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Jian Fang
>Assignee: Naganarasimha G R
>  Labels: mapreduce, release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: MAPREDUCE-6304.20150410-1.patch, 
> MAPREDUCE-6304.20150411-1.patch, MAPREDUCE-6304.20150501-1.patch, 
> MAPREDUCE-6304.20150510-1.patch, MAPREDUCE-6304.20150511-1.patch, 
> MAPREDUCE-6304.20150512-1.patch, MAPREDUCE-6304-branch-2.7.patch
>
>
> Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify 
> node labels when submitting MR jobs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs

2017-05-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6304:
---
  Labels: mapreduce release-blocker  (was: mapreduce)
Target Version/s: 2.8.0, 2.7.4  (was: 2.8.0)

> Specifying node labels when submitting MR jobs
> --
>
> Key: MAPREDUCE-6304
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Jian Fang
>Assignee: Naganarasimha G R
>  Labels: mapreduce, release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: MAPREDUCE-6304.20150410-1.patch, 
> MAPREDUCE-6304.20150411-1.patch, MAPREDUCE-6304.20150501-1.patch, 
> MAPREDUCE-6304.20150510-1.patch, MAPREDUCE-6304.20150511-1.patch, 
> MAPREDUCE-6304.20150512-1.patch
>
>
> Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify 
> node labels when submitting MR jobs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6228) Add truncate operation to SLive

2015-02-19 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6228:
---
Status: Open  (was: Patch Available)

 Add truncate operation to SLive
 ---

 Key: MAPREDUCE-6228
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6228
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Attachments: MAPREDUCE-6228-trunk.patch, MAPREDUCE-6228.patch, 
 MAPREDUCE-6228.patch, MAPREDUCE-6228.patch


 Add truncate into the mix of operations for SLive test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6228) Add truncate operation to SLive

2015-02-19 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6228:
---
Attachment: MAPREDUCE-6228-trunk.patch

Trunk has diverged due to HADOOP-11602. Minor difference, but still attaching 
new patch. Plamen's latest patch is applying to branch-2 as is.

[~zero45] it would be good to comply with the patch naming convention in the 
future.

[~milandesai] thanks for testing on the cluster.

 Add truncate operation to SLive
 ---

 Key: MAPREDUCE-6228
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6228
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Attachments: MAPREDUCE-6228-trunk.patch, MAPREDUCE-6228.patch, 
 MAPREDUCE-6228.patch, MAPREDUCE-6228.patch


 Add truncate into the mix of operations for SLive test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAPREDUCE-6228) Add truncate operation to SLive

2015-02-19 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved MAPREDUCE-6228.

   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed

I just committed this. Thank you Plamen.

 Add truncate operation to SLive
 ---

 Key: MAPREDUCE-6228
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6228
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Fix For: 2.7.0

 Attachments: MAPREDUCE-6228-trunk.patch, MAPREDUCE-6228.patch, 
 MAPREDUCE-6228.patch, MAPREDUCE-6228.patch


 Add truncate into the mix of operations for SLive test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6228) Add truncate operation to SLive

2015-02-12 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317838#comment-14317838
 ] 

Konstantin Shvachko commented on MAPREDUCE-6228:


Few minor things:
# Long line defining {{ConfigOptionBoolean WAIT_ON_TRUNCATE}}
# Lets increment {{PROG_VERSION = 0.0.2;}} to {{0.1.0}}. I missed it last 
time.
#  {{0M}} and {{Constants.MEGABYTES * 0}} is just {{0}}
# The unit test is passing for me.
# Could you please confirm that it has been tried on a cluster with truncate on.

 Add truncate operation to SLive
 ---

 Key: MAPREDUCE-6228
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6228
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Attachments: MAPREDUCE-6228.patch, MAPREDUCE-6228.patch


 Add truncate into the mix of operations for SLive test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6228) Add truncate operation to SLive

2015-02-12 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319599#comment-14319599
 ] 

Konstantin Shvachko commented on MAPREDUCE-6228:


TestJobConf is failing due to MAPREDUCE-6223.
+1 - pending cluster test confirmation.

 Add truncate operation to SLive
 ---

 Key: MAPREDUCE-6228
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6228
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Attachments: MAPREDUCE-6228.patch, MAPREDUCE-6228.patch, 
 MAPREDUCE-6228.patch


 Add truncate into the mix of operations for SLive test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6228) Add truncate operation to SLive

2015-02-11 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317328#comment-14317328
 ] 

Konstantin Shvachko commented on MAPREDUCE-6228:


* Error extracting  merging append size range should be ... truncate ...
* Running org.apache.hadoop.fs.slive.TestSlive
Tests run: 18, Failures: 3,
* {{TruncateOp}} has a bunch of unused imports.
* The default truncate size range should probably be something like {{\[0, 
1M\]}}. If truncate range is the same as the append's then actual truncation 
may occur too rare.
* Should we make {{waitForRecovery}} configurable? It would make sense to run a 
bunch of truncates with recoveries in-progress and see the behaviour when they 
are mixed with appends

 Add truncate operation to SLive
 ---

 Key: MAPREDUCE-6228
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6228
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Attachments: MAPREDUCE-6228.patch


 Add truncate into the mix of operations for SLive test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6227) DFSIO for truncate

2015-02-06 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6227:
---
Attachment: DFSIO-truncate-02.patch

Fixed all 4.
BTW, TestJobConf is failing due to MAPREDUCE-6223.

 DFSIO for truncate
 --

 Key: MAPREDUCE-6227
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6227
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Affects Versions: 2.7.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: DFSIO-truncate-00.patch, DFSIO-truncate-01.patch, 
 DFSIO-truncate-02.patch


 Create a benchmark and a test for truncate within the framework of TestDFSIO.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6227) DFSIO for truncate

2015-02-05 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6227:
---
Attachment: DFSIO-truncate-01.patch

Moved TestDFSIO_results.log under {{target/test-dir}} for tests.

 DFSIO for truncate
 --

 Key: MAPREDUCE-6227
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6227
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Affects Versions: 2.7.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: DFSIO-truncate-00.patch, DFSIO-truncate-01.patch


 Create a benchmark and a test for truncate within the framework of TestDFSIO.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6227) DFSIO for truncate

2015-02-05 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6227:
---
Attachment: DFSIO-truncate-00.patch

Adding truncate to DFSIO.

 DFSIO for truncate
 --

 Key: MAPREDUCE-6227
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6227
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Reporter: Konstantin Shvachko
 Attachments: DFSIO-truncate-00.patch


 Create a benchmark and a test for truncate within the framework of TestDFSIO.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6227) DFSIO for truncate

2015-02-05 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-6227:
---
 Assignee: Konstantin Shvachko
Affects Version/s: 2.7.0
   Status: Patch Available  (was: Open)

 DFSIO for truncate
 --

 Key: MAPREDUCE-6227
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6227
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Affects Versions: 2.7.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: DFSIO-truncate-00.patch


 Create a benchmark and a test for truncate within the framework of TestDFSIO.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6227) DFSIO for truncate

2015-01-26 Thread Konstantin Shvachko (JIRA)
Konstantin Shvachko created MAPREDUCE-6227:
--

 Summary: DFSIO for truncate
 Key: MAPREDUCE-6227
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6227
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Reporter: Konstantin Shvachko


Create a benchmark and a test for truncate within the framework of TestDFSIO.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6228) Add truncate operation to SLive

2015-01-26 Thread Konstantin Shvachko (JIRA)
Konstantin Shvachko created MAPREDUCE-6228:
--

 Summary: Add truncate operation to SLive
 Key: MAPREDUCE-6228
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6228
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Reporter: Konstantin Shvachko


Add truncate into the mix of operations for SLive test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAPREDUCE-3469) Port to 0.22 - Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2014-02-07 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved MAPREDUCE-3469.


Resolution: Duplicate

 Port to 0.22 - Implement limits on per-job JobConf, Counters, StatusReport, 
 Split-Sizes
 ---

 Key: MAPREDUCE-3469
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3469
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Konstantin Shvachko
  Labels: critical-0.22.0

 We have come across issues in production clusters wherein users abuse 
 counters, statusreport messages and split sizes. One such case was when one 
 of the users had 100 million counters. This leads to jobtracker going out of 
 memory and being unresponsive. In this jira I am proposing to put sane limits 
 on the status report length, the number of counters and the size of block 
 locations returned by the input split. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-4981) WordMean, WordMedian, WordStandardDeviation missing from ExamplesDriver

2013-04-11 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629326#comment-13629326
 ] 

Konstantin Shvachko commented on MAPREDUCE-4981:


+1. Should have committed this earlier.

 WordMean, WordMedian, WordStandardDeviation missing from ExamplesDriver
 ---

 Key: MAPREDUCE-4981
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4981
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, 2.0.3-alpha
Reporter: Plamen Jeliazkov
Assignee: Plamen Jeliazkov
Priority: Minor
 Fix For: 3.0.0, 2.0.5-beta

 Attachments: MAPREDUCE-4981.patch


 https://issues.apache.org/jira/browse/MAPREDUCE-2669 introduced 3 new 
 MapReduce examples, but they were never added to the ExamplesDriver.
 This JIRA proposes to add them to the ExamplesDriver. I have ran them myself 
 and can confirm the examples still work as intended.
 As a workaround for now, people can still run them by: 
 bin/hadoop org.apache.hadoop.examples.WordMean input file/dir path output 
 dir path
 bin/hadoop org.apache.hadoop.examples.WordMedian input file/dir path 
 output dir path
 bin/hadoop org.apache.hadoop.examples.WordStandardDeviation input file/dir 
 path output dir path
 Post-patch, people will be able to run them by:
 bin/hadoop jar /HADOOP_PATH/share/lib/mapreduce-examples.jar 
 wordmean|wordmedian|wordstandarddeviation input file/dir path output dir 
 path
 Just like they do for running the wordcount example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4981) WordMean, WordMedian, WordStandardDeviation missing from ExamplesDriver

2013-04-11 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-4981:
---

  Resolution: Fixed
Target Version/s: 2.0.3-alpha, 3.0.0  (was: 3.0.0, 2.0.3-alpha)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I just committed this to trunk and branch 2. Thank you Plamen.

 WordMean, WordMedian, WordStandardDeviation missing from ExamplesDriver
 ---

 Key: MAPREDUCE-4981
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4981
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, 2.0.3-alpha
Reporter: Plamen Jeliazkov
Assignee: Plamen Jeliazkov
Priority: Minor
 Fix For: 3.0.0, 2.0.5-beta

 Attachments: MAPREDUCE-4981.patch


 https://issues.apache.org/jira/browse/MAPREDUCE-2669 introduced 3 new 
 MapReduce examples, but they were never added to the ExamplesDriver.
 This JIRA proposes to add them to the ExamplesDriver. I have ran them myself 
 and can confirm the examples still work as intended.
 As a workaround for now, people can still run them by: 
 bin/hadoop org.apache.hadoop.examples.WordMean input file/dir path output 
 dir path
 bin/hadoop org.apache.hadoop.examples.WordMedian input file/dir path 
 output dir path
 bin/hadoop org.apache.hadoop.examples.WordStandardDeviation input file/dir 
 path output dir path
 Post-patch, people will be able to run them by:
 bin/hadoop jar /HADOOP_PATH/share/lib/mapreduce-examples.jar 
 wordmean|wordmedian|wordstandarddeviation input file/dir path output dir 
 path
 Just like they do for running the wordcount example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4985) TestDFSIO supports compression but usages doesn't reflect

2013-04-11 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-4985:
---

   Resolution: Fixed
Fix Version/s: 2.0.5-beta
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I just committed this to trunk and branch 2. Thank you Plamen.

 TestDFSIO supports compression but usages doesn't reflect
 -

 Key: MAPREDUCE-4985
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4985
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Plamen Jeliazkov
Assignee: Plamen Jeliazkov
Priority: Trivial
 Fix For: 2.0.5-beta

 Attachments: MAPREDUCE-4985.patch


 https://issues.apache.org/jira/browse/MAPREDUCE-2786 introduced the ability 
 to use a compression codec during TestDFSIO. However, the -compression 
 parameter was never introduced to the usages printout.
 This is a trivial patch to reveal the parameter to end users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4736) Remove obsolete option [-rootDir] from TestDFSIO

2012-10-19 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479663#comment-13479663
 ] 

Konstantin Shvachko commented on MAPREDUCE-4736:


+1
Root directory can be specified by -Dtest.build.data=/my/root/dir

 Remove obsolete option [-rootDir] from TestDFSIO
 

 Key: MAPREDUCE-4736
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4736
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: test
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
Priority: Trivial
 Attachments: MAPREDUCE-4736.patch


 Looks like this option is obsolete. Remove it to avoid confusion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

2012-09-28 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465467#comment-13465467
 ] 

Konstantin Shvachko commented on MAPREDUCE-4651:


That is because I committed it. Jakob reviewed.

 Benchmarking random reads with DFSIO
 

 Key: MAPREDUCE-4651
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Affects Versions: 1.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 0.23.4

 Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch


 TestDFSIO measures throughput of HDFS write, read, and append operations. It 
 will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

2012-09-25 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-4651:
---

Attachment: randomDFSIO.patch

Fixed wonky space in AppendMapper.getIOStream()

 Benchmarking random reads with DFSIO
 

 Key: MAPREDUCE-4651
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Affects Versions: 1.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch


 TestDFSIO measures throughput of HDFS write, read, and append operations. It 
 will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

2012-09-25 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-4651:
---

Status: Patch Available  (was: Open)

 Benchmarking random reads with DFSIO
 

 Key: MAPREDUCE-4651
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Affects Versions: 1.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch


 TestDFSIO measures throughput of HDFS write, read, and append operations. It 
 will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

2012-09-25 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463154#comment-13463154
 ] 

Konstantin Shvachko commented on MAPREDUCE-4651:


Ravi, 
- even with single node you can specify more -nrFiles if you have multiple 
drives on the node. I usually setup number of map slots equal to the number of 
drives on a node.
- I don't know how big was the file that you created with -write prior to 
reads. If it was 10 MB than the actual size of reads was not more than that. 
Check the DFSIO summary it prints how much data was read.
- You probably ran reads right after creating the file. So the the data was in 
buffer cache. I usually clean the cache before each test run. (On linux 'echo 1 
 /proc/sys/vm/drop_caches')
- Also -fileSize is replaced by -size in my patch. It says how much data you 
want to read/write/append, rather than specifying the size of a file. Initially 
(read/write) it was the same.

 Benchmarking random reads with DFSIO
 

 Key: MAPREDUCE-4651
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Affects Versions: 1.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch


 TestDFSIO measures throughput of HDFS write, read, and append operations. It 
 will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2786) TestDFSIO should also test compression reading/writing from command-line.

2012-09-25 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-2786:
---

Fix Version/s: 0.23.4

Committed this to branch-0.23 to avoid discrepancies with MAPREDUCE-4651.

 TestDFSIO should also test compression reading/writing from command-line.
 -

 Key: MAPREDUCE-2786
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2786
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: benchmarks
Affects Versions: 2.0.0-alpha
Reporter: Plamen Jeliazkov
Assignee: Plamen Jeliazkov
Priority: Minor
  Labels: newbie
 Fix For: 2.0.2-alpha, 0.23.4

 Attachments: MAPREDUCE_2786.patch, MAPREDUCE_2786.patch, 
 MAPREDUCE_2786.patch, MAPREDUCE-2786.patch

   Original Estimate: 36h
  Remaining Estimate: 36h

 I thought it might be beneficial to simply alter the code of TestDFSIO to 
 accept any compression codec class and allow testing for compression by a 
 command line argument instead of having to change the config file everytime. 
 Something like -compression would do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

2012-09-25 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-4651:
---

   Resolution: Fixed
Fix Version/s: 0.23.4
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I just committed this to trunk, branch-2, and branch 0.23.4

 Benchmarking random reads with DFSIO
 

 Key: MAPREDUCE-4651
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Affects Versions: 1.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 0.23.4

 Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch


 TestDFSIO measures throughput of HDFS write, read, and append operations. It 
 will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4645) Providing a random seed to Slive should make the sequence of filenames completely deterministic

2012-09-24 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13462103#comment-13462103
 ] 

Konstantin Shvachko commented on MAPREDUCE-4645:


+1 Looks good.

 Providing a random seed to Slive should make the sequence of filenames 
 completely deterministic
 ---

 Key: MAPREDUCE-4645
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4645
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, test
Affects Versions: 0.23.1, 2.0.0-alpha
Reporter: Ravi Prakash
Assignee: Ravi Prakash
  Labels: performance, test
 Attachments: MAPREDUCE-4645.branch-0.23.patch, 
 MAPREDUCE-4645.branch-0.23.patch, MAPREDUCE-4645.branch-0.23.patch


 Using the -random seed option still doesn't produce a deterministic sequence 
 of filenames. Hence there's no way to replicate the performance test. If I'm 
 providing a seed, its obvious that I want the test to be reproducible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4645) Providing a random seed to Slive should make the sequence of filenames completely deterministic

2012-09-24 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-4645:
---

   Resolution: Fixed
Fix Version/s: 0.23.4
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I just committed it to trunk, branch 2, and branch 0.23.
Thank you Ravi.

 Providing a random seed to Slive should make the sequence of filenames 
 completely deterministic
 ---

 Key: MAPREDUCE-4645
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4645
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, test
Affects Versions: 0.23.1, 2.0.0-alpha
Reporter: Ravi Prakash
Assignee: Ravi Prakash
  Labels: performance, test
 Fix For: 0.23.4

 Attachments: MAPREDUCE-4645.branch-0.23.patch, 
 MAPREDUCE-4645.branch-0.23.patch, MAPREDUCE-4645.branch-0.23.patch


 Using the -random seed option still doesn't produce a deterministic sequence 
 of filenames. Hence there's no way to replicate the performance test. If I'm 
 providing a seed, its obvious that I want the test to be reproducible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

2012-09-13 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455133#comment-13455133
 ] 

Konstantin Shvachko commented on MAPREDUCE-4651:


(1) I did want to make IOMapperBase.getIOStream() abstract, but there are other 
tests that are based on IOMapperBase. A dummy implementation of getIOStream() 
will avoid changing them.
(2) Do you want me to add spaces or you see unnecessary spaces in my patch?
(3) So you suggest to add comments to @Override specifying the base class, 
right? Agreed, should have done that for the new method, will also do it for 
doIO().
(4) Sure you can check the instance before casting, but what would you do in 
the else clause - throw exception. So one way or another an exception will be 
thrown saying the type of stream is not right. And this is sort of an assert, 
because it means there is a bug in DFSIO, not user's fault.
(5) Yep, you are right nanoTime() is already in the default constructor. I 
believe it wasn't there before.

Thanks for the review. I'll check SLive jira.

 Benchmarking random reads with DFSIO
 

 Key: MAPREDUCE-4651
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Affects Versions: 1.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: randomDFSIO.patch


 TestDFSIO measures throughput of HDFS write, read, and append operations. It 
 will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

2012-09-13 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-4651:
---

Attachment: randomDFSIO.patch

Removed nanoTime(), added comments to @Override.

 Benchmarking random reads with DFSIO
 

 Key: MAPREDUCE-4651
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Affects Versions: 1.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: randomDFSIO.patch, randomDFSIO.patch


 TestDFSIO measures throughput of HDFS write, read, and append operations. It 
 will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4645) Providing a random seed to Slive should make the sequence of filenames completely deterministic

2012-09-13 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455580#comment-13455580
 ] 

Konstantin Shvachko commented on MAPREDUCE-4645:


Ravi, can you just use taskID as seed instead of passing the sequence number 
through DummyInputFormat. That way you will have different seeds per map but 
still completely reproducible because the number of maps is the same.
Otherwise DummyInputFormat becomes not dummy and EmptySplit not empty 
anymore.

 Providing a random seed to Slive should make the sequence of filenames 
 completely deterministic
 ---

 Key: MAPREDUCE-4645
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4645
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, test
Affects Versions: 0.23.1, 2.0.0-alpha
Reporter: Ravi Prakash
Assignee: Ravi Prakash
  Labels: performance, test
 Attachments: MAPREDUCE-4645.branch-0.23.patch


 Using the -random seed option still doesn't produce a deterministic sequence 
 of filenames. Hence there's no way to replicate the performance test. If I'm 
 providing a seed, its obvious that I want the test to be reproducible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

2012-09-12 Thread Konstantin Shvachko (JIRA)
Konstantin Shvachko created MAPREDUCE-4651:
--

 Summary: Benchmarking random reads with DFSIO
 Key: MAPREDUCE-4651
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Affects Versions: 1.0.0
Reporter: Konstantin Shvachko


TestDFSIO measures throughput of HDFS write, read, and append operations. It 
will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

2012-09-12 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko reassigned MAPREDUCE-4651:
--

Assignee: Konstantin Shvachko

 Benchmarking random reads with DFSIO
 

 Key: MAPREDUCE-4651
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Affects Versions: 1.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko

 TestDFSIO measures throughput of HDFS write, read, and append operations. It 
 will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

2012-09-12 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453821#comment-13453821
 ] 

Konstantin Shvachko commented on MAPREDUCE-4651:


The idea is to utilize HDFS positional read, which is defined by 
{{PositionedReadable}} and allows to read a segment of data from a given 
position.
I propose three variants of such benchmarks:
# *Random read*. Randomly choose an offset in the range [0, fileSize] and read 
one buffer of data from that random position. Repeat operation until a 
specified number of bytes is read. 
Random read can occasionally read the same bytes twice.
# *Backward read* reads file in reverse order.
This is intended to read all bytes of the given file, but avoid reading any of 
them twice.
# *Skip read*. Starting from the beginning read one buffer of data, then jump 
ahead, and read again. Repeat until either the specified number of bytes is 
read or the end of file is reached.
Skip read allows to avoid read-ahead. With sequential read data mostly comes 
from the system block cache. Jumping ahead far enough will ensure that bytes 
are actually read from the storage device.

 Benchmarking random reads with DFSIO
 

 Key: MAPREDUCE-4651
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Affects Versions: 1.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko

 TestDFSIO measures throughput of HDFS write, read, and append operations. It 
 will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

2012-09-12 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-4651:
---

Attachment: randomDFSIO.patch

The patch
# Introduces the three types of random reads.
# It also adds getIOStream() method, which excludes stream construction from 
the timed part of the execution. This is important for small writes and reads.
And this let me move all compression functionality from IOMapperBase to 
TestDFSIO, where it truly belongs.
# Converted to JUnit 4 format. Finally. And added test cases for new benchmarks.
# Fixed couple of warnings and removed unnecessary generic parameters.

 Benchmarking random reads with DFSIO
 

 Key: MAPREDUCE-4651
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks, test
Affects Versions: 1.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: randomDFSIO.patch


 TestDFSIO measures throughput of HDFS write, read, and append operations. It 
 will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2786) TestDFSIO should also test compression reading/writing from command-line.

2012-09-03 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447374#comment-13447374
 ] 

Konstantin Shvachko commented on MAPREDUCE-2786:


+1 from me too

 TestDFSIO should also test compression reading/writing from command-line.
 -

 Key: MAPREDUCE-2786
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2786
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: benchmarks
Affects Versions: 2.0.0-alpha
Reporter: Plamen Jeliazkov
Priority: Minor
  Labels: newbie
 Fix For: 2.1.0-alpha

 Attachments: MAPREDUCE_2786.patch, MAPREDUCE_2786.patch, 
 MAPREDUCE_2786.patch, MAPREDUCE-2786.patch

   Original Estimate: 36h
  Remaining Estimate: 36h

 I thought it might be beneficial to simply alter the code of TestDFSIO to 
 accept any compression codec class and allow testing for compression by a 
 command line argument instead of having to change the config file everytime. 
 Something like -compression would do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2786) TestDFSIO should also test compression reading/writing from command-line.

2012-09-03 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-2786:
---

   Resolution: Fixed
Fix Version/s: (was: 2.1.0-alpha)
   2.2.0-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I just committed this to branch-2 and trunk.
Thank you Plamen.

 TestDFSIO should also test compression reading/writing from command-line.
 -

 Key: MAPREDUCE-2786
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2786
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: benchmarks
Affects Versions: 2.0.0-alpha
Reporter: Plamen Jeliazkov
Priority: Minor
  Labels: newbie
 Fix For: 2.2.0-alpha

 Attachments: MAPREDUCE_2786.patch, MAPREDUCE_2786.patch, 
 MAPREDUCE_2786.patch, MAPREDUCE-2786.patch

   Original Estimate: 36h
  Remaining Estimate: 36h

 I thought it might be beneficial to simply alter the code of TestDFSIO to 
 accept any compression codec class and allow testing for compression by a 
 command line argument instead of having to change the config file everytime. 
 Something like -compression would do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2786) TestDFSIO should also test compression reading/writing from command-line.

2012-08-30 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445427#comment-13445427
 ] 

Konstantin Shvachko commented on MAPREDUCE-2786:


A few review comments. (Unlike Jenkins I could apply your patch)
- createControlFile() uses new compressionClass parameter only to Log it. 
You should log compressionClass in run() along with other input parameters, 
like bufferSize or baseDir. That should be enough.
- Don't print it in analyzeResult() either. I think it is not the output 
parameter but the input.
Even if it was necessary, it should be taken from conf.
- White space changes inside loop in doIO() for WriteMapper and ReadMapper. 
Please avoid.

Otherwise it's good and good to go.

 TestDFSIO should also test compression reading/writing from command-line.
 -

 Key: MAPREDUCE-2786
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2786
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: benchmarks
Affects Versions: 2.0.0-alpha
Reporter: Plamen Jeliazkov
Priority: Minor
  Labels: newbie
 Fix For: 2.1.0-alpha

 Attachments: MAPREDUCE_2786.patch, MAPREDUCE-2786.patch

   Original Estimate: 36h
  Remaining Estimate: 36h

 I thought it might be beneficial to simply alter the code of TestDFSIO to 
 accept any compression codec class and allow testing for compression by a 
 command line argument instead of having to change the config file everytime. 
 Something like -compression would do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4491) Encryption and Key Protection

2012-08-28 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442982#comment-13442982
 ] 

Konstantin Shvachko commented on MAPREDUCE-4491:


Benoy. I went over your design document. Pretty comprehensive description. 
Want to clarify couple of things. 
# Do I understand correctly that your approach can be used to securely store 
(encrypt) data even on non-secure (security=simple) clusters?
# So JobClient uses current user credentials to obtain keys from the KeyStore, 
encrypts them with cluster-public-key and sends to the cluster along with the 
user credentials. JobTracker has nothing to do with the keys and passes the 
encrypted blob over to TaskTrackers scheduled to execute the tasks. TT decrypts 
the user keys using private-cluster-key and handles them to the local tasks, 
which is secure as keys don't travel over the wires. Is it right so far?
# TT should be using user credentials to decrypt the blob of keys somehow? Or 
does it authenticate the user and then decrypts if authentication passes? I did 
not find it in your document.
# How cluster-private-key is delivered to TTs?
# I think configuration parameters naming need some changes. They should not 
start with {{mapreduce.job}}. Based on your examples you can just encrypt a 
HDFS file without spawning any actual jobs. In this case seeing 
{{mapreduce.job.*}} seems confusing.
My suggestion is to prefix all parameters with simply {{crypto.*}} Then you can 
use e.g. full word keystore instead of ks.

I plan to get into reviewing the implementation soon.

 Encryption and Key Protection
 -

 Key: MAPREDUCE-4491
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: documentation, security, task-controller, tasktracker
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: Hadoop_Encryption.pdf, Hadoop_Encryption.pdf


 When dealing with sensitive data, it is required to keep the data encrypted 
 wherever it is stored. Common use case is to pull encrypted data out of a 
 datasource and store in HDFS for analysis. The keys are stored in an external 
 keystore. 
 The feature adds a customizable framework to integrate different types of 
 keystores, support for Java KeyStore, read keys from keystores, and transport 
 keys from JobClient to Tasks.
 The feature adds PGP encryption as a codec and additional utilities to 
 perform encryption related steps.
 The design document is attached. It explains the requirement, design and use 
 cases.
 Kindly review and comment. Collaboration is very much welcome.
 I have a tested patch for this for 1.1 and will upload it soon as an initial 
 work for further refinement.
 Update: The patches are uploaded to subtasks. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (MAPREDUCE-4491) Encryption and Key Protection

2012-08-28 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442982#comment-13442982
 ] 

Konstantin Shvachko edited comment on MAPREDUCE-4491 at 8/28/12 5:25 PM:
-

Benoy. I went over your design document. Pretty comprehensive description. 
Want to clarify couple of things. 
# Do I understand correctly that your approach can be used to securely store 
(encrypt) data even on non-secure (security=simple) clusters?
# So JobClient uses current user credentials to obtain keys from the KeyStore, 
encrypts them with cluster-public-key and sends to the cluster along with the 
user credentials. JobTracker has nothing to do with the keys and passes the 
encrypted blob over to TaskTrackers scheduled to execute the tasks. TT decrypts 
the user keys using private-cluster-key and handles them to the local tasks, 
which is secure as keys don't travel over the wires. Is it right so far?
# TT should be using user credentials to decrypt the blob of keys somehow? Or 
does it authenticate the user and then decrypts if authentication passes? I did 
not find it in your document.
# How cluster-private-key is delivered to TTs?
# I think configuration parameters naming need some changes. They should not 
start with {{mapreduce.job}}. Based on your examples you can just encrypt a 
HDFS file without spawning any actual jobs. In this case seeing 
{{mapreduce.job.*}} seems confusing.
My suggestion is to prefix all parameters with simply {{hadoop.crypto.*}} Then 
you can use e.g. full word keystore instead of ks.

I plan to get into reviewing the implementation soon.

  was (Author: shv):
Benoy. I went over your design document. Pretty comprehensive description. 
Want to clarify couple of things. 
# Do I understand correctly that your approach can be used to securely store 
(encrypt) data even on non-secure (security=simple) clusters?
# So JobClient uses current user credentials to obtain keys from the KeyStore, 
encrypts them with cluster-public-key and sends to the cluster along with the 
user credentials. JobTracker has nothing to do with the keys and passes the 
encrypted blob over to TaskTrackers scheduled to execute the tasks. TT decrypts 
the user keys using private-cluster-key and handles them to the local tasks, 
which is secure as keys don't travel over the wires. Is it right so far?
# TT should be using user credentials to decrypt the blob of keys somehow? Or 
does it authenticate the user and then decrypts if authentication passes? I did 
not find it in your document.
# How cluster-private-key is delivered to TTs?
# I think configuration parameters naming need some changes. They should not 
start with {{mapreduce.job}}. Based on your examples you can just encrypt a 
HDFS file without spawning any actual jobs. In this case seeing 
{{mapreduce.job.*}} seems confusing.
My suggestion is to prefix all parameters with simply {{crypto.*}} Then you can 
use e.g. full word keystore instead of ks.

I plan to get into reviewing the implementation soon.
  
 Encryption and Key Protection
 -

 Key: MAPREDUCE-4491
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: documentation, security, task-controller, tasktracker
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: Hadoop_Encryption.pdf, Hadoop_Encryption.pdf


 When dealing with sensitive data, it is required to keep the data encrypted 
 wherever it is stored. Common use case is to pull encrypted data out of a 
 datasource and store in HDFS for analysis. The keys are stored in an external 
 keystore. 
 The feature adds a customizable framework to integrate different types of 
 keystores, support for Java KeyStore, read keys from keystores, and transport 
 keys from JobClient to Tasks.
 The feature adds PGP encryption as a codec and additional utilities to 
 perform encryption related steps.
 The design document is attached. It explains the requirement, design and use 
 cases.
 Kindly review and comment. Collaboration is very much welcome.
 I have a tested patch for this for 1.1 and will upload it soon as an initial 
 work for further refinement.
 Update: The patches are uploaded to subtasks. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4491) Encryption and Key Protection

2012-08-28 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442987#comment-13442987
 ] 

Konstantin Shvachko commented on MAPREDUCE-4491:


Edited previous comment. Was: crypto.* Changed to: hadoop.crypto.*
Similar to hadoop.security

 Encryption and Key Protection
 -

 Key: MAPREDUCE-4491
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: documentation, security, task-controller, tasktracker
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: Hadoop_Encryption.pdf, Hadoop_Encryption.pdf


 When dealing with sensitive data, it is required to keep the data encrypted 
 wherever it is stored. Common use case is to pull encrypted data out of a 
 datasource and store in HDFS for analysis. The keys are stored in an external 
 keystore. 
 The feature adds a customizable framework to integrate different types of 
 keystores, support for Java KeyStore, read keys from keystores, and transport 
 keys from JobClient to Tasks.
 The feature adds PGP encryption as a codec and additional utilities to 
 perform encryption related steps.
 The design document is attached. It explains the requirement, design and use 
 cases.
 Kindly review and comment. Collaboration is very much welcome.
 I have a tested patch for this for 1.1 and will upload it soon as an initial 
 work for further refinement.
 Update: The patches are uploaded to subtasks. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4550) Key Protection : Define Encryption and Key Protection interfaces and default implementations

2012-08-28 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442988#comment-13442988
 ] 

Konstantin Shvachko commented on MAPREDUCE-4550:


Does it make sense to combine {{Encryptor}} and {{Decryptor}} into a single 
interface with two methods {{encrypt()}} and {{decrypt()}}. These algorithms 
usually come in pairs.

 Key Protection : Define Encryption and Key Protection interfaces and default 
 implementations
 

 Key: MAPREDUCE-4550
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4550
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: MR_4550_1_1.patch, MR_4550_trunk.patch


 A secret key is read from a Key Store and then encrypted during transport 
 between JobClient and Task. The tasktrackers/nodemanagers decrypt the secrets 
 and provide the secrets to child tasks which part of the job.
 This jira defines the interfaces to accomplish the above :
 1) KeyProvider - to read keys from a KeyStore
 2) Encrypter and Decrypter - to and encrypt and decrypt secrets/data.
 The default/dummy implementations will also be added. This includes a 
 KeyProvider implementation to read keys from a Java KeyStore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2786) TestDFSIO should also test compression reading/writing from command-line.

2012-08-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13436967#comment-13436967
 ] 

Konstantin Shvachko commented on MAPREDUCE-2786:


That is a good thing to have an opportunity to benchmark with compression.
Couple suggestions.
# Move all compression configuration logic, including reflections and the cc 
variable all the way to {{IOMapperBase.configure()}}. Otherwise all this small 
actions will be counted as execution time.
# You should not work separately with compressed and non-compressed streams 
inside doIO(). Same {{out}} or {{in}} variables should just point to compressed 
or not compressed streams. Nesting streams is a regular practice.
# {{getCompression()}} is not used anywhere, should be removed.
# You use {{test.compression}} to get the codec class and 
{{test.io.compression.class}} to set it. How is going to work? You should make 
two constants with the property and the default value and use them.
# AppendMapper is not covered. It should be the same as others. Moving the 
conig logic into {{IOMapperBase}} should make it easy.

 TestDFSIO should also test compression reading/writing from command-line.
 -

 Key: MAPREDUCE-2786
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2786
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: benchmarks
Affects Versions: 2.0.0-alpha
Reporter: Plamen Jeliazkov
Priority: Minor
  Labels: newbie
 Fix For: 2.1.0-alpha

 Attachments: MAPREDUCE-2786.patch

   Original Estimate: 36h
  Remaining Estimate: 36h

 I thought it might be beneficial to simply alter the code of TestDFSIO to 
 accept any compression codec class and allow testing for compression by a 
 command line argument instead of having to change the config file everytime. 
 Something like -compression would do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4349) Distributed Cache gives inconsistent result if cache Archive files get deleted from task tracker

2012-07-25 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422069#comment-13422069
 ] 

Konstantin Shvachko commented on MAPREDUCE-4349:


Should we close it or is it applicable to other versions?

 Distributed Cache gives inconsistent result if cache Archive files get 
 deleted from task tracker 
 -

 Key: MAPREDUCE-4349
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4349
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 1.0.3, trunk
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Fix For: 0.22.1

 Attachments: PATCH-MAPREDUCE-4349-22-v2.patch, 
 PATCH-MAPREDUCE-4349-22-v3.patch, PATCH-MAPREDUCE-4349-22.patch


 Add test to verify Distributed Cache consistency when cached archives are 
 deleted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4349) Distributed Cache gives inconsistent result if cache Archive files get deleted from task tracker

2012-07-24 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13421727#comment-13421727
 ] 

Konstantin Shvachko commented on MAPREDUCE-4349:


+1 the patch looks good.
I removed two unused imports. Will commit now.

 Distributed Cache gives inconsistent result if cache Archive files get 
 deleted from task tracker 
 -

 Key: MAPREDUCE-4349
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4349
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 1.0.3, trunk
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Attachments: PATCH-MAPREDUCE-4349-22-v2.patch, 
 PATCH-MAPREDUCE-4349-22-v3.patch, PATCH-MAPREDUCE-4349-22.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4349) Distributed Cache gives inconsistent result if cache Archive files get deleted from task tracker

2012-07-24 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-4349:
---

  Description: 
I just committed this to branch 0.22.1.
Thank you, Mayank.
Fix Version/s: 0.22.1
 Hadoop Flags: Reviewed

 Distributed Cache gives inconsistent result if cache Archive files get 
 deleted from task tracker 
 -

 Key: MAPREDUCE-4349
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4349
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 1.0.3, trunk
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Fix For: 0.22.1

 Attachments: PATCH-MAPREDUCE-4349-22-v2.patch, 
 PATCH-MAPREDUCE-4349-22-v3.patch, PATCH-MAPREDUCE-4349-22.patch


 I just committed this to branch 0.22.1.
 Thank you, Mayank.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4349) Distributed Cache gives inconsistent result if cache Archive files get deleted from task tracker

2012-07-24 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-4349:
---

Description: Add test to verify Distributed Cache consistency when cached 
archives are deleted.  (was: I just committed this to branch 0.22.1.
Thank you, Mayank.)

I just committed this to branch 0.22.1.
Thank you, Mayank.

 Distributed Cache gives inconsistent result if cache Archive files get 
 deleted from task tracker 
 -

 Key: MAPREDUCE-4349
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4349
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 1.0.3, trunk
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Fix For: 0.22.1

 Attachments: PATCH-MAPREDUCE-4349-22-v2.patch, 
 PATCH-MAPREDUCE-4349-22-v3.patch, PATCH-MAPREDUCE-4349-22.patch


 Add test to verify Distributed Cache consistency when cached archives are 
 deleted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4349) Distributed Cache gives inconsistent result if cache Archive files get deleted from task tracker

2012-07-20 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419488#comment-13419488
 ] 

Konstantin Shvachko commented on MAPREDUCE-4349:


One simple suggestion. You should use {{FileUtil.fullyDelete}} instead of 
implementing {{delete()}} internally.
Once you do that {{if (f2.exists())}} should become {{AssertFalse(f2.exists())}}

 Distributed Cache gives inconsistent result if cache Archive files get 
 deleted from task tracker 
 -

 Key: MAPREDUCE-4349
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4349
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 1.0.3, trunk
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Attachments: PATCH-MAPREDUCE-4349-22-v2.patch, 
 PATCH-MAPREDUCE-4349-22.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4403) Adding test case for resubmission of jobs in TestRecoveryManager

2012-07-13 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413539#comment-13413539
 ] 

Konstantin Shvachko commented on MAPREDUCE-4403:


+1 looks good

 Adding test case for resubmission of jobs in TestRecoveryManager
 

 Key: MAPREDUCE-4403
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4403
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Attachments: MAPREDUCE-4403-22-1.patch, MAPREDUCE-4403-22.patch


 In Hadoop 22 Test recovery Manager does not have resubmission test case which 
 checks after the resubmission jobs get succeeded.
 There is some refactoring is also needed. 
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4403) Adding test case for resubmission of jobs in TestRecoveryManager

2012-07-13 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-4403:
---

Fix Version/s: 0.22.1
 Hadoop Flags: Reviewed

I just committed this to branch 0.22.1. Thank you Mayank.
Do we need this for trunk or other versions?

 Adding test case for resubmission of jobs in TestRecoveryManager
 

 Key: MAPREDUCE-4403
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4403
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Fix For: 0.22.1

 Attachments: MAPREDUCE-4403-22-1.patch, MAPREDUCE-4403-22.patch


 In Hadoop 22 Test recovery Manager does not have resubmission test case which 
 checks after the resubmission jobs get succeeded.
 There is some refactoring is also needed. 
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4404) Adding Test case for TestMRJobClient to verify the user name

2012-07-13 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-4404:
---

Hadoop Flags: Reviewed

I just committed this to branch 0.22.1. Thank you Mayank.
Is it also targeted for trunk?

 Adding Test case for TestMRJobClient to verify the user name
 

 Key: MAPREDUCE-4404
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4404
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Fix For: 0.22.1

 Attachments: MAPREDUCE-4404-22.patch


 Adding Test case for TestMRJobClient to verify the user name

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4405) Adding test case for HierarchicalQueue in TestJobQueueClient

2012-07-13 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413572#comment-13413572
 ] 

Konstantin Shvachko commented on MAPREDUCE-4405:


assertNotNull for the resulting queues is good, but you can also verify that 
the total number of queues is as expected.

 Adding test case for HierarchicalQueue in TestJobQueueClient
 

 Key: MAPREDUCE-4405
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4405
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Attachments: MAPREDUCE-4405-22.patch


 Adding test case for HierarchicalQueue in TestJobQueueClient

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4405) Adding test case for HierarchicalQueue in TestJobQueueClient

2012-07-13 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413948#comment-13413948
 ] 

Konstantin Shvachko commented on MAPREDUCE-4405:


+1 looks good.

 Adding test case for HierarchicalQueue in TestJobQueueClient
 

 Key: MAPREDUCE-4405
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4405
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.22.0
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Attachments: MAPREDUCE-4405-22-v2.patch, MAPREDUCE-4405-22.patch


 Adding test case for HierarchicalQueue in TestJobQueueClient

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4405) Adding test case for HierarchicalQueue in TestJobQueueClient

2012-07-13 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-4405:
---

Fix Version/s: 0.22.1
 Hadoop Flags: Reviewed

I just committed this to branch 0.22.1. Thank you Mayank.

 Adding test case for HierarchicalQueue in TestJobQueueClient
 

 Key: MAPREDUCE-4405
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4405
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.22.0
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Fix For: 0.22.1

 Attachments: MAPREDUCE-4405-22-v2.patch, MAPREDUCE-4405-22.patch


 Adding test case for HierarchicalQueue in TestJobQueueClient

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4349) Distributed Cache gives inconsistent result if cache Archive files get deleted from task tracker

2012-07-13 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414285#comment-13414285
 ] 

Konstantin Shvachko commented on MAPREDUCE-4349:


I would rather integrated verification of archive files in 
{{testCacheConsistency()}} instead of creating a new test case.
You can do
{code}
DistributedCache.addCacheFile(firstCacheFile ...
DistributedCache.addCacheArchive(firstCacheArchive ...
{code}
And then add verification for the archive along with the file.
I think it will be less change, and definitely less code replication.
Otherwise it will need refactoring to extract common parts of code into methods.

 Distributed Cache gives inconsistent result if cache Archive files get 
 deleted from task tracker 
 -

 Key: MAPREDUCE-4349
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4349
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 1.0.3, trunk
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Attachments: PATCH-MAPREDUCE-4349-22.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (MAPREDUCE-4403) Adding test case for resubmission of jobs in TestRecoveryManager

2012-07-11 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1349#comment-1349
 ] 

Konstantin Shvachko edited comment on MAPREDUCE-4403 at 7/11/12 6:33 AM:
-

The code looks good. Two things:
# {{TEST_DIR = /tmp;}}
It is better if we don't use /tmp in the tests, as /tmp is not being cleaned up 
by the build clean target. I suggest to replace it with {{TEST_DIR = 
build/tmp;}}
# You need to wrap most of the code into {{try finally}}, where {{finally}} and 
call {{mr.shutdown()}} in the {{finally}} section. You are starting and 
stopping JobTracker in the test. When it fails the mini-cluster will not 
shutdown, which may affect subsequent tests.

This is not related to your particular patch, because it was introduced in 
other test cases, and I do not propose to change the old code. I am just 
advocating to set a better example with the new code.

  was (Author: shv):
The code looks good. Two things:
# {{TEST_DIR = /tmp;}}
It is better if we don't use /tmp in the tests, as /tmp is not being cleaned up 
by the build clean target. I suggest to replace it with {{TEST_DIR = 
build/tmp;}}
# You need to wrap most of the code into {{try finally}}, where {{finally}} and 
call {{mr.shutdown()}} in the {{finally}} section. You are starting and 
stopping JobTracker in the test. When it fails the mini-cluster will not 
shutdown, which may affect subsequent tests.
  
 Adding test case for resubmission of jobs in TestRecoveryManager
 

 Key: MAPREDUCE-4403
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4403
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Attachments: MAPREDUCE-4403-22.patch


 In Hadoop 22 Test recovery Manager does not have resubmission test case which 
 checks after the resubmission jobs get succeeded.
 There is some refactoring is also needed. 
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4404) Adding Test case for TestMRJobClient to verify the user name

2012-07-11 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412373#comment-13412373
 ] 

Konstantin Shvachko commented on MAPREDUCE-4404:


+1. Looks good.

 Adding Test case for TestMRJobClient to verify the user name
 

 Key: MAPREDUCE-4404
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4404
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Fix For: 0.22.1

 Attachments: MAPREDUCE-4404-22.patch


 Adding Test case for TestMRJobClient to verify the user name

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4403) Adding test case for resubmission of jobs in TestRecoveryManager

2012-07-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1349#comment-1349
 ] 

Konstantin Shvachko commented on MAPREDUCE-4403:


The code looks good. Two things:
# {{TEST_DIR = /tmp;}}
It is better if we don't use /tmp in the tests, as /tmp is not being cleaned up 
by the build clean target. I suggest to replace it with {{TEST_DIR = 
build/tmp;}}
# You need to wrap most of the code into {{try finally}}, where {{finally}} and 
call {{mr.shutdown()}} in the {{finally}} section. You are starting and 
stopping JobTracker in the test. When it fails the mini-cluster will not 
shutdown, which may affect subsequent tests.

 Adding test case for resubmission of jobs in TestRecoveryManager
 

 Key: MAPREDUCE-4403
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4403
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Attachments: MAPREDUCE-4403-22.patch


 In Hadoop 22 Test recovery Manager does not have resubmission test case which 
 checks after the resubmission jobs get succeeded.
 There is some refactoring is also needed. 
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4360) Capacity Scheduler Hierarchical leaf queue does not honur the max capacity of container queue

2012-06-29 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403731#comment-13403731
 ] 

Konstantin Shvachko commented on MAPREDUCE-4360:


Patch looks good. Wanted to commit it, but mumak tests are failing.
Run test-contrib target, you will see.

 Capacity Scheduler Hierarchical leaf queue does not honur the max capacity of 
 container queue
 -

 Key: MAPREDUCE-4360
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4360
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: MAPREDUCE-4360-22-1.patch, MAPREDUCE-4360-22.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4360) Capacity Scheduler Hierarchical leaf queue does not honur the max capacity of container queue

2012-06-29 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404132#comment-13404132
 ] 

Konstantin Shvachko commented on MAPREDUCE-4360:


Sorry my mistake. I had something else running on my box on port 50030, 
preventing SiulatorJobTracker to start for mumak. It's fine now. I'll commit 
the patch.
Will remove unused variable counter in the new test.

 Capacity Scheduler Hierarchical leaf queue does not honur the max capacity of 
 container queue
 -

 Key: MAPREDUCE-4360
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4360
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: MAPREDUCE-4360-22-1.patch, MAPREDUCE-4360-22.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-4360) Capacity Scheduler Hierarchical leaf queue does not honur the max capacity of container queue

2012-06-29 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved MAPREDUCE-4360.


  Resolution: Fixed
   Fix Version/s: 0.22.1
Target Version/s: 0.22.1
Hadoop Flags: Reviewed

I just committed this. Thank you Mayank.

 Capacity Scheduler Hierarchical leaf queue does not honur the max capacity of 
 container queue
 -

 Key: MAPREDUCE-4360
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4360
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 0.22.1

 Attachments: MAPREDUCE-4360-22-1.patch, MAPREDUCE-4360-22.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4360) Capacity Scheduler Hierarchical leaf queue does not honor the max capacity of container queue

2012-06-29 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-4360:
---

Summary: Capacity Scheduler Hierarchical leaf queue does not honor the max 
capacity of container queue  (was: Capacity Scheduler Hierarchical leaf queue 
does not honur the max capacity of container queue)

 Capacity Scheduler Hierarchical leaf queue does not honor the max capacity of 
 container queue
 -

 Key: MAPREDUCE-4360
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4360
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 0.22.1

 Attachments: MAPREDUCE-4360-22-1.patch, MAPREDUCE-4360-22.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker

2012-06-28 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403585#comment-13403585
 ] 

Konstantin Shvachko commented on MAPREDUCE-4342:


+1 for branch 0.22 patch

 Distributed Cache gives inconsistent result if cache files get deleted from 
 task tracker 
 -

 Key: MAPREDUCE-4342
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0, 1.0.3, trunk
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, 
 MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker

2012-06-28 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403589#comment-13403589
 ] 

Konstantin Shvachko commented on MAPREDUCE-4342:


I just committed this to branch 0.22. Thank you Mayank.
Is it applicable for trunk? If so could you please attach a patch.

 Distributed Cache gives inconsistent result if cache files get deleted from 
 task tracker 
 -

 Key: MAPREDUCE-4342
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0, 1.0.3, trunk
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, 
 MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker

2012-06-26 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401833#comment-13401833
 ] 

Konstantin Shvachko commented on MAPREDUCE-4342:


Mayank, a few more touches.
- Typo chechs in javaDoc. Should be checks.
- Please do cntrl-i on your new JavaDoc comments (in both files), this should 
indent them correctly.
- Add @SuppressWarnings(deprecation) before testCacheConsistency(). This 
should reduce javac warnings reported in your test-patch log.
- Then you will see that workDir variable is not used anywhere and can be 
removed.


 Distributed Cache gives inconsistent result if cache files get deleted from 
 task tracker 
 -

 Key: MAPREDUCE-4342
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0, 1.0.3, trunk
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, 
 MAPREDUCE-4342-22.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4360) Capacity Scheduler Hierarchical leaf queue does not honur the max capacity of container queue

2012-06-26 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401849#comment-13401849
 ] 

Konstantin Shvachko commented on MAPREDUCE-4360:


You should work on usual formatting issues.

So the fix as I understood is to build a hierarchy of queues by setting the 
parent member. Then find the parent that has maxCapacity set, and verify it is 
not exceeded. Sounds like the right approach.
Do you need to update the current capacity for the parent, when you allocate 
new slots for a child queue?

 Capacity Scheduler Hierarchical leaf queue does not honur the max capacity of 
 container queue
 -

 Key: MAPREDUCE-4360
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4360
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.1, trunk
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: MAPREDUCE-4360-22.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker

2012-06-18 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396386#comment-13396386
 ] 

Konstantin Shvachko commented on MAPREDUCE-4342:


Mayank, the patch is not applying as is. Namely the empty line change in 
TrackerDistributedCacheManager. You can just leave the line there. I did that, 
but then it is not compiling. You need to sync it with the repo.

- Could you also change is been to has been as Robert suggested.
- And add spaces between method parameters.
- Reporting the results of test-patch and test builds would very useful, since 
we don't have Jenkins to verify that for 0.22.

The fix looks good modular the jiras you opened.


 Distributed Cache gives inconsistent result if cache files get deleted from 
 task tracker 
 -

 Key: MAPREDUCE-4342
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0, 1.0.3, trunk
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4305) Implement delay scheduling in capacity scheduler for improving data locality

2012-06-07 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291444#comment-13291444
 ] 

Konstantin Shvachko commented on MAPREDUCE-4305:


Task locality is important. Interesting that it is only necessary to hook 
Capacity Scheduler up to the logic that already existed in JobInProgress etc. I 
went over the general logic of the patch. It looks good. But I have several 
formatting and code organization comments.
# Append _PROPERTY to new config key constants, e.g. 
NODE_LOCALITY_DELAY_PROPERTY. Looks like other constants in 
CapacitySchedulerConf are like that.
# Bend longs lines.
# In CapacitySchedulerConf convert comments describing variables to a JavaDoc.
# In initializeDefaults() you should use {{capacity-scheduler}} not 
{{fairscheduler}} config variables. Also since you introduced constants for the 
keys, use them rather than the raw keys.
# JobInfo is confusing because there is already a class with that name. Call it 
something like JobLocality. I'd rather move it into JobQueuesManager, because 
the latter maintains the map of those
# Correct indentations in CapacityTaskScheduler, particularly eliminate all 
tabs, should be spaces only.
# Add spaces between arguments, operators, and in some LOG messages.
# Add empty lines between new methods.
# updateLocalityWaitTimes() and updateLastMapLocalityLevel() should belong to 
JobQueuesManager, imo.
# JobQueuesManager.infos is a map keyed with JobInProgress. It'd be better to 
use JobID as a key?
# In TaskSchedulingMgr you need only one version of obtainNewTask to be 
abstract, the one with cachelevel parameter. The other one should not be 
abstract and just call the abstract obtainNewTask() with cachelevel set to any.


 Implement delay scheduling in capacity scheduler for improving data locality
 

 Key: MAPREDUCE-4305
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4305
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: MAPREDUCE-4305, MAPREDUCE-4305-1.patch


 Capacity Scheduler data local tasks are about 40%-50% which is not good.
 While my test with 70 node cluster i consistently get data locality around 
 40-50% on a free cluster.
 I think we need to implement something like delay scheduling in the capacity 
 scheduler for improving the data locality.
 http://radlab.cs.berkeley.edu/publication/308
 After implementing the delay scheduling on Hadoop 22 I am getting 100 % data 
 locality in free cluster and around 90% data locality in busy cluster.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4318) TestRecoveryManager should not use raw and deprecated configuration parameters.

2012-06-07 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated MAPREDUCE-4318:
---

Summary: TestRecoveryManager should not use raw and deprecated 
configuration parameters.  (was: TestRecoveryManagershould not use raw and 
deprecated configuration parameters.)

+1 Looks good to me.

 TestRecoveryManager should not use raw and deprecated configuration 
 parameters.
 ---

 Key: MAPREDUCE-4318
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4318
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.1
Reporter: Konstantin Shvachko
Assignee: Benoy Antony
 Attachments: MAPREDUCE-4318.patch


 TestRecoveryManager should not use deprecated config keys, and should use 
 constants for the keys where possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   >