[jira] [Created] (MAPREDUCE-6914) Tests use assertTrue(....equals(...)) instead of assertEquals()
Daniel Templeton created MAPREDUCE-6914: --- Summary: Tests use assertTrue(equals(...)) instead of assertEquals() Key: MAPREDUCE-6914 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6914 Project: Hadoop Map/Reduce Issue Type: Improvement Components: test Affects Versions: 3.0.0-alpha4, 2.8.1 Reporter: Daniel Templeton Assignee: Daniel Templeton Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6883) AuditLogger and TestAuditLogger are dead code
Daniel Templeton created MAPREDUCE-6883: --- Summary: AuditLogger and TestAuditLogger are dead code Key: MAPREDUCE-6883 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6883 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 2.8.0 Reporter: Daniel Templeton Priority: Minor The {{AuditLogger}} and {{TestAuditLogger}} classes appear to be dead code. I can't find anything that uses or references {{AuditLogger}}. No one has touched the code 2011. I think it's safe to remove. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6864) Hadoop streaming creates 2 mappers when the input has only one block
Daniel Templeton created MAPREDUCE-6864: --- Summary: Hadoop streaming creates 2 mappers when the input has only one block Key: MAPREDUCE-6864 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6864 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.7.3 Reporter: Daniel Templeton If a streaming job is run against input that is less than 2 blocks, 2 mappers will be created, both operating on the same split, both producing (duplicate) output. In some cases the second mapper will consistently fail. I've not seen the failure with input less than 10 bytes or more than a couple MB. I have seen it with a 4kB input. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6848) MRApps.setMRFrameworkClasspath() unnecessarily declares that it throws IOException
Daniel Templeton created MAPREDUCE-6848: --- Summary: MRApps.setMRFrameworkClasspath() unnecessarily declares that it throws IOException Key: MAPREDUCE-6848 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6848 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.8.0 Reporter: Daniel Templeton Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6837) Add an equivalent to Crunch's Pair class
Daniel Templeton created MAPREDUCE-6837: --- Summary: Add an equivalent to Crunch's Pair class Key: MAPREDUCE-6837 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6837 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Reporter: Daniel Templeton Crunch has this great {{Pair}} class (https://crunch.apache.org/apidocs/0.14.0/org/apache/crunch/Pair.html) that save you from constantly implementing composite writables. It seems silly that we still don't have an equivalent in MR. I would like to see a new class with the following API: {code} package org.apache.hadoop.io; public class CompositeWritable implements WritableComparable { public CompositeWritable(P primary, S secondary); public P getPrimary(); public void setPrimary(P primary); public S getSecondary(); public void setSecondary(S secondary); // Return true if both primaries and both secondaries are equal public boolean equals(CompositeWritable o); // Return the primary's hash code public long hashCode(); // Sort first by primary and then by secondary public int compareTo(CompositeWritable o); public void readFields(DataInput in); public void write(DataOutput out); } {code} With such a class, implementing a secondary sort would mean just implementing a custom grouping comparator. That comparator could be implemented as part of this JIRA: {code} package org.apache.hadoop.io; public class CompositeGroupingComparator extends WritableComparator { ... } {code} Or some such. Crunch also provides {{Tuple3}}, {{Tuple4}}, and {{TupleN}} classes, but I don't think we need to add equivalents. If someone really wants that capability, they can nest composite keys. Don't forget to add unit tests! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-6827) Failed to traverse Iterable values the second time in reduce() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton resolved MAPREDUCE-6827. - Resolution: Not A Problem That is known, documented, and intended behavior. The {{ValueIterator}}'s {{hasNext()}} and {{next()}} methods defer defer to the {{ReduceContextImpl}}'s {{BackupStore}} instance, so creating a new iterator won't help. The reason we only go through the values once is to allow the data to be efficiently streamed. > Failed to traverse Iterable values the second time in reduce() method > - > > Key: MAPREDUCE-6827 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6827 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 3.0.0-alpha1 > Environment: hadoop2.7.3 >Reporter: javaloveme > > Failed to traverse Iterable values the second time in reduce() method > The following code is a reduce() method (of WordCount): > {code:title=WordCount.java|borderStyle=solid} > public static class WcReducer extends ReducerIntWritable> { > @Override > protected void reduce(Text key, Iterable values, > Context context) > throws IOException, InterruptedException { > // print some logs > List vals = new LinkedList<>(); > for(IntWritable i : values) { > vals.add(i.toString()); > } > System.out.println(String.format(" reduce(%s, > [%s])", > key, String.join(", ", vals))); > // sum of values > int sum = 0; > for(IntWritable i : values) { > sum += i.get(); > } > System.out.println(String.format(" reduced(%s, %s)", > key, sum)); > > context.write(key, new IntWritable(sum)); > } > } > {code} > After running it, we got the result that all sums were zero! > After debugging, it was found that the second foreach-loop was not executed, > and the root cause was the returned value of Iterable.iterator(), it returned > the same instance in the two calls called by foreach-loop. In general, > Iterable.iterator() should return a new instance in each call, such as > ArrayList.iterator(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6776) yarn.app.mapreduce.client.job.max-retries should have a more useful default
Daniel Templeton created MAPREDUCE-6776: --- Summary: yarn.app.mapreduce.client.job.max-retries should have a more useful default Key: MAPREDUCE-6776 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6776 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 2.8.0 Reporter: Daniel Templeton Assignee: Daniel Templeton The default is 0, so any communication results in a client failure. Oozie doesn't like that. If the RM is failing over and Oozie gets a communication failure, it assumes the target job has failed. I propose raising the default to something modest like 3 or 5. The default retry interval is 2s. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-6560) ClientServiceDelegate doesn't handle retries during AM restart as intended
[ https://issues.apache.org/jira/browse/MAPREDUCE-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton resolved MAPREDUCE-6560. - Resolution: Invalid Looks like I was just wrong. > ClientServiceDelegate doesn't handle retries during AM restart as intended > -- > > Key: MAPREDUCE-6560 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6560 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Daniel Templeton >Assignee: Daniel Templeton > > In the {{invoke()}} method, I found the following code: > {code} > private AtomicBoolean usingAMProxy = new AtomicBoolean(false); > ... > // if it's AM shut down, do not decrement maxClientRetry as we wait > for > // AM to be restarted. > if (!usingAMProxy.get()) { > maxClientRetry--; > } > usingAMProxy.set(false); > {code} > When we create the AM proxy, we set the flag to true. If we fail to connect, > the impact of the flag being true is that the code will try one extra time, > giving it 400ms instead of just 300ms. I can't imagine that's the intended > behavior. After any failure, the flag will forever more be false, but > fortunately (?!?) the flag is otherwise unused. > Looks like I need to do some archeology to figure out how we ended up here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6719) -libjars should use wildcards to reduce the application footprint in the state store
Daniel Templeton created MAPREDUCE-6719: --- Summary: -libjars should use wildcards to reduce the application footprint in the state store Key: MAPREDUCE-6719 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6719 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 2.8.0 Reporter: Daniel Templeton Assignee: Daniel Templeton Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6714) Refactor UncompressedSplitLineReader.fillBuffer()
Daniel Templeton created MAPREDUCE-6714: --- Summary: Refactor UncompressedSplitLineReader.fillBuffer() Key: MAPREDUCE-6714 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6714 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.8.0 Reporter: Daniel Templeton MAPREDUCE-6635 made this change: {code} - maxBytesToRead = Math.min(maxBytesToRead, -(int)(splitLength - totalBytesRead)); + long leftBytesForSplit = splitLength - totalBytesRead; + // check if leftBytesForSplit exceed Integer.MAX_VALUE + if (leftBytesForSplit <= Integer.MAX_VALUE) { +maxBytesToRead = Math.min(maxBytesToRead, (int)leftBytesForSplit); + } {code} The result is one more comparison than necessary and code that's a little convoluted. The code can be simplified as: {code} long leftBytesForSplit = splitLength - totalBytesRead; if (leftBytesForSplit < maxBytesToRead) { maxBytesToRead = (int)leftBytesForSplit; } {code} The comparison will auto promote {{maxBytesToRead}}, making it safe. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6702) TestMiniMRChildTask.testTaskEnv and TestMiniMRChildTask.testTaskOldEnv are failing
Daniel Templeton created MAPREDUCE-6702: --- Summary: TestMiniMRChildTask.testTaskEnv and TestMiniMRChildTask.testTaskOldEnv are failing Key: MAPREDUCE-6702 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6702 Project: Hadoop Map/Reduce Issue Type: Test Components: client Affects Versions: 3.0.0-alpha1 Reporter: Daniel Templeton -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6632) Master.getMasterAddress() should be updated to use YARN-4629
Daniel Templeton created MAPREDUCE-6632: --- Summary: Master.getMasterAddress() should be updated to use YARN-4629 Key: MAPREDUCE-6632 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6632 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster Reporter: Daniel Templeton Assignee: Daniel Templeton Priority: Minor The new {{YarnClientUtil.getRmPrincipal()}} method can replace most of the {{Master.getMasterAddress()}} method and should to reduce redundancy and improve servicability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6620) Jobs that did not start are shown as starting in 1969 in the JHS web UI
Daniel Templeton created MAPREDUCE-6620: --- Summary: Jobs that did not start are shown as starting in 1969 in the JHS web UI Key: MAPREDUCE-6620 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6620 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 2.7.2 Reporter: Daniel Templeton Assignee: Daniel Templeton If a job fails, its start time is stored as -1. The RM UI correctly handles negative start times. The JHS UI does not, blindly converting it into a date in 1969. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6575) TestMRJobs.setup() should use YarnConfiguration properties instead of bare strings
Daniel Templeton created MAPREDUCE-6575: --- Summary: TestMRJobs.setup() should use YarnConfiguration properties instead of bare strings Key: MAPREDUCE-6575 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6575 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Daniel Templeton Assignee: Daniel Templeton YARN-5870 introduced the following line: {code} conf.setInt("yarn.cluster.max-application-priority", 10); {code} It should instead be: {code} conf.setInt(YarnConfiguration.MAX_CLUSTER_LEVEL_APPLICATION_PRIORITY, 10); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)