[jira] [Created] (MAPREDUCE-6914) Tests use assertTrue(....equals(...)) instead of assertEquals()

2017-07-17 Thread Daniel Templeton (JIRA)
Daniel Templeton created MAPREDUCE-6914:
---

 Summary: Tests use assertTrue(equals(...)) instead of 
assertEquals()
 Key: MAPREDUCE-6914
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6914
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: test
Affects Versions: 3.0.0-alpha4, 2.8.1
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6883) AuditLogger and TestAuditLogger are dead code

2017-05-03 Thread Daniel Templeton (JIRA)
Daniel Templeton created MAPREDUCE-6883:
---

 Summary: AuditLogger and TestAuditLogger are dead code
 Key: MAPREDUCE-6883
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6883
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 2.8.0
Reporter: Daniel Templeton
Priority: Minor


The {{AuditLogger}} and {{TestAuditLogger}} classes appear to be dead code.  I 
can't find anything that uses or references {{AuditLogger}}.  No one has 
touched the code 2011.  I think it's safe to remove.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6864) Hadoop streaming creates 2 mappers when the input has only one block

2017-03-17 Thread Daniel Templeton (JIRA)
Daniel Templeton created MAPREDUCE-6864:
---

 Summary: Hadoop streaming creates 2 mappers when the input has 
only one block
 Key: MAPREDUCE-6864
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6864
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.7.3
Reporter: Daniel Templeton


If a streaming job is run against input that is less than 2 blocks, 2 mappers 
will be created, both operating on the same split, both producing (duplicate) 
output.  In some cases the second mapper will consistently fail.  I've not seen 
the failure with input less than 10 bytes or more than a couple MB.  I have 
seen it with a 4kB input.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6848) MRApps.setMRFrameworkClasspath() unnecessarily declares that it throws IOException

2017-02-15 Thread Daniel Templeton (JIRA)
Daniel Templeton created MAPREDUCE-6848:
---

 Summary: MRApps.setMRFrameworkClasspath() unnecessarily declares 
that it throws IOException
 Key: MAPREDUCE-6848
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6848
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 2.8.0
Reporter: Daniel Templeton
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6837) Add an equivalent to Crunch's Pair class

2017-01-26 Thread Daniel Templeton (JIRA)
Daniel Templeton created MAPREDUCE-6837:
---

 Summary: Add an equivalent to Crunch's Pair class
 Key: MAPREDUCE-6837
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6837
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Reporter: Daniel Templeton


Crunch has this great {{Pair}} class 
(https://crunch.apache.org/apidocs/0.14.0/org/apache/crunch/Pair.html) that 
save you from constantly implementing composite writables.  It seems silly that 
we still don't have an equivalent in MR.

I would like to see a new class with the following API:

{code}
package org.apache.hadoop.io;

public class CompositeWritable implements WritableComparable {
  public CompositeWritable(P primary, S secondary);
  public P getPrimary();
  public void setPrimary(P primary);
  public S getSecondary();
  public void setSecondary(S secondary);

  // Return true if both primaries and both secondaries are equal
  public boolean equals(CompositeWritable o);

  // Return the primary's hash code
  public long hashCode();

  // Sort first by primary and then by secondary
  public int compareTo(CompositeWritable o);

  public void readFields(DataInput in);
  public void write(DataOutput out);
}
{code}

With such a class, implementing a secondary sort would mean just implementing a 
custom grouping comparator.  That comparator could be implemented as part of 
this JIRA:

{code}
package org.apache.hadoop.io;

public class CompositeGroupingComparator extends WritableComparator {
  ...
}
{code}

Or some such.

Crunch also provides {{Tuple3}}, {{Tuple4}}, and {{TupleN}} classes, but I 
don't think we need to add equivalents.  If someone really wants that 
capability, they can nest composite keys.

Don't forget to add unit tests!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6827) Failed to traverse Iterable values the second time in reduce() method

2017-01-03 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved MAPREDUCE-6827.
-
Resolution: Not A Problem

That is known, documented, and intended behavior.  The {{ValueIterator}}'s 
{{hasNext()}} and {{next()}} methods defer defer to the {{ReduceContextImpl}}'s 
{{BackupStore}} instance, so creating a new iterator won't help.  The reason we 
only go through the values once is to allow the data to be efficiently streamed.

> Failed to traverse Iterable values the second time in reduce() method
> -
>
> Key: MAPREDUCE-6827
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6827
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 3.0.0-alpha1
> Environment: hadoop2.7.3
>Reporter: javaloveme
>
> Failed to traverse Iterable values the second time in reduce() method
> The following code is a reduce() method (of WordCount):
> {code:title=WordCount.java|borderStyle=solid}
>   public static class WcReducer extends Reducer IntWritable> {
>   @Override
>   protected void reduce(Text key, Iterable values, 
> Context context)
>   throws IOException, InterruptedException {
>   // print some logs
>   List vals = new LinkedList<>();
>   for(IntWritable i : values) {
>   vals.add(i.toString());
>   }
>   System.out.println(String.format(" reduce(%s, 
> [%s])",
>   key, String.join(", ", vals)));
>   // sum of values
>   int sum = 0;
>   for(IntWritable i : values) {
>   sum += i.get();
>   }
>   System.out.println(String.format(" reduced(%s, %s)",
>   key, sum));
>   
>   context.write(key, new IntWritable(sum));
>   }   
>   }
> {code}
> After running it, we got the result that all sums were zero!
> After debugging, it was found that the second foreach-loop was not executed, 
> and the root cause was the returned value of Iterable.iterator(), it returned 
> the same instance in the two calls called by foreach-loop. In general, 
> Iterable.iterator() should return a new instance in each call, such as 
> ArrayList.iterator().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6776) yarn.app.mapreduce.client.job.max-retries should have a more useful default

2016-09-12 Thread Daniel Templeton (JIRA)
Daniel Templeton created MAPREDUCE-6776:
---

 Summary: yarn.app.mapreduce.client.job.max-retries should have a 
more useful default
 Key: MAPREDUCE-6776
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6776
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 2.8.0
Reporter: Daniel Templeton
Assignee: Daniel Templeton


The default is 0, so any communication results in a client failure.  Oozie 
doesn't like that.  If the RM is failing over and Oozie gets a communication 
failure, it assumes the target job has failed.  I propose raising the default 
to something modest like 3 or 5.  The default retry interval is 2s.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6560) ClientServiceDelegate doesn't handle retries during AM restart as intended

2016-08-31 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved MAPREDUCE-6560.
-
Resolution: Invalid

Looks like I was just wrong.

> ClientServiceDelegate doesn't handle retries during AM restart as intended
> --
>
> Key: MAPREDUCE-6560
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6560
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> In the {{invoke()}} method, I found the following code:
> {code}
>   private AtomicBoolean usingAMProxy = new AtomicBoolean(false);
> ...
> // if it's AM shut down, do not decrement maxClientRetry as we wait 
> for
> // AM to be restarted.
> if (!usingAMProxy.get()) {
>   maxClientRetry--;
> }
> usingAMProxy.set(false);
> {code}
> When we create the AM proxy, we set the flag to true.  If we fail to connect, 
> the impact of the flag being true is that the code will try one extra time, 
> giving it 400ms instead of just 300ms.  I can't imagine that's the intended 
> behavior.  After any failure, the flag will forever more be false, but 
> fortunately (?!?) the flag is otherwise unused.
> Looks like I need to do some archeology to figure out how we ended up here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6719) -libjars should use wildcards to reduce the application footprint in the state store

2016-06-20 Thread Daniel Templeton (JIRA)
Daniel Templeton created MAPREDUCE-6719:
---

 Summary: -libjars should use wildcards to reduce the application 
footprint in the state store
 Key: MAPREDUCE-6719
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6719
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 2.8.0
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6714) Refactor UncompressedSplitLineReader.fillBuffer()

2016-06-09 Thread Daniel Templeton (JIRA)
Daniel Templeton created MAPREDUCE-6714:
---

 Summary: Refactor UncompressedSplitLineReader.fillBuffer()
 Key: MAPREDUCE-6714
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6714
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: Daniel Templeton


MAPREDUCE-6635 made this change:

{code}
-  maxBytesToRead = Math.min(maxBytesToRead,
-(int)(splitLength - totalBytesRead));
+  long leftBytesForSplit = splitLength - totalBytesRead;
+  // check if leftBytesForSplit exceed Integer.MAX_VALUE
+  if (leftBytesForSplit <= Integer.MAX_VALUE) {
+maxBytesToRead = Math.min(maxBytesToRead, (int)leftBytesForSplit);
+  }
{code}

The result is one more comparison than necessary and code that's a little 
convoluted.  The code can be simplified as:

{code}
  long leftBytesForSplit = splitLength - totalBytesRead;

  if (leftBytesForSplit < maxBytesToRead) {
maxBytesToRead = (int)leftBytesForSplit;
  }
{code}

The comparison will auto promote {{maxBytesToRead}}, making it safe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6702) TestMiniMRChildTask.testTaskEnv and TestMiniMRChildTask.testTaskOldEnv are failing

2016-05-17 Thread Daniel Templeton (JIRA)
Daniel Templeton created MAPREDUCE-6702:
---

 Summary: TestMiniMRChildTask.testTaskEnv and 
TestMiniMRChildTask.testTaskOldEnv are failing
 Key: MAPREDUCE-6702
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6702
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: client
Affects Versions: 3.0.0-alpha1
Reporter: Daniel Templeton






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6632) Master.getMasterAddress() should be updated to use YARN-4629

2016-02-10 Thread Daniel Templeton (JIRA)
Daniel Templeton created MAPREDUCE-6632:
---

 Summary: Master.getMasterAddress() should be updated to use 
YARN-4629
 Key: MAPREDUCE-6632
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6632
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Minor


The new {{YarnClientUtil.getRmPrincipal()}} method can replace most of the 
{{Master.getMasterAddress()}} method and should to reduce redundancy and 
improve servicability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6620) Jobs that did not start are shown as starting in 1969 in the JHS web UI

2016-01-28 Thread Daniel Templeton (JIRA)
Daniel Templeton created MAPREDUCE-6620:
---

 Summary: Jobs that did not start are shown as starting in 1969 in 
the JHS web UI
 Key: MAPREDUCE-6620
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6620
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 2.7.2
Reporter: Daniel Templeton
Assignee: Daniel Templeton


If a job fails, its start time is stored as -1.  The RM UI correctly handles 
negative start times.  The JHS UI does not, blindly converting it into a date 
in 1969.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6575) TestMRJobs.setup() should use YarnConfiguration properties instead of bare strings

2015-12-16 Thread Daniel Templeton (JIRA)
Daniel Templeton created MAPREDUCE-6575:
---

 Summary: TestMRJobs.setup() should use YarnConfiguration 
properties instead of bare strings
 Key: MAPREDUCE-6575
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6575
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Daniel Templeton
Assignee: Daniel Templeton


YARN-5870 introduced the following line:

{code}
  conf.setInt("yarn.cluster.max-application-priority", 10);
{code}

It should instead be:

{code}
  conf.setInt(YarnConfiguration.MAX_CLUSTER_LEVEL_APPLICATION_PRIORITY, 10);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)