[jira] [Created] (TEZ-3238) Shuffle service name should be configureable and should not be hardcoded to ‘mapreduce_shuffle’

2016-04-29 Thread Rob Leidle (JIRA)
Rob Leidle created TEZ-3238:
---

 Summary: Shuffle service name should be configureable and should 
not be hardcoded to ‘mapreduce_shuffle’
 Key: TEZ-3238
 URL: https://issues.apache.org/jira/browse/TEZ-3238
 Project: Apache Tez
  Issue Type: Improvement
Affects Versions: 0.8.2
Reporter: Rob Leidle


In YARN there are no guarantees that a shuffle service with a specific name 
exists. The setting '
yarn.nodemanager.aux-services’ can be filled with a list of names that can be 
defined by the Hadoop cluster administrator. It is merely by convention that 
many clusters have their MapReduce shuffle service named ‘mapreduce_shuffle’.

Tez is hard-coded to use a shuffle service named ‘mapreduce_shuffle’: 
https://github.com/apache/tez/blob/TEZ-8/tez-api/src/main/java/org/apache/tez/dag/api/TezConstants.java#L75.
 This name should be configureable. Also, this is a hidden dependency of Tez, 
we should add that the shuffle service is a dependency to the Tez 
documentation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


One to one edges and local fetch

2016-04-29 Thread Rohini Palaniswamy
I was under the assumption that we optimized 1-1 edge scheduling to reuse
containers or run as much as possible on same node and do local fetch. But
does not seem to be the case. Is there a jira for this already? Could not
find any.

Regards,
Rohini


[jira] [Created] (TEZ-3237) Corrupted shuffle transfers to disk are not detected during transfer

2016-04-29 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-3237:
---

 Summary: Corrupted shuffle transfers to disk are not detected 
during transfer
 Key: TEZ-3237
 URL: https://issues.apache.org/jira/browse/TEZ-3237
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Jason Lowe


When a shuffle transfer is larger than the single transfer limit it gets 
written straight to disk during the transfer.  Unfortunately there are no 
checksum validations performed during that transfer, so if the data is 
corrupted at the source or during transmit it goes undetected.  Only later when 
the task tries to consume the transferred data is the error detected, but at 
that point it's too late to blame the source task for the error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)