[jira] [Commented] (TEZ-14) Support for speculation of slow tasks

2014-11-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195367#comment-14195367 ] Jason Lowe commented on TEZ-14: --- Apologies for the late reply. I haven't had time to look at

[jira] [Commented] (TEZ-2018) Job Tracking and History URL should point to the Tez UI

2015-02-04 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305247#comment-14305247 ] Jason Lowe commented on TEZ-2018: - bq. Maybe this plugin could be enhanced to do the

[jira] [Commented] (TEZ-2073) SimpleHistoryLoggingService cannot be read by log aggregation (umask)

2015-02-11 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316386#comment-14316386 ] Jason Lowe commented on TEZ-2073: - bq. Is the fs.permissions.umask-mode applicable to all

[jira] [Commented] (TEZ-2073) SimpleHistoryLoggingService cannot be read by log aggregation (umask)

2015-02-11 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317013#comment-14317013 ] Jason Lowe commented on TEZ-2073: - +1 lgtm. RawLocalFileSystem explicitly overrides the

[jira] [Commented] (TEZ-2393) Tez pickup PATH env from gateway machine

2015-05-04 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527296#comment-14527296 ] Jason Lowe commented on TEZ-2393: - I think the main problems will be from anyone who

[jira] [Commented] (TEZ-2311) AM can hang if kill received while recovering from previous attempt

2015-05-15 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545652#comment-14545652 ] Jason Lowe commented on TEZ-2311: - It's kind of a pain to scrub the logs for posting, but I

[jira] [Commented] (TEZ-2319) DAG history in HDFS

2015-04-15 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496800#comment-14496800 ] Jason Lowe commented on TEZ-2319: - MR does not dump the final state all at once, rather it

[jira] [Updated] (TEZ-2304) InvalidStateTransitonException TA_SCHEDULE at START_WAIT during recovery

2015-04-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-2304: Attachment: 168563_recovery.gz Posting the logs of the second AM attempt up to the point of the first invalid

[jira] [Created] (TEZ-2311) AM can hang if kill received while recovering from previous attempt

2015-04-13 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-2311: --- Summary: AM can hang if kill received while recovering from previous attempt Key: TEZ-2311 URL: https://issues.apache.org/jira/browse/TEZ-2311 Project: Apache Tez

[jira] [Commented] (TEZ-2311) AM can hang if kill received while recovering from previous attempt

2015-04-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492617#comment-14492617 ] Jason Lowe commented on TEZ-2311: - The AM appeared to hang because it was still waiting for

[jira] [Commented] (TEZ-2304) InvalidStateTransitonException TA_SCHEDULE at START_WAIT during recovery

2015-04-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488258#comment-14488258 ] Jason Lowe commented on TEZ-2304: - Log snippets showing state transitions and eventual

[jira] [Created] (TEZ-2303) ConcurrentModificationException while processing recovery

2015-04-09 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-2303: --- Summary: ConcurrentModificationException while processing recovery Key: TEZ-2303 URL: https://issues.apache.org/jira/browse/TEZ-2303 Project: Apache Tez Issue Type:

[jira] [Created] (TEZ-2304) InvalidStateTransitonException TA_SCHEDULE at START_WAIT during recovery

2015-04-09 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-2304: --- Summary: InvalidStateTransitonException TA_SCHEDULE at START_WAIT during recovery Key: TEZ-2304 URL: https://issues.apache.org/jira/browse/TEZ-2304 Project: Apache Tez

[jira] [Commented] (TEZ-2303) ConcurrentModificationException while processing recovery

2015-04-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488219#comment-14488219 ] Jason Lowe commented on TEZ-2303: - {noformat} 2015-04-09 19:36:11,231 INFO [main]

[jira] [Updated] (TEZ-2485) Reduce the Resource Load on the Timeline Server

2015-06-04 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-2485: Attachment: ats-omit-dup-display-names-and-zero-counters_v2.patch Minor fix to patch, was trying to apply

[jira] [Updated] (TEZ-2485) Reduce the Resource Load on the Timeline Server

2015-06-04 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-2485: Attachment: ats-omit-dup-display-names-and-zero-counters.patch Posting a prototype patch that does two main

[jira] [Commented] (TEZ-2549) Reduce Counter Load on the Timeline Server

2015-06-17 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590735#comment-14590735 ] Jason Lowe commented on TEZ-2549: - bq. can you shed some more light on the counter proto

[jira] [Commented] (TEZ-2018) App Tracking and History URL should point to the Tez UI

2015-06-17 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589809#comment-14589809 ] Jason Lowe commented on TEZ-2018: - bq. Will Application History Server continue to hold the

[jira] [Commented] (TEZ-2549) Reduce Counter Load on the Timeline Server

2015-06-17 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589854#comment-14589854 ] Jason Lowe commented on TEZ-2549: - Thanks for moving the patch forward, Jon. Couple of

[jira] [Created] (TEZ-2711) Tez fails to submit a job where user.name is not UGI user

2015-08-11 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-2711: --- Summary: Tez fails to submit a job where user.name is not UGI user Key: TEZ-2711 URL: https://issues.apache.org/jira/browse/TEZ-2711 Project: Apache Tez Issue Type:

[jira] [Commented] (TEZ-2711) Tez fails to submit a job where user.name is not UGI user

2015-08-11 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692287#comment-14692287 ] Jason Lowe commented on TEZ-2711: - This same scenario works for MapReduce jobs because the

[jira] [Updated] (TEZ-2628) History logging plugin to write ATS events to HDFS

2015-08-20 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-2628: Attachment: TEZ-2628.002.patch Yes, there's a problem with retention on a secure cluster. The timeline

[jira] [Commented] (TEZ-2726) Handle invalid number of partitions for SCATTER-GATHER edge

2015-08-17 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699963#comment-14699963 ] Jason Lowe commented on TEZ-2726: - +1 for throwing an exception. I think it could be

[jira] [Created] (TEZ-2677) NPE while submitting MRRSleepJob if tez.runtime.sort.threads is set

2015-07-31 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-2677: --- Summary: NPE while submitting MRRSleepJob if tez.runtime.sort.threads is set Key: TEZ-2677 URL: https://issues.apache.org/jira/browse/TEZ-2677 Project: Apache Tez

[jira] [Created] (TEZ-2679) Admin forms of launch env and java opts settings

2015-07-31 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-2679: --- Summary: Admin forms of launch env and java opts settings Key: TEZ-2679 URL: https://issues.apache.org/jira/browse/TEZ-2679 Project: Apache Tez Issue Type:

[jira] [Updated] (TEZ-2679) Admin forms of launch env settings

2015-07-31 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-2679: Summary: Admin forms of launch env settings (was: Admin forms of launch env and java opts settings) Thanks,

[jira] [Commented] (TEZ-2654) Support for getting counters of running tasks

2015-07-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646232#comment-14646232 ] Jason Lowe commented on TEZ-2654: - Since we already provide a REST service for getting dag

[jira] [Commented] (TEZ-2628) History logging plugin to write ATS events to HDFS

2015-08-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696028#comment-14696028 ] Jason Lowe commented on TEZ-2628: - One advantage of getting Hive to post this via HDFS is

[jira] [Commented] (TEZ-2628) History logging plugin to write ATS events to HDFS

2015-08-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695967#comment-14695967 ] Jason Lowe commented on TEZ-2628: - Alternatively, if the Hive server knows which app ID the

[jira] [Commented] (TEZ-2628) History logging plugin to write ATS events to HDFS

2015-08-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695937#comment-14695937 ] Jason Lowe commented on TEZ-2628: - bq. It seems to me that all the code in EntityLogger

[jira] [Commented] (TEZ-2311) AM can hang if kill received while recovering from previous attempt

2015-07-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638970#comment-14638970 ] Jason Lowe commented on TEZ-2311: - Ideally we would like this fixed in a 0.7 patch release

[jira] [Updated] (TEZ-2628) History logging plugin to write ATS events to HDFS

2015-07-20 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-2628: Attachment: TEZ-2628.001.patch Posting a prototype patch a bit early since there was some interest expressed

[jira] [Created] (TEZ-2628) History logging plugin to write ATS events to HDFS

2015-07-20 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-2628: --- Summary: History logging plugin to write ATS events to HDFS Key: TEZ-2628 URL: https://issues.apache.org/jira/browse/TEZ-2628 Project: Apache Tez Issue Type:

[jira] [Commented] (TEZ-2581) Umbrella for Tez Recovery Redesign

2015-10-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967361#comment-14967361 ] Jason Lowe commented on TEZ-2581: - I noticed the document doesn't discuss much about how user-provided code

[jira] [Commented] (TEZ-808) Handle task attempts that are not making progress

2015-10-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981205#comment-14981205 ] Jason Lowe commented on TEZ-808: Thanks for updating the patch, Bikas! Patch still doesn't treat 0 as a

[jira] [Commented] (TEZ-808) Handle task attempts that are not making progress

2015-10-27 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977056#comment-14977056 ] Jason Lowe commented on TEZ-808: bq. If we use a boolean, then I think it will be fine to not use volatile

[jira] [Commented] (TEZ-2914) Ability to limit vertex concurrency

2015-10-28 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979050#comment-14979050 ] Jason Lowe commented on TEZ-2914: - If we're trying to port the capability of MAPREDUCE-5583 then it would be

[jira] [Commented] (TEZ-808) Handle task attempts that are not making progress

2015-10-26 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974432#comment-14974432 ] Jason Lowe commented on TEZ-808: Thanks for the patch, Bikas! I haven't had a chance to look at it in great

[jira] [Commented] (TEZ-2581) Umbrella for Tez Recovery Redesign

2015-10-22 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969319#comment-14969319 ] Jason Lowe commented on TEZ-2581: - bq. Ideally we should provide API in VertexMangerPlugin to allow user to

[jira] [Updated] (TEZ-808) Handle task attempts that are not making progress

2015-11-10 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-808: --- Attachment: TEZ-808.branch-0.7.patch Would it be possible to backport this to branch-0.7? We're going to be on

[jira] [Commented] (TEZ-2918) Make progress notifications in IOs

2015-11-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988152#comment-14988152 ] Jason Lowe commented on TEZ-2918: - Latest patch lgtm, assuming the test is unrelated. Only nit is that it's

[jira] [Commented] (TEZ-2918) Make progress notifications in IOs

2015-11-02 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986084#comment-14986084 ] Jason Lowe commented on TEZ-2918: - Thanks for the patch, Bikas! I agree that using AtomicBoolean.lazySet is

[jira] [Assigned] (TEZ-2886) Ability to merge AM credentials with DAG credentials

2015-10-14 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned TEZ-2886: --- Assignee: Jason Lowe bq. Did you mean "AM NodeManager" and "Other NodeManagers" ? Technically yes.

[jira] [Commented] (TEZ-2679) Admin forms of launch env settings

2015-10-16 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961288#comment-14961288 ] Jason Lowe commented on TEZ-2679: - +1 looks OK to me. This makes the behavior more inline with how

[jira] [Commented] (TEZ-808) Handle task attempts that are not making progress

2015-10-14 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957536#comment-14957536 ] Jason Lowe commented on TEZ-808: bq. Fixing IOs vs Fixing processor callback - which one of these would

[jira] [Commented] (TEZ-2679) Admin forms of launch env settings

2015-10-14 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957624#comment-14957624 ] Jason Lowe commented on TEZ-2679: - bq. Before they would only get './' and now they will get './' +

[jira] [Commented] (TEZ-808) Handle task attempts that are not making progress

2015-10-14 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957683#comment-14957683 ] Jason Lowe commented on TEZ-808: Sounds pretty good to me. Only suggestion is to have a pure progress() API

[jira] [Commented] (TEZ-808) Handle task attempts that are not making progress

2015-10-14 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957448#comment-14957448 ] Jason Lowe commented on TEZ-808: bq. Like I said in item 1) above - add finer grained updates of processed

[jira] [Commented] (TEZ-2872) Tez AM can be overwhelmed by TezTaskUmbilicalProtocol.getTask responses

2015-10-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949368#comment-14949368 ] Jason Lowe commented on TEZ-2872: - This is similar to a scenario MapReduce encountered before, see

[jira] [Created] (TEZ-2872) Tez AM can be overwhelmed by TezTaskUmbilicalProtocol.getTask responses

2015-10-08 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-2872: --- Summary: Tez AM can be overwhelmed by TezTaskUmbilicalProtocol.getTask responses Key: TEZ-2872 URL: https://issues.apache.org/jira/browse/TEZ-2872 Project: Apache Tez

[jira] [Commented] (TEZ-2872) Tez AM can be overwhelmed by TezTaskUmbilicalProtocol.getTask responses

2015-10-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949535#comment-14949535 ] Jason Lowe commented on TEZ-2872: - The problem with TEZ-754 is that it only helps when containers are

[jira] [Updated] (TEZ-2628) History logging plugin to write ATS events to HDFS

2015-10-12 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-2628: Attachment: TEZ-2628.004.patch Minor update to the patch to fix a bug that [~jeagles] pointed out offline.

[jira] [Commented] (TEZ-808) Handle task attempts that are not making progress

2015-10-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955483#comment-14955483 ] Jason Lowe commented on TEZ-808: Correct, in this latest case the tasks were part of Pig streaming jobs and

[jira] [Created] (TEZ-2886) Ability to merge AM credentials with DAG credentials

2015-10-13 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-2886: --- Summary: Ability to merge AM credentials with DAG credentials Key: TEZ-2886 URL: https://issues.apache.org/jira/browse/TEZ-2886 Project: Apache Tez Issue Type:

[jira] [Commented] (TEZ-2886) Ability to merge AM credentials with DAG credentials

2015-10-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955549#comment-14955549 ] Jason Lowe commented on TEZ-2886: - For example, the RM will automatically add tokens for the log aggregation

[jira] [Commented] (TEZ-808) Handle task attempts that are not making progress

2015-10-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955389#comment-14955389 ] Jason Lowe commented on TEZ-808: Just ran across the lack of this for some Tez jobs that hung forever. Tasks

[jira] [Commented] (TEZ-808) Handle task attempts that are not making progress

2015-10-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955678#comment-14955678 ] Jason Lowe commented on TEZ-808: No, we can't key off the progress field. In practice progress can go

[jira] [Commented] (TEZ-2864) Vertex group output commit overwrites without failing on conflict

2015-10-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949296#comment-14949296 ] Jason Lowe commented on TEZ-2864: - Is this a Tez issue? I'm not sure how Tez is supposed to know about, and

[jira] [Commented] (TEZ-2628) History logging plugin to write ATS events to HDFS

2015-08-26 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715831#comment-14715831 ] Jason Lowe commented on TEZ-2628: - I believe this is a bug in the MemoryTimelineStore. The

[jira] [Created] (TEZ-2787) Tez AM should have java.io.tmpdir=./tmp to be consistent with tasks

2015-09-08 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-2787: --- Summary: Tez AM should have java.io.tmpdir=./tmp to be consistent with tasks Key: TEZ-2787 URL: https://issues.apache.org/jira/browse/TEZ-2787 Project: Apache Tez

[jira] [Commented] (TEZ-2628) History logging plugin to write ATS events to HDFS

2015-09-28 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933876#comment-14933876 ] Jason Lowe commented on TEZ-2628: - Yes, sorry this wasn't clear. The group of the the timeline server

[jira] [Commented] (TEZ-2914) Ability to limit vertex concurrency

2015-12-01 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034642#comment-15034642 ] Jason Lowe commented on TEZ-2914: - One advantage to doing it at the YARN level is that we can tell the RM

[jira] [Comment Edited] (TEZ-2914) Ability to limit vertex concurrency

2015-12-01 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034642#comment-15034642 ] Jason Lowe edited comment on TEZ-2914 at 12/1/15 9:44 PM: -- One advantage to doing

[jira] [Updated] (TEZ-2972) Ability for Tez AM to ignore node updates from YARN

2015-12-04 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-2972: Attachment: TEZ-2972.001.patch Patch that adds a tez.am.node-updates.enabled property to control whether the

[jira] [Created] (TEZ-2972) Ability for Tez AM to ignore node updates from YARN

2015-12-04 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-2972: --- Summary: Ability for Tez AM to ignore node updates from YARN Key: TEZ-2972 URL: https://issues.apache.org/jira/browse/TEZ-2972 Project: Apache Tez Issue Type:

[jira] [Assigned] (TEZ-2972) Ability for Tez AM to ignore node updates from YARN

2015-12-04 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned TEZ-2972: --- Assignee: Jason Lowe This can also be important on clusters where the UNHEALTHY state is used as part

[jira] [Updated] (TEZ-2972) Ability for Tez AM to ignore node updates from YARN

2015-12-16 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-2972: Attachment: TEZ-2972.002.patch Thanks for the review, Bikas! I updated the patch to avoid sending the node

[jira] [Commented] (TEZ-3009) Errors that occur during container task acquisition are not logged

2015-12-17 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062896#comment-15062896 ] Jason Lowe commented on TEZ-3009: - Sample container log showing the problem: {noformat} 2015-12-11

[jira] [Created] (TEZ-3009) Errors that occur during container task acquisition are not logged

2015-12-17 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-3009: --- Summary: Errors that occur during container task acquisition are not logged Key: TEZ-3009 URL: https://issues.apache.org/jira/browse/TEZ-3009 Project: Apache Tez

[jira] [Created] (TEZ-3010) Container task acquisition has no retries for errors

2015-12-17 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-3010: --- Summary: Container task acquisition has no retries for errors Key: TEZ-3010 URL: https://issues.apache.org/jira/browse/TEZ-3010 Project: Apache Tez Issue Type: Bug

[jira] [Updated] (TEZ-2972) Ability for Tez AM to ignore node updates from YARN

2015-12-17 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-2972: Hadoop Flags: Incompatible change > Ability for Tez AM to ignore node updates from YARN >

[jira] [Updated] (TEZ-2972) Ability for Tez AM to ignore node updates from YARN

2015-12-17 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-2972: Attachment: TEZ-2972.003.patch Updated the patch to use tez.am.node-unhealthy-reschedule-tasks instead of

[jira] [Updated] (TEZ-2972) Avoid task rescheduling when a node turns unhealthy

2016-01-05 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-2972: Fix Version/s: 0.7.1 Thanks, Bikas! I committed the branch-0.7 patch. > Avoid task rescheduling when a node

[jira] [Commented] (TEZ-3009) Errors that occur during container task acquisition are not logged

2015-12-18 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064650#comment-15064650 ] Jason Lowe commented on TEZ-3009: - I don't see any indication it's fixed in master. From TezChild.run:

[jira] [Updated] (TEZ-3009) Errors that occur during container task acquisition are not logged

2015-12-18 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-3009: Attachment: TEZ-3009.001.patch Patch that adds logging of errors that occur during task fetch. Manually

[jira] [Updated] (TEZ-2972) Avoid task rescheduling when a node turns unhealthy

2015-12-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-2972: Attachment: TEZ-2972.003.addendum.patch TEZ-2972-branch-0.7.001.patch Attached is a patch for

[jira] [Updated] (TEZ-3009) Errors that occur during container task acquisition are not logged

2015-12-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-3009: Attachment: TEZ-3009.002.patch Thanks for the review, Sid! Updated the patch to log at the ERROR level

[jira] [Commented] (TEZ-3293) Fetch failures can cause a shuffle hang waiting for memory merge that never starts

2016-06-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321370#comment-15321370 ] Jason Lowe commented on TEZ-3293: - The same type of error was fixed in MapReduce's version of the

[jira] [Commented] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-10 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325186#comment-15325186 ] Jason Lowe commented on TEZ-3296: - bq. Could you please help me understand the logic to make these unique.

[jira] [Created] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-09 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-3296: --- Summary: Tez job can hang if two vertices at the same root distance have different task requirements Key: TEZ-3296 URL: https://issues.apache.org/jira/browse/TEZ-3296 Project:

[jira] [Updated] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-3296: Attachment: TEZ-3296.001.patch Patch that changes the container priority calculations to generate a unique

[jira] [Updated] (TEZ-3293) Fetch failures can cause a shuffle hang waiting for memory merge that never starts

2016-06-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-3293: Attachment: TEZ-3293.001.patch Patch to have unreserve only adjust usedMemory so it's symmetrical with the

[jira] [Created] (TEZ-3293) Fetch failures can cause a shuffle hang waiting for memory merge that never starts

2016-06-08 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-3293: --- Summary: Fetch failures can cause a shuffle hang waiting for memory merge that never starts Key: TEZ-3293 URL: https://issues.apache.org/jira/browse/TEZ-3293 Project: Apache

[jira] [Updated] (TEZ-3293) Fetch failures can cause a shuffle hang waiting for memory merge that never starts

2016-06-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-3293: Priority: Critical (was: Major) > Fetch failures can cause a shuffle hang waiting for memory merge that

[jira] [Updated] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-16 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-3296: Attachment: taskschedulerlog We no longer have the logs from the original job that hung. However it's easy

[jira] [Commented] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-16 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334521#comment-15334521 ] Jason Lowe commented on TEZ-3296: - bq. Could you please attach the task scheduler logs for the hung job and

[jira] [Created] (TEZ-3306) Improve container priority assignments for vertices

2016-06-16 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-3306: --- Summary: Improve container priority assignments for vertices Key: TEZ-3306 URL: https://issues.apache.org/jira/browse/TEZ-3306 Project: Apache Tez Issue Type:

[jira] [Commented] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-16 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334129#comment-15334129 ] Jason Lowe commented on TEZ-3296: - bq. Wondering why the app was hung. As I mentioned in the description

[jira] [Updated] (TEZ-3036) Tez AM can hang on startup with no indication of error

2016-01-15 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-3036: Attachment: TEZ-3036.001.patch Attaching a prototype patch that seems to fix the issue. This has the

[jira] [Commented] (TEZ-3036) Tez AM can hang on startup with no indication of error

2016-01-15 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102478#comment-15102478 ] Jason Lowe commented on TEZ-3036: - My apologies, I misread the heap dump info. NoSuchMethodError was being

[jira] [Created] (TEZ-3036) Tez AM can hang on startup with no indication of error

2016-01-13 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-3036: --- Summary: Tez AM can hang on startup with no indication of error Key: TEZ-3036 URL: https://issues.apache.org/jira/browse/TEZ-3036 Project: Apache Tez Issue Type: Bug

[jira] [Commented] (TEZ-3036) Tez AM can hang on startup with no indication of error

2016-01-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097107#comment-15097107 ] Jason Lowe commented on TEZ-3036: - In this particular instance the hang occurred because

[jira] [Commented] (TEZ-3036) Tez AM can hang on startup with no indication of error

2016-01-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097108#comment-15097108 ] Jason Lowe commented on TEZ-3036: - Haven't verified this, but I suspect this can be replicated by simply

[jira] [Created] (TEZ-3103) Shuffle can hang when memory to memory merging enabled

2016-02-08 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-3103: --- Summary: Shuffle can hang when memory to memory merging enabled Key: TEZ-3103 URL: https://issues.apache.org/jira/browse/TEZ-3103 Project: Apache Tez Issue Type: Bug

[jira] [Updated] (TEZ-3102) Fetch failure of a speculated task causes job hang

2016-02-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-3102: Attachment: TEZ-3102.001.patch Attaching a patch that does sufficient processing of the kill event for the

[jira] [Commented] (TEZ-3102) Fetch failure of a speculated task causes job hang

2016-02-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137056#comment-15137056 ] Jason Lowe commented on TEZ-3102: - Hang occurs because TaskImpl.shouldScheduleNewAttempt returns false as it

[jira] [Created] (TEZ-3102) Fetch failure of a speculated task causes job hang

2016-02-08 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-3102: --- Summary: Fetch failure of a speculated task causes job hang Key: TEZ-3102 URL: https://issues.apache.org/jira/browse/TEZ-3102 Project: Apache Tez Issue Type: Bug

[jira] [Commented] (TEZ-1944) OOM when using tez.runtime.shuffle.memory-to-memory.enable=true

2016-02-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137059#comment-15137059 ] Jason Lowe commented on TEZ-1944: - This seems likely caused by the same problem reported in TEZ-1911. > OOM

[jira] [Created] (TEZ-3114) Shuffle OOM due to EventMetaData flood

2016-02-11 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-3114: --- Summary: Shuffle OOM due to EventMetaData flood Key: TEZ-3114 URL: https://issues.apache.org/jira/browse/TEZ-3114 Project: Apache Tez Issue Type: Bug Affects

[jira] [Commented] (TEZ-3114) Shuffle OOM due to EventMetaData flood

2016-02-11 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/TEZ-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142873#comment-15142873 ] Jason Lowe commented on TEZ-3114: - There's no flow control to prevent shuffle transfer events from arriving

  1   2   3   4   5   >