[jira] [Commented] (TEZ-4066) Upgrade servlet-api from 2.5 to 3.1.0

2019-05-06 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834205#comment-16834205
 ] 

Kuhu Shukla commented on TEZ-4066:
--

Looks good with a minor nit :

 

Can we change the LICENSE files that document servlept-api.jar to the new name 
of the jar? Let me know if you have any thoughts [~jeagles] on the same. Thanks 
a lot for the report and the patch.

> Upgrade servlet-api from 2.5 to 3.1.0
> -
>
> Key: TEZ-4066
> URL: https://issues.apache.org/jira/browse/TEZ-4066
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4066.001.patch
>
>
> Oozie launcher jobs trying to launch Tez jobs now fail to render Oozie 
> Launcher Job AM due to both 2.5 (from tez) and 3.1.0 (from hadoop) 
> servlet-api both being in the classpath. Tez should sync with servlet api 
> version from tez master branch that only supports hadoop 3+
> {code}
> 2019-04-30 14:53:02,747 WARN [qtp1213419524-119] 
> org.eclipse.jetty.server.HttpChannel:
> java.lang.NoSuchMethodError: 
> javax.servlet.http.HttpServletRequest.isAsyncStarted()Z
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:688)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>   at 
> org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TEZ-4058) Changes for 0.9.2 release

2019-04-04 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla resolved TEZ-4058.
--
   Resolution: Fixed
Fix Version/s: 0.10.1

> Changes for 0.9.2 release
> -
>
> Key: TEZ-4058
> URL: https://issues.apache.org/jira/browse/TEZ-4058
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.10.1
>
> Attachments: TEZ-4058.001.patch
>
>
> Update Tez_DOAP.rtf, index.md etc. for 0.9.2 release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4058) Changes for 0.9.2 release

2019-03-29 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805569#comment-16805569
 ] 

Kuhu Shukla commented on TEZ-4058:
--

Committing to master.

> Changes for 0.9.2 release
> -
>
> Key: TEZ-4058
> URL: https://issues.apache.org/jira/browse/TEZ-4058
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-4058.001.patch
>
>
> Update Tez_DOAP.rtf, index.md etc. for 0.9.2 release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-4058) Changes for 0.9.2 release

2019-03-29 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-4058:
-
Attachment: TEZ-4058.001.patch

> Changes for 0.9.2 release
> -
>
> Key: TEZ-4058
> URL: https://issues.apache.org/jira/browse/TEZ-4058
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-4058.001.patch
>
>
> Update Tez_DOAP.rtf, index.md etc. for 0.9.2 release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TEZ-4058) Changes for 0.9.2 release

2019-03-29 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created TEZ-4058:


 Summary: Changes for 0.9.2 release
 Key: TEZ-4058
 URL: https://issues.apache.org/jira/browse/TEZ-4058
 Project: Apache Tez
  Issue Type: Bug
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


Update Tez_DOAP.rtf, index.md etc. for 0.9.2 release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4031) Support tez gitbox migration

2019-03-19 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796471#comment-16796471
 ] 

Kuhu Shukla commented on TEZ-4031:
--

I made a mistake while committing the patch for this and would revert the older 
version of the patch that went in and put v4 in. My +1 is/was for v4. Sorry 
about this. 

> Support tez gitbox migration
> 
>
> Key: TEZ-4031
> URL: https://issues.apache.org/jira/browse/TEZ-4031
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-4031.001.patch, TEZ-4031.002.patch, 
> TEZ-4031.003.patch, TEZ-4031.004.patch
>
>
> {code}
> $ git grep git-wip
> Tez_DOAP.rdf: rdf:resource="https://git-wip-us.apache.org/repos/asf/tez.git"/>
> Tez_DOAP.rdf: rdf:resource="https://git-wip-us.apache.org/repos/asf?p=tez.git"/>
> docs/src/site/site.xml:   href="https://git-wip-us.apache.org/repos/asf/tez.git; alt="Use git clone 
> https://git-wip-us.apache.org/repos/asf/tez.git; />
> pom.xml:
> scm:git:https://git-wip-us.apache.org/repos/asf/tez.git
> tez-api/src/test/java/org/apache/tez/common/TestVersionInfo.java:  final 
> String scmUrl = "scm:git:https://git-wip-us.apache.org/repos/asf/tez.git;;
> tez-api/src/test/resources/test1-version-info.properties:scmurl=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git
> tez-api/src/test/resources/test3-version-info.properties:scmurl=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git
> tez-ui/src/main/webapp/package.json:"url": 
> "https://git-wip-us.apache.org/repos/asf/tez.git;
> {code}
> In addition the cwiki needs to be updated
> https://cwiki.apache.org/confluence/display/TEZ/Committer+Guide
> https://cwiki.apache.org/confluence/display/TEZ/How+to+Contribute+to+Tez
> https://cwiki.apache.org/confluence/display/TEZ/Making+a+TEZ+Release
> https://cwiki.apache.org/confluence/display/TEZ/Updating+the+Tez+Website



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TEZ-4056) Update version for 0.9.2 release

2019-03-19 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla resolved TEZ-4056.
--
Resolution: Fixed

Committed to 0.9.2
Thanks [~jeagles]

> Update version for 0.9.2 release
> 
>
> Key: TEZ-4056
> URL: https://issues.apache.org/jira/browse/TEZ-4056
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.9.2
>
> Attachments: TEZ-4056.001.patch
>
>
> Tracks the release process sub section:
> {code:java}
> mvn versions:set -DnewVersion="x.y.z"
> {code}{code}
> CC: [~jeagles]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TEZ-4056) Update version for 0.9.2 release

2019-03-19 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created TEZ-4056:


 Summary: Update version for 0.9.2 release
 Key: TEZ-4056
 URL: https://issues.apache.org/jira/browse/TEZ-4056
 Project: Apache Tez
  Issue Type: Bug
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla
 Fix For: 0.9.2


Tracks the release process sub section:
{code:java}
mvn versions:set -DnewVersion="x.y.z"
{code}{code}
CC: [~jeagles]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-4056) Update version for 0.9.2 release

2019-03-19 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-4056:
-
Attachment: TEZ-4056.001.patch

> Update version for 0.9.2 release
> 
>
> Key: TEZ-4056
> URL: https://issues.apache.org/jira/browse/TEZ-4056
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.9.2
>
> Attachments: TEZ-4056.001.patch
>
>
> Tracks the release process sub section:
> {code:java}
> mvn versions:set -DnewVersion="x.y.z"
> {code}{code}
> CC: [~jeagles]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3992) Update commons-codec from 1.4 to 1.11

2019-03-19 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3992:
-
Fix Version/s: (was: 0.9.next)

> Update commons-codec from 1.4 to 1.11
> -
>
> Key: TEZ-3992
> URL: https://issues.apache.org/jira/browse/TEZ-3992
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Priority: Major
> Attachments: TEZ-3992.01.patch
>
>
> Commons codec 1.4 is from 2009, maybe we should try an update.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3994) Upgrade maven-surefire-plugin to 0.21.0 to support yetus

2019-03-19 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3994:
-
Fix Version/s: 0.9.2

> Upgrade maven-surefire-plugin to 0.21.0 to support yetus
> 
>
> Key: TEZ-3994
> URL: https://issues.apache.org/jira/browse/TEZ-3994
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Fix For: 0.9.next, 0.9.2, 0.10.1
>
> Attachments: TEZ-3994.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3992) Update commons-codec from 1.4 to 1.11

2019-03-19 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3992:
-
Fix Version/s: 0.10.1

> Update commons-codec from 1.4 to 1.11
> -
>
> Key: TEZ-3992
> URL: https://issues.apache.org/jira/browse/TEZ-3992
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Priority: Major
> Fix For: 0.10.1
>
> Attachments: TEZ-3992.01.patch
>
>
> Commons codec 1.4 is from 2009, maybe we should try an update.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3960) Better error handling in proto history logger and add doAs support.

2019-03-19 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3960:
-
Fix Version/s: 0.9.2

> Better error handling in proto history logger and add doAs support.
> ---
>
> Key: TEZ-3960
> URL: https://issues.apache.org/jira/browse/TEZ-3960
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
>Priority: Major
> Fix For: 0.9.next, 0.9.2, 0.10.0
>
> Attachments: TEZ-3960.01.patch, TEZ-3960.02.patch
>
>
> DagManifestScanner gets stuck for a days logs if there are errors in them. 
> Fix it using fixed number of retries.
> The scanner should be able to use doAs to ensure it can read files if run 
> using a proxy admin user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3915) Create protobuf based history event logger.

2019-03-19 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3915:
-
Fix Version/s: 0.9.2

> Create protobuf based history event logger.
> ---
>
> Key: TEZ-3915
> URL: https://issues.apache.org/jira/browse/TEZ-3915
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
>Priority: Major
> Fix For: 0.9.next, 0.9.2
>
> Attachments: TEZ-3915.01.patch, TEZ-3915.02.patch, TEZ-3915.03.patch, 
> TEZ-3915.04.patch, TEZ-3915.05.patch, TEZ-3915.06.patch, TEZ-3915.07.patch
>
>
> A protobuf based history event logger, to log directly into hdfs. Implement a 
> reader api also, to get the events from the files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-4002) CHANGES.txt for 0.9.2 Release

2019-03-19 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-4002:
-
Fix Version/s: 0.9.2

> CHANGES.txt for 0.9.2 Release
> -
>
> Key: TEZ-4002
> URL: https://issues.apache.org/jira/browse/TEZ-4002
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.9.2
>
> Attachments: TEZ-4002.001.patch, TEZ-4002.002.patch, 
> TEZ-4002.003.patch
>
>
> Add CHANGES.txt for 0.9.2 line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4002) CHANGES.txt for 0.9.2 Release

2019-03-18 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795143#comment-16795143
 ] 

Kuhu Shukla commented on TEZ-4002:
--

Updated CHANGES.txt.
Steps:
1.
{code}
git log rel/release-0.9.1..origin/branch-0.9 --oneline | cut -d ' ' -f2- > 
temp.CHANGES.txt
{code}
2. Take CHANGES. txt from old release (in this case 0.9.1)
3. Add contents of temp.CHANGES.txt to CHANGES.txt

Will update documentation if this sounds right.
[~jeagles], request for your feedback. Thank you!

> CHANGES.txt for 0.9.2 Release
> -
>
> Key: TEZ-4002
> URL: https://issues.apache.org/jira/browse/TEZ-4002
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-4002.001.patch, TEZ-4002.002.patch, 
> TEZ-4002.003.patch
>
>
> Add CHANGES.txt for 0.9.2 line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-4002) CHANGES.txt for 0.9.2 Release

2019-03-18 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-4002:
-
Attachment: TEZ-4002.003.patch

> CHANGES.txt for 0.9.2 Release
> -
>
> Key: TEZ-4002
> URL: https://issues.apache.org/jira/browse/TEZ-4002
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-4002.001.patch, TEZ-4002.002.patch, 
> TEZ-4002.003.patch
>
>
> Add CHANGES.txt for 0.9.2 line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4052) Fit dot files ASF License issues - part 2

2019-03-14 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793007#comment-16793007
 ] 

Kuhu Shukla commented on TEZ-4052:
--

+1. lgtm. committing to master and branch-0.9.

> Fit dot files ASF License issues - part 2
> -
>
> Key: TEZ-4052
> URL: https://issues.apache.org/jira/browse/TEZ-4052
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4052.001.patch
>
>
> Continuing the effort in TEZ-3995.
> https://issues.apache.org/jira/browse/TEZ-3995?focusedCommentId=16784595=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16784595
> {code}
> 1) Please extend this to tez-ext-service-tests 2) Also, please consider 
> directory tez.log.dir with path ${project.build.directory}/logs.
> {code}
> This jira is to making sure all dot files are correctly placed under target 
> directory as to 1) make sure file aren't created outside the build directory 
> and 2) and named as part of a broader test directory design



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4052) Fit dot files ASF License issues - part 2

2019-03-14 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792996#comment-16792996
 ] 

Kuhu Shukla commented on TEZ-4052:
--

Sorry, I needed a rebase. Reviewing the patch now.

> Fit dot files ASF License issues - part 2
> -
>
> Key: TEZ-4052
> URL: https://issues.apache.org/jira/browse/TEZ-4052
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4052.001.patch
>
>
> Continuing the effort in TEZ-3995.
> https://issues.apache.org/jira/browse/TEZ-3995?focusedCommentId=16784595=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16784595
> {code}
> 1) Please extend this to tez-ext-service-tests 2) Also, please consider 
> directory tez.log.dir with path ${project.build.directory}/logs.
> {code}
> This jira is to making sure all dot files are correctly placed under target 
> directory as to 1) make sure file aren't created outside the build directory 
> and 2) and named as part of a broader test directory design



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4052) Fit dot files ASF License issues - part 2

2019-03-13 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792106#comment-16792106
 ] 

Kuhu Shukla commented on TEZ-4052:
--

I tried applying the patch to master and it was not clean and failed. 
[~jeagles] could you take a look and let me know what I am missing?

> Fit dot files ASF License issues - part 2
> -
>
> Key: TEZ-4052
> URL: https://issues.apache.org/jira/browse/TEZ-4052
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4052.001.patch
>
>
> Continuing the effort in TEZ-3995.
> https://issues.apache.org/jira/browse/TEZ-3995?focusedCommentId=16784595=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16784595
> {code}
> 1) Please extend this to tez-ext-service-tests 2) Also, please consider 
> directory tez.log.dir with path ${project.build.directory}/logs.
> {code}
> This jira is to making sure all dot files are correctly placed under target 
> directory as to 1) make sure file aren't created outside the build directory 
> and 2) and named as part of a broader test directory design



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TEZ-4053) TestExceptionPropagation.testExceptionPropagationSession fails intermittently

2019-03-13 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created TEZ-4053:


 Summary: TestExceptionPropagation.testExceptionPropagationSession 
fails intermittently
 Key: TEZ-4053
 URL: https://issues.apache.org/jira/browse/TEZ-4053
 Project: Apache Tez
  Issue Type: Bug
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


Irrespective of the test case failing or not I see the below stack trace that 
needs to be looked at:
{code}
2019-03-13 15:12:19,457 [INFO] [Dispatcher thread {Central}] 
|HistoryEventHandler.criticalEvents|: 
[HISTORY][DAG:dag_1552507929653_0001_1][Event:DAG_FINISHED]: 
dagId=dag_1552507929653_0001_1, startTime=1552507939233, 
finishTime=1552507939419, timeTaken=186, status=FAILED, diagnostics=Vertex 
failed, vertexName=v1, vertexId=vertex_1552507929653_0001_1_00, 
diagnostics=[Exception in EdgeManager, vertex:vertex_1552507929653_0001_1_00 
[v1],java.lang.RuntimeException: EM_GetNumSourceTaskPhysicalOutputs
at 
org.apache.tez.test.TestExceptionPropagation$CustomEdgeManager.getNumSourceTaskPhysicalOutputs(TestExceptionPropagation.java:828)
at org.apache.tez.dag.app.dag.impl.Edge.getSourceSpec(Edge.java:340)
at 
org.apache.tez.dag.app.dag.impl.VertexImpl.getSourceSpecFor(VertexImpl.java:4489)
at 
org.apache.tez.dag.app.dag.impl.VertexImpl.getOutputSpecList(VertexImpl.java:4477)
at 
org.apache.tez.dag.app.dag.impl.VertexImpl.createRemoteTaskSpec(VertexImpl.java:1647)
at 
org.apache.tez.dag.app.dag.impl.VertexImpl.scheduleTasks(VertexImpl.java:1683)
] completed., numCompletedVertices=2, numSuccessfulVertices=0, 
numFailedVertices=1, numKilledVertices=1, numVertices=2
2019-03-13 15:12:19,418 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: 
Checking vertices for DAG completion, num
CompletedVertices=2, numSuccessfulVertices=0, numFailedVertices=1, 
numKilledVertices=1, numVertices=2, commitInProgres
s=0, terminationCause=VERTEX_FAILURE
2019-03-13 15:12:19,418 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: 
DAG did not succeed due to VERTEX_FAILURE
. failedVertices:1 killedVertices:1
2019-03-13 15:12:19,443 [INFO] [Dispatcher thread {Central}] 
|recovery.RecoveryService|: DAG completed, dagId=dag_1552
507929653_0001_1, queueSize=0
2019-03-13 15:12:19,457 [INFO] [Dispatcher thread {Central}] 
|HistoryEventHandler.criticalEvents|: [HISTORY][DAG:dag_1
552507929653_0001_1][Event:DAG_FINISHED]: dagId=dag_1552507929653_0001_1, 
startTime=1552507939233, finishTime=15525079
39419, timeTaken=186, status=FAILED, diagnostics=Vertex failed, vertexName=v1, 
vertexId=vertex_1552507929653_0001_1_00
, diagnostics=[Exception in EdgeManager, vertex:vertex_1552507929653_0001_1_00 
[v1],java.lang.RuntimeException: EM_Get
NumSourceTaskPhysicalOutputs
at 
org.apache.tez.test.TestExceptionPropagation$CustomEdgeManager.getNumSourceTaskPhysicalOutputs(TestExceptio
nPropagation.java:828)
at org.apache.tez.dag.app.dag.impl.Edge.getSourceSpec(Edge.java:340)
at 
org.apache.tez.dag.app.dag.impl.VertexImpl.getSourceSpecFor(VertexImpl.java:4489)
at 
org.apache.tez.dag.app.dag.impl.VertexImpl.getOutputSpecList(VertexImpl.java:4477)
at 
org.apache.tez.dag.app.dag.impl.VertexImpl.createRemoteTaskSpec(VertexImpl.java:1647)
at 
org.apache.tez.dag.app.dag.impl.VertexImpl.scheduleTasks(VertexImpl.java:1683)
at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerPluginContextImpl.scheduleTasks(VertexManager.java:221)
at 
org.apache.tez.dag.app.dag.impl.RootInputVertexManager.schedulePendingTasks(RootInputVertexManager.java:386)
at 
org.apache.tez.dag.app.dag.impl.RootInputVertexManager.processPendingTasks(RootInputVertexManager.java:379)
at 
org.apache.tez.dag.app.dag.impl.RootInputVertexManager.onVertexStarted(RootInputVertexManager.java:182)
at 
org.apache.tez.test.TestExceptionPropagation$RootInputVertexManagerWithException.onVertexStarted(TestExceptionPropagation.java:714)
at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:618)
at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:687)
at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:682)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:682)
at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:671)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 

[jira] [Commented] (TEZ-4031) Support tez gitbox migration

2019-03-06 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785856#comment-16785856
 ] 

Kuhu Shukla commented on TEZ-4031:
--

+1. lgtm. Committing in a bit.

> Support tez gitbox migration
> 
>
> Key: TEZ-4031
> URL: https://issues.apache.org/jira/browse/TEZ-4031
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4031.001.patch, TEZ-4031.002.patch, 
> TEZ-4031.003.patch, TEZ-4031.004.patch
>
>
> {code}
> $ git grep git-wip
> Tez_DOAP.rdf: rdf:resource="https://git-wip-us.apache.org/repos/asf/tez.git"/>
> Tez_DOAP.rdf: rdf:resource="https://git-wip-us.apache.org/repos/asf?p=tez.git"/>
> docs/src/site/site.xml:   href="https://git-wip-us.apache.org/repos/asf/tez.git; alt="Use git clone 
> https://git-wip-us.apache.org/repos/asf/tez.git; />
> pom.xml:
> scm:git:https://git-wip-us.apache.org/repos/asf/tez.git
> tez-api/src/test/java/org/apache/tez/common/TestVersionInfo.java:  final 
> String scmUrl = "scm:git:https://git-wip-us.apache.org/repos/asf/tez.git;;
> tez-api/src/test/resources/test1-version-info.properties:scmurl=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git
> tez-api/src/test/resources/test3-version-info.properties:scmurl=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git
> tez-ui/src/main/webapp/package.json:"url": 
> "https://git-wip-us.apache.org/repos/asf/tez.git;
> {code}
> In addition the cwiki needs to be updated
> https://cwiki.apache.org/confluence/display/TEZ/Committer+Guide
> https://cwiki.apache.org/confluence/display/TEZ/How+to+Contribute+to+Tez
> https://cwiki.apache.org/confluence/display/TEZ/Making+a+TEZ+Release
> https://cwiki.apache.org/confluence/display/TEZ/Updating+the+Tez+Website



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4031) Support tez gitbox migration

2019-03-01 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781877#comment-16781877
 ] 

Kuhu Shukla commented on TEZ-4031:
--

[~jeagles], thanks for the patch. Could you help me get level set with part of 
the patch that does version change for maven-site-plugin and 
maven-project-info-reports-plugin? Is there a HADOOP specific JIRA that has 
more context? Thanks!

> Support tez gitbox migration
> 
>
> Key: TEZ-4031
> URL: https://issues.apache.org/jira/browse/TEZ-4031
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4031.001.patch, TEZ-4031.002.patch
>
>
> {code}
> $ git grep git-wip
> Tez_DOAP.rdf: rdf:resource="https://git-wip-us.apache.org/repos/asf/tez.git"/>
> Tez_DOAP.rdf: rdf:resource="https://git-wip-us.apache.org/repos/asf?p=tez.git"/>
> docs/src/site/site.xml:   href="https://git-wip-us.apache.org/repos/asf/tez.git; alt="Use git clone 
> https://git-wip-us.apache.org/repos/asf/tez.git; />
> pom.xml:
> scm:git:https://git-wip-us.apache.org/repos/asf/tez.git
> tez-api/src/test/java/org/apache/tez/common/TestVersionInfo.java:  final 
> String scmUrl = "scm:git:https://git-wip-us.apache.org/repos/asf/tez.git;;
> tez-api/src/test/resources/test1-version-info.properties:scmurl=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git
> tez-api/src/test/resources/test3-version-info.properties:scmurl=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git
> tez-ui/src/main/webapp/package.json:"url": 
> "https://git-wip-us.apache.org/repos/asf/tez.git;
> {code}
> In addition the cwiki needs to be updated
> https://cwiki.apache.org/confluence/display/TEZ/Committer+Guide
> https://cwiki.apache.org/confluence/display/TEZ/How+to+Contribute+to+Tez
> https://cwiki.apache.org/confluence/display/TEZ/Making+a+TEZ+Release
> https://cwiki.apache.org/confluence/display/TEZ/Updating+the+Tez+Website



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4031) Support tez gitbox migration

2019-03-01 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781847#comment-16781847
 ] 

Kuhu Shukla commented on TEZ-4031:
--

Taking a look.

> Support tez gitbox migration
> 
>
> Key: TEZ-4031
> URL: https://issues.apache.org/jira/browse/TEZ-4031
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4031.001.patch, TEZ-4031.002.patch
>
>
> {code}
> $ git grep git-wip
> Tez_DOAP.rdf: rdf:resource="https://git-wip-us.apache.org/repos/asf/tez.git"/>
> Tez_DOAP.rdf: rdf:resource="https://git-wip-us.apache.org/repos/asf?p=tez.git"/>
> docs/src/site/site.xml:   href="https://git-wip-us.apache.org/repos/asf/tez.git; alt="Use git clone 
> https://git-wip-us.apache.org/repos/asf/tez.git; />
> pom.xml:
> scm:git:https://git-wip-us.apache.org/repos/asf/tez.git
> tez-api/src/test/java/org/apache/tez/common/TestVersionInfo.java:  final 
> String scmUrl = "scm:git:https://git-wip-us.apache.org/repos/asf/tez.git;;
> tez-api/src/test/resources/test1-version-info.properties:scmurl=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git
> tez-api/src/test/resources/test3-version-info.properties:scmurl=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git
> tez-ui/src/main/webapp/package.json:"url": 
> "https://git-wip-us.apache.org/repos/asf/tez.git;
> {code}
> In addition the cwiki needs to be updated
> https://cwiki.apache.org/confluence/display/TEZ/Committer+Guide
> https://cwiki.apache.org/confluence/display/TEZ/How+to+Contribute+to+Tez
> https://cwiki.apache.org/confluence/display/TEZ/Making+a+TEZ+Release
> https://cwiki.apache.org/confluence/display/TEZ/Updating+the+Tez+Website



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3995) Fix dot files produced by tests to prevent ASF license warnings in yetus

2019-03-01 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781845#comment-16781845
 ] 

Kuhu Shukla commented on TEZ-3995:
--

[~jeagles] Can you take a look the latest patch [~jmarhuen] posted here? Let me 
know if any inputs are required. Thanks a lot!

> Fix dot files produced by tests to prevent ASF license warnings in yetus
> 
>
> Key: TEZ-3995
> URL: https://issues.apache.org/jira/browse/TEZ-3995
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jaume M
>Priority: Major
> Fix For: 0.9.2, 0.10.0
>
> Attachments: TEZ-3995.1-branch-0.9.patch, TEZ-3995.1.patch, 
> TEZ-3995.1.patch, TEZ-3995.2.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-TEZ-Build-Yetus/10/artifact/out/patch-asflicense-problems.txt
> {code}
> Lines that start with ? in the ASF License report indicate files that do 
> not have an Apache license header: !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/name1/current/seen_txid
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/name1/current/fsimage_000.md5
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/name1/current/fsimage_000
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/name1/current/VERSION
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/name2/current/seen_txid
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/name2/current/fsimage_000.md5
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/name2/current/fsimage_000
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/name2/current/VERSION
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/data/data4/current/BP-1822420483-67.195.81.145-1538067940282/scanner.cursor
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/data/data4/current/BP-1822420483-67.195.81.145-1538067940282/current/dfsUsed
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/data/data4/current/BP-1822420483-67.195.81.145-1538067940282/current/finalized/subdir0/subdir0/blk_1073741827
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/data/data4/current/BP-1822420483-67.195.81.145-1538067940282/current/VERSION
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/data/data4/current/VERSION
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/data/data9/current/BP-1822420483-67.195.81.145-1538067940282/scanner.cursor
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/data/data9/current/BP-1822420483-67.195.81.145-1538067940282/current/dfsUsed
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/data/data9/current/BP-1822420483-67.195.81.145-1538067940282/current/finalized/subdir0/subdir0/blk_1073741826
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/data/data9/current/BP-1822420483-67.195.81.145-1538067940282/current/VERSION
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/data/data9/current/VERSION
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build-Yetus/sourcedir/tez-plugins/tez-aux-services/build/test/data/dfs/data/data10/current/BP-1822420483-67.195.81.145-1538067940282/scanner.cursor
>  !? 
> 

[jira] [Commented] (TEZ-4002) CHANGES.txt for 0.9.2 Release

2019-02-28 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780926#comment-16780926
 ] 

Kuhu Shukla commented on TEZ-4002:
--

[~jeagles] could you help take a look? Appreciate it.

> CHANGES.txt for 0.9.2 Release
> -
>
> Key: TEZ-4002
> URL: https://issues.apache.org/jira/browse/TEZ-4002
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-4002.001.patch, TEZ-4002.002.patch
>
>
> Add CHANGES.txt for 0.9.2 line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-4002) CHANGES.txt for 0.9.2 Release

2019-02-28 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-4002:
-
Attachment: TEZ-4002.002.patch

> CHANGES.txt for 0.9.2 Release
> -
>
> Key: TEZ-4002
> URL: https://issues.apache.org/jira/browse/TEZ-4002
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-4002.001.patch, TEZ-4002.002.patch
>
>
> Add CHANGES.txt for 0.9.2 line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4047) Tez trademark in xml is causing xml parsing issue

2019-02-28 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780757#comment-16780757
 ] 

Kuhu Shukla commented on TEZ-4047:
--

+1. lgtm. Committing this.

> Tez trademark in xml is causing xml parsing issue
> -
>
> Key: TEZ-4047
> URL: https://issues.apache.org/jira/browse/TEZ-4047
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4047.001.patch
>
>
> {code}
> docs/src/site/site.xml:
> [Fatal Error] site.xml:97:34: The entity "reg" was referenced, but not 
> declared.
> java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: 
> file:/testptch/tez/./docs/src/site/site.xml; lineNumber: 97; columnNumber: 
> 34; The entity "reg" was referenced, but not declared.
>   at 
> jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:397)
>   at 
> jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:449)
>   at 
> jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:406)
>   at 
> jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:402)
>   at 
> jdk.nashorn.api.scripting.NashornScriptEngine.eval(NashornScriptEngine.java:155)
>   at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:264)
>   at com.sun.tools.script.shell.Main.evaluateString(Main.java:298)
>   at com.sun.tools.script.shell.Main.evaluateString(Main.java:319)
>   at com.sun.tools.script.shell.Main.access$300(Main.java:37)
>   at com.sun.tools.script.shell.Main$3.run(Main.java:217)
>   at com.sun.tools.script.shell.Main.main(Main.java:48)
> Caused by: org.xml.sax.SAXParseException; systemId: 
> file:/testptch/tez/./docs/src/site/site.xml; lineNumber: 97; columnNumber: 
> 34; The entity "reg" was referenced, but not declared.
>   at 
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
>   at 
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
>   at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205)
>   at 
> jdk.nashorn.internal.scripts.Script$Recompilation$2$19313A$\^system_init\_.XMLDocument(:747)
>   at jdk.nashorn.internal.scripts.Script$1$\^string\_.:program(:1)
>   at 
> jdk.nashorn.internal.runtime.ScriptFunctionData.invoke(ScriptFunctionData.java:637)
>   at 
> jdk.nashorn.internal.runtime.ScriptFunction.invoke(ScriptFunction.java:494)
>   at 
> jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:393)
>   ... 10 more
> {code}
> Also output from xmllint verifies xml issue as well.
> {code}
> xmllint ./docs/src/site/site.xml
> .//src/site/site.xml:97: parser error : Entity 'reg' not defined
>  http://tez.apache.org/"/>
>  ^
> .//src/site/site.xml:123: parser error : Entity 'reg' not defined
>  
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4050) maven site is failing due to missing configuration.

2019-02-28 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780750#comment-16780750
 ] 

Kuhu Shukla commented on TEZ-4050:
--

+1. lgtm.

> maven site is failing due to missing configuration.
> ---
>
> Key: TEZ-4050
> URL: https://issues.apache.org/jira/browse/TEZ-4050
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4050.001.patch
>
>
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-site-plugin:3.4:stage (default-cli) on project 
> tez-docs: Missing site information in the distribution management of the 
> project Tez (org.apache.tez:tez-docs:0.10.1-SNAPSHOT) -> [Help 1]
> {code}
> From maven site plugin usage we can see we are missing configuration.
> https://maven.apache.org/plugins/maven-site-plugin/usage.html
> {code}
> 
>   ...
>   
> 
>   www.yourcompany.com
>   scp://www.yourcompany.com/www/docs/project/
> 
>   
>   ...
> 
> {code}
> Tez does not use this url to deploy and neither does hadoop. But it is needed 
> to stage site documentation. url is only used during site:deploy which is 
> never called during Tez QA step.
> This jira aims to provide a place holder (the same as hadoop)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4049) Fix findbugs issues in NotRunningJob

2019-02-28 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780707#comment-16780707
 ] 

Kuhu Shukla commented on TEZ-4049:
--

Thank you for the patch [~jeagles]! Committed to branch-0.9, master.

> Fix findbugs issues in NotRunningJob
> 
>
> Key: TEZ-4049
> URL: https://issues.apache.org/jira/browse/TEZ-4049
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.1
>
> Attachments: TEZ-4049.001.patch
>
>
> Introduced by TEZ-4035. Remove fixes while keeping 3.2.0 api compatibility. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4049) Fix findbugs issues in NotRunningJob

2019-02-28 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780692#comment-16780692
 ] 

Kuhu Shukla commented on TEZ-4049:
--

+1. lgtm. Committing to master, branch-0.9.


> Fix findbugs issues in NotRunningJob
> 
>
> Key: TEZ-4049
> URL: https://issues.apache.org/jira/browse/TEZ-4049
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4049.001.patch
>
>
> Introduced by TEZ-4035. Remove fixes while keeping 3.2.0 api compatibility. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4043) Create a yetus compatible checkstyle configuration

2019-02-15 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769997#comment-16769997
 ] 

Kuhu Shukla commented on TEZ-4043:
--

Thank you [~jeagles] for the patch! Committed to master, branch-0.9.



> Create a yetus compatible checkstyle configuration
> --
>
> Key: TEZ-4043
> URL: https://issues.apache.org/jira/browse/TEZ-4043
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.1
>
> Attachments: TEZ-4043.001.patch, TEZ-4043.002.patch
>
>
> Tez follows Hadoop source code guidelines with the exception of 120 character 
> line length.
> http://maven.apache.org/plugins/maven-checkstyle-plugin/examples/multi-module-config.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4043) Create a yetus compatible checkstyle configuration

2019-02-15 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769774#comment-16769774
 ] 

Kuhu Shukla commented on TEZ-4043:
--

+1. lgtm. Committing to master and branch-0.9.

> Create a yetus compatible checkstyle configuration
> --
>
> Key: TEZ-4043
> URL: https://issues.apache.org/jira/browse/TEZ-4043
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4043.001.patch, TEZ-4043.002.patch
>
>
> Tez follows Hadoop source code guidelines with the exception of 120 character 
> line length.
> http://maven.apache.org/plugins/maven-checkstyle-plugin/examples/multi-module-config.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4004) Update jetty9 to align with Hadoop and Hive

2019-02-14 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768533#comment-16768533
 ] 

Kuhu Shukla commented on TEZ-4004:
--

[~jeagles], precommit looks good. +1. Committing this shortly.

> Update jetty9 to align with Hadoop and Hive
> ---
>
> Key: TEZ-4004
> URL: https://issues.apache.org/jira/browse/TEZ-4004
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4004.001-branch-0.9.patch, TEZ-4004.001.patch
>
>
> https://abi-laboratory.pro/index.php?view=timeline=java=jetty
> https://issues.apache.org/jira/browse/HADOOP-15815



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4004) Update jetty9 to align with Hadoop and Hive

2019-02-13 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767598#comment-16767598
 ] 

Kuhu Shukla commented on TEZ-4004:
--

Triggered pre-commit for this to see if TEZ-4041 fix changes anything.

> Update jetty9 to align with Hadoop and Hive
> ---
>
> Key: TEZ-4004
> URL: https://issues.apache.org/jira/browse/TEZ-4004
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4004.001-branch-0.9.patch, TEZ-4004.001.patch
>
>
> https://abi-laboratory.pro/index.php?view=timeline=java=jetty
> https://issues.apache.org/jira/browse/HADOOP-15815



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4041) TestExtServicesWithLocalMode fails in docker

2019-02-13 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767584#comment-16767584
 ] 

Kuhu Shukla commented on TEZ-4041:
--

+1. lgtm. Committing this to master and branch-0.9. [~jeagles] thank you for 
the patch.

> TestExtServicesWithLocalMode fails in docker
> 
>
> Key: TEZ-4041
> URL: https://issues.apache.org/jira/browse/TEZ-4041
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4041.001.patch
>
>
> {code}
> 2019-02-13 00:24:33,703 INFO  [DAGAppMaster Thread] service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.tez.dag.app.DAGAppMaster failed in state INITED
> org.apache.tez.dag.api.TezUncheckedException: 
> java.lang.reflect.InvocationTargetException
>   at 
> org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:215)
>   at 
> org.apache.tez.dag.app.TaskCommunicatorManager.createTaskCommunicator(TaskCommunicatorManager.java:184)
>   at 
> org.apache.tez.dag.app.TaskCommunicatorManager.(TaskCommunicatorManager.java:152)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.createTaskCommunicatorManager(DAGAppMaster.java:1088)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.serviceInit(DAGAppMaster.java:532)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2606)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2603)
>   at org.apache.tez.client.LocalClient$1.run(LocalClient.java:327)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:213)
>   ... 12 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.tez.test.service.rpc.TezTestServiceProtocolProtos$SubmitWorkRequestProto$Builder.setUser(TezTestServiceProtocolProtos.java:5549)
>   at 
> org.apache.tez.dag.app.taskcomm.TezTestServiceTaskCommunicatorImpl.(TezTestServiceTaskCommunicatorImpl.java:65)
>   ... 17 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4040) Upgrade RoaringBitmap version to avoid NoSuchMethodError

2019-02-13 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767462#comment-16767462
 ] 

Kuhu Shukla commented on TEZ-4040:
--

The test failure [~jeagles] will fix as a separate JIRA. I am committing v1 of 
this patch to master and branch-0.9.

> Upgrade RoaringBitmap version to avoid NoSuchMethodError
> 
>
> Key: TEZ-4040
> URL: https://issues.apache.org/jira/browse/TEZ-4040
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: 0.4.9.api.txt, 0.5.11.api.txt, 0.5.21.api.txt, 
> TEZ-4040.001.patch, TEZ-4040.002.patch
>
>
> a common request is to use the runOptimize function which is present is later 
> versions of roaringbitmap



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-4037) Add back DAG search status KILLED

2019-02-12 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-4037:
-
Fix Version/s: 0.10.1
   0.9.2

> Add back DAG search status KILLED 
> --
>
> Key: TEZ-4037
> URL: https://issues.apache.org/jira/browse/TEZ-4037
> Project: Apache Tez
>  Issue Type: Task
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Fix For: 0.9.2, 0.10.1
>
> Attachments: TEZ-4037.001.patch
>
>
> https://issues.apache.org/jira/browse/TEZ-2447 removed KILLED since sometimes 
> this status can fail to search all KILLED DAGs. This jira re-adds KILLED dag 
> status search since it still has value and would rather focus on fixing the 
> DAGs who fail to write killed status to history log file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4037) Add back DAG search status KILLED

2019-02-12 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766345#comment-16766345
 ] 

Kuhu Shukla commented on TEZ-4037:
--

Thank you for the report and patch [~jeagles]. Took a look at how it was done 
in TEZ-2447 and this change makes sense. +1. Will commit this shortly to master 
and branch-0.9.


> Add back DAG search status KILLED 
> --
>
> Key: TEZ-4037
> URL: https://issues.apache.org/jira/browse/TEZ-4037
> Project: Apache Tez
>  Issue Type: Task
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4037.001.patch
>
>
> https://issues.apache.org/jira/browse/TEZ-2447 removed KILLED since sometimes 
> this status can fail to search all KILLED DAGs. This jira re-adds KILLED dag 
> status search since it still has value and would rather focus on fixing the 
> DAGs who fail to write killed status to history log file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4036) TestMockDAGAppMaster#testInternalPreemption should assert for failed state

2019-02-05 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761014#comment-16761014
 ] 

Kuhu Shukla commented on TEZ-4036:
--

[~jeagles]. Request for review. Thanks a lot!

> TestMockDAGAppMaster#testInternalPreemption should assert for failed state
> --
>
> Key: TEZ-4036
> URL: https://issues.apache.org/jira/browse/TEZ-4036
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-4036.001.patch
>
>
> Due to root cause mentioned in TEZ-3950, the test fails regularly. Until the 
> fix for that JIRA is in (which is rather a good amount of redesign) , adding 
> failed assert to the test as this is now an expected state for the task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-4036) TestMockDAGAppMaster#testInternalPreemption should assert for failed state

2019-02-05 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-4036:
-
Attachment: TEZ-4036.001.patch

> TestMockDAGAppMaster#testInternalPreemption should assert for failed state
> --
>
> Key: TEZ-4036
> URL: https://issues.apache.org/jira/browse/TEZ-4036
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-4036.001.patch
>
>
> Due to root cause mentioned in TEZ-3950, the test fails regularly. Until the 
> fix for that JIRA is in (which is rather a good amount of redesign) , adding 
> failed assert to the test as this is now an expected state for the task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TEZ-4036) TestMockDAGAppMaster#testInternalPreemption should assert for failed state

2019-02-05 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created TEZ-4036:


 Summary: TestMockDAGAppMaster#testInternalPreemption should assert 
for failed state
 Key: TEZ-4036
 URL: https://issues.apache.org/jira/browse/TEZ-4036
 Project: Apache Tez
  Issue Type: Bug
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


Due to root cause mentioned in TEZ-3950, the test fails regularly. Until the 
fix for that JIRA is in (which is rather a good amount of redesign) , adding 
failed assert to the test as this is now an expected state for the task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4032) TEZ will throw "Client cannot authenticate via:[TOKEN, KERBEROS]" when used with HDFS federation(non viewfs, only hdfs schema used).

2019-01-23 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750381#comment-16750381
 ] 

Kuhu Shukla commented on TEZ-4032:
--

Will take a look asap.

> TEZ will throw "Client cannot authenticate via:[TOKEN, KERBEROS]"  when used 
> with HDFS federation(non viewfs, only hdfs schema used). 
> --
>
> Key: TEZ-4032
> URL: https://issues.apache.org/jira/browse/TEZ-4032
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: zhangbutao
>Priority: Major
> Attachments: TEZ-4032.001.patch
>
>
> I execute hive tez job in HDFS federation and kerberos. The hadoop cluster 
> has multiple  namespace (hdfs://ns1,hdfs://ns2,hdfs://ns3 ...)and we don't 
> use viewfs schema.  Hive tez job will throw  error as follows  when the table 
> is created in hdfs://ns2 (default configuration  fs.defaluFS=hdfs://ns1):
> {code:java}
> 2019-01-21 15:43:46,507 [WARN] [TezChild] |ipc.Client|: Exception encountered 
> while connecting to the server : 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS]
> 2019-01-21 15:43:46,507 [INFO] [TezChild] |retry.RetryInvocationHandler|: 
> java.io.IOException: DestHost:destPort docker5.cmss.com:8020 , 
> LocalHost:localPort docker1.cmss.com/10.254.10.116:0. Failed on local 
> exception: java.io.IOException: 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS], while invoking 
> ClientNamenodeProtocolTranslatorPB.getFileInfo over 
> docker5.cmss.com/10.254.2.106:8020 after 14 failover attempts. Trying to 
> failover after sleeping for 10827ms.
> 2019-01-21 15:43:57,338 [WARN] [TezChild] |ipc.Client|: Exception encountered 
> while connecting to the server : 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS]
> 2019-01-21 15:43:57,363 [ERROR] [TezChild] |tez.MapRecordSource|: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing writable (null)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:568)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> DestHost:destPort docker4.cmss.com:8020 , LocalHost:localPort 
> docker1.cmss.com/10.254.10.116:0. Failed on local exception: 
> java.io.IOException: org.apache.hadoop.security.AccessControlException: 
> Client cannot authenticate via:[TOKEN, KERBEROS]
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:742)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:897)
>   at 
> 

[jira] [Commented] (TEZ-4030) Unable to find hive database name in tez applications

2019-01-04 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734195#comment-16734195
 ] 

Kuhu Shukla commented on TEZ-4030:
--

[~Sreenath] Could you help with this? Thanks!

> Unable to find hive database name in tez applications
> -
>
> Key: TEZ-4030
> URL: https://issues.apache.org/jira/browse/TEZ-4030
> Project: Apache Tez
>  Issue Type: Improvement
>  Components: UI
>Affects Versions: 0.8.4
>Reporter: Ashish Doneriya
>Priority: Minor
>
> Currently there is no way that I could find the name of the hive database 
> using application id. In mapreduce applications I can find the database name 
> from its configuration but in tez there is no such property.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4027) DagAwareYarnTaskScheduler can miscompute blocked vertices and cause a hang

2018-12-17 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723615#comment-16723615
 ] 

Kuhu Shukla commented on TEZ-4027:
--

[~jlowe], [~jeagles] request for comments/review. Thanks a lot!

> DagAwareYarnTaskScheduler can miscompute blocked vertices and cause a hang
> --
>
> Key: TEZ-4027
> URL: https://issues.apache.org/jira/browse/TEZ-4027
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-4027.001.patch, TEZ-4027.002.patch
>
>
> In a scenario where there are retro active failures and the YARN queue is 
> full to not allow more new container assignments, the scheduler can 
> miscompute blocked vertex set as it tries to flip the bits upto the length of 
> the bitset which may not be reflective of the total number of vertices. This 
> causes no preemption and the DAG will hang.
> {code}
> @GuardedBy("DagAwareYarnTaskScheduler.this")
> BitSet createVertexBlockedSet() {
>   BitSet blocked = new BitSet();
>   Entry entry = priorityStats.lastEntry();
>   if (entry != null) {
> RequestPriorityStats stats = entry.getValue();
> blocked.or(stats.allowedVertices);
> blocked.flip(0, blocked.length());
> blocked.or(stats.descendants);
>   }
>   return blocked;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-4027) DagAwareYarnTaskScheduler can miscompute blocked vertices and cause a hang

2018-12-17 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-4027:
-
Attachment: TEZ-4027.002.patch

> DagAwareYarnTaskScheduler can miscompute blocked vertices and cause a hang
> --
>
> Key: TEZ-4027
> URL: https://issues.apache.org/jira/browse/TEZ-4027
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-4027.001.patch, TEZ-4027.002.patch
>
>
> In a scenario where there are retro active failures and the YARN queue is 
> full to not allow more new container assignments, the scheduler can 
> miscompute blocked vertex set as it tries to flip the bits upto the length of 
> the bitset which may not be reflective of the total number of vertices. This 
> causes no preemption and the DAG will hang.
> {code}
> @GuardedBy("DagAwareYarnTaskScheduler.this")
> BitSet createVertexBlockedSet() {
>   BitSet blocked = new BitSet();
>   Entry entry = priorityStats.lastEntry();
>   if (entry != null) {
> RequestPriorityStats stats = entry.getValue();
> blocked.or(stats.allowedVertices);
> blocked.flip(0, blocked.length());
> blocked.or(stats.descendants);
>   }
>   return blocked;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-4027) DagAwareYarnTaskScheduler can miscompute blocked vertices and cause a hang

2018-12-17 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-4027:
-
Attachment: TEZ-4027.001.patch

> DagAwareYarnTaskScheduler can miscompute blocked vertices and cause a hang
> --
>
> Key: TEZ-4027
> URL: https://issues.apache.org/jira/browse/TEZ-4027
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-4027.001.patch
>
>
> In a scenario where there are retro active failures and the YARN queue is 
> full to not allow more new container assignments, the scheduler can 
> miscompute blocked vertex set as it tries to flip the bits upto the length of 
> the bitset which may not be reflective of the total number of vertices. This 
> causes no preemption and the DAG will hang.
> {code}
> @GuardedBy("DagAwareYarnTaskScheduler.this")
> BitSet createVertexBlockedSet() {
>   BitSet blocked = new BitSet();
>   Entry entry = priorityStats.lastEntry();
>   if (entry != null) {
> RequestPriorityStats stats = entry.getValue();
> blocked.or(stats.allowedVertices);
> blocked.flip(0, blocked.length());
> blocked.or(stats.descendants);
>   }
>   return blocked;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TEZ-4027) DagAwareYarnTaskScheduler can miscompute blocked vertices and cause a hang

2018-12-14 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created TEZ-4027:


 Summary: DagAwareYarnTaskScheduler can miscompute blocked vertices 
and cause a hang
 Key: TEZ-4027
 URL: https://issues.apache.org/jira/browse/TEZ-4027
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.9.1, 0.10.0
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


In a scenario where there are retro active failures and the YARN queue is full 
to not allow more new container assignments, the scheduler can miscompute 
blocked vertex set as it tries to flip the bits upto the length of the bitset 
which may not be reflective of the total number of vertices. This causes no 
preemption and the DAG will hang.

{code}
@GuardedBy("DagAwareYarnTaskScheduler.this")
BitSet createVertexBlockedSet() {
  BitSet blocked = new BitSet();
  Entry entry = priorityStats.lastEntry();
  if (entry != null) {
RequestPriorityStats stats = entry.getValue();
blocked.or(stats.allowedVertices);
blocked.flip(0, blocked.length());
blocked.or(stats.descendants);
  }
  return blocked;
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3476) Need a way to account for container localization.

2018-11-21 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695230#comment-16695230
 ] 

Kuhu Shukla commented on TEZ-3476:
--

I did some more manual testing and I can see the expected tasks getting 
speculated that I added a delay in start up by hacking TezChild to have an 
additional sleep. This patch can be reviewed for an initial pass. [~jeagles], 
[~jlowe] request for initial comments and suggestions. Thanks a lot!

> Need a way to account for container localization.
> -
>
> Key: TEZ-3476
> URL: https://issues.apache.org/jira/browse/TEZ-3476
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3476.001.patch
>
>
> Tez task attempt start times don't reflect time spent in localization.
> In the MapReduce framework, the time spent in localization was included in 
> the total runtime of each task attempt. But since Tez reuses containers, the 
> time spent localizing for a container is not captured. The start time of the 
> first attempt in that container will only be set after the localization has 
> completed.
> The result is that attempts can appear as if they are not being run even 
> though there are resources available in the queue. An attempt can be assigned 
> to a container, but if the container is on a slow node and it takes a long 
> time to localize, the attempt state will remain pending until localization 
> completes.
> The impact risk is that tasks will not speculate during localization since 
> they haven't started



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4004) Update jetty9 to align with Hadoop and Hive

2018-11-21 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694845#comment-16694845
 ] 

Kuhu Shukla commented on TEZ-4004:
--

[~jeagles], is the test failure related for the 0.9 version of the patch? 

> Update jetty9 to align with Hadoop and Hive
> ---
>
> Key: TEZ-4004
> URL: https://issues.apache.org/jira/browse/TEZ-4004
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4004.001-branch-0.9.patch, TEZ-4004.001.patch
>
>
> https://abi-laboratory.pro/index.php?view=timeline=java=jetty
> https://issues.apache.org/jira/browse/HADOOP-15815



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3476) Need a way to account for container localization.

2018-11-18 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691242#comment-16691242
 ] 

Kuhu Shukla commented on TEZ-3476:
--

I am sorry for the late reply [~jeagles]. Thank you for the comments!  I have 
been trying to do some more testing on this patch.
bq.Do attempt times now include container launch time?
Yes.
bq. Will speculation continue to be fair where attempts that required launching 
a container can be compared with attempts that reused a container?
If the initial container startup takes time, the task attempt would be 
speculated and any task attempt that used an already running container won't be 
speculated based on container startup slowness as that would be absent. This 
patches introduces that behavior.
bq. Does this patch support the design principle in Tez to separate container 
and task?
For the purposes of task submission event being sent earlier, I do not think it 
breaks the above mentioned separation except the time accounting and allowing 
speculation. Please correct me if I did not catch the question correctly.

I will post some more test results soon.


> Need a way to account for container localization.
> -
>
> Key: TEZ-3476
> URL: https://issues.apache.org/jira/browse/TEZ-3476
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3476.001.patch
>
>
> Tez task attempt start times don't reflect time spent in localization.
> In the MapReduce framework, the time spent in localization was included in 
> the total runtime of each task attempt. But since Tez reuses containers, the 
> time spent localizing for a container is not captured. The start time of the 
> first attempt in that container will only be set after the localization has 
> completed.
> The result is that attempts can appear as if they are not being run even 
> though there are resources available in the queue. An attempt can be assigned 
> to a container, but if the container is on a slow node and it takes a long 
> time to localize, the attempt state will remain pending until localization 
> completes.
> The impact risk is that tasks will not speculate during localization since 
> they haven't started



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3476) Need a way to account for container localization.

2018-11-15 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688295#comment-16688295
 ] 

Kuhu Shukla commented on TEZ-3476:
--

The v1 patch is less than ideal but it is a start. This change will undo parts 
of TEZ-3715 which may be a concern. I can move this into the scheduler service 
implementations but sending the event would require downcasting which is less 
than ideal as well. I have a test change and will try to come up with another 
where I can show that speculation kicks in for a slow container launch. I 
checked through TestTezJobs that the patch makes the task attempt go into 
submitted state before the container is launched. Appreciate any reviews.

> Need a way to account for container localization.
> -
>
> Key: TEZ-3476
> URL: https://issues.apache.org/jira/browse/TEZ-3476
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3476.001.patch
>
>
> Tez task attempt start times don't reflect time spent in localization.
> In the MapReduce framework, the time spent in localization was included in 
> the total runtime of each task attempt. But since Tez reuses containers, the 
> time spent localizing for a container is not captured. The start time of the 
> first attempt in that container will only be set after the localization has 
> completed.
> The result is that attempts can appear as if they are not being run even 
> though there are resources available in the queue. An attempt can be assigned 
> to a container, but if the container is on a slow node and it takes a long 
> time to localize, the attempt state will remain pending until localization 
> completes.
> The impact risk is that tasks will not speculate during localization since 
> they haven't started



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3476) Need a way to account for container localization.

2018-11-15 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3476:
-
Attachment: TEZ-3476.001.patch

> Need a way to account for container localization.
> -
>
> Key: TEZ-3476
> URL: https://issues.apache.org/jira/browse/TEZ-3476
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3476.001.patch
>
>
> Tez task attempt start times don't reflect time spent in localization.
> In the MapReduce framework, the time spent in localization was included in 
> the total runtime of each task attempt. But since Tez reuses containers, the 
> time spent localizing for a container is not captured. The start time of the 
> first attempt in that container will only be set after the localization has 
> completed.
> The result is that attempts can appear as if they are not being run even 
> though there are resources available in the queue. An attempt can be assigned 
> to a container, but if the container is on a slow node and it takes a long 
> time to localize, the attempt state will remain pending until localization 
> completes.
> The impact risk is that tasks will not speculate during localization since 
> they haven't started



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (TEZ-3476) Need a way to account for container localization.

2018-11-15 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla reassigned TEZ-3476:


Assignee: Kuhu Shukla

> Need a way to account for container localization.
> -
>
> Key: TEZ-3476
> URL: https://issues.apache.org/jira/browse/TEZ-3476
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Kuhu Shukla
>Priority: Major
>
> Tez task attempt start times don't reflect time spent in localization.
> In the MapReduce framework, the time spent in localization was included in 
> the total runtime of each task attempt. But since Tez reuses containers, the 
> time spent localizing for a container is not captured. The start time of the 
> first attempt in that container will only be set after the localization has 
> completed.
> The result is that attempts can appear as if they are not being run even 
> though there are resources available in the queue. An attempt can be assigned 
> to a container, but if the container is on a slow node and it takes a long 
> time to localize, the attempt state will remain pending until localization 
> completes.
> The impact risk is that tasks will not speculate during localization since 
> they haven't started



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TEZ-4019) Modify Tez shuffle handler to use AuxiliaryLocalPathHandler instead of LocalDirAllocator

2018-11-13 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created TEZ-4019:


 Summary: Modify Tez shuffle handler to use 
AuxiliaryLocalPathHandler instead of LocalDirAllocator
 Key: TEZ-4019
 URL: https://issues.apache.org/jira/browse/TEZ-4019
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


Like with the MR shuffle handler , this new API (YARN-7244) exposed in Hadoop 
version 2.8.2 and up helps keep the NM's view of disks good to use and the 
auxiliary services' view in sync. Tez right now compiles with 2.7 but when we 
move that we should allow this new good behavior to come in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4004) Update jetty9 to align with Hadoop and Hive

2018-11-01 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672219#comment-16672219
 ] 

Kuhu Shukla commented on TEZ-4004:
--

Committed to branch-0.10.0. CC: [~ewohlstadter] for changes required to the 
branch's CHANGES.txt. Thanks!

> Update jetty9 to align with Hadoop and Hive
> ---
>
> Key: TEZ-4004
> URL: https://issues.apache.org/jira/browse/TEZ-4004
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4004.001-branch-0.9.patch, TEZ-4004.001.patch
>
>
> https://abi-laboratory.pro/index.php?view=timeline=java=jetty
> https://issues.apache.org/jira/browse/HADOOP-15815



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-4004) Update jetty9 to align with Hadoop and Hive

2018-11-01 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-4004:
-
Fix Version/s: 0.10.0

> Update jetty9 to align with Hadoop and Hive
> ---
>
> Key: TEZ-4004
> URL: https://issues.apache.org/jira/browse/TEZ-4004
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Fix For: 0.10.0, 0.10.1
>
> Attachments: TEZ-4004.001-branch-0.9.patch, TEZ-4004.001.patch
>
>
> https://abi-laboratory.pro/index.php?view=timeline=java=jetty
> https://issues.apache.org/jira/browse/HADOOP-15815



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-4004) Update jetty9 to align with Hadoop and Hive

2018-10-30 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-4004:
-
Fix Version/s: 0.10.1

> Update jetty9 to align with Hadoop and Hive
> ---
>
> Key: TEZ-4004
> URL: https://issues.apache.org/jira/browse/TEZ-4004
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Fix For: 0.10.1
>
> Attachments: TEZ-4004.001.patch
>
>
> https://abi-laboratory.pro/index.php?view=timeline=java=jetty
> https://issues.apache.org/jira/browse/HADOOP-15815



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4004) Update jetty9 to align with Hadoop and Hive

2018-10-30 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668759#comment-16668759
 ] 

Kuhu Shukla commented on TEZ-4004:
--

Committed to master. [~jeagles] can you provide a patch for 0.9 as there is a 
conflict. Thank you!

> Update jetty9 to align with Hadoop and Hive
> ---
>
> Key: TEZ-4004
> URL: https://issues.apache.org/jira/browse/TEZ-4004
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4004.001.patch
>
>
> https://abi-laboratory.pro/index.php?view=timeline=java=jetty
> https://issues.apache.org/jira/browse/HADOOP-15815



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4004) Update jetty9 to align with Hadoop and Hive

2018-10-30 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668728#comment-16668728
 ] 

Kuhu Shukla commented on TEZ-4004:
--

+1 for the patch. Committing this shortly.

> Update jetty9 to align with Hadoop and Hive
> ---
>
> Key: TEZ-4004
> URL: https://issues.apache.org/jira/browse/TEZ-4004
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4004.001.patch
>
>
> https://abi-laboratory.pro/index.php?view=timeline=java=jetty
> https://issues.apache.org/jira/browse/HADOOP-15815



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-10-26 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3990:
-
Fix Version/s: (was: 0.10.0)
   0.10.1

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.9.2, 0.10.1
>
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch, 
> TEZ-3990.003.patch, TEZ-3990.004.patch, TEZ-3990.005.patch, TEZ-3990.006.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4012) Add docker support for Tez.

2018-10-26 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665608#comment-16665608
 ] 

Kuhu Shukla commented on TEZ-4012:
--

Thank you [~jeagles]. +1. Committing this shortly.

> Add docker support for Tez.
> ---
>
> Key: TEZ-4012
> URL: https://issues.apache.org/jira/browse/TEZ-4012
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4012.001.patch, TEZ-4012.002.patch, 
> TEZ-4012.003.patch
>
>
> Hadoop label builds contain a mix of development tools and versions. In 
> particular H11-H20 are unusable by tez since protoc -version is 2.6.x and 
> hadoop only supports 2.5.0. This jira will allow builds across all H1-H20 
> jenkins machines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4012) Add docker support for Tez.

2018-10-26 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665204#comment-16665204
 ] 

Kuhu Shukla commented on TEZ-4012:
--

Thank you for the patch [~jeagles]. Just some more minor comments.
{code}
# Add a welcome message and environment checks.
COPY hadoop_env_checks.sh /root/hadoop_env_checks.sh
RUN chmod 755 /root/hadoop_env_checks.sh
{code}
This should be now referencing the renamed env shell script.

> Add docker support for Tez.
> ---
>
> Key: TEZ-4012
> URL: https://issues.apache.org/jira/browse/TEZ-4012
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4012.001.patch, TEZ-4012.002.patch
>
>
> Hadoop label builds contain a mix of development tools and versions. In 
> particular H11-H20 are unusable by tez since protoc -version is 2.6.x and 
> hadoop only supports 2.5.0. This jira will allow builds across all H1-H20 
> jenkins machines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TEZ-4012) Add docker support for Tez.

2018-10-25 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664322#comment-16664322
 ] 

Kuhu Shukla edited comment on TEZ-4012 at 10/25/18 9:50 PM:


I am not super familiar with yetus and precommit builds but can the shell 
script name be changed here from {{hadoop_env_checks}} to maybe 
{{tez_env_checks}}?
The only place it is referenced seems to be in the Dockerfile.


was (Author: kshukla):
I am not super familiar with yetus and precommit builds but can the shell 
script name be changed here from {{hadoop_env_checks}} to maybe 
{{tez_env_checks}}?

> Add docker support for Tez.
> ---
>
> Key: TEZ-4012
> URL: https://issues.apache.org/jira/browse/TEZ-4012
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4012.001.patch
>
>
> Hadoop label builds contain a mix of development tools and versions. In 
> particular H11-H20 are unusable by tez since protoc -version is 2.6.x and 
> hadoop only supports 2.5.0. This jira will allow builds across all H1-H20 
> jenkins machines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4012) Add docker support for Tez.

2018-10-25 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664322#comment-16664322
 ] 

Kuhu Shukla commented on TEZ-4012:
--

I am not super familiar with yetus and precommit builds but can the shell 
script name be changed here from {{hadoop_env_checks}} to maybe 
{{tez_env_checks}}?

> Add docker support for Tez.
> ---
>
> Key: TEZ-4012
> URL: https://issues.apache.org/jira/browse/TEZ-4012
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4012.001.patch
>
>
> Hadoop label builds contain a mix of development tools and versions. In 
> particular H11-H20 are unusable by tez since protoc -version is 2.6.x and 
> hadoop only supports 2.5.0. This jira will allow builds across all H1-H20 
> jenkins machines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TEZ-4013) Allow intermediate output/spill data encryption

2018-10-22 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created TEZ-4013:


 Summary: Allow intermediate output/spill data encryption
 Key: TEZ-4013
 URL: https://issues.apache.org/jira/browse/TEZ-4013
 Project: Apache Tez
  Issue Type: Task
Affects Versions: 0.9.1
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


Analogous to MAPREDUCE-5890 (Support for encrypting Intermediate data and 
spills in local filesystem), Tez should support for encrypting spill data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3976) ShuffleManager reporting too many errors

2018-10-18 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655924#comment-16655924
 ] 

Kuhu Shukla commented on TEZ-3976:
--

Thank you [~jmarhuen] for the patch.

I took a cursory look at it and the new config waits for a configurable amount 
of time before reporting all the failures thus far. This approach looks good 
and it is different from the ordered case though and I wonder if we should make 
them consistent?

I will look at the patch closely in the mean time.

> ShuffleManager reporting too many errors
> 
>
> Key: TEZ-3976
> URL: https://issues.apache.org/jira/browse/TEZ-3976
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Major
> Attachments: TEZ-3976.1.patch, TEZ-3976.2.patch, TEZ-3976.3.patch, 
> TEZ-3976.4.patch, TEZ-3976.5.patch, TEZ-3976.6.patch, TEZ-3976.7.patch
>
>
> The symptoms are a lot of these logs are being shown:
> {code:java}
> 2018-06-15T18:09:35,811 INFO  [Fetcher_B {Reducer_5} #0 ()] 
> org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager: Reducer_5: 
> Fetch failed for src: InputAttemptIdentifier [inputIdentifier=701, 
> attemptNumber=0, 
> pathComponent=attempt_152901963_0021_34_01_000701_0_12541_0, spillType=2, 
> spillId=0]InputIdentifier: InputAttemptIdentifier [inputIdentifier=701, 
> attemptNumber=0, 
> pathComponent=attempt_152901963_0021_34_01_000701_0_12541_0, spillType=2, 
> spillId=0], connectFailed: true
> 2018-06-15T18:09:35,811 WARN  [Fetcher_B {Reducer_5} #1 ()] 
> org.apache.tez.runtime.library.common.shuffle.Fetcher: copyInputs failed for 
> tasks [InputAttemptIdentifier [inputIdentifier=589, attemptNumber=0, 
> pathComponent=attempt_152901963_0021_34_01_000589_0_12445_0, spillType=2, 
> spillId=0]]
> 2018-06-15T18:09:35,811 INFO  [Fetcher_B {Reducer_5} #1 ()] 
> org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager: Reducer_5: 
> Fetch failed for src: InputAttemptIdentifier [inputIdentifier=589, 
> attemptNumber=0, 
> pathComponent=attempt_152901963_0021_34_01_000589_0_12445_0, spillType=2, 
> spillId=0]InputIdentifier: InputAttemptIdentifier [inputIdentifier=589, 
> attemptNumber=0, 
> pathComponent=attempt_152901963_0021_34_01_000589_0_12445_0, spillType=2, 
> spillId=0], connectFailed: true
> {code}
> Each of those translate into an event in the AM which finally crashes due to 
> OOM after around 30 minutes and around 10 million shuffle input errors (and 
> 10 million lines like the previous ones). When the ShufflerManager is closed 
> and the counters reported there are many shuffle input errors, some of those 
> logs are:
> {code:java}
> 2018-06-15T17:46:30,988  INFO [TezTR-441963_21_34_4_0_4 
> (152901963_0021_34_04_00_4)] runtime.LogicalIOProcessorRuntimeTask: 
> Final Counters for attempt_152901963_0021_34_04_00_4: Counters: 43 
> [[org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=0, 
> NUM_SHUFFLED_INPUTS=26, NUM_FAILED_SHUFFLE_INPUTS=858965, 
> INPUT_RECORDS_PROCESSED=26, OUTPUT_RECORDS=1, OUTPUT_LARGE_RECORDS=0, 
> OUTPUT_BYTES=779472, OUTPUT_BYTES_WITH_OVERHEAD=779483, 
> OUTPUT_BYTES_PHYSICAL=780146, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, 
> ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0, 
> SHUFFLE_BYTES=4207563, SHUFFLE_BYTES_DECOMPRESSED=20266603, 
> SHUFFLE_BYTES_TO_MEM=3380616, SHUFFLE_BYTES_TO_DISK=0, 
> SHUFFLE_BYTES_DISK_DIRECT=826947, SHUFFLE_PHASE_TIME=52516, 
> FIRST_EVENT_RECEIVED=1, LAST_EVENT_RECEIVED=1185][HIVE 
> RECORDS_OUT_INTERMEDIATE_^[[1;35;40m^[[KReducer_12^[[m^[[K=1, 
> RECORDS_OUT_OPERATOR_GBY_159=1, 
> RECORDS_OUT_OPERATOR_RS_160=1][TaskCounter_^[[1;35;40m^[[KReducer_12^[[m^[[K_INPUT_Map_11
>  FIRST_EVENT_RECEIVED=1, INPUT_RECORDS_PROCESSED=26, 
> LAST_EVENT_RECEIVED=1185, NUM_FAILED_SHUFFLE_INPUTS=858965, 
> NUM_SHUFFLED_INPUTS=26, SHUFFLE_BYTES=4207563, 
> SHUFFLE_BYTES_DECOMPRESSED=20266603, SHUFFLE_BYTES_DISK_DIRECT=826947, 
> SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_TO_MEM=3380616, 
> SHUFFLE_PHASE_TIME=52516][TaskCounter_^[[1;35;40m^[[KReducer_12^[[m^[[K_OUTPUT_Map_1
>  ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, 
> ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=779472, OUTPUT_BYTES_PHYSICAL=780146, 
> OUTPUT_BYTES_WITH_OVERHEAD=779483, OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=1, 
> SPILLED_RECORDS=0]]
> 2018-06-15T17:46:32,271 INFO  [TezTR-441963_21_34_3_15_1 ()] 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Final Counters for 
> attempt_152901963_0021_34_03_15_1: Counters: 87 [[File System 
> Counters FILE_BYTES_READ=0, FILE_BYTES_WRITTEN=0, FILE_READ_OPS=0, 
> FILE_LARGE_READ_OPS=0, FILE_WRITE_OPS=0, HDFS_BYTES_READ=2344929, 
> HDFS_BYTES_WRITTEN=0, HDFS_READ_OPS=5, HDFS_LARGE_READ_OPS=0, 
> 

[jira] [Commented] (TEZ-3998) Allow CONCURRENT edge property in DAG construction and introduce ConcurrentSchedulingType

2018-10-16 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651706#comment-16651706
 ] 

Kuhu Shukla commented on TEZ-3998:
--

I plan to review this in the next few days. Would be nice to get some comments 
from [~gopalv] , [~jlowe] and [~jeagles] among others.

> Allow CONCURRENT edge property in DAG construction and introduce 
> ConcurrentSchedulingType
> -
>
> Key: TEZ-3998
> URL: https://issues.apache.org/jira/browse/TEZ-3998
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Yingda Chen
>Assignee: Yingda Chen
>Priority: Major
>
> This is the first task related to TEZ-3997
>  
> |Note: There is no API change in this proposed change. The majority of this 
> change will be lifting some existing constraints against CONCURRENT edge 
> type, and addition of a VertexMangerPlugin implementation.|
>  
> This includes enabling the CONCURRENT SchedulingType as a valid edge 
> property, by removing all the sanity check against CONCURRENT during DAG 
> construction/execution. A new VertexManagerPlugin (namely 
> VertexManagerWithConcurrentInput) will be implemented for vertex with 
> incoming concurrent edge(s). 
> In addition, we will assume in this change that 
>  * A vertex *cannot* have both SEQUENTIAL and CONCURRENT incoming edges 
>  * No shuffle or data movement is handled by Tez framework when two vertices 
> are connected through a CONCURRENT edge. Instead, runtime should be 
> responsible for handling all the data-plane communications (as proposed in 
> [1]).
> Note that the above assumptions are common for scenarios such as whole-DAG or 
> sub-graph gang scheduling, but they may be relaxed in later implementation, 
> which may allow mixture of SEQUENTIAL and CONCURRENT edges on the same vertex.
>  
> Most of the (meaningful) scheduling decisions today in Tez are made based on 
> the notion of (or an extended version of) source task completion. This will 
> no longer be true in presence of CONCURRENT edge. Instead, events such as 
> source vertex configured, or source task running will become more relevant 
> when making scheduling decision for two vertices connected via a CONCURRENT 
> edge.  We therefore introduce a new enum *ConcurrentSchedulingType* to 
> describe the “scheduling timing” for the downstream vertex in such scenarios. 
> |public enum ConcurrentSchedulingType{
>    /** * trigger downstream vertex tasks scheduling by "configured" event of 
> upstream vertices */
>   SOURCE_VERTEX_CONFIGURED,
>    /** * trigger downstream vertex tasks scheduling by "running" event of 
> upstream tasks */ 
>   SOURCE_TASK_STARTED 
> }|
>  
> Note that in this change, we will only use SOURCE_VERTEX_CONFIGURED as the 
> scheduling type, which suffice for scenarios of whole-DAG or sub-graph 
> gang-scheduling, where we want (all the tasks in) the downstream vertex to be 
> scheduled together with (all the tasks) in the upstream vertex. In this case, 
> we can leverage the existing onVertexStateUpdated() interface of 
> VextexMangerPlugin to collect relevant information to assist the scheduling 
> decision, and *there is no additional API change necessary*. However, in more 
> subtle case such as the parameter-server example described in Fig. 1, other 
> scheduling type would be more relevant, therefore the placeholder for 
> *ConcurrentSchedulingType* will be introduced in this change as part of the 
> infrastructure work.
>  
> Finally, since we assume that all communications between two vertices 
> connected via CONCURRENT edge are handled by application runtime, a 
> CONCURRENT edge will be assigned a DummyEdgeManager that basically mute all 
> DME/VME handling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3961) Tez UI web.xml tries to reach out to java.sun.com for validation after moving to jetty-9

2018-10-10 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645502#comment-16645502
 ] 

Kuhu Shukla commented on TEZ-3961:
--

[~jeagles], request for review. Thanks a lot!

 

> Tez UI web.xml tries to reach out to java.sun.com for validation after moving 
> to jetty-9
> 
>
> Key: TEZ-3961
> URL: https://issues.apache.org/jira/browse/TEZ-3961
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3961.001.patch, TEZ-3961.002.patch
>
>
> Tez UI can throw a 503 error when hosted on a server that cannot reach public 
> IPs like java.sun.com which are listed as servers for DTDs in web.xml.  This 
> behavior change comes from moving to jetty 9 (Tez and Hadoop 3.0) which 
> removed provided schemas that were being shipped with earlier versions. It is 
> suboptimal even in cases where public IPs are accessible to fetch the DTD for 
> a very very simple web.xml file. We can choose to either remove the DTD 
> validation or add dependency explicitly to org.eclipse.jetty.toolchain » 
> jetty-osgi-servlet-api to allow for this jetty change to not affect the 
> behavior of tez-ui.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-10-10 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645452#comment-16645452
 ] 

Kuhu Shukla commented on TEZ-3990:
--

Opened https://issues.apache.org/jira/browse/TEZ-4005 for unordered feature 
addition for penalties.

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch, 
> TEZ-3990.003.patch, TEZ-3990.004.patch, TEZ-3990.005.patch, TEZ-3990.006.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TEZ-4005) Add Host-Source input penalties to Unordered Shuffle

2018-10-10 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created TEZ-4005:


 Summary: Add Host-Source input penalties to Unordered Shuffle
 Key: TEZ-4005
 URL: https://issues.apache.org/jira/browse/TEZ-4005
 Project: Apache Tez
  Issue Type: Task
Affects Versions: 0.9.1
Reporter: Kuhu Shukla


Ordered shuffle has a mechanism to penalize hosts and try exponential waits for 
retrying. Unordered case is missing this feature. Would be really useful to add 
this and make shuffle policies consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3961) Tez UI web.xml tries to reach out to java.sun.com for validation after moving to jetty-9

2018-10-10 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3961:
-
Attachment: TEZ-3961.002.patch

> Tez UI web.xml tries to reach out to java.sun.com for validation after moving 
> to jetty-9
> 
>
> Key: TEZ-3961
> URL: https://issues.apache.org/jira/browse/TEZ-3961
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3961.001.patch, TEZ-3961.002.patch
>
>
> Tez UI can throw a 503 error when hosted on a server that cannot reach public 
> IPs like java.sun.com which are listed as servers for DTDs in web.xml.  This 
> behavior change comes from moving to jetty 9 (Tez and Hadoop 3.0) which 
> removed provided schemas that were being shipped with earlier versions. It is 
> suboptimal even in cases where public IPs are accessible to fetch the DTD for 
> a very very simple web.xml file. We can choose to either remove the DTD 
> validation or add dependency explicitly to org.eclipse.jetty.toolchain » 
> jetty-osgi-servlet-api to allow for this jetty change to not affect the 
> behavior of tez-ui.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-10-10 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645421#comment-16645421
 ] 

Kuhu Shukla commented on TEZ-3990:
--

The build seems to be having a protoc version issue.

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch, 
> TEZ-3990.003.patch, TEZ-3990.004.patch, TEZ-3990.005.patch, TEZ-3990.006.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4004) Update jetty9 to align with Hadoop and Hive

2018-10-10 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645415#comment-16645415
 ] 

Kuhu Shukla commented on TEZ-4004:
--

HADOOP-15815 seems to try and go to 9.3.25 jetty (9.3.25.v20180904) although 
anything 24 (and up?) should be good based on [~kihwal]. I see some new methods 
in 9.3.25 compared to 9.3.24.

> Update jetty9 to align with Hadoop and Hive
> ---
>
> Key: TEZ-4004
> URL: https://issues.apache.org/jira/browse/TEZ-4004
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-4004.001.patch
>
>
> https://abi-laboratory.pro/index.php?view=timeline=java=jetty
> https://issues.apache.org/jira/browse/HADOOP-15815



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-10-10 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3990:
-
Attachment: TEZ-3990.006.patch

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch, 
> TEZ-3990.003.patch, TEZ-3990.004.patch, TEZ-3990.005.patch, TEZ-3990.006.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-10-10 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3990:
-
Attachment: (was: TEZ-3990.006..patch)

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch, 
> TEZ-3990.003.patch, TEZ-3990.004.patch, TEZ-3990.005.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-10-10 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645254#comment-16645254
 ] 

Kuhu Shukla commented on TEZ-3990:
--

Missed a variable name change. d'oh.

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch, 
> TEZ-3990.003.patch, TEZ-3990.004.patch, TEZ-3990.005.patch, 
> TEZ-3990.006..patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-10-10 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3990:
-
Attachment: TEZ-3990.006..patch

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch, 
> TEZ-3990.003.patch, TEZ-3990.004.patch, TEZ-3990.005.patch, 
> TEZ-3990.006..patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TEZ-3961) Tez UI web.xml tries to reach out to java.sun.com for validation after moving to jetty-9

2018-10-10 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645213#comment-16645213
 ] 

Kuhu Shukla edited comment on TEZ-3961 at 10/10/18 4:25 PM:


Thank you for the ping [~jeagles]. This patch is simple but needs review on 
whether it is ok to remove the DOCTYPE declaration.


was (Author: kshukla):
Thank you for the ping [~jeagles]. This patch is simple but needs review on 
whether it is ok to emovethe DOCTYPE declaration.

> Tez UI web.xml tries to reach out to java.sun.com for validation after moving 
> to jetty-9
> 
>
> Key: TEZ-3961
> URL: https://issues.apache.org/jira/browse/TEZ-3961
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3961.001.patch
>
>
> Tez UI can throw a 503 error when hosted on a server that cannot reach public 
> IPs like java.sun.com which are listed as servers for DTDs in web.xml.  This 
> behavior change comes from moving to jetty 9 (Tez and Hadoop 3.0) which 
> removed provided schemas that were being shipped with earlier versions. It is 
> suboptimal even in cases where public IPs are accessible to fetch the DTD for 
> a very very simple web.xml file. We can choose to either remove the DTD 
> validation or add dependency explicitly to org.eclipse.jetty.toolchain » 
> jetty-osgi-servlet-api to allow for this jetty change to not affect the 
> behavior of tez-ui.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3961) Tez UI web.xml tries to reach out to java.sun.com for validation after moving to jetty-9

2018-10-10 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3961:
-
Attachment: TEZ-3961.001.patch

> Tez UI web.xml tries to reach out to java.sun.com for validation after moving 
> to jetty-9
> 
>
> Key: TEZ-3961
> URL: https://issues.apache.org/jira/browse/TEZ-3961
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3961.001.patch
>
>
> Tez UI can throw a 503 error when hosted on a server that cannot reach public 
> IPs like java.sun.com which are listed as servers for DTDs in web.xml.  This 
> behavior change comes from moving to jetty 9 (Tez and Hadoop 3.0) which 
> removed provided schemas that were being shipped with earlier versions. It is 
> suboptimal even in cases where public IPs are accessible to fetch the DTD for 
> a very very simple web.xml file. We can choose to either remove the DTD 
> validation or add dependency explicitly to org.eclipse.jetty.toolchain » 
> jetty-osgi-servlet-api to allow for this jetty change to not affect the 
> behavior of tez-ui.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-10-10 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3990:
-
Attachment: TEZ-3990.005.patch

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch, 
> TEZ-3990.003.patch, TEZ-3990.004.patch, TEZ-3990.005.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-10-08 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642016#comment-16642016
 ] 

Kuhu Shukla edited comment on TEZ-3990 at 10/8/18 3:32 PM:
---

Addressed comments by [~jeagles]. Agreed on the issues mentioned with delay 
calculation and testability. [~jeagles], should I go ahead and create JIRAs for 
these issues?

 

P.S. The unordered case doesn't seem to have the concept of penalties fyi.. 
which is odd..


was (Author: kshukla):
Addressed comments by [~jeagles]. Agreed on the issues mentioned with delay 
calculation and testability. [~jeagles], should I go ahead and create JIRAs for 
these issues?

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch, 
> TEZ-3990.003.patch, TEZ-3990.004.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-10-08 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642016#comment-16642016
 ] 

Kuhu Shukla commented on TEZ-3990:
--

Addressed comments by [~jeagles]. Agreed on the issues mentioned with delay 
calculation and testability. [~jeagles], should I go ahead and create JIRAs for 
these issues?

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch, 
> TEZ-3990.003.patch, TEZ-3990.004.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-10-08 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3990:
-
Attachment: TEZ-3990.004.patch

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch, 
> TEZ-3990.003.patch, TEZ-3990.004.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-4002) CHANGES.txt for 0.9.2 Release

2018-10-06 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-4002:
-
Attachment: TEZ-4002.001.patch

> CHANGES.txt for 0.9.2 Release
> -
>
> Key: TEZ-4002
> URL: https://issues.apache.org/jira/browse/TEZ-4002
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-4002.001.patch
>
>
> Add CHANGES.txt for 0.9.2 line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TEZ-4002) CHANGES.txt for 0.9.2 Release

2018-10-06 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created TEZ-4002:


 Summary: CHANGES.txt for 0.9.2 Release
 Key: TEZ-4002
 URL: https://issues.apache.org/jira/browse/TEZ-4002
 Project: Apache Tez
  Issue Type: Task
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


Add CHANGES.txt for 0.9.2 line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3982) DAGAppMaster and tasks should not report negative or invalid progress

2018-09-21 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623653#comment-16623653
 ] 

Kuhu Shukla commented on TEZ-3982:
--

Sorry about that. Changed the clock to SystemClock which is not deprecated in 
2.8 Hadoop. [~jlowe] thank you for catching this. Request for review.

> DAGAppMaster and tasks should not report negative or invalid progress
> -
>
> Key: TEZ-3982
> URL: https://issues.apache.org/jira/browse/TEZ-3982
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.10.1
>
> Attachments: TEZ-3982.001.patch, TEZ-3982.002.patch, 
> TEZ-3982.003.patch, TEZ-3982.004.patch, TEZ-3982.005.branch-0.9.patch
>
>
> AM fails (AMRMClient expects non negative progress) if any component reports 
> invalid or -ve progress, DagAppMaster/Tasks should check and report 
> accordingly to allow the AM to execute.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3982) DAGAppMaster and tasks should not report negative or invalid progress

2018-09-21 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3982:
-
Attachment: TEZ-3982.005.branch-0.9.patch

> DAGAppMaster and tasks should not report negative or invalid progress
> -
>
> Key: TEZ-3982
> URL: https://issues.apache.org/jira/browse/TEZ-3982
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.10.1
>
> Attachments: TEZ-3982.001.patch, TEZ-3982.002.patch, 
> TEZ-3982.003.patch, TEZ-3982.004.patch, TEZ-3982.005.branch-0.9.patch
>
>
> AM fails (AMRMClient expects non negative progress) if any component reports 
> invalid or -ve progress, DagAppMaster/Tasks should check and report 
> accordingly to allow the AM to execute.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3982) DAGAppMaster and tasks should not report negative or invalid progress

2018-09-20 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622161#comment-16622161
 ] 

Kuhu Shukla commented on TEZ-3982:
--

[~jeagles] if the total vertices are 0, do we want to report NaN or 0 progress? 
That would help decide if the code mentioned above is required. Appreciate the 
inputs.

> DAGAppMaster and tasks should not report negative or invalid progress
> -
>
> Key: TEZ-3982
> URL: https://issues.apache.org/jira/browse/TEZ-3982
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3982.001.patch, TEZ-3982.002.patch, 
> TEZ-3982.003.patch, TEZ-3982.004.patch
>
>
> AM fails (AMRMClient expects non negative progress) if any component reports 
> invalid or -ve progress, DagAppMaster/Tasks should check and report 
> accordingly to allow the AM to execute.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TEZ-3198) Shuffle failures for the trailing task in a vertex are often fatal to the entire DAG

2018-09-20 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla resolved TEZ-3198.
--
Resolution: Duplicate

Yes. It will certainly allow the AM to retry the attempt sooner.

> Shuffle failures for the trailing task in a vertex are often fatal to the 
> entire DAG
> 
>
> Key: TEZ-3198
> URL: https://issues.apache.org/jira/browse/TEZ-3198
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0, 0.8.2
>Reporter: Jason Lowe
>Priority: Critical
>
> I've seen an increasing number of cases where a single-node failure caused 
> the whole Tez DAG to fail. These scenarios are common in that they involve 
> the last task of a vertex attempting to complete a shuffle where all the peer 
> tasks have already finished shuffling.  The last task's attempt encounters 
> errors shuffling one of its inputs and keeps reporting it to the AM.  
> Eventually the attempt decides it must be the cause of the shuffle error and 
> fails.  The subsequent attempts all do the same thing, and eventually we hit 
> the task max attempts limit and fail the vertex and DAG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3982) DAGAppMaster and tasks should not report negative or invalid progress

2018-09-19 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621223#comment-16621223
 ] 

Kuhu Shukla commented on TEZ-3982:
--

Thanks [~jeagles], addressed the comment and kept the similar check after the 
division by totalVertices is done.

> DAGAppMaster and tasks should not report negative or invalid progress
> -
>
> Key: TEZ-3982
> URL: https://issues.apache.org/jira/browse/TEZ-3982
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3982.001.patch, TEZ-3982.002.patch, 
> TEZ-3982.003.patch, TEZ-3982.004.patch
>
>
> AM fails (AMRMClient expects non negative progress) if any component reports 
> invalid or -ve progress, DagAppMaster/Tasks should check and report 
> accordingly to allow the AM to execute.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3982) DAGAppMaster and tasks should not report negative or invalid progress

2018-09-19 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3982:
-
Attachment: TEZ-3982.004.patch

> DAGAppMaster and tasks should not report negative or invalid progress
> -
>
> Key: TEZ-3982
> URL: https://issues.apache.org/jira/browse/TEZ-3982
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3982.001.patch, TEZ-3982.002.patch, 
> TEZ-3982.003.patch, TEZ-3982.004.patch
>
>
> AM fails (AMRMClient expects non negative progress) if any component reports 
> invalid or -ve progress, DagAppMaster/Tasks should check and report 
> accordingly to allow the AM to execute.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3972) Tez DAG can hang when a single task fails to fetch

2018-09-18 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620018#comment-16620018
 ] 

Kuhu Shukla commented on TEZ-3972:
--

Thanks [~jeagles]!

> Tez DAG can hang when a single task fails to fetch
> --
>
> Key: TEZ-3972
> URL: https://issues.apache.org/jira/browse/TEZ-3972
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 0.9.2, 0.10.1
>
> Attachments: TEZ-3972.001.patch, TEZ-3972.002.patch, 
> TEZ-3972.003.patch
>
>
> Description of the hung DAG:
> A DAG with 2 vertices. {{Map}} Vertex has 22k maps, downstream vertex 
> {{Reduce}} has 1009 tasks. All tasks succeed but one, which hangs. This one 
> task (attempt) is doing a local fetch from a node that (now) has a bad disk. 
> It fails to fetch and reports to the AM for the offending input attempt 
> identifiers. However the AM does not schedule a re-run as 
> {{uniquefailedOutputReports}} size is 1 (since only this task attempt failed 
> to fetch) and failure fraction is not met. The denominator for this fraction 
> is the total number of tasks. That causes the re-run to never occur. This 
> JIRA tracks the AM side of the change to alleviate this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3972) Tez DAG can hang when a single task fails to fetch

2018-09-17 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3972:
-
Attachment: TEZ-3972.003.patch

> Tez DAG can hang when a single task fails to fetch
> --
>
> Key: TEZ-3972
> URL: https://issues.apache.org/jira/browse/TEZ-3972
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3972.001.patch, TEZ-3972.002.patch, 
> TEZ-3972.003.patch
>
>
> Description of the hung DAG:
> A DAG with 2 vertices. {{Map}} Vertex has 22k maps, downstream vertex 
> {{Reduce}} has 1009 tasks. All tasks succeed but one, which hangs. This one 
> task (attempt) is doing a local fetch from a node that (now) has a bad disk. 
> It fails to fetch and reports to the AM for the offending input attempt 
> identifiers. However the AM does not schedule a re-run as 
> {{uniquefailedOutputReports}} size is 1 (since only this task attempt failed 
> to fetch) and failure fraction is not met. The denominator for this fraction 
> is the total number of tasks. That causes the re-run to never occur. This 
> JIRA tracks the AM side of the change to alleviate this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-09-17 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617601#comment-16617601
 ] 

Kuhu Shukla commented on TEZ-3990:
--

Minor change to add the new config to runtime keys that are expected. Fixes the 
unit test failure.

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch, 
> TEZ-3990.003.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-09-17 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3990:
-
Attachment: TEZ-3990.003.patch

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch, 
> TEZ-3990.003.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-09-14 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615349#comment-16615349
 ] 

Kuhu Shukla commented on TEZ-3990:
--

Updated patch with penalty cap based on a configured max time in milliseconds. 
Adds an entry for every failure occurrence to allow retries at some point by 
the AM if threshold is reached.

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-09-14 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3990:
-
Attachment: TEZ-3990.002.patch

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch, TEZ-3990.002.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-09-13 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614003#comment-16614003
 ] 

Kuhu Shukla commented on TEZ-3990:
--

Hmm, so after some more offline discussion [~jeagles], this patch won't fully 
address the issue specific to penalties and would cap the signaling to the AM, 
which is not what we want.

To clarify, it is important to not allow for indefinite exponential growth on 
penalties delay. It makes sending AM signals spaced out farther and makes it 
difficult for the upstream to run and increases the overall runtime of the 
downstream as well.

What we can instead do is cap the delay based on the value calculated and start 
over with factor of one again to allow aggressive signaling or cap the delay at 
that for all future occurrences to allow for debugging and provide constant 
value function after one window of exponential growth per MapHost. Appreciate 
more comments and I will post a revised (and hopefully a functional capping 
mechanism) patch soon. 

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

2018-09-13 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613568#comment-16613568
 ] 

Kuhu Shukla commented on TEZ-3990:
--

[~jeagles] thoughts? Thanks a lot!

> The number of shuffle penalties for a host/inputAttemptIdentifier should be 
> capped
> --
>
> Key: TEZ-3990
> URL: https://issues.apache.org/jira/browse/TEZ-3990
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3990.001.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows 
> adding the same Host/InputAttemptIdentifier over and over with revised 
> penalty time that grows exponentially. It should at some point drop the 
> retrying and report failure to the AM asap to allow the job to rectify the 
> upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   >