[jira] [Assigned] (TEZ-1205) Remove profiling keyword from APIs/configs

2014-07-25 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned TEZ-1205:


Assignee: Gopal V  (was: Rajesh Balamohan)

 Remove profiling keyword from APIs/configs 
 -

 Key: TEZ-1205
 URL: https://issues.apache.org/jira/browse/TEZ-1205
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Gopal V
Priority: Blocker

 Given that the current functionality to support profiling actually just 
 implies augmenting of command line options for a specified set of tasks, we 
 can word the APIs and configs to be more general purpose. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1205) Remove profiling keyword from APIs/configs

2014-07-29 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078913#comment-14078913
 ] 

Gopal V commented on TEZ-1205:
--

This is passed in as 2 properties - one declaring the actual cmd-opts and the 
other selecting the vertex/task which it is targeted at.

additional does not explain the relationship.

 Remove profiling keyword from APIs/configs 
 -

 Key: TEZ-1205
 URL: https://issues.apache.org/jira/browse/TEZ-1205
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Gopal V
Priority: Blocker

 Given that the current functionality to support profiling actually just 
 implies augmenting of command line options for a specified set of tasks, we 
 can word the APIs and configs to be more general purpose. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (TEZ-1360) Provide vertex parallelism to each vertex task

2014-08-01 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned TEZ-1360:


Assignee: Gopal V

 Provide vertex parallelism to each vertex task
 --

 Key: TEZ-1360
 URL: https://issues.apache.org/jira/browse/TEZ-1360
 Project: Apache Tez
  Issue Type: Bug
Reporter: Johannes Zillmann
Assignee: Gopal V

 It would be good for a task to get a info about the total task count of its 
 vertex.
 With this there would be an equivalent for map-reduce' {{mapred.map.tasks}} 
 and {{mapred.reduce.tasks}} and mr-applications using this could be ported to 
 Tez more easily.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1360) Provide vertex parallelism to each vertex task

2014-08-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1360:
-

Attachment: TEZ-1332.1.patch

 Provide vertex parallelism to each vertex task
 --

 Key: TEZ-1360
 URL: https://issues.apache.org/jira/browse/TEZ-1360
 Project: Apache Tez
  Issue Type: Bug
Reporter: Johannes Zillmann
Assignee: Gopal V
 Attachments: TEZ-1332.1.patch


 It would be good for a task to get a info about the total task count of its 
 vertex.
 With this there would be an equivalent for map-reduce' {{mapred.map.tasks}} 
 and {{mapred.reduce.tasks}} and mr-applications using this could be ported to 
 Tez more easily.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1372) Fix preWarm to work after recent API changes

2014-08-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089848#comment-14089848
 ] 

Gopal V commented on TEZ-1372:
--

[~bikassaha]: getting to this after HIVE-7601 and HIVE-7639 gets a test run.

 Fix preWarm to work after recent API changes
 

 Key: TEZ-1372
 URL: https://issues.apache.org/jira/browse/TEZ-1372
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: 0.5.0
Reporter: Bikas Saha
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-1372.1.patch, TEZ-1372.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1372) Fix preWarm to work after recent API changes

2014-08-08 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091011#comment-14091011
 ] 

Gopal V commented on TEZ-1372:
--

Missed PreWarmVertex.java in diff?

 Fix preWarm to work after recent API changes
 

 Key: TEZ-1372
 URL: https://issues.apache.org/jira/browse/TEZ-1372
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: 0.5.0
Reporter: Bikas Saha
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-1372.1.patch, TEZ-1372.2.patch, TEZ-1372.3.patch, 
 TEZ-1372.svg






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1205) Remove profiling keyword from APIs/configs

2014-08-11 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1205:
-

Assignee: Rajesh Balamohan  (was: Gopal V)

 Remove profiling keyword from APIs/configs 
 -

 Key: TEZ-1205
 URL: https://issues.apache.org/jira/browse/TEZ-1205
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Rajesh Balamohan
Priority: Blocker
 Attachments: TEZ-1205.1.patch, TEZ-1205.2.patch


 Given that the current functionality to support profiling actually just 
 implies augmenting of command line options for a specified set of tasks, we 
 can word the APIs and configs to be more general purpose. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1360) Provide vertex parallelism to each vertex task

2014-08-11 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093368#comment-14093368
 ] 

Gopal V commented on TEZ-1360:
--

[~hitesh]: looks like I'll have to start over after the API changes. 

Moving parallelism to TaskContext.

 Provide vertex parallelism to each vertex task
 --

 Key: TEZ-1360
 URL: https://issues.apache.org/jira/browse/TEZ-1360
 Project: Apache Tez
  Issue Type: Bug
Reporter: Johannes Zillmann
Assignee: Gopal V
 Attachments: TEZ-1360.1.patch


 It would be good for a task to get a info about the total task count of its 
 vertex.
 With this there would be an equivalent for map-reduce' {{mapred.map.tasks}} 
 and {{mapred.reduce.tasks}} and mr-applications using this could be ported to 
 Tez more easily.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1360) Provide vertex parallelism to each vertex task

2014-08-11 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1360:
-

Attachment: TEZ-1360.2.patch

 Provide vertex parallelism to each vertex task
 --

 Key: TEZ-1360
 URL: https://issues.apache.org/jira/browse/TEZ-1360
 Project: Apache Tez
  Issue Type: Bug
Reporter: Johannes Zillmann
Assignee: Gopal V
 Attachments: TEZ-1360.1.patch, TEZ-1360.2.patch


 It would be good for a task to get a info about the total task count of its 
 vertex.
 With this there would be an equivalent for map-reduce' {{mapred.map.tasks}} 
 and {{mapred.reduce.tasks}} and mr-applications using this could be ported to 
 Tez more easily.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1411) Address initial feedback on swimlanes

2014-08-15 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098965#comment-14098965
 ] 

Gopal V commented on TEZ-1411:
--

[~jeagles]: sure, that shouldn't be a big issue. The reason to use SVG was to 
have the clickable links.

 Address initial feedback on swimlanes
 -

 Key: TEZ-1411
 URL: https://issues.apache.org/jira/browse/TEZ-1411
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Gopal V
Priority: Blocker
 Fix For: 0.5.0


 Few other good to have things
 1) A wrapper script that takes care of the command chaining with a single 
 appId as input from the user.
 2) Legend in the README or in the svg itself about what is what.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1411) Address initial feedback on swimlanes

2014-08-15 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099297#comment-14099297
 ] 

Gopal V commented on TEZ-1411:
--

[~jeagles]: You can produce a zoomed out view by modifing the -t variable.

I intend to rewrite this tool, without needing regex based log-parsing and pull 
all the information from ATS/SimpleLoggingHistoryService directly.

The latter is trivial to use, just add this to the tez-site.xml - to log 
ATS-like info into HDFS.

{code}
  property
nametez.simple.history.logging.dir/name
value${fs.default.name}/tez-history//value
  /property
{code}

I will encourage you to use either of those, because I'll try to push out more 
tooling I have built for post-hoc analysis from that data.


 Address initial feedback on swimlanes
 -

 Key: TEZ-1411
 URL: https://issues.apache.org/jira/browse/TEZ-1411
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Gopal V
Priority: Blocker
 Fix For: 0.5.0

 Attachments: TEZ-1411.1.patch, large.am.history.txt


 Few other good to have things
 1) A wrapper script that takes care of the command chaining with a single 
 appId as input from the user.
 2) Legend in the README or in the svg itself about what is what.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API

2014-08-18 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101350#comment-14101350
 ] 

Gopal V commented on TEZ-1390:
--

Need to put a warning around

{code}
...
+byte[] bytes = new byte[bb.limit() - bb.position()];
+bb.get(bytes);
{code}

This only works consistently because every single call to 
DagTypeConverters.convertFromTezUserPayload(userPayload); produces a throw-away 
reference.

Also for consistency, we could use the {{size}} variable in the array 
allocation as well.

 Replace byte[] with ByteBuffer as the type of user payload in the API
 -

 Key: TEZ-1390
 URL: https://issues.apache.org/jira/browse/TEZ-1390
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Tsuyoshi OZAWA
Priority: Blocker
 Attachments: TEZ-1390.1.patch, TEZ-1390.2.patch, TEZ-1390.3.patch, 
 TEZ-1390.4.txt, pig.payload.txt


 This is just and API change. Internally we can continue to use byte[] since 
 thats a much bigger change.
 The translation from ByteBuffer to byte[] in the API layer should not have 
 perf impact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1466) Fix JDK8 builds of tez

2014-08-19 Thread Gopal V (JIRA)
Gopal V created TEZ-1466:


 Summary: Fix JDK8 builds of tez
 Key: TEZ-1466
 URL: https://issues.apache.org/jira/browse/TEZ-1466
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Trivial
 Attachments: TEZ-1466.1.patch

Tez fails to build on JDK8 due to stricter generics checks on a unit test

{code}
sortedDataMap = TreeMultimap.create(this.correctComparator, Ordering.natural());
{code}





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1360) Provide vertex parallelism to each vertex task

2014-08-19 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1360:
-

Attachment: TEZ-1360.4.patch

 Provide vertex parallelism to each vertex task
 --

 Key: TEZ-1360
 URL: https://issues.apache.org/jira/browse/TEZ-1360
 Project: Apache Tez
  Issue Type: Bug
Reporter: Johannes Zillmann
Assignee: Gopal V
 Fix For: 0.5.1

 Attachments: TEZ-1360.1.patch, TEZ-1360.2.patch, TEZ-1360.3.patch, 
 TEZ-1360.4.patch


 It would be good for a task to get a info about the total task count of its 
 vertex.
 With this there would be an equivalent for map-reduce' {{mapred.map.tasks}} 
 and {{mapred.reduce.tasks}} and mr-applications using this could be ported to 
 Tez more easily.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (TEZ-1469) AM/Session LRs are not shipped to vertices in new API use-case

2014-08-20 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104286#comment-14104286
 ] 

Gopal V edited comment on TEZ-1469 at 8/20/14 6:26 PM:
---

I think this is a breakage between Session and non-Session modes.

When I use sessions, this works as expected.


was (Author: gopalv):
I think this is a breakage between Session and on-Session modes.

When I use sessions, this works as expected.

 AM/Session LRs are not shipped to vertices in new API use-case
 --

 Key: TEZ-1469
 URL: https://issues.apache.org/jira/browse/TEZ-1469
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Gopal V
Priority: Blocker
 Attachments: TEZ-1469.1.patch, tez-broadcast-example.tgz


 Previously in the tez codebase, the session LRs were part of each vertex's 
 LRs, automatically.
 During 0.5.0 API changes, the following no longer provides local LRs to the 
 vertices, even if it is part of the session LR.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1469) AM/Session LRs are not shipped to vertices in new API use-case

2014-08-20 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104286#comment-14104286
 ] 

Gopal V commented on TEZ-1469:
--

I think this is a breakage between Session and on-Session modes.

When I use sessions, this works as expected.

 AM/Session LRs are not shipped to vertices in new API use-case
 --

 Key: TEZ-1469
 URL: https://issues.apache.org/jira/browse/TEZ-1469
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Gopal V
Priority: Blocker
 Attachments: TEZ-1469.1.patch, tez-broadcast-example.tgz


 Previously in the tez codebase, the session LRs were part of each vertex's 
 LRs, automatically.
 During 0.5.0 API changes, the following no longer provides local LRs to the 
 vertices, even if it is part of the session LR.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1469) AM/Session LRs are not shipped to vertices in new API use-case

2014-08-20 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104378#comment-14104378
 ] 

Gopal V commented on TEZ-1469:
--

Fixed that issue in the DAGAppMaster, but still only getting tez LRs and the pb 
binary as session resources.

Tracked that down to

{code}
  MapString, LocalResource sessionJars =
new HashMapString, LocalResource(tezJarResources.size() + 1);
  sessionJars.putAll(tezJarResources);
  sessionJars.put(TezConstants.TEZ_PB_BINARY_CONF_NAME,
binaryConfLRsrc);
  DAGProtos.PlanLocalResourcesProto proto =
DagTypeConverters.convertFromLocalResources(sessionJars);
  sessionJarsPBOutStream = TezCommonUtils.createFileForAM(fs, 
sessionJarsPath);
  proto.writeDelimitedTo(sessionJarsPBOutStream);
{code}

Tez does not use any user-provided jar in session resources, which is how 
hive-exec.jar is shipped IIRC.

 AM/Session LRs are not shipped to vertices in new API use-case
 --

 Key: TEZ-1469
 URL: https://issues.apache.org/jira/browse/TEZ-1469
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Gopal V
Priority: Blocker
 Attachments: TEZ-1469.1.patch, tez-broadcast-example.tgz


 Previously in the tez codebase, the session LRs were part of each vertex's 
 LRs, automatically.
 During 0.5.0 API changes, the following no longer provides local LRs to the 
 vertices, even if it is part of the session LR.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1469) AM/Session LRs are not shipped to vertices in new API use-case

2014-08-20 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104462#comment-14104462
 ] 

Gopal V commented on TEZ-1469:
--

I'm fine with the DAG as well, if that is guaranteed to not re-localize during 
a run.

The downside is that both AM and DAG needs exactly the same JARS for hive, so 
please provide a way to debug these issues in production as well.

 AM/Session LRs are not shipped to vertices in new API use-case
 --

 Key: TEZ-1469
 URL: https://issues.apache.org/jira/browse/TEZ-1469
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Gopal V
Priority: Blocker
 Attachments: TEZ-1469.1.patch, tez-broadcast-example.tgz


 Previously in the tez codebase, the session LRs were part of each vertex's 
 LRs, automatically.
 During 0.5.0 API changes, the following no longer provides local LRs to the 
 vertices, even if it is part of the session LR.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1469) AM/Session LRs are not shipped to vertices in new API use-case

2014-08-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1469:
-

Attachment: TEZ-1469.2.patch

Patch to DAGAppMaster to add AM Local Resources even if not in session mode.

 AM/Session LRs are not shipped to vertices in new API use-case
 --

 Key: TEZ-1469
 URL: https://issues.apache.org/jira/browse/TEZ-1469
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Gopal V
Priority: Blocker
 Attachments: TEZ-1469.1.patch, TEZ-1469.2.patch, 
 tez-broadcast-example.tgz


 Previously in the tez codebase, the session LRs were part of each vertex's 
 LRs, automatically.
 During 0.5.0 API changes, the following no longer provides local LRs to the 
 vertices, even if it is part of the session LR.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1479) Disambiguate between ShuffleInputEventHandler and ShuffleInputEventHandlerImpl (which are not related)

2014-08-21 Thread Gopal V (JIRA)
Gopal V created TEZ-1479:


 Summary: Disambiguate between ShuffleInputEventHandler and 
ShuffleInputEventHandlerImpl (which are not related)
 Key: TEZ-1479
 URL: https://issues.apache.org/jira/browse/TEZ-1479
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V


common.shuffle.impl.ShuffleInputEventHandler is not related to 
shuffle.common.impl.ShuffleInputEventHandlerImpl 

This is extremely confusing and needs refactoring internally to be readable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1332) tez swimlanes UI tool

2014-08-22 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106580#comment-14106580
 ] 

Gopal V commented on TEZ-1332:
--

This patch was all python - can't quite figure out how this broke anything.

 tez swimlanes UI tool
 -

 Key: TEZ-1332
 URL: https://issues.apache.org/jira/browse/TEZ-1332
 Project: Apache Tez
  Issue Type: New Feature
Reporter: Gopal V
Assignee: Gopal V
Priority: Blocker
 Fix For: 0.5.0

 Attachments: TEZ-1332.1.patch


 Import https://github.com/t3rmin4t0r/tez-swimlanes into trunk.
 Also move from using the AM INFO logs to using SimpleHistoryLogging/ATS data 
 to draw the diagrams.
 The goal is to be able to draw diagrams like, for a developer to debug 
 performance issues - http://people.apache.org/~gopalv/query27.svg



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1360) Provide vertex parallelism to each vertex task

2014-08-23 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1360:
-

Attachment: TEZ-1360.6.patch

Patch with doc changes from me and unit test from [~rajesh.balamohan].

 Provide vertex parallelism to each vertex task
 --

 Key: TEZ-1360
 URL: https://issues.apache.org/jira/browse/TEZ-1360
 Project: Apache Tez
  Issue Type: Bug
Reporter: Johannes Zillmann
Assignee: Gopal V
 Fix For: 0.5.1

 Attachments: TEZ-1360.1.patch, TEZ-1360.2.patch, TEZ-1360.4.patch, 
 TEZ-1360.5.patch, TEZ-1360.6.patch


 It would be good for a task to get a info about the total task count of its 
 vertex.
 With this there would be an equivalent for map-reduce' {{mapred.map.tasks}} 
 and {{mapred.reduce.tasks}} and mr-applications using this could be ported to 
 Tez more easily.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1489) Broadcast Shuffle should call freeResources() on FetchedInput

2014-08-24 Thread Gopal V (JIRA)
Gopal V created TEZ-1489:


 Summary: Broadcast Shuffle should call freeResources() on 
FetchedInput
 Key: TEZ-1489
 URL: https://issues.apache.org/jira/browse/TEZ-1489
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V


BroadcastShuffle does not seem to free up the buffer space allocated by the 
FetchedInputs during the task runtime.

SimpleFetchedInputAllocator::freeResources is never called as per my logging.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1491) Tez reducer-side merge's counter update is slow

2014-08-24 Thread Gopal V (JIRA)
Gopal V created TEZ-1491:


 Summary: Tez reducer-side merge's counter update is slow
 Key: TEZ-1491
 URL: https://issues.apache.org/jira/browse/TEZ-1491
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V


TezMerger$MergeQueue::next() shows up in profiles due a synchronized block in a 
tight loop.

Part of the slow operation was due to DataInputBuffer issues identified earlier 
in HADOOP-10694, but along with that approx 11% of my lock prefix calls were 
originating from the following line.

{code}
  mergeProgress.set(totalBytesProcessed * progPerByte);
{code}

in two places within the core loop.

!perf-top-counters.png!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1491) Tez reducer-side merge's counter update is slow

2014-08-24 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1491:
-

Attachment: perf-top-counters.png

 Tez reducer-side merge's counter update is slow
 ---

 Key: TEZ-1491
 URL: https://issues.apache.org/jira/browse/TEZ-1491
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: perf-top-counters.png


 TezMerger$MergeQueue::next() shows up in profiles due a synchronized block in 
 a tight loop.
 Part of the slow operation was due to DataInputBuffer issues identified 
 earlier in HADOOP-10694, but along with that approx 11% of my lock prefix 
 calls were originating from the following line.
 {code}
   mergeProgress.set(totalBytesProcessed * progPerByte);
 {code}
 in two places within the core loop.
 !perf-top-counters.png!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1488) Implement HashComparatorBytesWritable in TezBytesComparator

2014-08-24 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1488:
-

Attachment: TEZ-1488.1.patch

 Implement HashComparatorBytesWritable in TezBytesComparator
 -

 Key: TEZ-1488
 URL: https://issues.apache.org/jira/browse/TEZ-1488
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: TEZ-1488.1.patch


 Speed up TezBytesComparator by ~20% when used in PipelinedSorter.
 This moves part of the key comparator into the partition comparator, which is 
 a single register operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1083) Enable IFile RLE for DefaultSorter

2014-08-25 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109453#comment-14109453
 ] 

Gopal V commented on TEZ-1083:
--

This looks alright - this just needs a roll-over check for the sameKey long 
variable.

The worst-case value for that is near O(n^2), so it might overflow before 
totalKeys does.

For performance, it can be assumed that if sameKeys is  0, isRLENeeded == true 
- instead of checking within the loop.

 Enable IFile RLE for DefaultSorter
 --

 Key: TEZ-1083
 URL: https://issues.apache.org/jira/browse/TEZ-1083
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Gopal V
 Attachments: TEZ-1083.1.patch


 Generate RLE IFiles for DefaultSorter and use it to fast-forward map-side 
 merge.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1492) IFile RLE not kicking in due to bug in BufferUtils.compare()

2014-08-25 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109461#comment-14109461
 ] 

Gopal V commented on TEZ-1492:
--

The BufferUtils class needs re-namespacing as well, as part of this patch.

 IFile RLE not kicking in due to bug in BufferUtils.compare()
 

 Key: TEZ-1492
 URL: https://issues.apache.org/jira/browse/TEZ-1492
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: TEZ-1492.1.patch, TEZ-1492.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1489) Broadcast Shuffle should call freeResources() on FetchedInput

2014-08-25 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109472#comment-14109472
 ] 

Gopal V commented on TEZ-1489:
--

The buffer is being cleared up correctly - but the unreserve() is not getting 
called, so the internal check switches to Disk even though buffers are unused.

 Broadcast Shuffle should call freeResources() on FetchedInput
 -

 Key: TEZ-1489
 URL: https://issues.apache.org/jira/browse/TEZ-1489
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V

 BroadcastShuffle does not seem to free up the buffer space allocated by the 
 FetchedInputs during the task runtime.
 SimpleFetchedInputAllocator::freeResources is never called as per my logging.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1489) Broadcast Shuffle should call freeResources() on FetchedInput

2014-08-25 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109476#comment-14109476
 ] 

Gopal V commented on TEZ-1489:
--

UnorderedKVReader::moveToNextInput() maybe is a good place for this?

 Broadcast Shuffle should call freeResources() on FetchedInput
 -

 Key: TEZ-1489
 URL: https://issues.apache.org/jira/browse/TEZ-1489
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V

 BroadcastShuffle does not seem to free up the buffer space allocated by the 
 FetchedInputs during the task runtime.
 SimpleFetchedInputAllocator::freeResources is never called as per my logging.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1497) Add tez-broadcast-example into tez-examples/

2014-08-25 Thread Gopal V (JIRA)
Gopal V created TEZ-1497:


 Summary: Add tez-broadcast-example into tez-examples/
 Key: TEZ-1497
 URL: https://issues.apache.org/jira/browse/TEZ-1497
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V
Assignee: Gopal V


Modify https://github.com/t3rmin4t0r/tez-broadcast-example into a usable 
example inside tez-examples.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1492) IFile RLE not kicking in due to bug in BufferUtils.compare()

2014-08-25 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110264#comment-14110264
 ] 

Gopal V commented on TEZ-1492:
--

Thanks [~rajesh.balamohan], can I have the same diff with git mv instead of 
the big change-sets?

 IFile RLE not kicking in due to bug in BufferUtils.compare()
 

 Key: TEZ-1492
 URL: https://issues.apache.org/jira/browse/TEZ-1492
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: TEZ-1492.1.patch, TEZ-1492.2.patch, TEZ-1492.3.patch, 
 TEZ-1492.4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TEZ-1497) Add tez-broadcast-example into tez-examples/

2014-08-25 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V resolved TEZ-1497.
--

Resolution: Not a Problem

 Add tez-broadcast-example into tez-examples/
 

 Key: TEZ-1497
 URL: https://issues.apache.org/jira/browse/TEZ-1497
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V
Assignee: Gopal V

 Modify https://github.com/t3rmin4t0r/tez-broadcast-example into a usable 
 example inside tez-examples.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1503) UnorderedKVInput.getReader() should return KeyValuesReader

2014-08-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111095#comment-14111095
 ] 

Gopal V commented on TEZ-1503:
--

The Unordered input will not satisfy the contract of key values reader at all - 
it will not return K, ListV

 UnorderedKVInput.getReader() should return KeyValuesReader
 --

 Key: TEZ-1503
 URL: https://issues.apache.org/jira/browse/TEZ-1503
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan

 Currently OrderedGroupedKVInput.getReader() returns KeyValuesReader and 
 UnorderedKVInput.getReader() returns KeyValueReader.
 It would be useful to return KeyValuesReader for UnorderedKVInput to be 
 consistent with OrderedGroupedKVInput.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1492) IFile RLE not kicking in due to bug in BufferUtils.compare()

2014-08-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111819#comment-14111819
 ] 

Gopal V commented on TEZ-1492:
--

Alright, this looks good - +1

 IFile RLE not kicking in due to bug in BufferUtils.compare()
 

 Key: TEZ-1492
 URL: https://issues.apache.org/jira/browse/TEZ-1492
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: TEZ-1492.1.patch, TEZ-1492.2.patch, TEZ-1492.3.patch, 
 TEZ-1492.4.patch, TEZ-1492.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1503) UnorderedKVInput.getReader() should return KeyValuesReader

2014-08-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111859#comment-14111859
 ] 

Gopal V commented on TEZ-1503:
--

Yes, I think this change would make it easier to write wrong code.

If someone changes the edge type for a vertex, I'd rather get a class cast 
exception instead of my vertices working, but generating odd results due to the 
assumptions around K, ListV for things like aggregations.

 UnorderedKVInput.getReader() should return KeyValuesReader
 --

 Key: TEZ-1503
 URL: https://issues.apache.org/jira/browse/TEZ-1503
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan

 Currently OrderedGroupedKVInput.getReader() returns KeyValuesReader and 
 UnorderedKVInput.getReader() returns KeyValueReader.
 It would be useful to return KeyValuesReader for UnorderedKVInput to be 
 consistent with OrderedGroupedKVInput.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1509) Set a useful default value for java opts

2014-08-27 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112988#comment-14112988
 ] 

Gopal V commented on TEZ-1509:
--

-1 on the  -XX:+UseCompressedStrings.

The rest looks good.

 Set a useful default value for java opts  
 --

 Key: TEZ-1509
 URL: https://issues.apache.org/jira/browse/TEZ-1509
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah

 A subset of the following should be considered for the defaults:
 -server -XX:+UseCompressedStrings -Djava.net.preferIPv4Stack=true 
 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA 
 -XX:+UseParallelGC



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1501) Add a test dag to generate load on the getTask RPC

2014-08-28 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114052#comment-14114052
 ] 

Gopal V commented on TEZ-1501:
--

Looks good - +1

Just needs an fs.deleteOnExit() for the PAYLOAD file for cleanups.

 Add a test dag to generate load on the getTask RPC
 --

 Key: TEZ-1501
 URL: https://issues.apache.org/jira/browse/TEZ-1501
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: TEZ-1501.1.txt, TEZ-1501.2.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1488) Implement HashComparatorBytesWritable in TezBytesComparator

2014-08-28 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114448#comment-14114448
 ] 

Gopal V commented on TEZ-1488:
--

I think we should call the interface what it really - a ProxyComparator?

I will do the renames  write docs.

 Implement HashComparatorBytesWritable in TezBytesComparator
 -

 Key: TEZ-1488
 URL: https://issues.apache.org/jira/browse/TEZ-1488
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: TEZ-1488.1.patch


 Speed up TezBytesComparator by ~20% when used in PipelinedSorter.
 This moves part of the key comparator into the partition comparator, which is 
 a single register operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1488) Implement HashComparatorBytesWritable in TezBytesComparator

2014-08-29 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1488:
-

Attachment: TEZ-1488.3.patch

Patch includes renames - will only apply with {{git apply -v -p0 
TEZ-1488.3.patch}}

 Implement HashComparatorBytesWritable in TezBytesComparator
 -

 Key: TEZ-1488
 URL: https://issues.apache.org/jira/browse/TEZ-1488
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: TEZ-1488.1.patch, TEZ-1488.2.patch, TEZ-1488.3.patch


 Speed up TezBytesComparator by ~20% when used in PipelinedSorter.
 This moves part of the key comparator into the partition comparator, which is 
 a single register operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1488) Rename HashComparator to ProxyComparator and implement in TezBytesComparator

2014-08-29 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1488:
-

Summary: Rename HashComparator to ProxyComparator and implement in 
TezBytesComparator  (was: Implement HashComparatorBytesWritable in 
TezBytesComparator)

 Rename HashComparator to ProxyComparator and implement in TezBytesComparator
 

 Key: TEZ-1488
 URL: https://issues.apache.org/jira/browse/TEZ-1488
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V
 Fix For: 0.6.0

 Attachments: TEZ-1488.1.patch, TEZ-1488.2.patch, TEZ-1488.3.patch


 Speed up TezBytesComparator by ~20% when used in PipelinedSorter.
 This moves part of the key comparator into the partition comparator, which is 
 a single register operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1488) Rename HashComparator to ProxyComparator and implement in TezBytesComparator

2014-08-29 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1488:
-

Fix Version/s: 0.6.0

 Rename HashComparator to ProxyComparator and implement in TezBytesComparator
 

 Key: TEZ-1488
 URL: https://issues.apache.org/jira/browse/TEZ-1488
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V
 Fix For: 0.6.0

 Attachments: TEZ-1488.1.patch, TEZ-1488.2.patch, TEZ-1488.3.patch


 Speed up TezBytesComparator by ~20% when used in PipelinedSorter.
 This moves part of the key comparator into the partition comparator, which is 
 a single register operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1488) Rename HashComparator to ProxyComparator and implement in TezBytesComparator

2014-08-29 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1488:
-

Attachment: TEZ-1488.4.patch

 Rename HashComparator to ProxyComparator and implement in TezBytesComparator
 

 Key: TEZ-1488
 URL: https://issues.apache.org/jira/browse/TEZ-1488
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V
 Fix For: 0.6.0

 Attachments: TEZ-1488.1.patch, TEZ-1488.2.patch, TEZ-1488.3.patch, 
 TEZ-1488.4.patch


 Speed up TezBytesComparator by ~20% when used in PipelinedSorter.
 This moves part of the key comparator into the partition comparator, which is 
 a single register operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1524) getDAGStatus seems to fork out the entire JVM

2014-08-29 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115806#comment-14115806
 ] 

Gopal V commented on TEZ-1524:
--

The cache does not cache misses.

 getDAGStatus seems to fork out the entire JVM
 -

 Key: TEZ-1524
 URL: https://issues.apache.org/jira/browse/TEZ-1524
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Gopal V

 Tracked down a consistent fork() call to
 {code}
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
   at org.apache.hadoop.util.Shell.run(Shell.java:418)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
   at 
 org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83)
   at 
 org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52)
   at 
 org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:50)
   at org.apache.hadoop.security.Groups.getGroups(Groups.java:139)
   at 
 org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1409)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getRPCUserGroups(DAGClientAMProtocolBlockingPBServerImpl.java:75)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:102)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
 {code}
 [~hitesh] - would it make sense to cache this at all?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1524) getDAGStatus seems to fork out the entire JVM

2014-08-29 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1524:
-

Attachment: TEZ-1524.1.patch

 getDAGStatus seems to fork out the entire JVM
 -

 Key: TEZ-1524
 URL: https://issues.apache.org/jira/browse/TEZ-1524
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Gopal V
 Attachments: TEZ-1524.1.patch


 Tracked down a consistent fork() call to
 {code}
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
   at org.apache.hadoop.util.Shell.run(Shell.java:418)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
   at 
 org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83)
   at 
 org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52)
   at 
 org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:50)
   at org.apache.hadoop.security.Groups.getGroups(Groups.java:139)
   at 
 org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1409)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getRPCUserGroups(DAGClientAMProtocolBlockingPBServerImpl.java:75)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:102)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
 {code}
 [~hitesh] - would it make sense to cache this at all?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1525) BroadcastLoadGen testcase

2014-08-29 Thread Gopal V (JIRA)
Gopal V created TEZ-1525:


 Summary: BroadcastLoadGen testcase
 Key: TEZ-1525
 URL: https://issues.apache.org/jira/browse/TEZ-1525
 Project: Apache Tez
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V


Broadcast load generator test example



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1525) BroadcastLoadGen testcase

2014-08-29 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1525:
-

Attachment: TEZ-1525.1.patch

 BroadcastLoadGen testcase
 -

 Key: TEZ-1525
 URL: https://issues.apache.org/jira/browse/TEZ-1525
 Project: Apache Tez
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: TEZ-1525.1.patch


 Broadcast load generator test example



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1524) getDAGStatus seems to fork out the entire JVM

2014-09-11 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1524:
-
Attachment: TEZ-1524.2.patch

 getDAGStatus seems to fork out the entire JVM
 -

 Key: TEZ-1524
 URL: https://issues.apache.org/jira/browse/TEZ-1524
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: TEZ-1524.1.patch, TEZ-1524.2.patch


 Tracked down a consistent fork() call to
 {code}
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
   at org.apache.hadoop.util.Shell.run(Shell.java:418)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
   at 
 org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83)
   at 
 org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52)
   at 
 org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:50)
   at org.apache.hadoop.security.Groups.getGroups(Groups.java:139)
   at 
 org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1409)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getRPCUserGroups(DAGClientAMProtocolBlockingPBServerImpl.java:75)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:102)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
 {code}
 [~hitesh] - would it make sense to cache this at all?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1524) getDAGStatus seems to fork out the entire JVM on non-secure clusters

2014-09-12 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1524:
-
Summary: getDAGStatus seems to fork out the entire JVM on non-secure 
clusters  (was: getDAGStatus seems to fork out the entire JVM)

 getDAGStatus seems to fork out the entire JVM on non-secure clusters
 

 Key: TEZ-1524
 URL: https://issues.apache.org/jira/browse/TEZ-1524
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: TEZ-1524.1.patch, TEZ-1524.2.patch


 Tracked down a consistent fork() call to
 {code}
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
   at org.apache.hadoop.util.Shell.run(Shell.java:418)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
   at 
 org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83)
   at 
 org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52)
   at 
 org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:50)
   at org.apache.hadoop.security.Groups.getGroups(Groups.java:139)
   at 
 org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1409)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getRPCUserGroups(DAGClientAMProtocolBlockingPBServerImpl.java:75)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:102)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
 {code}
 [~hitesh] - would it make sense to cache this at all?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1524) getDAGStatus seems to fork out the entire JVM on non-secure clusters

2014-09-12 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1524:
-
Attachment: TEZ-1524.3.patch

Removed the stray println.

 getDAGStatus seems to fork out the entire JVM on non-secure clusters
 

 Key: TEZ-1524
 URL: https://issues.apache.org/jira/browse/TEZ-1524
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: TEZ-1524.1.patch, TEZ-1524.2.patch, TEZ-1524.3.patch


 Tracked down a consistent fork() call to
 {code}
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
   at org.apache.hadoop.util.Shell.run(Shell.java:418)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
   at 
 org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83)
   at 
 org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52)
   at 
 org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:50)
   at org.apache.hadoop.security.Groups.getGroups(Groups.java:139)
   at 
 org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1409)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getRPCUserGroups(DAGClientAMProtocolBlockingPBServerImpl.java:75)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:102)
   at 
 org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
 {code}
 [~hitesh] - would it make sense to cache this at all?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1157) Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

2014-09-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1157:
-
Attachment: TEZ-1157.7.patch

 Optimize broadcast :- Tasks pertaining to same job in same machine should not 
 download multiple copies of broadcast data
 

 Key: TEZ-1157
 URL: https://issues.apache.org/jira/browse/TEZ-1157
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Gopal V
  Labels: performance
 Attachments: TEZ-1152.WIP.patch, TEZ-1157.3.WIP.patch, 
 TEZ-1157.4.WIP.patch, TEZ-1157.5.WIP.patch, TEZ-1157.6.patch, 
 TEZ-1157.7.patch, TEZ-broadcast-shuffle+vertex-parallelism.patch


 Currently tasks (belonging to same job) running in the same machine download 
 its own copy of broadcast data.  Optimization could be to  download one copy 
 in the machine, and the rest of the tasks can refer to this downloaded copy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1157) Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

2014-09-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1157:
-
Attachment: TEZ-1157.8.patch

 Optimize broadcast :- Tasks pertaining to same job in same machine should not 
 download multiple copies of broadcast data
 

 Key: TEZ-1157
 URL: https://issues.apache.org/jira/browse/TEZ-1157
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Gopal V
  Labels: performance
 Attachments: TEZ-1152.WIP.patch, TEZ-1157.3.WIP.patch, 
 TEZ-1157.4.WIP.patch, TEZ-1157.5.WIP.patch, TEZ-1157.6.patch, 
 TEZ-1157.7.patch, TEZ-1157.8.patch, 
 TEZ-broadcast-shuffle+vertex-parallelism.patch


 Currently tasks (belonging to same job) running in the same machine download 
 its own copy of broadcast data.  Optimization could be to  download one copy 
 in the machine, and the rest of the tasks can refer to this downloaded copy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1157) Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

2014-09-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1157:
-
Attachment: TEZ-1157.9.patch

 Optimize broadcast :- Tasks pertaining to same job in same machine should not 
 download multiple copies of broadcast data
 

 Key: TEZ-1157
 URL: https://issues.apache.org/jira/browse/TEZ-1157
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Gopal V
  Labels: performance
 Attachments: TEZ-1152.WIP.patch, TEZ-1157.3.WIP.patch, 
 TEZ-1157.4.WIP.patch, TEZ-1157.5.WIP.patch, TEZ-1157.6.patch, 
 TEZ-1157.7.patch, TEZ-1157.8.patch, TEZ-1157.9.patch, 
 TEZ-broadcast-shuffle+vertex-parallelism.patch


 Currently tasks (belonging to same job) running in the same machine download 
 its own copy of broadcast data.  Optimization could be to  download one copy 
 in the machine, and the rest of the tasks can refer to this downloaded copy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1535) Minor bug in computing min-time the reducer should run without getting killed

2014-09-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1535:
-
Labels: performance timeunits  (was: performance)

 Minor bug in computing min-time the reducer should run without getting killed
 -

 Key: TEZ-1535
 URL: https://issues.apache.org/jira/browse/TEZ-1535
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Rajesh Balamohan
  Labels: performance, timeunits
 Fix For: 0.6.0


 ShuffleScheduler's shuffleProgressDuration is computed in milliseconds.  
 ShufflePayload's runDuration is computed in microseconds (i.e in 
 OrderedPartitionedKVOutput.generateEventsOnClose())
 This would end up in wrong value in computing min-time the reducer should run 
 without getting killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1535) Minor bug in computing min-time the reducer should run without getting killed

2014-09-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1535:
-
Fix Version/s: 0.6.0

 Minor bug in computing min-time the reducer should run without getting killed
 -

 Key: TEZ-1535
 URL: https://issues.apache.org/jira/browse/TEZ-1535
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Rajesh Balamohan
  Labels: performance, timeunits
 Fix For: 0.6.0


 ShuffleScheduler's shuffleProgressDuration is computed in milliseconds.  
 ShufflePayload's runDuration is computed in microseconds (i.e in 
 OrderedPartitionedKVOutput.generateEventsOnClose())
 This would end up in wrong value in computing min-time the reducer should run 
 without getting killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1157) Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

2014-09-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1157:
-
Attachment: TEZ-1157.10.patch

Fix TeztTezRuntimeConfiguration failures for missing keys.

This is the patch for commit.

 Optimize broadcast :- Tasks pertaining to same job in same machine should not 
 download multiple copies of broadcast data
 

 Key: TEZ-1157
 URL: https://issues.apache.org/jira/browse/TEZ-1157
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Gopal V
  Labels: performance
 Attachments: TEZ-1152.WIP.patch, TEZ-1157.10.patch, 
 TEZ-1157.3.WIP.patch, TEZ-1157.4.WIP.patch, TEZ-1157.5.WIP.patch, 
 TEZ-1157.6.patch, TEZ-1157.7.patch, TEZ-1157.8.patch, TEZ-1157.9.patch, 
 TEZ-broadcast-shuffle+vertex-parallelism.patch, connections.png, latency.png


 Currently tasks (belonging to same job) running in the same machine download 
 its own copy of broadcast data.  Optimization could be to  download one copy 
 in the machine, and the rest of the tasks can refer to this downloaded copy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1157) Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

2014-09-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1157:
-
Description: 
Currently tasks (belonging to same job) running in the same machine download 
its own copy of broadcast data.  Optimization could be to  download one copy in 
the machine, and the rest of the tasks can refer to this downloaded copy.

(results after this feature)

!connections.png! 

!latency.png!

  was:Currently tasks (belonging to same job) running in the same machine 
download its own copy of broadcast data.  Optimization could be to  download 
one copy in the machine, and the rest of the tasks can refer to this downloaded 
copy.


 Optimize broadcast :- Tasks pertaining to same job in same machine should not 
 download multiple copies of broadcast data
 

 Key: TEZ-1157
 URL: https://issues.apache.org/jira/browse/TEZ-1157
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Gopal V
  Labels: performance
 Fix For: 0.6.0

 Attachments: TEZ-1152.WIP.patch, TEZ-1157.10.patch, 
 TEZ-1157.3.WIP.patch, TEZ-1157.4.WIP.patch, TEZ-1157.5.WIP.patch, 
 TEZ-1157.6.patch, TEZ-1157.7.patch, TEZ-1157.8.patch, TEZ-1157.9.patch, 
 TEZ-broadcast-shuffle+vertex-parallelism.patch, connections.png, latency.png


 Currently tasks (belonging to same job) running in the same machine download 
 its own copy of broadcast data.  Optimization could be to  download one copy 
 in the machine, and the rest of the tasks can refer to this downloaded copy.
 (results after this feature)
 !connections.png! 
 !latency.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1593) PipelinedSorter::compare() makes a key-copy to satisfy RawComparator interface

2014-09-18 Thread Gopal V (JIRA)
Gopal V created TEZ-1593:


 Summary: PipelinedSorter::compare() makes a key-copy to satisfy 
RawComparator interface
 Key: TEZ-1593
 URL: https://issues.apache.org/jira/browse/TEZ-1593
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V


The current implementation of PipelinedSorter has a slow section which revolves 
around key comparisons.

{code}
  kvbuffer.position(istart);
  kvbuffer.get(ki, 0, ilen);
  kvbuffer.position(jstart);
  kvbuffer.get(kj, 0, jlen);
  // sort by key
  final int cmp = comparator.compare(ki, 0, ilen, kj, 0, jlen);
{code}

The kvbuffer.get into the arrays ki and kj are the slowest part of the 
comparator operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1596) Secure Shuffle utils is extremely expensive for fast queries

2014-09-18 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1596:
-
Attachment: shuffle-secure.png

 Secure Shuffle utils is extremely expensive for fast queries
 

 Key: TEZ-1596
 URL: https://issues.apache.org/jira/browse/TEZ-1596
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
 Attachments: shuffle-secure.png


 Generating the hash for YARN's secure shuffle is more expensive than the 
 actual HTTP call once keep-alive is turned on.
 !shuffle-secure.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1141) DAGStatus.Progress should include number of failed attempts

2014-09-22 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1141:
-
Affects Version/s: 0.5.0

 DAGStatus.Progress should include number of failed attempts
 ---

 Key: TEZ-1141
 URL: https://issues.apache.org/jira/browse/TEZ-1141
 Project: Apache Tez
  Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Bikas Saha
Assignee: Gopal V

 Currently its impossible to know whether a job is seeing a lot of issues and 
 failures because we only report running tasks. Eventually the job fails but 
 before that we have no indication that a bunch of task failures have been 
 happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-1609) Add hostname to logIdentifiers of fetchers for easy debugging

2014-09-22 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned TEZ-1609:


Assignee: Gopal V

 Add hostname to logIdentifiers of fetchers for easy debugging
 -

 Key: TEZ-1609
 URL: https://issues.apache.org/jira/browse/TEZ-1609
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Rajesh Balamohan
Assignee: Gopal V





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-1141) DAGStatus.Progress should include number of failed attempts

2014-09-22 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned TEZ-1141:


Assignee: Gopal V

Hit this issue in a bad way today, need a way to debug this.

 DAGStatus.Progress should include number of failed attempts
 ---

 Key: TEZ-1141
 URL: https://issues.apache.org/jira/browse/TEZ-1141
 Project: Apache Tez
  Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Bikas Saha
Assignee: Gopal V

 Currently its impossible to know whether a job is seeing a lot of issues and 
 failures because we only report running tasks. Eventually the job fails but 
 before that we have no indication that a bunch of task failures have been 
 happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1622) Implement a tez jar equivalent script to avoid the complexities of hadoop jar

2014-09-24 Thread Gopal V (JIRA)
Gopal V created TEZ-1622:


 Summary: Implement a tez jar equivalent script to avoid the 
complexities of hadoop jar
 Key: TEZ-1622
 URL: https://issues.apache.org/jira/browse/TEZ-1622
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V


Currently, the only way to run a tez job by hand is to setup multiple 
parameters like HADOOP_CLASSPATH and then do hadoop jar {{main-class}}.

This is inconvenient and complex - find an easier way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1609) Add hostname to logIdentifiers of fetchers for easy debugging

2014-09-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14149779#comment-14149779
 ] 

Gopal V commented on TEZ-1609:
--

Yes, they do print the URL and speed.

 Add hostname to logIdentifiers of fetchers for easy debugging
 -

 Key: TEZ-1609
 URL: https://issues.apache.org/jira/browse/TEZ-1609
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-1609.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1634) BlockCompressorStream.finish() is called twice in IFile.close leading to Shuffle errors

2014-09-30 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153290#comment-14153290
 ] 

Gopal V commented on TEZ-1634:
--

Change looks good, but is hard to read.

Can you move the compressor close to the same location as the checksumOut 
finish?

 BlockCompressorStream.finish() is called twice in IFile.close leading to 
 Shuffle errors
 ---

 Key: TEZ-1634
 URL: https://issues.apache.org/jira/browse/TEZ-1634
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: BlockCompressorStream.with.logging.java, 
 TEZ-1634.1.patch, stacktrace-with-comments.txt


 When IFile.Writer is closed, it explicitly calls compressedOut.finish(); And 
 as a part of FSDataOutputStream.close(), it again internally calls finish().  
 Please refer o.a.h.i.compress.BlockCompressorStream for more details on 
 finish(). This leads to additional 4 bytes being written to IFile.  This 
 causes issues randomly during shuffle.  Also, this prevents IFileInputStream 
 to do the proper checksumming.  
 This error happens only when we try to fetch multiple attempt outputs using 
 the same URL.  And is easily reproducible with SnappCompressionCodec.  First 
 attempt output would be downloaded by fetcher and due to the last 4 bytes in 
 the stream, it wouldn't do the proper checksumming in IFileInputStream.  This 
 causes the subsequent attempt download to fail with the following exception.
 Example error in shuffle phase is attached below.
 
 2014-09-15 09:54:22,950 WARN [fetcher [scope_41] #31] 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher: Invalid map id 
 java.lang.IllegalArgumentException: Invalid header received:  partition: 0
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyMapOutput(Fetcher.java:352)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:294)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
 
 I will attach the debug version of BlockCompressionStream with threaddump 
 (which validates that finish() is called twice in IFile.close()).  This bug 
 was present in earlier versions of Tez as well, and was able to consistently 
 reproduce it now on local-vm itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1634) BlockCompressorStream.finish() is called twice in IFile.close leading to Shuffle errors

2014-09-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1634:
-
Attachment: TEZ-1634.2.patch

Small cosmetic change, for easier debugging.

Please review - [~rajesh.balamohan].

 BlockCompressorStream.finish() is called twice in IFile.close leading to 
 Shuffle errors
 ---

 Key: TEZ-1634
 URL: https://issues.apache.org/jira/browse/TEZ-1634
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: BlockCompressorStream.with.logging.java, 
 TEZ-1634.1.patch, TEZ-1634.2.patch, stacktrace-with-comments.txt


 When IFile.Writer is closed, it explicitly calls compressedOut.finish(); And 
 as a part of FSDataOutputStream.close(), it again internally calls finish().  
 Please refer o.a.h.i.compress.BlockCompressorStream for more details on 
 finish(). This leads to additional 4 bytes being written to IFile.  This 
 causes issues randomly during shuffle.  Also, this prevents IFileInputStream 
 to do the proper checksumming.  
 This error happens only when we try to fetch multiple attempt outputs using 
 the same URL.  And is easily reproducible with SnappCompressionCodec.  First 
 attempt output would be downloaded by fetcher and due to the last 4 bytes in 
 the stream, it wouldn't do the proper checksumming in IFileInputStream.  This 
 causes the subsequent attempt download to fail with the following exception.
 Example error in shuffle phase is attached below.
 
 2014-09-15 09:54:22,950 WARN [fetcher [scope_41] #31] 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher: Invalid map id 
 java.lang.IllegalArgumentException: Invalid header received:  partition: 0
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyMapOutput(Fetcher.java:352)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:294)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
 
 I will attach the debug version of BlockCompressionStream with threaddump 
 (which validates that finish() is called twice in IFile.close()).  This bug 
 was present in earlier versions of Tez as well, and was able to consistently 
 reproduce it now on local-vm itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1634) BlockCompressorStream.finish() is called twice in IFile.close leading to Shuffle errors

2014-09-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1634:
-
Fix Version/s: 0.6.0

 BlockCompressorStream.finish() is called twice in IFile.close leading to 
 Shuffle errors
 ---

 Key: TEZ-1634
 URL: https://issues.apache.org/jira/browse/TEZ-1634
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0, 0.6.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Fix For: 0.6.0

 Attachments: BlockCompressorStream.with.logging.java, 
 TEZ-1634.1.patch, TEZ-1634.2.patch, stacktrace-with-comments.txt


 When IFile.Writer is closed, it explicitly calls compressedOut.finish(); And 
 as a part of FSDataOutputStream.close(), it again internally calls finish().  
 Please refer o.a.h.i.compress.BlockCompressorStream for more details on 
 finish(). This leads to additional 4 bytes being written to IFile.  This 
 causes issues randomly during shuffle.  Also, this prevents IFileInputStream 
 to do the proper checksumming.  
 This error happens only when we try to fetch multiple attempt outputs using 
 the same URL.  And is easily reproducible with SnappCompressionCodec.  First 
 attempt output would be downloaded by fetcher and due to the last 4 bytes in 
 the stream, it wouldn't do the proper checksumming in IFileInputStream.  This 
 causes the subsequent attempt download to fail with the following exception.
 Example error in shuffle phase is attached below.
 
 2014-09-15 09:54:22,950 WARN [fetcher [scope_41] #31] 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher: Invalid map id 
 java.lang.IllegalArgumentException: Invalid header received:  partition: 0
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyMapOutput(Fetcher.java:352)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:294)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
 
 I will attach the debug version of BlockCompressionStream with threaddump 
 (which validates that finish() is called twice in IFile.close()).  This bug 
 was present in earlier versions of Tez as well, and was able to consistently 
 reproduce it now on local-vm itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1277) Tez Spill handler should truncate files to reserve space on disk

2014-10-06 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1277:
-
Description: 
Occasionally tasks fail due to full disks because the disks had space when the 
task was allocating via LocalDirAllocator, but the disk space was actually 
promised to many tasks instead of just one.

This race condition shows up when a 1Gb spill can be done in ~10s or so.

There is no way to do this via the hadoop-fs abstraction - but an SSD based 
spill wastes most of the IOPS on journal updates about the file length changing.

  was:
Occasionally tasks fail due to full disks because the disks had space when the 
task was allocating via LocalDirAllocator, but the disk space was actually 
promised to many tasks instead of just one.

This race condition shows up when a 1Gb spill can be done in ~10s or so.


 Tez Spill handler should truncate files to reserve space on disk
 

 Key: TEZ-1277
 URL: https://issues.apache.org/jira/browse/TEZ-1277
 Project: Apache Tez
  Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Gopal V
Assignee: Gopal V

 Occasionally tasks fail due to full disks because the disks had space when 
 the task was allocating via LocalDirAllocator, but the disk space was 
 actually promised to many tasks instead of just one.
 This race condition shows up when a 1Gb spill can be done in ~10s or so.
 There is no way to do this via the hadoop-fs abstraction - but an SSD based 
 spill wastes most of the IOPS on journal updates about the file length 
 changing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1083) Enable IFile RLE for DefaultSorter

2014-10-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171963#comment-14171963
 ] 

Gopal V commented on TEZ-1083:
--

+1 - this enables RLE on the map-side spill.

Reduce-side TezMerger needs the equivalent impl, as a different JIRA.

 Enable IFile RLE for DefaultSorter
 --

 Key: TEZ-1083
 URL: https://issues.apache.org/jira/browse/TEZ-1083
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Gopal V
 Attachments: TEZ-1083.1.patch, TEZ-1083.2.patch


 Generate RLE IFiles for DefaultSorter and use it to fast-forward map-side 
 merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1277) Tez Spill handler should truncate files to reserve space on disk

2014-10-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171964#comment-14171964
 ] 

Gopal V commented on TEZ-1277:
--

Need to add a NativeIO impl to do {{fallocate}}.

 Tez Spill handler should truncate files to reserve space on disk
 

 Key: TEZ-1277
 URL: https://issues.apache.org/jira/browse/TEZ-1277
 Project: Apache Tez
  Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Gopal V
Assignee: Gopal V

 Occasionally tasks fail due to full disks because the disks had space when 
 the task was allocating via LocalDirAllocator, but the disk space was 
 actually promised to many tasks instead of just one.
 This race condition shows up when a 1Gb spill can be done in ~10s or so.
 There is no way to do this via the hadoop-fs abstraction - but an SSD based 
 spill wastes most of the IOPS on journal updates about the file length 
 changing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1525) BroadcastLoadGen testcase

2014-10-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1525:
-
Attachment: TEZ-1525.2.patch

Rebase after TEZ-1479

 BroadcastLoadGen testcase
 -

 Key: TEZ-1525
 URL: https://issues.apache.org/jira/browse/TEZ-1525
 Project: Apache Tez
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: TEZ-1525.1.patch, TEZ-1525.2.patch


 Broadcast load generator test example



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1141) DAGStatus.Progress should include number of failed attempts

2014-10-17 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175874#comment-14175874
 ] 

Gopal V commented on TEZ-1141:
--

LGTM, I found that this doesn't track NM blacklisting, but that is a completely 
different problem.

I've updated patch on HIVE-7838 to use this and it is useful, to narrow down 
query failures (particularly reducer OOMs happening).

 DAGStatus.Progress should include number of failed attempts
 ---

 Key: TEZ-1141
 URL: https://issues.apache.org/jira/browse/TEZ-1141
 Project: Apache Tez
  Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Bikas Saha
Assignee: Hitesh Shah
 Attachments: TEZ-1141.1.patch


 Currently its impossible to know whether a job is seeing a lot of issues and 
 failures because we only report running tasks. Eventually the job fails but 
 before that we have no indication that a bunch of task failures have been 
 happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1525) BroadcastLoadGen testcase

2014-10-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1525:
-
Attachment: TEZ-1525.3.patch

 BroadcastLoadGen testcase
 -

 Key: TEZ-1525
 URL: https://issues.apache.org/jira/browse/TEZ-1525
 Project: Apache Tez
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: TEZ-1525.1.patch, TEZ-1525.2.patch, TEZ-1525.3.patch


 Broadcast load generator test example



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1690) TestMultiMRInput tests fail because of user collisions

2014-10-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1690:
-
Affects Version/s: 0.5.2

 TestMultiMRInput tests fail because of user collisions
 --

 Key: TEZ-1690
 URL: https://issues.apache.org/jira/browse/TEZ-1690
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Gopal V
  Labels: newbie

 If two users run mvn test on a machine, the paths in TestMultiMRInput 
 collide  tests fail.
 {code}
 testSingleSplit(org.apache.tez.mapreduce.input.TestMultiMRInput)  Time 
 elapsed: 0.037 sec   ERROR!
 java.io.FileNotFoundException: /tmp/TestMultiMRInput/testSingleSplit/file1 
 (Permission denied)
 at java.io.FileOutputStream.open(Native Method)
 at java.io.FileOutputStream.init(FileOutputStream.java:212)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:206)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:202)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:265)   
  at 
 org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:252)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.init(ChecksumFileSystem.java:384)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
 at 
 org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1071)
 at 
 org.apache.hadoop.io.SequenceFile$RecordCompressWriter.init(SequenceFile.java:1371)
 at 
 org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:272)
 at 
 org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:294)
 at 
 org.apache.tez.mapreduce.input.TestMultiMRInput.createInputData(TestMultiMRInput.java:277)
 at 
 org.apache.tez.mapreduce.input.TestMultiMRInput.testSingleSplit(TestMultiMRInput.java:106)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1693) ARCHIVE local resources are not supported in Tez DAGs

2014-10-21 Thread Gopal V (JIRA)
Gopal V created TEZ-1693:


 Summary: ARCHIVE local resources are not supported in Tez DAGs
 Key: TEZ-1693
 URL: https://issues.apache.org/jira/browse/TEZ-1693
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V


{code}
2014-10-21 16:42:17,919 ERROR [main]: exec.Task (TezTask.java:execute(180)) - 
Failed to execute tez graph.
java.lang.IllegalArgumentException: LocalResourceType: ARCHIVE is not 
supported, only FILE is supported
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:365)
at org.apache.tez.client.TezClient.submitDAG(TezClient.java:344)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:368)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:159)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1607)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1593) Refactor PipelinedSorter to remove all MMAP based ByteBuffer references

2014-10-21 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1593:
-
Summary: Refactor PipelinedSorter to remove all MMAP based ByteBuffer 
references  (was: PipelinedSorter::compare() makes a key-copy to satisfy 
RawComparator interface)

 Refactor PipelinedSorter to remove all MMAP based ByteBuffer references
 ---

 Key: TEZ-1593
 URL: https://issues.apache.org/jira/browse/TEZ-1593
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V
  Labels: Performance

 The current implementation of PipelinedSorter has a slow section which 
 revolves around key comparisons.
 {code}
   kvbuffer.position(istart);
   kvbuffer.get(ki, 0, ilen);
   kvbuffer.position(jstart);
   kvbuffer.get(kj, 0, jlen);
   // sort by key
   final int cmp = comparator.compare(ki, 0, ilen, kj, 0, jlen);
 {code}
 The kvbuffer.get into the arrays ki and kj are the slowest part of the 
 comparator operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1593) Refactor PipelinedSorter to remove all MMAP based ByteBuffer references

2014-10-21 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1593:
-
Description: 
The current implementation of PipelinedSorter has a slow section which revolves 
around key comparisons - this was relevant when the implementation used direct 
byte buffers to back the kvbuffer.

{code}
  kvbuffer.position(istart);
  kvbuffer.get(ki, 0, ilen);
  kvbuffer.position(jstart);
  kvbuffer.get(kj, 0, jlen);
  // sort by key
  final int cmp = comparator.compare(ki, 0, ilen, kj, 0, jlen);
{code}

The kvbuffer.get into the arrays ki and kj are the slowest part of the 
comparator operation.

  was:
The current implementation of PipelinedSorter has a slow section which revolves 
around key comparisons.

{code}
  kvbuffer.position(istart);
  kvbuffer.get(ki, 0, ilen);
  kvbuffer.position(jstart);
  kvbuffer.get(kj, 0, jlen);
  // sort by key
  final int cmp = comparator.compare(ki, 0, ilen, kj, 0, jlen);
{code}

The kvbuffer.get into the arrays ki and kj are the slowest part of the 
comparator operation.


 Refactor PipelinedSorter to remove all MMAP based ByteBuffer references
 ---

 Key: TEZ-1593
 URL: https://issues.apache.org/jira/browse/TEZ-1593
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V
  Labels: Performance

 The current implementation of PipelinedSorter has a slow section which 
 revolves around key comparisons - this was relevant when the implementation 
 used direct byte buffers to back the kvbuffer.
 {code}
   kvbuffer.position(istart);
   kvbuffer.get(ki, 0, ilen);
   kvbuffer.position(jstart);
   kvbuffer.get(kj, 0, jlen);
   // sort by key
   final int cmp = comparator.compare(ki, 0, ilen, kj, 0, jlen);
 {code}
 The kvbuffer.get into the arrays ki and kj are the slowest part of the 
 comparator operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1141) DAGStatus.Progress should include number of failed attempts

2014-10-22 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180900#comment-14180900
 ] 

Gopal V commented on TEZ-1141:
--

The hive progress UI is already crowded with 4 numbers per vertex.

In general, we want failed attempts tracking - so that a user can see OOMs or 
task errors, without waiting for a query to finish and grepping the logs. 

Adding killed attempts to the mix (as one number) doesn't help and possibly 
will confuse users (see earlier comment on NM black-listing).


 DAGStatus.Progress should include number of failed attempts
 ---

 Key: TEZ-1141
 URL: https://issues.apache.org/jira/browse/TEZ-1141
 Project: Apache Tez
  Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Bikas Saha
Assignee: Hitesh Shah
 Attachments: TEZ-1141.1.patch


 Currently its impossible to know whether a job is seeing a lot of issues and 
 failures because we only report running tasks. Eventually the job fails but 
 before that we have no indication that a bunch of task failures have been 
 happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1141) DAGStatus.Progress should include number of failed attempts

2014-10-22 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180910#comment-14180910
 ] 

Gopal V commented on TEZ-1141:
--

The UI should be able to extract this information from 
TEZ_TASK_ATTEMPT_ID::status

The progress RPC we're talking about today is from the AM directly for clients 
like Hive.

 DAGStatus.Progress should include number of failed attempts
 ---

 Key: TEZ-1141
 URL: https://issues.apache.org/jira/browse/TEZ-1141
 Project: Apache Tez
  Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Bikas Saha
Assignee: Hitesh Shah
 Attachments: TEZ-1141.1.patch


 Currently its impossible to know whether a job is seeing a lot of issues and 
 failures because we only report running tasks. Eventually the job fails but 
 before that we have no indication that a bunch of task failures have been 
 happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1688) Add applicationId as a primary filter for all Timeline data for easier export

2014-10-22 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180934#comment-14180934
 ] 

Gopal V commented on TEZ-1688:
--

LGTM - +1.

I noticed that this uses the same naming convention as YARN applications 
(instead of being tied to a TEZ_* name). 

That makes a lot of sense, when we integrate this with the RM data - but right 
now, it looks rather odd in the dumps.

{code}
primaryfilters: {
TEZ_DAG_ID: [
dag_1413959022005_0046_1
],
TEZ_VERTEX_ID: [
vertex_1413959022005_0046_1_01
],
applicationId: [
application_1413959022005_0046
]
},
{code}

That is not particularly relevant to fix, but I'm commenting for the sake of 
some documentation about this.

 Add applicationId as a primary filter for all Timeline data for easier export 
 --

 Key: TEZ-1688
 URL: https://issues.apache.org/jira/browse/TEZ-1688
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-1688.1.patch, TEZ-1688.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1688) Add applicationId as a primary filter for all Timeline data for easier export

2014-10-22 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180944#comment-14180944
 ] 

Gopal V commented on TEZ-1688:
--

Running through my extraction pipelines - {{TEZ_APPLICATION_ATTEMPT}} is 
missing the filter.

The relatedEntities does not allow for a reverse lookup.

 Add applicationId as a primary filter for all Timeline data for easier export 
 --

 Key: TEZ-1688
 URL: https://issues.apache.org/jira/browse/TEZ-1688
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-1688.1.patch, TEZ-1688.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1596) Secure Shuffle utils is extremely expensive for fast queries

2014-10-23 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1596:
-
Description: 
Generating the hash for YARN's secure shuffle is more expensive than the actual 
HTTP call once keep-alive is turned on.

!Shuffle_generateHash.png!

  was:
Generating the hash for YARN's secure shuffle is more expensive than the actual 
HTTP call once keep-alive is turned on.

!shuffle-secure.png!


 Secure Shuffle utils is extremely expensive for fast queries
 

 Key: TEZ-1596
 URL: https://issues.apache.org/jira/browse/TEZ-1596
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
 Attachments: Shuffle_generateHash.png, TEZ-1596.hack.patch, 
 shuffle-secure-drilldown.png, shuffle-secure.png


 Generating the hash for YARN's secure shuffle is more expensive than the 
 actual HTTP call once keep-alive is turned on.
 !Shuffle_generateHash.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1698) Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez

2014-10-23 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1698:
-
Attachment: ProcfsBasedProcessTree.png

 Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez
 

 Key: TEZ-1698
 URL: https://issues.apache.org/jira/browse/TEZ-1698
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Gopal V
 Attachments: ProcfsBasedProcessTree.png


 ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
 the current task's process group.
 This is mostly wasted in Tez, since unlike YARN which has to do this since it 
 has the PID for the container-executor process (bash) and has to trace the 
 bash - java spawn inheritance.
 !ProcfsBasedProcessTree.png!
 The effect of this is less clearly visible with the profiler turned on as 
 this is primarily related to Syscall overhead in the kernel (via the 
 following codepath in YARN).
 {code}
  private ListString getProcessList() {
 String[] processDirs = (new File(procfsDir)).list();
 ...
 for (String dir : processDirs) {
   try {
 if ((new File(procfsDir, dir)).isDirectory()) {
   processList.add(dir);
 }
 ...
   public void updateProcessTree() {
 if (!pid.equals(deadPid)) {
   // Get the list of processes
   ListString processList = getProcessList();
 ...
   for (String proc : processList) {
 // Get information for each process
 ProcessInfo pInfo = new ProcessInfo(proc);
 if (constructProcessInfo(pInfo, procfsDir) != null) {
   allProcessInfo.put(proc, pInfo);
   if (proc.equals(this.pid)) {
 me = pInfo; // cache 'me'
 processTree.put(proc, pInfo);
   }
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1698) Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez

2014-10-23 Thread Gopal V (JIRA)
Gopal V created TEZ-1698:


 Summary: Use ResourceCalculatorPlugin instead of 
ResourceCalculatorProcessTree in Tez
 Key: TEZ-1698
 URL: https://issues.apache.org/jira/browse/TEZ-1698
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Gopal V
 Attachments: ProcfsBasedProcessTree.png

ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
the current task's process group.

This is mostly wasted in Tez, since unlike YARN which has to do this since it 
has the PID for the container-executor process (bash) and has to trace the bash 
- java spawn inheritance.

!ProcfsBasedProcessTree.png!

The effect of this is less clearly visible with the profiler turned on as this 
is primarily related to Syscall overhead in the kernel (via the following 
codepath in YARN).

{code}
 private ListString getProcessList() {
String[] processDirs = (new File(procfsDir)).list();
...
for (String dir : processDirs) {
  try {
if ((new File(procfsDir, dir)).isDirectory()) {
  processList.add(dir);
}
...

  public void updateProcessTree() {
if (!pid.equals(deadPid)) {
  // Get the list of processes
  ListString processList = getProcessList();
...
  for (String proc : processList) {
// Get information for each process
ProcessInfo pInfo = new ProcessInfo(proc);
if (constructProcessInfo(pInfo, procfsDir) != null) {
  allProcessInfo.put(proc, pInfo);
  if (proc.equals(this.pid)) {
me = pInfo; // cache 'me'
processTree.put(proc, pInfo);
  }
}
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1698) Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez

2014-10-23 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1698:
-
Description: 
ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
the current task's process group.

This is mostly wasted in Tez, since unlike YARN which has to do this since it 
has the PID for the container-executor process (bash) and has to trace the bash 
- java spawn inheritance.

!ProcfsBasedProcessTree.png!

The latency effect of this is less clearly visible with the profiler turned on 
as this is primarily related to rate of syscalls + overhead in the kernel (via 
the following codepath in YARN).

!ProcfsFiles.png!

{code}
 private ListString getProcessList() {
String[] processDirs = (new File(procfsDir)).list();
...
for (String dir : processDirs) {
  try {
if ((new File(procfsDir, dir)).isDirectory()) {
  processList.add(dir);
}
...

  public void updateProcessTree() {
if (!pid.equals(deadPid)) {
  // Get the list of processes
  ListString processList = getProcessList();
...
  for (String proc : processList) {
// Get information for each process
ProcessInfo pInfo = new ProcessInfo(proc);
if (constructProcessInfo(pInfo, procfsDir) != null) {
  allProcessInfo.put(proc, pInfo);
  if (proc.equals(this.pid)) {
me = pInfo; // cache 'me'
processTree.put(proc, pInfo);
  }
}
  }
{code}

  was:
ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
the current task's process group.

This is mostly wasted in Tez, since unlike YARN which has to do this since it 
has the PID for the container-executor process (bash) and has to trace the bash 
- java spawn inheritance.

!ProcfsBasedProcessTree.png!

The effect of this is less clearly visible with the profiler turned on as this 
is primarily related to Syscall overhead in the kernel (via the following 
codepath in YARN).

{code}
 private ListString getProcessList() {
String[] processDirs = (new File(procfsDir)).list();
...
for (String dir : processDirs) {
  try {
if ((new File(procfsDir, dir)).isDirectory()) {
  processList.add(dir);
}
...

  public void updateProcessTree() {
if (!pid.equals(deadPid)) {
  // Get the list of processes
  ListString processList = getProcessList();
...
  for (String proc : processList) {
// Get information for each process
ProcessInfo pInfo = new ProcessInfo(proc);
if (constructProcessInfo(pInfo, procfsDir) != null) {
  allProcessInfo.put(proc, pInfo);
  if (proc.equals(this.pid)) {
me = pInfo; // cache 'me'
processTree.put(proc, pInfo);
  }
}
  }
{code}


 Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez
 

 Key: TEZ-1698
 URL: https://issues.apache.org/jira/browse/TEZ-1698
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Gopal V
 Attachments: ProcfsBasedProcessTree.png, ProcfsFiles.png


 ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
 the current task's process group.
 This is mostly wasted in Tez, since unlike YARN which has to do this since it 
 has the PID for the container-executor process (bash) and has to trace the 
 bash - java spawn inheritance.
 !ProcfsBasedProcessTree.png!
 The latency effect of this is less clearly visible with the profiler turned 
 on as this is primarily related to rate of syscalls + overhead in the kernel 
 (via the following codepath in YARN).
 !ProcfsFiles.png!
 {code}
  private ListString getProcessList() {
 String[] processDirs = (new File(procfsDir)).list();
 ...
 for (String dir : processDirs) {
   try {
 if ((new File(procfsDir, dir)).isDirectory()) {
   processList.add(dir);
 }
 ...
   public void updateProcessTree() {
 if (!pid.equals(deadPid)) {
   // Get the list of processes
   ListString processList = getProcessList();
 ...
   for (String proc : processList) {
 // Get information for each process
 ProcessInfo pInfo = new ProcessInfo(proc);
 if (constructProcessInfo(pInfo, procfsDir) != null) {
   allProcessInfo.put(proc, pInfo);
   if (proc.equals(this.pid)) {
 me = pInfo; // cache 'me'
 processTree.put(proc, pInfo);
   }
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1698) Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez

2014-10-23 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1698:
-
Attachment: ProcfsFiles.png

 Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez
 

 Key: TEZ-1698
 URL: https://issues.apache.org/jira/browse/TEZ-1698
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Gopal V
 Attachments: ProcfsBasedProcessTree.png, ProcfsFiles.png


 ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
 the current task's process group.
 This is mostly wasted in Tez, since unlike YARN which has to do this since it 
 has the PID for the container-executor process (bash) and has to trace the 
 bash - java spawn inheritance.
 !ProcfsBasedProcessTree.png!
 The latency effect of this is less clearly visible with the profiler turned 
 on as this is primarily related to rate of syscalls + overhead in the kernel 
 (via the following codepath in YARN).
 !ProcfsFiles.png!
 {code}
  private ListString getProcessList() {
 String[] processDirs = (new File(procfsDir)).list();
 ...
 for (String dir : processDirs) {
   try {
 if ((new File(procfsDir, dir)).isDirectory()) {
   processList.add(dir);
 }
 ...
   public void updateProcessTree() {
 if (!pid.equals(deadPid)) {
   // Get the list of processes
   ListString processList = getProcessList();
 ...
   for (String proc : processList) {
 // Get information for each process
 ProcessInfo pInfo = new ProcessInfo(proc);
 if (constructProcessInfo(pInfo, procfsDir) != null) {
   allProcessInfo.put(proc, pInfo);
   if (proc.equals(this.pid)) {
 me = pInfo; // cache 'me'
 processTree.put(proc, pInfo);
   }
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1634) BlockCompressorStream.finish() is called twice in IFile.close leading to Shuffle errors

2014-10-27 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1634:
-
Fix Version/s: 0.5.2

 BlockCompressorStream.finish() is called twice in IFile.close leading to 
 Shuffle errors
 ---

 Key: TEZ-1634
 URL: https://issues.apache.org/jira/browse/TEZ-1634
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0, 0.6.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Fix For: 0.6.0, 0.5.2

 Attachments: BlockCompressorStream.with.logging.java, 
 TEZ-1634.1.patch, TEZ-1634.2.patch, stacktrace-with-comments.txt


 When IFile.Writer is closed, it explicitly calls compressedOut.finish(); And 
 as a part of FSDataOutputStream.close(), it again internally calls finish().  
 Please refer o.a.h.i.compress.BlockCompressorStream for more details on 
 finish(). This leads to additional 4 bytes being written to IFile.  This 
 causes issues randomly during shuffle.  Also, this prevents IFileInputStream 
 to do the proper checksumming.  
 This error happens only when we try to fetch multiple attempt outputs using 
 the same URL.  And is easily reproducible with SnappCompressionCodec.  First 
 attempt output would be downloaded by fetcher and due to the last 4 bytes in 
 the stream, it wouldn't do the proper checksumming in IFileInputStream.  This 
 causes the subsequent attempt download to fail with the following exception.
 Example error in shuffle phase is attached below.
 
 2014-09-15 09:54:22,950 WARN [fetcher [scope_41] #31] 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher: Invalid map id 
 java.lang.IllegalArgumentException: Invalid header received:  partition: 0
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyMapOutput(Fetcher.java:352)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:294)
   at 
 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
 
 I will attach the debug version of BlockCompressionStream with threaddump 
 (which validates that finish() is called twice in IFile.close()).  This bug 
 was present in earlier versions of Tez as well, and was able to consistently 
 reproduce it now on local-vm itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1596) Secure Shuffle utils is extremely expensive for fast queries

2014-10-27 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185973#comment-14185973
 ] 

Gopal V commented on TEZ-1596:
--

Works as expected - +1

 Secure Shuffle utils is extremely expensive for fast queries
 

 Key: TEZ-1596
 URL: https://issues.apache.org/jira/browse/TEZ-1596
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: Shuffle_generateHash.png, 
 Shuffle_generateHash_afterFix.png, TEZ-1596.2.patch, TEZ-1596.hack.patch, 
 shuffle-secure-drilldown.png, shuffle-secure.png


 Generating the hash for YARN's secure shuffle is more expensive than the 
 actual HTTP call once keep-alive is turned on.
 !Shuffle_generateHash.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1701) ATS fixes to flush all history events and also using batching

2014-10-27 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186000#comment-14186000
 ] 

Gopal V commented on TEZ-1701:
--

Event counts add up with this patch.

Within the ATS level DB, significant pauses were noticed during the GC delete 
passes.

 ATS fixes to flush all history events and also using batching
 -

 Key: TEZ-1701
 URL: https://issues.apache.org/jira/browse/TEZ-1701
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-1701.1.patch


 There are cases when the timeline server can get backlogged. To address this, 
 the AM should wait for a longer period to send events to it. Also, sending 
 events in batches will reduce the load. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1719) Allow IFile reducer merge-sort to disable crc32 checksums

2014-10-29 Thread Gopal V (JIRA)
Gopal V created TEZ-1719:


 Summary: Allow IFile reducer merge-sort to disable crc32 checksums
 Key: TEZ-1719
 URL: https://issues.apache.org/jira/browse/TEZ-1719
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V


Next-gen filesystems like BTRFS and ZFS provide their own checksumming for disk 
data.

Using PureJavaCrc32 for data written for temporary spills to such filesystems 
is a complete waste of CPU resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1719) Allow IFile reducer merge-sort to disable crc32 checksums

2014-10-29 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1719:
-
Labels: Performance  (was: )

 Allow IFile reducer merge-sort to disable crc32 checksums
 -

 Key: TEZ-1719
 URL: https://issues.apache.org/jira/browse/TEZ-1719
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
  Labels: Performance

 Next-gen filesystems like BTRFS and ZFS provide their own checksumming for 
 disk data.
 Using PureJavaCrc32 for data written for temporary spills to such filesystems 
 is a complete waste of CPU resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1698) Cut down on ResourceCalculatorProcessTree overheads in Tez

2014-10-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1698:
-
Summary: Cut down on ResourceCalculatorProcessTree overheads in Tez  (was: 
Use ResourceCalculatorPlugin instead of ResourceCalculatorProcessTree in Tez)

 Cut down on ResourceCalculatorProcessTree overheads in Tez
 --

 Key: TEZ-1698
 URL: https://issues.apache.org/jira/browse/TEZ-1698
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: ProcfsBasedProcessTree.png, ProcfsFiles.png, 
 TEZ-1698.1.patch, TEZ-1698.2.patch


 ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
 the current task's process group.
 This is mostly wasted in Tez, since unlike YARN which has to do this since it 
 has the PID for the container-executor process (bash) and has to trace the 
 bash - java spawn inheritance.
 !ProcfsBasedProcessTree.png!
 The latency effect of this is less clearly visible with the profiler turned 
 on as this is primarily related to rate of syscalls + overhead in the kernel 
 (via the following codepath in YARN).
 !ProcfsFiles.png!
 {code}
  private ListString getProcessList() {
 String[] processDirs = (new File(procfsDir)).list();
 ...
 for (String dir : processDirs) {
   try {
 if ((new File(procfsDir, dir)).isDirectory()) {
   processList.add(dir);
 }
 ...
   public void updateProcessTree() {
 if (!pid.equals(deadPid)) {
   // Get the list of processes
   ListString processList = getProcessList();
 ...
   for (String proc : processList) {
 // Get information for each process
 ProcessInfo pInfo = new ProcessInfo(proc);
 if (constructProcessInfo(pInfo, procfsDir) != null) {
   allProcessInfo.put(proc, pInfo);
   if (proc.equals(this.pid)) {
 me = pInfo; // cache 'me'
 processTree.put(proc, pInfo);
   }
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1698) Cut down on ResourceCalculatorProcessTree overheads in Tez

2014-10-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1698:
-
Attachment: TEZ-1698.3.patch

[~rajesh.balamohan]: Can you test this version? 

Minor changes, the plugin is built only against Sun/Oracle JDKs.

And CumulativeRSS now returns total memory held by the JVM via 
Runtime.getTotalMemory(), since it includes the free heap memory which is held 
by the JVM (as finalized/collected garbage).

 Cut down on ResourceCalculatorProcessTree overheads in Tez
 --

 Key: TEZ-1698
 URL: https://issues.apache.org/jira/browse/TEZ-1698
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: ProcfsBasedProcessTree.png, ProcfsFiles.png, 
 TEZ-1698.1.patch, TEZ-1698.2.patch, TEZ-1698.3.patch


 ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
 the current task's process group.
 This is mostly wasted in Tez, since unlike YARN which has to do this since it 
 has the PID for the container-executor process (bash) and has to trace the 
 bash - java spawn inheritance.
 !ProcfsBasedProcessTree.png!
 The latency effect of this is less clearly visible with the profiler turned 
 on as this is primarily related to rate of syscalls + overhead in the kernel 
 (via the following codepath in YARN).
 !ProcfsFiles.png!
 {code}
  private ListString getProcessList() {
 String[] processDirs = (new File(procfsDir)).list();
 ...
 for (String dir : processDirs) {
   try {
 if ((new File(procfsDir, dir)).isDirectory()) {
   processList.add(dir);
 }
 ...
   public void updateProcessTree() {
 if (!pid.equals(deadPid)) {
   // Get the list of processes
   ListString processList = getProcessList();
 ...
   for (String proc : processList) {
 // Get information for each process
 ProcessInfo pInfo = new ProcessInfo(proc);
 if (constructProcessInfo(pInfo, procfsDir) != null) {
   allProcessInfo.put(proc, pInfo);
   if (proc.equals(this.pid)) {
 me = pInfo; // cache 'me'
 processTree.put(proc, pInfo);
   }
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1698) Cut down on ResourceCalculatorProcessTree overheads in Tez

2014-10-30 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191204#comment-14191204
 ] 

Gopal V commented on TEZ-1698:
--

+1 - Thanks Rajesh, this looks good.

This isn't on by default, so it should be good for 0.5.2 as well.

 Cut down on ResourceCalculatorProcessTree overheads in Tez
 --

 Key: TEZ-1698
 URL: https://issues.apache.org/jira/browse/TEZ-1698
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: ProcfsBasedProcessTree.png, ProcfsFiles.png, 
 TEZ-1698.1.patch, TEZ-1698.2.patch, TEZ-1698.3.patch, TEZ-1698.4.patch


 ResourceCalculatorProcessTree scraps all of /proc/ for PIDs which are part of 
 the current task's process group.
 This is mostly wasted in Tez, since unlike YARN which has to do this since it 
 has the PID for the container-executor process (bash) and has to trace the 
 bash - java spawn inheritance.
 !ProcfsBasedProcessTree.png!
 The latency effect of this is less clearly visible with the profiler turned 
 on as this is primarily related to rate of syscalls + overhead in the kernel 
 (via the following codepath in YARN).
 !ProcfsFiles.png!
 {code}
  private ListString getProcessList() {
 String[] processDirs = (new File(procfsDir)).list();
 ...
 for (String dir : processDirs) {
   try {
 if ((new File(procfsDir, dir)).isDirectory()) {
   processList.add(dir);
 }
 ...
   public void updateProcessTree() {
 if (!pid.equals(deadPid)) {
   // Get the list of processes
   ListString processList = getProcessList();
 ...
   for (String proc : processList) {
 // Get information for each process
 ProcessInfo pInfo = new ProcessInfo(proc);
 if (constructProcessInfo(pInfo, procfsDir) != null) {
   allProcessInfo.put(proc, pInfo);
   if (proc.equals(this.pid)) {
 me = pInfo; // cache 'me'
 processTree.put(proc, pInfo);
   }
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1725) Fix nanosecond to millis conversion in TezMxBeanResourceCalculator

2014-10-31 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191449#comment-14191449
 ] 

Gopal V commented on TEZ-1725:
--

Comparing LinuxResourceCalculatorPlugin vs MXBeanResourceCalculator

|| Counter || LinuxResourceCalculatorPlugin || TezMxBeanResourceCalculator ||
| CPU_MILLISECONDS | 48458160 | 48059040 |
| PHYSICAL_MEMORY_BYTES | 6679073550336 | 7029569093632 |
| VIRTUAL_MEMORY_BYTES | 11706779303936 | 11492467920896 |



 Fix nanosecond to millis conversion in TezMxBeanResourceCalculator
 --

 Key: TEZ-1725
 URL: https://issues.apache.org/jira/browse/TEZ-1725
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-1725.1.patch, TEZ-1725.2.patch, TEZ-1725.3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1725) Fix nanosecond to millis conversion in TezMxBeanResourceCalculator

2014-10-31 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191455#comment-14191455
 ] 

Gopal V commented on TEZ-1725:
--

The error bar seems to be within a percentage, even on a 10Tb query.

+1 - LGTM.

 Fix nanosecond to millis conversion in TezMxBeanResourceCalculator
 --

 Key: TEZ-1725
 URL: https://issues.apache.org/jira/browse/TEZ-1725
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-1725.1.patch, TEZ-1725.2.patch, TEZ-1725.3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1733) TezMerger should sort FileChunks on decompressed size

2014-11-03 Thread Gopal V (JIRA)
Gopal V created TEZ-1733:


 Summary: TezMerger should sort FileChunks on decompressed size
 Key: TEZ-1733
 URL: https://issues.apache.org/jira/browse/TEZ-1733
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Gopal V


 MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the 
decompressed size, to cut-down on CPU and IO costs.

TezMerger needs an equivalent sorted TreeSet which sorts by the data with-in 
sizes rather than actual file sizes.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1733) TezMerger should sort FileChunks on decompressed size

2014-11-03 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1733:
-
Attachment: TEZ-1733.1.patch

 TezMerger should sort FileChunks on decompressed size
 -

 Key: TEZ-1733
 URL: https://issues.apache.org/jira/browse/TEZ-1733
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Gopal V
 Attachments: TEZ-1733.1.patch


  MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the 
 decompressed size, to cut-down on CPU and IO costs.
 TezMerger needs an equivalent sorted TreeSet which sorts by the data with-in 
 sizes rather than actual file sizes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1733) TezMerger should sort FileChunks on decompressed size

2014-11-03 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1733:
-
Priority: Critical  (was: Major)

 TezMerger should sort FileChunks on decompressed size
 -

 Key: TEZ-1733
 URL: https://issues.apache.org/jira/browse/TEZ-1733
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Gopal V
Priority: Critical
 Attachments: TEZ-1733.1.patch


  MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the 
 decompressed size, to cut-down on CPU and IO costs.
 TezMerger needs an equivalent sorted TreeSet which sorts by the data with-in 
 sizes rather than actual file sizes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1733) TezMerger should sort FileChunks on decompressed size

2014-11-03 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1733:
-
Target Version/s: 0.5.2  (was: 0.6.0)

 TezMerger should sort FileChunks on decompressed size
 -

 Key: TEZ-1733
 URL: https://issues.apache.org/jira/browse/TEZ-1733
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Gopal V
 Attachments: TEZ-1733.1.patch


  MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the 
 decompressed size, to cut-down on CPU and IO costs.
 TezMerger needs an equivalent sorted TreeSet which sorts by the data with-in 
 sizes rather than actual file sizes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   >