If you know the vertex/task, you can enable profiling only on those. Please check "Profiling in tez" section in https://cwiki.apache.org/confluence/display/TEZ/How+to+Diagnose+Tez+App. But this is with yourkit, which would dump the snapshot at the end of the run. Haven't tried with hprof options.
~Rajesh.B On Thu, Jan 19, 2017 at 5:39 AM, Piyush Narang <pnar...@twitter.com> wrote: > hi folks, > > I had a couple of Cascading3 on Tez jobs that seemed to be running slower > on Tez as compared to Hadoop. Wanted to try and get some hprof profiles to > see what the jobs are spending time on, so thought I'd try and get a hprof > profile. I tried running the job with: > > -Dmapreduce.task.profile=true \ > > -Dtez.task.launch.cluster-default.cmd-opts="-XX:+UseSerialGC > -Djava.net.preferIPv4Stack=true -XX:ReservedCodeCacheSize=128M > -XX:MaxMetaspaceSize=256M -XX:CompressedClassSpaceSize=256M > -XX:CICompilerCount=2 -XX:HeapDumpPath=<LOG_DIR>/heapdump-@taskid@.hprof > -XX:ErrorFile=<LOG_DIR>/hs_err_pid-@taskid@.log > -agentlib:hprof=cpu=samples,heap=sites,force=n,thread=y, > verbose=n,file=<LOG_DIR>/profile-@taskid@.out" > Now when I kick off the Tez job, it seems to spend very long (upwards of > an hour) stuck at 0%. The tasks don't seem to proceed beyond this: > > 2017-01-18 23:53:26,261 [INFO] [TezChild] |tez.FlowProcessor|: flow node id: > E08C07BFB10141D8B6D7211E5AF172E4, all 1 inputs ready in: 00:00:00.002 > > > Tried capturing some jstacks and the top of the stack seems to hold some hprof > > related frames. > > > Has anyone been able to profile their Tez jobs with hprof? Are there any other > > settings I'm missing? > > > Same set of options(replace tez.task.launch.cluster-default.cmd-opts with > mapreduce.task.profile.params) seem to work fine in case of Hadoop. I end up > getting the > > hprof profiles there. > > > Thanks, > > -- > - Piyush >