Thanks Rajesh. I've sent a mail to pig user mail list also. Waiting for their reply. I'll reply here for other to know about the problem.
On Tue, Jul 7, 2015 at 2:12 PM, Rajesh Balamohan <[email protected] > wrote: > Hi Sachin, > > That was just a temporary workaround to ensure that was the issue. > Ideally user does not need to set this parameter. Real issue is why > auto-reducer is set to false in certain vertices in pig-tez. Will wait for > Pig folks to chime in. > > For doc/tutorial related, you can start off with the following > - http://tez.apache.org/talks.html > - couple of youtube videos are available from hadoop summits and meetups. > - > http://hortonworks.com/blog/apache-tez-a-new-chapter-in-hadoop-data-processing/ > (this is pretty old) > - Pig on tez > http://www.slideshare.net/Hadoop_Summit/pig-on-tez-low-latency-data-processing-with-big-data > > ~Rajesh.B > > On Tue, Jul 7, 2015 at 1:54 PM, Sachin Sabbarwal < > [email protected]> wrote: > >> Hi Rajesh >> Thanks for your response. *This seems to be working for me.* >> By setting pig.exec.reducers.max to 10 i am able to complete my run in >> under 4 mins.(Initially it was running in 14-15 mins). >> I'm new to pig/tez/hadoop world. Do you write any blogs about >> pig/tez/hadoop etc? Can you suggest any tutorials/links to read about tez? >> I need to understand concepts like scope, DAG, parallelism etc. I just >> have a very basic understanding of tez. If i understand all these concepts >> i'll be able to tune my job. >> >> Thanks >> >> >> On Tue, Jul 7, 2015 at 1:23 PM, Rajesh Balamohan < >> [email protected]> wrote: >> >>> Forgot to add the following. Ideally auto-reduce implementation should >>> have kicked-in and on need basis, it should have decreased the number of >>> reducers needed. However, for the vertices of concern (scope-2037 & >>> scope-2162), auto-reducer has been turned off in configuration by Pig and >>> for the rest of the vertices it is turned on. >>> >>> Pig folks would be able to help in terms of providing details on why >>> auto-reduce parallelism is turned off in certain vertices. >>> >>> 2015-07-06 14:11:35,109 INFO [AsyncDispatcher event handler] >>> impl.VertexImpl: Setting vertexManager to ShuffleVertexManager for >>> vertex_1436152736518_0210_1_28* [scope-2037]* >>> 2015-07-06 14:11:35,123 INFO [AsyncDispatcher event handler] >>> vertexmanager.ShuffleVertexManager: Shuffle Vertex Manager: settings >>> minFrac:0.25 maxFrac:0.75* auto:false* desiredTaskIput:104857600 >>> minTasks:1 >>> 2015-07-06 14:11:35,123 INFO [AsyncDispatcher event handler] >>> impl.VertexImpl: Creating 999 for vertex: vertex_1436152736518_0210_1_28 >>> [scope-2037] >>> .... >>> >>> 2015-07-06 14:11:35,245 INFO [AsyncDispatcher event handler] >>> impl.VertexImpl: Setting vertexManager to ShuffleVertexManager for >>> vertex_1436152736518_0210_1_39 *[scope-2162]* >>> 2015-07-06 14:11:35,257 INFO [AsyncDispatcher event handler] >>> vertexmanager.ShuffleVertexManager: Shuffle Vertex Manager: settings >>> minFrac:0.25 maxFrac:0.75 *auto:false* desiredTaskIput:104857600 >>> minTasks:1 >>> 2015-07-06 14:11:35,257 INFO [AsyncDispatcher event handler] >>> impl.VertexImpl: Creating 999 for vertex: vertex_1436152736518_0210_1_39 >>> [scope-2162] >>> .... >>> >>> 2015-07-06 14:11:35,417 INFO [AsyncDispatcher event handler] >>> impl.VertexImpl: Setting user vertex manager plugin: >>> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager on vertex:* >>> scope-2185* >>> 2015-07-06 14:11:35,419 INFO [AsyncDispatcher event handler] >>> vertexmanager.ShuffleVertexManager: Shuffle Vertex Manager: settings >>> minFrac:0.25 maxFrac:0.75 *auto:true* desiredTaskIput:104857600 >>> minTasks:1 >>> ... >>> >>> >>> On Tue, Jul 7, 2015 at 12:47 PM, Rajesh Balamohan < >>> [email protected]> wrote: >>> >>>> >>>> Attaching the DAG and the swimlane for the job. >>>> >>>> scope-2052 which had to give the data to other vertices slowed down (~ >>>> 150-180 seconds) due to multiple spills and NumberFormatExceptions in >>>> data. You might want to try setting >>>> "tez.task.scale.memory.additional-reservation.fraction.max='PARTITIONED_UNSORTED_OUTPUT:12,UNSORTED_INPUT:1,UNSORTED_OUTPUT:12,SORTED_OUTPUT:12,SORTED_MERGED_INPUT:1,PROCESSOR:1,OTHER:1' >>>> " to allocate more memory for unordered outputs. >>>> Following are the details for this scope. >>>> - attempt_1436152736518_0210_1_31_000000_0, >>>> PigLatin:dmwith1tapin.pig-0_scope-0, VertexName: scope-2052, >>>> VertexParallelism: 1, >>>> TaskAttemptID:attempt_1436152736518_0210_1_31_000000_0, >>>> - numInputs=1, numOutputs=4, JVM.maxFree=734527488 >>>> - 2015-07-06 14:11:40,047 INFO [TezChild] resources.MemoryDistributor: >>>> Informing: INPUT, scope-546, org.apache.tez.mapreduce.input.MRInput: >>>> requested=0, allocated=0 >>>> - Small allocation of ~7 MB allocation to unordered output lead to >>>> multiple spills. >>>> - 2015-07-06 14:11:40,047 INFO [TezChild] resources.MemoryDistributor: >>>> Informing: OUTPUTPUT_RECORDS, scope-2117, >>>> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput: >>>> requested=268435456, allocated=222303401 >>>> - 2015-07-06 14:11:40,047 INFO [TezChild] resources.MemoryDistributor: >>>> Informing: OUTPUT, scope-2251, >>>> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput: >>>> requested=268435456, allocated=222303401 >>>> - 2015-07-06 14:11:40,048 INFO [TezChild] resources.MemoryDistributor: >>>> Informing: OUTPUT, scope-2063, >>>> org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput: >>>> requested=104857600, allocated=7236438 >>>> - 2015-07-06 14:11:40,048 INFO [TezChild] resources.MemoryDistributor: >>>> Informing: OUTPUT, scope-2068, >>>> org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput: >>>> requested=104857600, allocated=7236438 >>>> - Too many number of records had issues in NumberFormatException >>>> leading to large amount of logs. This dragged the runtime of this task to >>>> around >>>> e.g "mapReduceLayer.PigHadoopLogger: >>>> java.lang.Class(FIELD_DISCARDED_TYPE_CONVERSION_FAILED): Unable to >>>> interpret value in field being converted to long, caught >>>> NumberFormatException <empty String> field discarded" >>>> >>>> >>>> - scope-2037 and scope-2162 had set the vertex parallelism to "999" >>>> affecting subsequent execution. >>>> - VertexName: scope-2037, VertexParallelism: 999 vertex: >>>> vertex_1436152736518_0210_1_28 finished in *410* seconds. Tasks >>>> themselves were small, but due to large number of tasks that had to be >>>> executed in small containers (pretty much used the same container to >>>> execute this) it took time. >>>> - VertexName: scope-2162, VertexParallelism: 999 vertex: >>>> vertex_1436152736518_0210_1_39 finished in *697* seconds. Similar >>>> observation as previous vertex. >>>> *Above 2 vertices have caused the entire job to slow down.* >>>> >>>> "999" is set as the reducer parallelism at compile time by Pig. This is >>>> not for the input. I am not sure how pig sets the parallelism at compile >>>> time. You can possibly try setting "pig.exec.reducers.max=50" in your case >>>> and give it a try. Pig folks would be in a better position to explain that. >>>> >>>> >>>> On Tue, Jul 7, 2015 at 11:22 AM, Sachin Sabbarwal < >>>> [email protected]> wrote: >>>> >>>>> >>>>> logs.gz >>>>> <https://drive.google.com/file/d/0B-RFcYxUIHzzUVJpRzVDZXB5TUk/view?usp=drive_web> >>>>> Hi Rajesh >>>>> >>>>> PFA the gziped logs. >>>>> FYI It's a single file, when you'll gunzip it, it'll be around 1.5gb >>>>> in size. >>>>> One more thing which you might find useful: >>>>> >>>>> In the dmOutputTez file i could see following line, which suggests >>>>> that TEZ created a total of 7660 tasks. This is surprising as my data is >>>>> only few mbs(10-15 mb max). How is this number of tasks decided? is there >>>>> any property to tune it? >>>>> >>>>> 2015-07-07 05:37:02,647 [Timer-0] INFO >>>>> org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG >>>>> Status: status=RUNNING, progress=TotalTasks: 7660 Succeeded: 0 >>>>> Running: 0 Failed: 0 Killed: 0, diagnostics= >>>>> >>>>> Thanks >>>>> >>>>> On Mon, Jul 6, 2015 at 8:34 PM, Rajesh Balamohan < >>>>> [email protected]> wrote: >>>>> >>>>>> yarn logs -applicationId application_1436152736518_0210 >>>>>> >>>>>> You can possibly send the output to a log file, gzip it and post it. >>>>>> >>>>>> ~Rajesh.B >>>>>> >>>>>> On Mon, Jul 6, 2015 at 8:12 PM, Sachin Sabbarwal < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi >>>>>>> Thanks for reply. My tez-site.xml contains only following: >>>>>>> >>>>>>> <configuration> >>>>>>> <property> >>>>>>> <name>tez.lib.uris</name> >>>>>>> <value>${fs.defaultFS}/apps/tez-0.5/tez-0.5.3.tar.gz, >>>>>>> ${fs.defaultFS}/apps/tez-0.5/*,${fs.defaultFS}/apps/tez-0.5/lib/*</value> >>>>>>> </property> >>>>>>> </configuration> >>>>>>> >>>>>>> PFA the application logs. Here is the version information: >>>>>>> 1. Hadoop version: Hadoop 2.5.0-cdh5.3.1 >>>>>>> 2. Pig: Apache Pig version 0.14.0 (r1640057) >>>>>>> 3. Tez: 0.5.3 >>>>>>> >>>>>>> Lemme know if anything else is needed. >>>>>>> >>>>>>> Thanks in advance >>>>>>> >>>>>>> On Mon, Jul 6, 2015 at 7:07 PM, Rajesh Balamohan < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Can you post the application logs, tez-site.xml and also the >>>>>>>> version details? >>>>>>>> >>>>>>>> ~Rajesh.B >>>>>>>> >>>>>>>> On Mon, Jul 6, 2015 at 6:38 PM, Sachin Sabbarwal < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> ---------- Forwarded message ---------- >>>>>>>>> From: Sachin Sabbarwal <[email protected]> >>>>>>>>> Date: Mon, Jul 6, 2015 at 5:34 PM >>>>>>>>> Subject: Same pig script running slower with Tez as compared with >>>>>>>>> run in Mapred mode >>>>>>>>> To: [email protected] >>>>>>>>> >>>>>>>>> >>>>>>>>> Hello Guys >>>>>>>>> Trying Apache Tez. >>>>>>>>> I've setup to use pig in TEZ mode. >>>>>>>>> I'm running a pig script against i) no data and ii) with some data. >>>>>>>>> In case i) when i run with pig using TEZ mode my pig scripts >>>>>>>>> completes run in ~40secs. Whereas when i run case i) with mapred it >>>>>>>>> takes >>>>>>>>> around 7-8 mins. >>>>>>>>> in case ii) when run with pig using TEZ, same pig script takes >>>>>>>>> around 14-15 mins but with mapred it takes around 10 mins. >>>>>>>>> When i'm running same pig script with production data(which is >>>>>>>>> much more than the data i used here to run case i) and (ii) ) the job >>>>>>>>> takes >>>>>>>>> hours to complete. >>>>>>>>> Hence I'm trying tez to run my pig job in a faster mode. I'm not >>>>>>>>> really sure what i might be missing here. Please help, ask for any >>>>>>>>> further >>>>>>>>> info if required. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> -- >>>>>>>>> Sachin Sabbarwal >>>>>>>>> Linkedin: >>>>>>>>> https://www.linkedin.com/profile?viewProfile=&key=95777265 >>>>>>>>> Facebook: facebook.com/sachinsabbarwal >>>>>>>>> Quora: http://www.quora.com/Sachin-Sabbarwal >>>>>>>>> Blog: http://sachinsabbarwal.tumblr.com/ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Sachin Sabbarwal >>>>>>>>> Linkedin: >>>>>>>>> https://www.linkedin.com/profile?viewProfile=&key=95777265 >>>>>>>>> Facebook: facebook.com/sachinsabbarwal >>>>>>>>> Quora: http://www.quora.com/Sachin-Sabbarwal >>>>>>>>> Blog: http://sachinsabbarwal.tumblr.com/ >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Rajesh.B >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Sachin Sabbarwal >>>>>>> Linkedin: https://www.linkedin.com/profile?viewProfile=&key=95777265 >>>>>>> Facebook: facebook.com/sachinsabbarwal >>>>>>> Quora: http://www.quora.com/Sachin-Sabbarwal >>>>>>> Blog: http://sachinsabbarwal.tumblr.com/ >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ~Rajesh.B >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Sachin Sabbarwal >>>>> Linkedin: https://www.linkedin.com/profile?viewProfile=&key=95777265 >>>>> Facebook: facebook.com/sachinsabbarwal >>>>> Quora: http://www.quora.com/Sachin-Sabbarwal >>>>> Blog: http://sachinsabbarwal.tumblr.com/ >>>>> >>>> >>>> >>>> >>>> -- >>>> ~Rajesh.B >>>> >>> >>> >>> >>> -- >>> ~Rajesh.B >>> >> >> >> >> -- >> Sachin Sabbarwal >> Linkedin: https://www.linkedin.com/profile?viewProfile=&key=95777265 >> Facebook: facebook.com/sachinsabbarwal >> Quora: http://www.quora.com/Sachin-Sabbarwal >> Blog: http://sachinsabbarwal.tumblr.com/ >> > > > > -- > ~Rajesh.B > -- Sachin Sabbarwal Linkedin: https://www.linkedin.com/profile?viewProfile=&key=95777265 Facebook: facebook.com/sachinsabbarwal Quora: http://www.quora.com/Sachin-Sabbarwal Blog: http://sachinsabbarwal.tumblr.com/
