Hi Juho, Could you download the data from Tez-UI (dag details page should provide a download button which downloads data for the dag where you see the issue).
Also, could you share the yarn-app logs? What are the runtime numbers you see in MR and Tez? ~Rajesh.B On Mon, Oct 26, 2015 at 3:51 AM, Juho Autio <[email protected]> wrote: > Hitesh, thanks for your response! > > I got rid of the Full GC problem thanks to Jeff Zhang's hint. However > performance with tez is still not perfect – mr engine is faster for this > particular job. What are the perf analyzers you're referring to? I'd like > to try those. I have Tez UI available indeed. What hive options are > important to consider in combination with tez? Is the container size the > only option by Tez itself to tune? > > Cheers, > Juho > > On Fri, Oct 23, 2015 at 6:41 PM, Hitesh Shah <[email protected]> wrote: > >> Hello Juho >> >> As you are probably aware, each hive query will largely have different >> memory requirements depending on what kind of plan it ends up executing. >> For the most part, a common container size and general settings work well >> for most queries. >> In this case, this might need additional tuning to either fix the hive >> query plan or correctly size the Tez container just for this query as well >> as tuning any other Hive knobs that may be making the wrong assumptions >> about data stats or available memory to play with causing this query to run >> very slowly. >> >> As a first step, it would be good if you can help provide the explain >> plan for the query, hive-site/tez-site for configs being used and the yarn >> application logs for the completed query. If you have the Tez UI available, >> you can click the “Download data” on the dag details page too which can be >> used to run against the various perf analyzers available in Tez to see what >> the issue is. >> >> thanks >> — Hitesh >> >> >> On Oct 23, 2015, at 1:08 AM, Juho Autio <[email protected]> wrote: >> >> > Hi, >> > >> > I'm running a Hive script with tez-0.7.0. The progress is real slow and >> in the container logs I'm seeing constant Full GC lines, so that there >> doesn't seem to be no time for the JVM to actually execute anything between >> the GC pauses. >> > >> > When running the same Hive script with mr execution engine, the job >> goes through normally. >> > >> > So there's something specific to Tez's memory usage that causes the >> Full GC issue. >> > >> > Also with similar clusters & configuration other Hive jobs have gone >> through with Tez just fine. This issue happens when I just add a little >> more data to be processed by the script. With a smaller workload it goes >> through also with Tez engine with the expected execution time. >> > >> > For example an extract from one of the container logs: >> > >> > >> application_1445328511212_0001/container_1445328511212_0001_01_000292/stdout.gz >> > >> > 791.208: [Full GC >> > [PSYoungGen: 58368K->56830K(116736K)] >> > [ParOldGen: 348914K->348909K(349184K)] >> > 407282K->405740K(465920K) >> > [PSPermGen: 43413K->43413K(43520K)], 1.4063790 secs] [Times: user=5.22 >> sys=0.04, real=1.40 secs] >> > Heap >> > PSYoungGen total 116736K, used 58000K [0x00000000f5500000, >> 0x0000000100000000, 0x0000000100000000) >> > eden space 58368K, 99% used >> [0x00000000f5500000,0x00000000f8da41a0,0x00000000f8e00000) >> > from space 58368K, 0% used >> [0x00000000f8e00000,0x00000000f8e00000,0x00000000fc700000) >> > to space 58368K, 0% used >> [0x00000000fc700000,0x00000000fc700000,0x0000000100000000) >> > ParOldGen total 349184K, used 348909K [0x00000000e0000000, >> 0x00000000f5500000, 0x00000000f5500000) >> > object space 349184K, 99% used >> [0x00000000e0000000,0x00000000f54bb4b0,0x00000000f5500000) >> > PSPermGen total 43520K, used 43413K [0x00000000d5a00000, >> 0x00000000d8480000, 0x00000000e0000000) >> > object space 43520K, 99% used >> [0x00000000d5a00000,0x00000000d84657a8,0x00000000d8480000) >> > >> > If I understand the GC log correctly, it seems like ParOldGen is full >> and Full GC doesn't manage to free space from there. So maybe Tez has >> created too many objects that can't be released. It could be a memory leak. >> Or maybe this is just not big enough minimum heap for Tez in general? I >> could probably fix the problem by changing configuration somehow to simply >> have less containers and thus bigger heap size per container? Still, >> changing to bigger nodes doesn't seem like a solution that would eventually >> scale, so I would prefer to resolve this properly. >> > >> > Please, could you help me with how to troubleshoot & fix this issue? >> > >> > Cheers, >> > Juho >> >
