I'm trying to figure out why the following pig script takes forever to run.

logData = FOREACH flattenedLogData GENERATE opname, host, nanoTime, depth;

opNameGroupAll = GROUP logData by opname;
opNameGroupPerHost = GROUP logData by (opname,host);

overviewOpsAll = FOREACH opNameGroupAll GENERATE
    '$reportId', 'ALL' as scope,
    group as opname,
    COUNT(logData.opname) as cnt,
    AVG(logData.depth) as avgDepth,
    SUM(logData.nanoTime)/1000000 as sum,
    AVG(logData.nanoTime)/1000000 as avg,
    MAX(logData.nanoTime)/1000000 as max;

overviewOpsPerHost = FOREACH opNameGroupPerHost GENERATE
    '$reportId', group.host as scope,
    group.opname as opname,
    COUNT(logData.opname) as cnt,
    AVG(logData.depth) as avgDepth,
    SUM(logData.nanoTime)/1000000 as sum,
    AVG(logData.nanoTime)/1000000 as avg,
    MAX(logData.nanoTime)/1000000 as max;

STORE overviewOpsAll INTO '$outputPathRootDir/overviewOpsAll' using
PigStorage();
STORE overviewOpsPerHost INTO '$outputPathRootDir/overviewOpsPerHost' using
PigStorage();

It usually gets to around 90% then takes forever to finish the reduce
phase. I notice the following log lines in output logs.

2011-12-17 20:00:08,714 INFO
org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate of
336737356 bytes from 1 objects. init = 175243264(171136K) used =
401178152(391775K) committed = 477233152(466048K) max = 536870912(524288K)

2011-12-17 20:00:13,015 INFO
org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate
of 354470820 bytes from 1 objects. init = 175243264(171136K) used =
397146280(387838K) committed = 536870912(524288K) max =
536870912(524288K)
2011-12-17 20:00:17,814 INFO
org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate
of 365633020 bytes from 1 objects. init = 175243264(171136K) used =
407703960(398148K) committed = 536870912(524288K) max =
536870912(524288K)
2011-12-17 20:00:22,572 INFO
org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate
of 367290876 bytes from 1 objects. init = 175243264(171136K) used =
407457224(397907K) committed = 536870912(524288K) max =
536870912(524288K)

Reply via email to