Yes, but even using the : E = GROUP B *ALL PARALLEL 100;* I got only one reduce (an obviously<http://www.youtube.com/watch?v=hMtZfW2z9dw>no space to process everything)
I tried Group by something and worked. Could be some optimization issue!? On Fri, Feb 11, 2011 at 3:10 PM, Alan Gates <[email protected]> wrote: > Possible, but it will be ignored. Anything done inside a nested foreach > block will be executed at the parallel level of the preceding group by. > > Alan. > > > On Feb 11, 2011, at 8:57 AM, Charles Gonçalves wrote: > > Is possible to use a parallel statment inside a nested foreach block like >> in >> : >> >> 28 E = GROUP B ALL PARALLEL 100; >> >> >> >> 29 >> >> >> >> 30 edge_breakdown = FOREACH E { >> >> >> >> 31 dist_cIps = DISTINCT B.cIp *PARALLEL X * ; >> >> >> >> 32 dist_sIps = DISTINCT B.sIp ; >> >> >> >> 33 urls_ok = FILTER B BY valid(url); >> >> >> >> 34 GENERATE COUNT(dist_cIps),COUNT(dist_sIps) ,COUNT(urls_ok.url), >> COUNT(B.url), SUM(B.scBytes); >> >> >> 35 } >> >> I got an error : >> ERROR 1000: Error during parsing. Encountered " "parallel" "PARALLEL "" at >> line 36, column 36. >> Was expecting: >> ";" ... >> >> My problem is that I'm using PARALLEL in line 28 an also setting the >> 14 SET DEFAULT_PARALLEL 30; >> >> But even though I'm gotting just one reducer !! >> >> Is some optimization that I can disable? >> I already tried to play with the pig.exec.reducers.bytes.per.reducer and >> nothin. >> I'm processing 2TB of data an one reduce is yielding no space left on >> device error! >> >> Any >> >> >> -- >> *Charles Ferreira Gonçalves * >> http://homepages.dcc.ufmg.br/~charles/ >> UFMG - ICEx - Dcc >> Cel.: 55 31 87741485 >> Tel.: 55 31 34741485 >> Lab.: 55 31 34095840 >> > > -- *Charles Ferreira Gonçalves * http://homepages.dcc.ufmg.br/~charles/ UFMG - ICEx - Dcc Cel.: 55 31 87741485 Tel.: 55 31 34741485 Lab.: 55 31 34095840
