Re: Error Executing a Fragment Replicated Join

Renato Marroquín Mogrovejo Sun, 01 May 2011 16:31:44 -0700

Anyone please?


2011/4/29 Renato Marroquín Mogrovejo <[email protected]>:
> In spite of the fact that my execution plan says that only one
> MapReduce will be used, in my webUI there are two MR jobs for the Pig
> task, I am probably missing something here in the middle, because yeah
> replicated joins should only use one MR job, right?
> And another thing I find weird is that I tried executing the FR join
> again and I get a JavaHeapSpace problem in the second job of it, when
> before I got an error saying something like Pig was expecting X bytes
> but it was getting X+Y bytes. I haven't been able to replicate this
> error, it probably has something to do with my env at some point in
> time.
> I thought that error of Pig expecting X bytes and getting more than
> expected had something to do with Pig seeing about a 4x expansion when
> loading data from disk into memory, that is why I was asking about how
> this count is done (available java heap space > 4x FileSize) or
> something like this?
> Thanks again.
>
> #-----------------------------------------------
> # Logical Plan:
> #-----------------------------------------------
> Store 1-86 Schema: {proy_sR::sr_cde_sk: int,proy_cD::cd_dem_sk: int}
> Type: Unknown
> |
> |---LOJoin 1-25 Schema: {proy_sR::sr_cde_sk: int,proy_cD::cd_dem_sk:
> int} Type: bag
>    |   |
>    |   Project 1-23 Projections: [0] Overloaded: false FieldSchema:
> sr_cde_sk: int Type: int
>    |   Input: ForEach 1-18
>    |   |
>    |   Project 1-24 Projections: [0] Overloaded: false FieldSchema:
> cd_dem_sk: int Type: int
>    |   Input: ForEach 1-22
>    |
>    |---ForEach 1-18 Schema: {sr_cde_sk: int} Type: bag
>    |   |   |
>    |   |   Project 1-17 Projections: [0] Overloaded: false
> FieldSchema: sr_cde_sk: int Type: int
>    |   |   Input: ForEach 1-66
>    |   |
>    |   |---ForEach 1-66 Schema: {sr_cde_sk: int} Type: bag
>    |       |   |
>    |       |   Cast 1-35 FieldSchema: sr_cde_sk: int Type: int
>    |       |   |
>    |       |   |---Project 1-34 Projections: [0] Overloaded: false
> FieldSchema: sr_cde_sk: bytearray Type: bytearray
>    |       |       Input: Load 1-13
>    |       |
>    |       |---Load 1-13 Schema: {sr_cde_sk: bytearray} Type: bag
>    |
>    |---ForEach 1-22 Schema: {cd_dem_sk: int} Type: bag
>        |   |
>        |   Project 1-21 Projections: [0] Overloaded: false
> FieldSchema: cd_dem_sk: int Type: int
>        |   Input: ForEach 1-85
>        |
>        |---ForEach 1-85 Schema: {cd_dem_sk: int} Type: bag
>            |   |
>            |   Cast 1-68 FieldSchema: cd_demo_sk: int Type: int
>            |   |
>            |   |---Project 1-67 Projections: [0] Overloaded: false
> FieldSchema: cd_dem_sk: bytearray Type: bytearray
>            |       Input: Load 1-14
>            |
>            |---Load 1-14 Schema: {cd_dem_sk: bytearray} Type: bag
>
> #-----------------------------------------------
> # Physical Plan:
> #-----------------------------------------------
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-107
> |
> |---FRJoin[tuple] - 1-101
>    |   |
>    |   Project[int][0] - 1-99
>    |   |
>    |   Project[int][0] - 1-100
>    |
>    |---New For Each(false)[bag] - 1-92
>    |   |   |
>    |   |   Project[int][0] - 1-91
>    |   |
>    |   |---New For Each(false)[bag] - 1-90
>    |       |   |
>    |       |   Cast[int] - 1-89
>    |       |   |
>    |       |   |---Project[bytearray][0] - 1-88
>    |       |
>    |       
> |---Load(hdfs://berlin.labbio:54310/user/hadoop/pigData/sr.dat:PigStorage('|'))
> - 1-87
>    |
>    |---New For Each(false)[bag] - 1-98
>        |   |
>        |   Project[int][0] - 1-97
>        |
>        |---New For Each(false)[bag] - 1-96
>            |   |
>            |   Cast[int] - 1-95
>            |   |
>            |   |---Project[bytearray][0] - 1-94
>            |
>            
> |---Load(hdfs://berlin.labbio:54310/user/hadoop/pigData/cd.dat:PigStorage('|'))
> - 1-93
>
> 2011-04-29 23:04:54,727 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 2
> 2011-04-29 23:04:54,727 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 2
> #--------------------------------------------------
> # Map Reduce Plan
> #--------------------------------------------------
> MapReduce node 1-109
> Map Plan
> Store(hdfs://berlin.labbio:54310/tmp/temp1815576246/tmp379673501:org.apache.pig.builtin.BinStorage)
> - 1-110
> |
> |---New For Each(false)[bag] - 1-98
>    |   |
>    |   Project[int][0] - 1-97
>    |
>    |---New For Each(false)[bag] - 1-96
>        |   |
>        |   Cast[int] - 1-95
>        |   |
>        |   |---Project[bytearray][0] - 1-94
>        |
>        
> |---Load(hdfs://berlin.labbio:54310/user/hadoop/pigData/cd.dat:PigStorage('|'))
> - 1-93--------
> Global sort: false
> ----------------
>
> MapReduce node 1-108
> Map Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-107
> |
> |---FRJoin[tuple] - 1-101
>    |   |
>    |   Project[int][0] - 1-99
>    |   |
>    |   Project[int][0] - 1-100
>    |
>    |---New For Each(false)[bag] - 1-92
>        |   |
>        |   Project[int][0] - 1-91
>        |
>        |---New For Each(false)[bag] - 1-90
>            |   |
>            |   Cast[int] - 1-89
>            |   |
>            |   |---Project[bytearray][0] - 1-88
>            |
>            
> |---Load(hdfs://berlin.labbio:54310/user/hadoop/pigData/sr.dat:PigStorage('|'))
> - 1-87--------
> Global sort: false
> ----------------
>
>
>
> 2011/4/28 Daniel Dai <[email protected]>:
>> There should be only one job. Thanks Thejas point out.
>>
>> Daniel
>>
>>
>> -----Original Message----- From: Daniel Dai
>> Sent: Wednesday, April 27, 2011 7:18 PM
>> To: [email protected]
>> Cc: Renato Marroquín Mogrovejo ; [email protected]
>> Subject: Re: Error Executing a Fragment Replicated Join
>>
>> Do you see the failure in the first job (sampling) or second job? Do you
>> see the exception right after the job kick off?
>>
>> If the replicated side is too large, you probably will see a "Java heap
>> exception" rather than job setup exception. It more like an environment
>> issue. Check if you can run regular join, or you have other hadoop
>> config file in your classpath.
>>
>> Daniel
>>
>>
>> On 04/27/2011 05:26 PM, Renato Marroquín Mogrovejo wrote:
>>>
>>> Now that the Apache server is ok with me again, I can write back to
>>> the list. I wrote to the Apache Infra team and they told me to write
>>> messages just in plain text, disabling any html within the message
>>> (not that I ever sent html but oh well), I guess that worked :)
>>> Well, first thanks for answering. I am using pig 0.7 and my pig script
>>> is as follows:
>>>
>>> {code}
>>> sr = LOAD 'pigData/sr.dat' using PigStorage('|') AS
>>> (sr_ret_date_sk:int, sr_ret_tim_sk:int, sr_ite_sk:int, sr_cus_sk:int,
>>> sr_cde_sk:int, sr_hde_sk:int, sr_add_sk:int, sr_sto_sk:int,
>>> sr_rea_sk:int, sr_tic_num:int, sr_ret_qua:int, sr_ret_amt:double,
>>> sr_ret_tax:double, sr_ret_amt_inc_tax:double, sr_fee:double,
>>> sr_ret_sh_cst:double, sr_ref_csh:double, sr_rev_cha:double,
>>> sr_sto_cred:double, sr_net_lss:double);
>>>
>>> cd = LOAD 'pigData/cd.dat' using PigStorage('|') AS (cd_dem_sk:int,
>>> cd_gnd:chararray, cd_mrt_sts:chararray, cd_edt_sts:chararray,
>>> cd_pur_est:int, cd_cred_rtg:chararray, cd_dep_cnt:int,
>>> cd_dep_emp_cnt:int, cd_dep_col_count:int);
>>>
>>> proy_sR = FOREACH sr GENERATE sr_cde_sk;
>>> proy_cD = FOREACH cd GENERATE cd_dem_sk;
>>>
>>> join_sR_cD = JOIN proy_sR BY sr_cde_sk, proy_cD BY cd_dem_sk USING
>>> 'replicated';
>>>
>>> STORE join_sR_cD INTO 'queryResults/query.11.sr.cd.5.1' using
>>> PigStorage('|');
>>> {/code}
>>>
>>> Being "cd" the relation of 77MB and "sr" the relation of 32MB. I had
>>> some other similar queries in which the 32MB relation was being joined
>>> with smaller relations (<10MB) giving the same problem, I modified
>>> those, so the queries<10MB would be ones being replicated.
>>> Thanks again.
>>>
>>> Renato M.
>>>
>>> 2011/4/27 Thejas M Nair<[email protected]>:
>>>>
>>>> The exception indicates that the hadoop job creation failed. Are you able
>>>> to
>>>> run simple MR queries using each of the inputs ?
>>>> It could also caused by some problem pig is having with copying the file
>>>> being replicated to distributed cache.
>>>> -Thejas
>>>>
>>>>
>>>> On 4/27/11 3:42 PM, "Renato Marroquín Mogrovejo"
>>>> <[email protected]>  wrote:
>>>>
>>>> Does anybody have any suggestions? Please???
>>>> Thanks again.
>>>>
>>>> Renato M.
>>>>
>>>> 2011/4/26 Alan Gates<[email protected]>
>>>>>
>>>>> Sent for Renato, since Apache's mail system has decided it doesn't like
>>>>> him.
>>>>>
>>>>> Alan.
>>>>>
>>>>> I am getting an error while trying to execute a simple fragment
>>>>> replicated
>>>>> join on two files (one of 77MB and the other one of 32MB). I am using
>>>>> the
>>>>> 32MB file as the small one to be replicated, but I keep getting this
>>>>> error.
>>>>> Does any body know how this count is done? I mean how Pig determines
>>>>> that
>>>>> the small file is not small enough, or how I could modify this?
>>>>> I am executing these on four PC's with 3GB of RAM running DebianLenny.
>>>>> Thanks in advance.
>>>>>
>>>>>
>>>>> Renato M.
>>>>>
>>>>> Pig Stack Trace
>>>>> ---------------
>>>>> ERROR 2017: Internal error creating job configuration.
>>>>>
>>>>> org.apache.pig.backend.executionengine.ExecException: ERROR 2043:
>>>>> Unexpected
>>>>> error during execution.
>>>>>      at
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:332)
>>>>>      at
>>>>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
>>>>>      at org.apache.pig.PigServer.execute(PigServer.java:828)
>>>>>      at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>>>      at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>>>      at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>>>      at
>>>>>
>>>>> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>>>      at
>>>>>
>>>>>
>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>>>      at
>>>>>
>>>>>
>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>>>      at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>>>      at org.apache.pig.Main.main(Main.java:391)
>>>>> Caused by:
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
>>>>> ERROR 2017: Internal error creating job configuration.
>>>>>      at
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:624)
>>>>>      at
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:246)
>>>>>      at
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>
>>
>

Re: Error Executing a Fragment Replicated Join

Reply via email to