Re: HDFS "file" missing a part-file

Björn-Elmar Macek Mon, 01 Oct 2012 13:37:07 -0700


The script i now want to executed looks like this:

x = load 'tag_count_ts_pro_userpair' as(group:tuple(),cnt:int,times:bag{t:tuple(c:chararray)});y = foreach x generate *, moins.daysFromStart('2011-06-01 00:00:00',times);

store y into 'test_daysFromStart';

The problem is, that i do not have the logs anymore due to spaceconstraints within the cluster. But i think i can explain the importantparts:The script that created this data was a GROUP statement followed by aFOREACH calculating a COUNT on the bag mentioned above as "times" whichis represented in the 2nd column named "cnt". The results were storedvia a simple "store".The resulting pig calculation started as expected, but stoppped showingme progress at a certain percentage. A "tail -f" on the hadoop/logs dirrevealed that the hadoop calculation progressed nontheless - althoughsome of the tasktrackers permanently vanished during the shuffle phasewith the committed/eof/mortbay exception and stopped at least producingany more log output. As i really continiously watched the log i couldsee, that those work packages were handled by the remaining serversafter some of them already calculated packages of progress 1.0. Even thecleanup phase in the end was done, ALTHOUGH(!) the pig log didn'treflect the calculations of the cluster. And since i found the file asoutput in hdfs i supposed the missing pig progress log entries weresimply pig problems. Maybe im wrong with that.

But i did the calculations several times and this happened during everyexecution.


Is there something wrong with the data or the calculations?

On Mon, 1 Oct 2012 13:01:41 -0700, Robert Molina<[email protected]> wrote:

It seems that maybe the previous pig script didn't generate theoutput

data or write correctly on hdfs. Can you provide the pig script you
are trying to run?  Also, for the original script that ran and
generated the file, can you verify if that job had any failed tasks?

On Mon, Oct 1, 2012 at 10:31 AM, Björn-Elmar Macek  wrote:

 Hi Robert,

 the exception i see in the output of the grunt shell and in the pig
log respectively is:

 Backend error message
 ---------------------
 java.util.EmptyStackException
         at java.util.Stack.peek(Stack.java:102)
         at

org.apache.pig.builtin.Utf8StorageConverter.consumeTuple(Utf8StorageConverter.java:182)
         at

org.apache.pig.builtin.Utf8StorageConverter.bytesToTuple(Utf8StorageConverter.java:501)
         at

org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:905)
         at

org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
         at

org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
         at

org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
         at

org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
         at

org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
         at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
         at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
         at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
         at
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
         at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
         at java.security.AccessController.doPrivileged(Native
Method)
         at javax.security.auth.Subject.doAs(Subject.java:415)
         at

org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
         at org.apache.hadoop.mapred.Child.main(Child.java:249)

 On Mon, 1 Oct 2012 10:12:22 -0700, Robert Molina  wrote:

 Hi Bjorn, 
 Can you post the exception you are getting during the map phase?

 On Mon, Oct 1, 2012 at 9:11 AM, Björn-Elmar Macek  wrote:

  Hi,

  i am kind of unsure where to post this problem, but i think it is
 more related to hadoop than to pig.

  By successfully executing a pig script i created a new file in my

hdfs. Sadly though, i cannot use it for further processing exceptfor

 "dump"ing and viewing the data: every data-manipulation
script-command
 just as "foreach" gives exceptions during the map phase.
  Since there was no problem executing the same script on the first
100

lines of my data (LIMIT statement),i copied it to my local fsfolder.

  What i realized is, that one of the files namely part-r-000001 was
 empty and contained within the _temporary folder.

  Is there any reason for this? How can i fix this issue? Did the job
 (which created the file we are talking about) NOT run properly til
its
 end, although the tasktracker worked til the very end and the file
was
 created?

  Best regards,
  Björn

 Links:
 ------
 [1] mailto:[email protected] [3]



Links:
------
[1] mailto:[email protected]
[2] mailto:[email protected]
[3] mailto:[email protected]

Re: HDFS "file" missing a part-file

Reply via email to