This is more like a hadoop issue. Check the dfs UI to see if data nodes are up.
On Mon, Jan 9, 2012 at 4:18 PM, Michael Lok <[email protected]> wrote: > Hi folks, > > Not sure if this is related to Pig or Hadoop in general; but I'm > posting this here since I'm running Pig scripts :) > > Anyway, I've been trying to perform a CROSS join between 2 files which > results in ~1 billion records. My Hadoop cluster has 4 data nodes. > The namenode also serves as one of the data nodes as well (not > recommended, but haven't had time to reconfigure this yet :P). After > executing the Pig script, it threw the following exception at around > 80+%: > > java.io.IOException: org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet:/user/root/out/_tempora > ry/_attempt_201201091651_0001_r_000001_3/part-r-00001 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1517) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:685) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:399) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > Pig script shown below: > > ============================================================ > set job.name 'vac cross 2'; > set default_parallel 10; > > register lib/*.jar; > > define DIST com.pig.udf.Distance(); > > js = load 'js.csv' using PigStorage(',') as (ic:chararray, > jsstate:chararray); > > vac = load 'vac.csv' using PigStorage(',') as (id:chararray, > vacstate:chararray); > > cx = cross js, vac; > > d = foreach cx generate ic, jsstate, id, vacstate, DIST(jsstate, vacstate); > > store d into 'out' using PigStorage(','); > ============================================================ > > Any help is greatly appreciated. > > > Thanks! >
