I solved the problem by setting materialize to false while getting the readable : myClassData.asReadable(false) . Though I am still not sure why this happens.
Tahir Tahir On Wed, Sep 23, 2015 at 1:36 PM, Tahir Hameed <[email protected]> wrote: > Hi Gabriel, > > Thanks for the answer. After implementing what you suggested, I am getting > the following error: > > 2015-09-23 13:23:10,859 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : org.apache.crunch.CrunchRuntimeException: Can't > find local cache file for '/tmp/crunch-253557813/p1' > at > org.apache.crunch.io.impl.ReadableDataImpl.getCacheFilePath(ReadableDataImpl.java:81) > at > org.apache.crunch.io.impl.ReadableDataImpl.access$000(ReadableDataImpl.java:42) > at > org.apache.crunch.io.impl.ReadableDataImpl$1.apply(ReadableDataImpl.java:93) > at > org.apache.crunch.io.impl.ReadableDataImpl$1.apply(ReadableDataImpl.java:90) > at > com.google.common.collect.Lists$TransformingRandomAccessList.get(Lists.java:451) > at java.util.AbstractList$Itr.next(AbstractList.java:358) > at com.google.common.collect.Iterables$3.next(Iterables.java:508) > at com.google.common.collect.Iterables$3.next(Iterables.java:501) > at com.google.common.collect.Iterators$5.hasNext(Iterators.java:544) > at > com.bol.step.enrichmentdashboard.ProductsDoFN.initialize(ProductsDoFN.java:35) > at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:71) > at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:73) > at > org.apache.crunch.impl.mr.run.CrunchMapper.setup(CrunchMapper.java:48) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > > > > Can you suggest where I can be going wrong? > > > Tahir > > > On Wed, Sep 23, 2015 at 11:57 AM, Gabriel Reid <[email protected]> > wrote: > >> Hi Tahir, >> >> If I understand correctly, then you're trying to load the contents of >> a PTable into memory within a DoFn. >> >> This can be done via the PCollection.asReadable method. A couple of >> examples of this can be seen in the BloomFilterJoinStrategy.join and >> MapsideJoinStrategy.joinInternal methods. The general idea is that you >> pass a ReadableData instances into the constructor of you DoFn, and >> then you can access the contents of the underlying PCollection by >> iterating over the ReadableData within the initialize method of your >> DoFn. >> >> - Gabriel >> >> >> On Wed, Sep 23, 2015 at 9:56 AM, Tahir Hameed <[email protected]> wrote: >> > Hi, >> > >> > I've a PTable which I store as an Avro file. The PTable file is later >> to be >> > used in another DoFn after it is converted into a HashMap. >> > >> > PTable<String, MyClass> myClassData = table.parallelDo(new >> > >> MyClassDoFN(),Avros.tableOf(Avros.strings(),Avros.reflects(MyClass.class))); >> > Target target=To.avroFile("/user/xyz/output/"); >> > myClassData.write(target,Target.WriteMode.OVERWRITE); >> > >> > Can you please tell me how this file maybe read in another DoFn? >> > >> > Best, >> > >> > Tahir >> > >> > >> > >
