Hi,
I'm trying to write a Crunch job to generate a large amount of simulated
data. To kick the job off, I need inputs into a do function. These inputs
are essentially dummy values that will be ignored in the do fn. To
accomplish this, I'd like to create an inmemory PCollection that can then
be passed into a MR pipeline, but if I do this with MemPipeline.collectionOf
I get an error:
Exception in thread "main" java.lang.IllegalStateException: named
'null' cannot be serialized
at
org.apache.crunch.impl.mem.collect.MemCollection.verifySerializable(MemCollection.java:110)
at
org.apache.crunch.impl.mem.collect.MemCollection.parallelDo(MemCollection.java:129)
Is it possible to explicitly declare/instantiate a PCollection to pass
into an MRPipeline?
Thanks!
-Ben