Hey Ben, No easy way to do it right now besides writing the data yourself, though that sort of simulation-based use case has been in the back of my mind ever since we added the NLineFileSource. What would your ideal API look like here?
Thanks, J On Wed, Jan 21, 2015 at 9:01 AM, Benjamin Mears <[email protected]> wrote: > Hi, > > I'm trying to write a Crunch job to generate a large amount of simulated > data. To kick the job off, I need inputs into a do function. These inputs > are essentially dummy values that will be ignored in the do fn. To > accomplish this, I'd like to create an inmemory PCollection that can then > be passed into a MR pipeline, but if I do this with MemPipeline.collectionOf > I get an error: > > Exception in thread "main" java.lang.IllegalStateException: named 'null' > cannot be serialized > at > org.apache.crunch.impl.mem.collect.MemCollection.verifySerializable(MemCollection.java:110) > at > org.apache.crunch.impl.mem.collect.MemCollection.parallelDo(MemCollection.java:129) > > Is it possible to explicitly declare/instantiate a PCollection to pass into > an MRPipeline? > > Thanks! > > -Ben > > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
