Hey Ben,

No easy way to do it right now besides writing the data yourself, though
that sort of simulation-based use case has been in the back of my mind ever
since we added the NLineFileSource. What would your ideal API look like
here?

Thanks,
J

On Wed, Jan 21, 2015 at 9:01 AM, Benjamin Mears <[email protected]>
wrote:

> Hi,
>
> I'm trying to write a Crunch job to generate a large amount of simulated
> data.  To kick the job off, I need inputs into a do function.  These inputs
> are essentially dummy values that will be ignored in the do fn.  To
> accomplish this, I'd like to create an inmemory PCollection that can then
> be passed into a MR pipeline, but if I do this with MemPipeline.collectionOf
> I get an error:
>
> Exception in thread "main" java.lang.IllegalStateException:  named 'null' 
> cannot be serialized
>       at 
> org.apache.crunch.impl.mem.collect.MemCollection.verifySerializable(MemCollection.java:110)
>       at 
> org.apache.crunch.impl.mem.collect.MemCollection.parallelDo(MemCollection.java:129)
>
> Is it possible to explicitly declare/instantiate a PCollection to pass into 
> an MRPipeline?
>
> Thanks!
>
> -Ben
>
>


-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Reply via email to