I think it works in SparkPipeline-- I have hacks in place to fake a TIOContext inside of Spark when it's needed, but it's possible we need to add implementation of more methods to get it to work w/all of the ReadableData impls.
On Mon, May 9, 2016 at 9:26 AM, David Ortiz <[email protected]> wrote: > Thanks. That works. I also found a workaround by serializing all the > avro records into JSON in the map function that reads the data in, then > deserializing back into avro in my processing function down the line. > > Does ReadableData have issues running on a SparkPipeline? Just curious > since it takes the org.apache.hadoop.mapreduce.TaskInputOutputContext in > its read method. > > On Fri, May 6, 2016 at 4:56 PM Josh Wills <[email protected]> wrote: > >> Try using the ReadableData version of the PTable- it's an object that is >> serializable and you can read the data from it into whatever you want in >> the initialize method of the DoFn you pass it to. >> >> On Fri, May 6, 2016 at 1:03 PM David Ortiz <[email protected]> wrote: >> >>> Hello, >>> >>> In attempt to make my code a little bit easier to following, I am >>> attempting to materialize a PTable to a map and then pass it into another >>> DoFn. Unfortunately, since the value is an Avro record, I am getting a >>> NotSerializableException out of the code when I try to use it. >>> >>> I attempting to get around this by converting the record into a >>> ByteBuffer with the avro utils, but lo and behold that's also not >>> Serializable. Since I do not see a convenient way to wrap a byte array >>> with crunch, has anyone had any luck with any other approaches to getting a >>> crunch-compatible serialized avro object? >>> >>> Thanks, >>> David Ortiz >>> >>
