You could try to use the TypeSerializerInputFormat. On Thu, Jul 30, 2015 at 2:08 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote:
> How can I create a Flink dataset given a directory path that contains a > set of java objects serialized with kryo (one file per object)? > > On Thu, Jul 30, 2015 at 1:41 PM, Till Rohrmann <trohrm...@apache.org> > wrote: > >> Hi Flavio, >> >> in order to use the Kryo serializer for a given type you can use the >> registerTypeWithKryoSerializer of the ExecutionEnvironment object. What >> you provide to the method is the type you want to be serialized with kryo >> and an implementation of the com.esotericsoftware.kryo.Serializer class. >> If the given type is not supported by Flink’s own serialization framework, >> then this custom serializer should be used. You register the types at the >> beginning of your Flink program: >> >> def main(args: Array[String]): Unit = { >> val env = ExecutionEnvironment.getExecutionEnvironment >> >> env.registerTypeWithKryoSerializer(classOf[MyType], >> classOf[MyTypeSerializer]) >> >> ... >> >> env.execute() >> >> } >> >> Cheers, >> Till >> >> >> On Thu, Jul 30, 2015 at 12:45 PM, Flavio Pompermaier < >> pomperma...@okkam.it> wrote: >> >>> I have a project that produce RDF quads and I have to store to read them >>> with Flink afterwards. >>> I could use thrift/protobuf/avro but this means to add a lot of >>> transitive dependencies to my project. >>> Maybe I could use Kryo to store those objects..is there any example to >>> create a dataset of objects serialized with kryo? >>> >>> On Thu, Jul 30, 2015 at 11:10 AM, Stephan Ewen <se...@apache.org> wrote: >>> >>>> Quick response: I am not opposed to that, but there are tuple libraries >>>> around already. >>>> >>>> Do you need specifically the Flink tuples, for interoperability between >>>> Flink and other projects? >>>> >>>> On Thu, Jul 30, 2015 at 11:07 AM, Stephan Ewen <se...@apache.org> >>>> wrote: >>>> >>>>> Should we move this to the dev list? >>>>> >>>>> On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier < >>>>> pomperma...@okkam.it> wrote: >>>>> >>>>>> Any thought about this (move tuples classes in a separate >>>>>> self-contained project with no transitive dependencies so that to be >>>>>> easily >>>>>> used in other external projects)? >>>>>> >>>>>> On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier < >>>>>> pomperma...@okkam.it> wrote: >>>>>> >>>>>>> Do you think it could be a good idea to extract Flink tuples in a >>>>>>> separate project so that to allow simpler dependency management in >>>>>>> Flin-compatible projects? >>>>>>> >>>>>>> On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <fhue...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> at the moment, Tuples are more efficient than POJOs, because POJO >>>>>>>> fields are accessed via Java reflection whereas Tuple fields are >>>>>>>> directly >>>>>>>> accessed. >>>>>>>> This performance penalty could be overcome by code-generated >>>>>>>> seriliazers and comparators but I am not aware of any work in that >>>>>>>> direction. >>>>>>>> >>>>>>>> Best, Fabian >>>>>>>> >>>>>>>> 2015-07-06 11:01 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it >>>>>>>> >: >>>>>>>> >>>>>>>>> Hi to all, >>>>>>>>> I was thinking to write my own flink-compatible library and I need >>>>>>>>> basically a Tuple5. >>>>>>>>> >>>>>>>>> Is there any performace loss in using a POJO with 5 String fields >>>>>>>>> vs a Tuple5? >>>>>>>>> If yes, wouldn't be a good idea to extract flink tuples in a >>>>>>>>> separate simple project (e.g. flink-java-tuples) that has no other >>>>>>>>> dependency to enable other libs to write their flink-compatible logic >>>>>>>>> without the need to exclude all the transitive dependency of >>>>>>>>> flink-java? >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Flavio >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>> >> > >