Hi to all, in my Flink job I create a Dataset<MyThriftObj> using HadoopInputFormat in this way:
HadoopInputFormat<Void, MyThriftObj> inputFormat = new HadoopInputFormat<>( new ParquetThriftInputFormat<MyThriftObj>(), Void.class, MyThriftObj.class, job); FileInputFormat.addInputPath(job, new org.apache.hadoop.fs.Path(inputPath); *DataSet<Tuple2<Void, MyThriftObj>> ds* = env.createInput(inputFormat); Flink logs this message: - TypeExtractor -* class MyThriftObj contains custom serialization methods we do not call.* Indeed MyThriftObj has readObject/writeObject functions and when I print the type of ds I see: - Java Tuple2<Void,* GenericType<MyThriftObj>*> Fom my experience GenericType is a performace killer...what should I do to improve the reading/writing of MyThriftObj? Best, Flavio -- Flavio Pompermaier Development Department OKKAM S.r.l. Tel. +(39) 0461 1823908