Hi to all,
in my Flink job I create a Dataset<MyThriftObj> using HadoopInputFormat in
this way:

HadoopInputFormat<Void, MyThriftObj> inputFormat = new HadoopInputFormat<>(
        new ParquetThriftInputFormat<MyThriftObj>(), Void.class,
MyThriftObj.class, job);
FileInputFormat.addInputPath(job,  new org.apache.hadoop.fs.Path(inputPath);
*DataSet<Tuple2<Void, MyThriftObj>> ds* = env.createInput(inputFormat);

Flink logs this message:

   - TypeExtractor -* class MyThriftObj contains custom serialization
   methods we do not call.*


Indeed MyThriftObj has readObject/writeObject functions and when I print
the type of ds I see:

   - Java Tuple2<Void,* GenericType<MyThriftObj>*>

Fom my experience GenericType is a performace killer...what should I do to
improve the reading/writing of MyThriftObj?

Best,
Flavio


-- 
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 1823908

Reply via email to