Hi Egor,

There is the Row type which is not strongly typed (such as TupleX) but
supports arbitrary number of fields and null-valued fields.

The DataSet API does not have a split operator and implementing this would
be much more difficult than one would expect. The problem is in the
optimizer which assumes that all outputs of an operator receive the same
data. So we would have to change the plan enumeration logic.
However, there is a workaround for this. I would convert the String into an
Either<String, Double> (Flink features a Java Either type), emit the
dataset to two filters and the first filters on Either.isLeft and the
second on Either.isRight (or you you use a FlatMap to directly extract it
from the Either:

DataSet<String> input = ...
DataSet<Either<Double, String> parsed = input.map(// string -> either);
DataSet<Double> doubles = parsed.flatMap(// if Either.isLeft ->
Either.left);
DataSet<String> failed = parsed.flatMap(// Either.isRight -> Either.right);

Best, Fabian


2017-07-27 8:46 GMT+02:00 Егор Литвиненко <e.v.litvinenk...@gmail.com>:

> Hi
>
> Is there a way to process mapping errors in Flink?
> For example when string is valid double write in one table, otherwise in
> another?
> If not, what problems you see reffered to this opportunity and if I will
> make PR, where I should start to implenent this feature?
>
> I saw Tuple1, 2, etc. Many methods for different tuples to define types of
> DataSet.
> But I don't see Tuple with custom size. I mean something like new
> Tuple(List<Class<?>> types)
> Did I miss something?
>
> In best regards, Egor Litvinenko
>
  • Questions Егор Литвиненко
    • Re: Questions Fabian Hueske

Reply via email to