Hi Flavio,
it works the following way: Your data type will serialized by the
PojoSerializer iff it is a POJO. Iff it is a generic type which cannot be
serialized by any of the other serializers, then Kryo is used.
If it is a POJO type and you’re having DataStream which can also contain
subtypes
Thanks Max and Till for the answers. However I still didn't understand
fully the difference...Here are my doubts:
- If I don't register any of my POJO classes, they will be serialized
with Kryo (black box for Flink)
- If I register all of my POJO using env.registerType they will be
Registering a data type is only relevant for the Kryo serializer or if you
want to serialize a subclass of a POJO. Registering has the advantage that
you assign an id to the class which is written instead of the full class
name. The latter is usually much longer than the id.
Cheers,
Till
On Tue,
Hi Flavio,
I think the point is that Flink can use its serialization tools if you
register the class in advance. If you don't do that, it will use Kryo
as a fall-back which is slightly less efficient.
Equals and hash code have to be implemented correctly if you compare
Pojos. For standard types
Hi Max,
why do I need to register them? My job runs without problem also without
that.
The only problem with my POJOs was that I had to implement equals and hash
correctly, Flink didn't enforce me to do it but then results were wrong :(
On Wed, Feb 17, 2016 at 10:16 AM Maximilian Michels
Hi Flavio,
Stephan was referring to
env.registerType(ExtendedClass1.class);
env.registerType(ExtendedClass2.class);
Cheers,
Max
On Wed, Feb 10, 2016 at 12:48 PM, Flavio Pompermaier
wrote:
> What do you mean exactly..? Probably I'm missing something here..remember
> that
Because The classes are not related to each other. Do you think it's a good
idea to have something like this?
abstract class BaseClass(){
String someField;
}
class ExtendedClass1 extends BaseClass (){
String someOtherField11;
String someOtherField12;
String someOtherField13;
...
}
Why not use an abstract base class and N subclasses?
On Wed, Feb 10, 2016 at 10:05 AM, Fabian Hueske wrote:
> Unfortunately, there is no Either<1,...,n>.
> You could implement something like a Tuple3
Hi Flavio,
I did not completely understand which objects should go where, but here are
some general guidelines:
- early filtering is mostly a good idea (unless evaluating the filter
expression is very expensive)
- you can use a flatMap function to combine a map and a filter
- applying multiple
Yes, the intermediate dataset I create then join again between themselves.
What I'd need is a Either<1,...,n>. Is that possible to add?
Otherwise I was thinking to generate a Tuple2 and in the
subsequent filter+map/flatMap deserialize only those elements I want to
group togheter
What do you mean exactly..? Probably I'm missing something here..remember
that I can specify the right subClass only after the last flatMap, after
the first map neither me nor Flink can know the exact subclass of BaseClass
On Wed, Feb 10, 2016 at 12:42 PM, Stephan Ewen wrote:
Class hierarchies should definitely work, even if the base class has no
fields.
They work more efficiently if you register the subclasses at the execution
environment (Flink cannot infer them from the function signatures because
the function signatures only contain the abstract base class).
On
Hi to all,
in my program I have a Dataset that generated different types of object wrt
the incoming element.
Thus it's like a Map.
In order to type the different generated datasets I do something:
Dataset start =...
Dataset ds1 = start.filter().map(..);
Dataset ds2 =
Any help on this?
On 9 Feb 2016 18:03, "Flavio Pompermaier" wrote:
> Hi to all,
>
> in my program I have a Dataset that generated different types of object
> wrt the incoming element.
> Thus it's like a Map.
> In order to type the different generated datasets