Re: Dataset filter improvement

2016-02-24 Thread Till Rohrmann
Hi Flavio, it works the following way: Your data type will serialized by the PojoSerializer iff it is a POJO. Iff it is a generic type which cannot be serialized by any of the other serializers, then Kryo is used. If it is a POJO type and you’re having DataStream which can also contain subtypes

Re: Dataset filter improvement

2016-02-24 Thread Flavio Pompermaier
Thanks Max and Till for the answers. However I still didn't understand fully the difference...Here are my doubts: - If I don't register any of my POJO classes, they will be serialized with Kryo (black box for Flink) - If I register all of my POJO using env.registerType they will be

Re: Dataset filter improvement

2016-02-23 Thread Till Rohrmann
Registering a data type is only relevant for the Kryo serializer or if you want to serialize a subclass of a POJO. Registering has the advantage that you assign an id to the class which is written instead of the full class name. The latter is usually much longer than the id. Cheers, Till On Tue,

Re: Dataset filter improvement

2016-02-23 Thread Maximilian Michels
Hi Flavio, I think the point is that Flink can use its serialization tools if you register the class in advance. If you don't do that, it will use Kryo as a fall-back which is slightly less efficient. Equals and hash code have to be implemented correctly if you compare Pojos. For standard types

Re: Dataset filter improvement

2016-02-17 Thread Flavio Pompermaier
Hi Max, why do I need to register them? My job runs without problem also without that. The only problem with my POJOs was that I had to implement equals and hash correctly, Flink didn't enforce me to do it but then results were wrong :( On Wed, Feb 17, 2016 at 10:16 AM Maximilian Michels

Re: Dataset filter improvement

2016-02-17 Thread Maximilian Michels
Hi Flavio, Stephan was referring to env.registerType(ExtendedClass1.class); env.registerType(ExtendedClass2.class); Cheers, Max On Wed, Feb 10, 2016 at 12:48 PM, Flavio Pompermaier wrote: > What do you mean exactly..? Probably I'm missing something here..remember > that

Re: Dataset filter improvement

2016-02-10 Thread Flavio Pompermaier
Because The classes are not related to each other. Do you think it's a good idea to have something like this? abstract class BaseClass(){ String someField; } class ExtendedClass1 extends BaseClass (){ String someOtherField11; String someOtherField12; String someOtherField13; ... }

Re: Dataset filter improvement

2016-02-10 Thread Stephan Ewen
Why not use an abstract base class and N subclasses? On Wed, Feb 10, 2016 at 10:05 AM, Fabian Hueske wrote: > Unfortunately, there is no Either<1,...,n>. > You could implement something like a Tuple3 Option>. However, Flink does not provide an Option type

Re: Dataset filter improvement

2016-02-10 Thread Fabian Hueske
Hi Flavio, I did not completely understand which objects should go where, but here are some general guidelines: - early filtering is mostly a good idea (unless evaluating the filter expression is very expensive) - you can use a flatMap function to combine a map and a filter - applying multiple

Re: Dataset filter improvement

2016-02-10 Thread Flavio Pompermaier
Yes, the intermediate dataset I create then join again between themselves. What I'd need is a Either<1,...,n>. Is that possible to add? Otherwise I was thinking to generate a Tuple2 and in the subsequent filter+map/flatMap deserialize only those elements I want to group togheter

Re: Dataset filter improvement

2016-02-10 Thread Fabian Hueske
Unfortunately, there is no Either<1,...,n>. You could implement something like a Tuple3. However, Flink does not provide an Option type (comes with Java8). You would need to implement it yourself incl. TypeInfo and Serializer. You can get some inspiration from the Either

Re: Dataset filter improvement

2016-02-10 Thread Flavio Pompermaier
What do you mean exactly..? Probably I'm missing something here..remember that I can specify the right subClass only after the last flatMap, after the first map neither me nor Flink can know the exact subclass of BaseClass On Wed, Feb 10, 2016 at 12:42 PM, Stephan Ewen wrote:

Re: Dataset filter improvement

2016-02-10 Thread Stephan Ewen
Class hierarchies should definitely work, even if the base class has no fields. They work more efficiently if you register the subclasses at the execution environment (Flink cannot infer them from the function signatures because the function signatures only contain the abstract base class). On

Dataset filter improvement

2016-02-09 Thread Flavio Pompermaier
Hi to all, in my program I have a Dataset that generated different types of object wrt the incoming element. Thus it's like a Map. In order to type the different generated datasets I do something: Dataset start =... Dataset ds1 = start.filter().map(..); Dataset ds2 =

Re: Dataset filter improvement

2016-02-09 Thread Flavio Pompermaier
Any help on this? On 9 Feb 2016 18:03, "Flavio Pompermaier" wrote: > Hi to all, > > in my program I have a Dataset that generated different types of object > wrt the incoming element. > Thus it's like a Map. > In order to type the different generated datasets