Are join/groupBy operations with wide Java Beans using Dataset API much slower than using RDD API?

2016-08-04 Thread dueckm
Hello, I built a prototype that uses join and groupBy operations via Spark RDD API. Recently I migrated it to the Dataset API. Now it runs much slower than with the original RDD implementation. Did I do something wrong here? Or is this a price I have to pay for the more convienient API? Is there

Are join/groupBy operations with wide Java Beans using Dataset API much slower than using RDD API?

2016-08-03 Thread dueckm
Hello, first of all - excuse me for sending this post more than once, but I am new to this mailing list and did not subscribe completely, so I suspect my previous postings will not be accepted ... I built a prototype that uses join and groupBy operations via Spark RDD API. Recently I migrated

Are join/groupBy operations with wide Java Beans using Dataset API much slower than using RDD API? [*]

2016-08-02 Thread dueckm
Hello, I built a prototype that uses join and groupBy operations via Spark RDD API. Recently I migrated it to the Dataset API. Now it runs much slower than with the original RDD implementation. Did I do something wrong here? Or is this the price I have to pay for the more convienient API? Is

Are join/groupBy operations with wide Java Beans using Dataset API much slower than using RDD API?

2016-08-02 Thread dueckm
Hello, I built a prototype that uses join and groupBy operations via Spark RDD API. Recently I migrated it to the Dataset API. Now it runs much slower than with the original RDD implementation. Did I do something wrong here? Or is this the price I have to pay for the more convienient API? Is