Re: Dataframe vs Dataset dilemma: either Row parsing or no filter push-down

2018-06-18 Thread Koert Kuipers
we use DataFrame and RDD. Dataset not only has issues with predicate pushdown, it also adds shufffles at times where it shouldn't. and there is some overhead from the encoders themselves, because under the hood it is still just Row objects. On Mon, Jun 18, 2018 at 5:00 PM, Valery Khamenya wrote:

Re: Dataframe vs dataset

2018-05-01 Thread Michael Artz
I get your point haha and I also think of it as DataFrame being a specific kind of Dataset. Mike On Tue, May 1, 2018, 7:27 AM Lalwani, Jayesh wrote: > Neither. > > > > All women are humans. Not all humans are women. You wouldn’t say that a > woman is a subset of a human. > > > > All DataFrames a

Re: Dataframe vs dataset

2018-05-01 Thread Lalwani, Jayesh
Neither. All women are humans. Not all humans are women. You wouldn’t say that a woman is a subset of a human. All DataFrames are DataSets. Not all Datasets are DataFrames. The “subset” relationship doesn’t apply here. A DataFrame is a specialized type of DataSet From: Michael Artz Date: Sat

Re: Dataframe vs dataset

2018-04-28 Thread Michael Artz
Ok from the language you used, you are saying kind of that Dataset is a subset of Dataframe. I would disagree because to me a DataFrame is just a Dataset of org.spache.spark.sql.Row On Sat, Apr 28, 2018, 8:34 AM Marco Mistroni wrote: > Imho .neither..I see datasets as typed df and therefore ds

Re: Dataframe vs dataset

2018-04-28 Thread Marco Mistroni
Imho .neither..I see datasets as typed df and therefore ds are enhanced df Feel free to disagree.. Kr On Sat, Apr 28, 2018, 2:24 PM Michael Artz wrote: > Hi, > > I use Spark everyday and I have a good grip on the basics of Spark, so > this question isnt for myself. But this came up and I wanted