we use DataFrame and RDD. Dataset not only has issues with predicate
pushdown, it also adds shufffles at times where it shouldn't. and there is
some overhead from the encoders themselves, because under the hood it is
still just Row objects.
On Mon, Jun 18, 2018 at 5:00 PM, Valery Khamenya wrote:
I get your point haha and I also think of it as DataFrame being a specific
kind of Dataset.
Mike
On Tue, May 1, 2018, 7:27 AM Lalwani, Jayesh
wrote:
> Neither.
>
>
>
> All women are humans. Not all humans are women. You wouldn’t say that a
> woman is a subset of a human.
>
>
>
> All DataFrames a
Neither.
All women are humans. Not all humans are women. You wouldn’t say that a woman
is a subset of a human.
All DataFrames are DataSets. Not all Datasets are DataFrames. The “subset”
relationship doesn’t apply here. A DataFrame is a specialized type of DataSet
From: Michael Artz
Date: Sat
Ok from the language you used, you are saying kind of that Dataset is a
subset of Dataframe. I would disagree because to me a DataFrame is just a
Dataset of org.spache.spark.sql.Row
On Sat, Apr 28, 2018, 8:34 AM Marco Mistroni wrote:
> Imho .neither..I see datasets as typed df and therefore ds
Imho .neither..I see datasets as typed df and therefore ds are enhanced df
Feel free to disagree..
Kr
On Sat, Apr 28, 2018, 2:24 PM Michael Artz wrote:
> Hi,
>
> I use Spark everyday and I have a good grip on the basics of Spark, so
> this question isnt for myself. But this came up and I wanted