Re: Lineage between Datasets

2017-04-12 Thread Chang Chen
Does it mean any two Datasets's physical plans are independent?

Thanks
Chang

On Thu, Apr 13, 2017 at 12:53 AM, Reynold Xin  wrote:

> The physical plans are not subtrees, but the analyzed plan (before the
> optimizer runs) is actually similar to "lineage". You can get that by
> calling explain(true) and look at the analyzed plan.
>
>
> On Wed, Apr 12, 2017 at 3:03 AM Chang Chen  wrote:
>
>> Hi All
>>
>> I believe that there is no lineage between datasets. Consider this case:
>>
>> val people = spark.read.parquet("...").as[Person]
>>
>> val ageGreatThan30 = people.filter("age > 30")
>>
>> Since the second DS can push down the condition, they are obviously
>> different logical plans and hence are different physical plan.
>>
>> What I understanding is right?
>>
>> Thanks
>> Chang
>>
>


Re: Lineage between Datasets

2017-04-12 Thread Reynold Xin
The physical plans are not subtrees, but the analyzed plan (before the
optimizer runs) is actually similar to "lineage". You can get that by
calling explain(true) and look at the analyzed plan.


On Wed, Apr 12, 2017 at 3:03 AM Chang Chen  wrote:

> Hi All
>
> I believe that there is no lineage between datasets. Consider this case:
>
> val people = spark.read.parquet("...").as[Person]
>
> val ageGreatThan30 = people.filter("age > 30")
>
> Since the second DS can push down the condition, they are obviously
> different logical plans and hence are different physical plan.
>
> What I understanding is right?
>
> Thanks
> Chang
>


Lineage between Datasets

2017-04-12 Thread Chang Chen
Hi All

I believe that there is no lineage between datasets. Consider this case:

val people = spark.read.parquet("...").as[Person]

val ageGreatThan30 = people.filter("age > 30")

Since the second DS can push down the condition, they are obviously
different logical plans and hence are different physical plan.

What I understanding is right?

Thanks
Chang