Re: Lineage between Datasets
Does it mean any two Datasets's physical plans are independent? Thanks Chang On Thu, Apr 13, 2017 at 12:53 AM, Reynold Xin wrote: > The physical plans are not subtrees, but the analyzed plan (before the > optimizer runs) is actually similar to "lineage". You can get that by > calling explain(true) and look at the analyzed plan. > > > On Wed, Apr 12, 2017 at 3:03 AM Chang Chen wrote: > >> Hi All >> >> I believe that there is no lineage between datasets. Consider this case: >> >> val people = spark.read.parquet("...").as[Person] >> >> val ageGreatThan30 = people.filter("age > 30") >> >> Since the second DS can push down the condition, they are obviously >> different logical plans and hence are different physical plan. >> >> What I understanding is right? >> >> Thanks >> Chang >> >
Re: Lineage between Datasets
The physical plans are not subtrees, but the analyzed plan (before the optimizer runs) is actually similar to "lineage". You can get that by calling explain(true) and look at the analyzed plan. On Wed, Apr 12, 2017 at 3:03 AM Chang Chen wrote: > Hi All > > I believe that there is no lineage between datasets. Consider this case: > > val people = spark.read.parquet("...").as[Person] > > val ageGreatThan30 = people.filter("age > 30") > > Since the second DS can push down the condition, they are obviously > different logical plans and hence are different physical plan. > > What I understanding is right? > > Thanks > Chang >
Lineage between Datasets
Hi All I believe that there is no lineage between datasets. Consider this case: val people = spark.read.parquet("...").as[Person] val ageGreatThan30 = people.filter("age > 30") Since the second DS can push down the condition, they are obviously different logical plans and hence are different physical plan. What I understanding is right? Thanks Chang