[jira] [Updated] (SPARK-34989) Improve the performance of mapChildren and withNewChildren methods
[ https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34989: - Description: One of the main performance bottlenecks in query compilation is overly-generic tree transformation methods, namely {{mapChildren}} and {{withNewChildren}} (defined in {{TreeNode}}). These methods have an overly-generic implementation to iterate over the children and rely on reflection to create new instances. We have observed that, especially for queries with large query plans, a significant amount of CPU cycles are wasted in these methods. In this PR we make these methods more efficient, by delegating the iteration and instantiation to concrete node types. The benchmarks show that we can expect significant performance improvement in total query compilation time in queries with large query plans (from 30-80%) and about 20% on average. h4. Problem detail The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To be more specific, this method: * iterates over all the fields of a node using Scala’s product iterator. While the iteration is not reflection-based, thanks to the Scala compiler generating code for {{Product}}, we create many anonymous functions and visit many nested structures (recursive calls). The anonymous functions (presumably compiled to Java anonymous inner classes) also show up quite high on the list in the object allocation profiles, so we are putting unnecessary pressure on GC here. * does a lot of comparisons. Basically for each element returned from the product iterator, we check if it is a child (contained in the list of children) and then transform it. We can avoid that by just iterating over children, but in the current implementation, we need to gather all the fields (only transform the children) so that we can instantiate the object using the reflection. * creates objects using reflection, by delegating to the {{makeCopy}} method, which is several orders of magnitude slower than using the constructor. h4. Solution The proposed solution in this PR is rather straightforward: we rewrite the {{mapChildren}} method using the {{children}} and {{withNewChildren}} methods. The default {{withNewChildren}} method suffers from the same problems as {{mapChildren}} and we need to make it more efficient by specializing it in concrete classes. Similar to how each concrete query plan node already defines its children, it should also define how they can be constructed given a new list of children. Actually, the implementation is quite simple in most cases and is a one-liner thanks to the copy method present in Scala case classes. Note that we cannot abstract over the copy method, it’s generated by the compiler for case classes if no other type higher in the hierarchy defines it. For most concrete nodes, the implementation of {{withNewChildren}} looks like this: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = copy(children = newChildren)}} The current {{withNewChildren}} method has two properties that we should preserve: * It returns the same instance if the provided children are the same as its children, i.e., it preserves referential equality. * It copies tags and maintains the origin links when a new copy is created. These properties are hard to enforce in the concrete node type implementation. Therefore, we propose a template method {{withNewChildrenInternal}} that should be rewritten by the concrete classes and let the {{withNewChildren}} method take care of referential equality and copying: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = {}} {{ if (childrenFastEquals(children, newChildren)) {}} {{ this}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildrenInternal(newChildren)}} {{ res.copyTagsFrom(this)}} {{ res}} } } } With the refactoring done in a previous PR ([#31932|https://github.com/apache/spark/pull/31932]) most tree node types fall in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or {{Ternary}}. These traits have a more efficient implementation for {{mapChildren}} and define a more specialized version of {{withNewChildrenInternal}} that avoids creating unnecessary lists. For example, the {{mapChildren}} method in {{UnaryLike}} is defined as follows: {{override final def mapChildren(f: T => T): T = {}} {{ val newChild = f(child)}} {{ if (newChild fastEquals child) {}} {{ this.asInstanceOf[T]}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildInternal(newChild)}} {{ res.copyTagsFrom(this.asInstanceOf[T])}} {{ res}} } } } h4. Results With this PR, we have observed significant performance improvements in query compilation time, more specifically in the analysis and optimization
[jira] [Updated] (SPARK-34989) Improve the performance of mapChildren and withNewChildren methods
[ https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34989: - Description: One of the main performance bottlenecks in query compilation is overly-generic tree transformation methods, namely {{mapChildren}} and {{withNewChildren}} (defined in {{TreeNode}}). These methods have an overly-generic implementation to iterate over the children and rely on reflection to create new instances. We have observed that, especially for queries with large query plans, a significant amount of CPU cycles are wasted in these methods. In this PR we make these methods more efficient, by delegating the iteration and instantiation to concrete node types. The benchmarks show that we can expect significant performance improvement in total query compilation time in queries with large query plans (from 30-80%) and about 20% on average. h4. Problem detail The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To be more specific, this method: * iterates over all the fields of a node using Scala’s product iterator. While the iteration is not reflection-based, thanks to the Scala compiler generating code for {{Product}}, we create many anonymous functions and visit many nested structures (recursive calls). The anonymous functions (presumably compiled to Java anonymous inner classes) also show up quite high on the list in the object allocation profiles, so we are putting unnecessary pressure on GC here. * does a lot of comparisons. Basically for each element returned from the product iterator, we check if it is a child (contained in the list of children) and then transform it. We can avoid that by just iterating over children, but in the current implementation, we need to gather all the fields (only transform the children) so that we can instantiate the object using the reflection. * creates objects using reflection, by delegating to the {{makeCopy}} method, which is several orders of magnitude slower than using the constructor. h4. Solution The proposed solution in this PR is rather straightforward: we rewrite the {{mapChildren}} method using the {{children}} and {{withNewChildren}} methods. The default {{withNewChildren}} method suffers from the same problems as {{mapChildren}} and we need to make it more efficient by specializing it in concrete classes. Similar to how each concrete query plan node already defines its children, it should also define how they can be constructed given a new list of children. Actually, the implementation is quite simple in most cases and is a one-liner thanks to the copy method present in Scala case classes. Note that we cannot abstract over the copy method, it’s generated by the compiler for case classes if no other type higher in the hierarchy defines it. For most concrete nodes, the implementation of {{withNewChildren}} looks like this: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = copy(children = newChildren)}} The current {{withNewChildren}} method has two properties that we should preserve: * It returns the same instance if the provided children are the same as its children, i.e., it preserves referential equality. * It copies tags and maintains the origin links when a new copy is created. These properties are hard to enforce in the concrete node type implementation. Therefore, we propose a template method {{withNewChildrenInternal}} that should be rewritten by the concrete classes and let the {{withNewChildren}} method take care of referential equality and copying: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = {}} {{ if (childrenTheSame(children, newChildren)) {}} {{ this}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildrenInternal(newChildren)}} {{ res.copyTagsFrom(this)}} {{ res}} } } } With the refactoring done in a previous PR ([#31932|https://github.com/apache/spark/pull/31932]) most tree node types fall in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or {{Ternary}}. These traits have a more efficient implementation for {{mapChildren}} and define a more specialized version of {{withNewChildrenInternal}} that avoids creating unnecessary lists. For example, the {{mapChildren}} method in {{UnaryLike}} is defined as follows: {{override final def mapChildren(f: T => T): T = {}} {{ val newChild = f(child)}} {{ if (newChild fastEquals child) {}} {{ this.asInstanceOf[T]}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildInternal(newChild)}} {{ res.copyTagsFrom(this.asInstanceOf[T])}} {{ res}} } } } h4. Results With this PR, we have observed significant performance improvements in query compilation time, more specifically in the analysis and optimization phases.
[jira] [Updated] (SPARK-34989) Improve the performance of mapChildren and withNewChildren methods
[ https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34989: - Description: One of the main performance bottlenecks in query compilation is overly-generic tree transformation methods, namely {{mapChildren}} and {{withNewChildren}} (defined in {{TreeNode}}). These methods have an overly-generic implementation to iterate over the children and rely on reflection to create new instances. We have observed that, especially for queries with large query plans, a significant amount of CPU cycles are wasted in these methods. In this PR we make these methods more efficient, by delegating the iteration and instantiation to concrete node types. The benchmarks show that we can expect significant performance improvement in total query compilation time in queries with large query plans (from 30-80%) and about 20% on average. h4. Problem detail The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To be more specific, this method: * iterates over all the fields of a node using Scala’s product iterator. While the iteration is not reflection-based, thanks to the Scala compiler generating code for {{Product}}, we create many anonymous functions and visit many nested structures (recursive calls). The anonymous functions (presumably compiled to Java anonymous inner classes) also show up quite high on the list in the object allocation profiles, so we are putting unnecessary pressure on GC here. * does a lot of comparisons. Basically for each element returned from the product iterator, we check if it is a child (contained in the list of children) and then transform it. We can avoid that by just iterating over children, but in the current implementation, we need to gather all the fields (only transform the children) so that we can instantiate the object using the reflection. * creates objects using reflection, by delegating to the {{makeCopy}} method, which is several orders of magnitude slower than using the constructor. h4. Solution The proposed solution in this PR is rather straightforward: we rewrite the {{mapChildren}} method using the {{children}} and {{withNewChildren}} methods. The default {{withNewChildren}} method suffers from the same problems as {{mapChildren}} and we need to make it more efficient by specializing it in concrete classes. Similar to how each concrete query plan node already defines its children, it should also define how they can be constructed given a new list of children. Actually, the implementation is quite simple in most cases and is a one-liner thanks to the copy method present in Scala case classes. Note that we cannot abstract over the copy method, it’s generated by the compiler for case classes if no other type higher in the hierarchy defines it. For most concrete nodes, the implementation of {{withNewChildren}} looks like this: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = copy(children = newChildren)}} The current {{withNewChildren}} method has two properties that we should preserve: * It returns the same instance if the provided children are the same as its children, i.e., it preserves referential equality. * It copies tags and maintains the origin links when a new copy is created. These properties are hard to enforce in the concrete node type implementation. Therefore, we propose a template method {{withNewChildrenInternal}} that should be rewritten by the concrete classes and let the {{withNewChildren}} method take care of referential equality and copying: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = {}} {{ if (childrenTheSame(children, newChildren)) {}} {{ this}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildrenInternal(newChildren)}} {{ res.copyTagsFrom(this)}} {{ res}} } } } With the refactoring done in a previous PR ([#31932|https://github.com/apache/spark/pull/31932]) most tree node types fall in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or {{Ternary}}. These traits have a more efficient implementation for {{mapChildren}} and define a more specialized version of {{withNewChildrenInternal}} that avoids creating unnecessary lists. For example, the {{mapChildren}} method in {{UnaryLike}} is defined as follows: {{override final def mapChildren(f: T => T): T = {}} {{ val newChild = f(child)}} {{ if (newChild fastEquals child) {}} {{ this.asInstanceOf[T]}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildInternal(newChild)}} {{ res.copyTagsFrom(this.asInstanceOf[T])}} {{ res}} \{{ }}} \{{ }}} \{{ }}} h4. Results With this PR, we have observed significant performance improvements in query compilation time, more specifically in the analysis and optimization
[jira] [Updated] (SPARK-34989) Improve the performance of mapChildren and withNewChildren methods
[ https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34989: - Description: One of the main performance bottlenecks in query compilation is overly-generic tree transformation methods, namely {{mapChildren}} and {{withNewChildren}} (defined in {{TreeNode}}). These methods have an overly-generic implementation to iterate over the children and rely on reflection to create new instances. We have observed that, especially for queries with large query plans, a significant amount of CPU cycles are wasted in these methods. In this PR we make these methods more efficient, by delegating the iteration and instantiation to concrete node types. The benchmarks show that we can expect significant performance improvement in total query compilation time in queries with large query plans (from 30-80%) and about 20% on average. h4. Problem detail The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To be more specific, this method: * iterates over all the fields of a node using Scala’s product iterator. While the iteration is not reflection-based, thanks to the Scala compiler generating code for {{Product}}, we create many anonymous functions and visit many nested structures (recursive calls). The anonymous functions (presumably compiled to Java anonymous inner classes) also show up quite high on the list in the object allocation profiles, so we are putting unnecessary pressure on GC here. * does a lot of comparisons. Basically for each element returned from the product iterator, we check if it is a child (contained in the list of children) and then transform it. We can avoid that by just iterating over children, but in the current implementation, we need to gather all the fields (only transform the children) so that we can instantiate the object using the reflection. * creates objects using reflection, by delegating to the {{makeCopy}} method, which is several orders of magnitude slower than using the constructor. h4. Solution The proposed solution in this PR is rather straightforward: we rewrite the {{mapChildren}} method using the {{children}} and {{withNewChildren}} methods. The default {{withNewChildren}} method suffers from the same problems as {{mapChildren}} and we need to make it more efficient by specializing it in concrete classes. Similar to how each concrete query plan node already defines its children, it should also define how they can be constructed given a new list of children. Actually, the implementation is quite simple in most cases and is a one-liner thanks to the copy method present in Scala case classes. Note that we cannot abstract over the copy method, it’s generated by the compiler for case classes if no other type higher in the hierarchy defines it. For most concrete nodes, the implementation of {{withNewChildren}} looks like this: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = copy(children = newChildren)}} The current {{withNewChildren}} method has two properties that we should preserve: * It returns the same instance if the provided children are the same as its children, i.e., it preserves referential equality. * It copies tags and maintains the origin links when a new copy is created. These properties are hard to enforce in the concrete node type implementation. Therefore, we propose a template method {{withNewChildrenInternal}} that should be rewritten by the concrete classes and let the {{withNewChildren}} method take care of referential equality and copying: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = {}} {{ if (childrenTheSame(children, newChildren)) {}} {{ this}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildrenInternal(newChildren)}} {{ res.copyTagsFrom(this)}} {{ res}} } } } With the refactoring done in a previous PR ([#31932|https://github.com/apache/spark/pull/31932]) most tree node types fall in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or {{Ternary}}. These traits have a more efficient implementation for {{mapChildren}} and define a more specialized version of {{withNewChildrenInternal}} that avoids creating unnecessary lists. For example, the {{mapChildren}} method in {{UnaryLike}} is defined as follows: {{override final def mapChildren(f: T => T): T = {}} {{ val newChild = f(child)}} {{ if (newChild fastEquals child) {}} {{ this.asInstanceOf[T]}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildInternal(newChild)}} {{ res.copyTagsFrom(this.asInstanceOf[T])}} {{ res}} \{{ }}} \{{ }}} \{{ }}} h4. Results With this PR, we have observed significant performance improvements in query compilation time, more specifically in the analysis and optimization
[jira] [Updated] (SPARK-34989) Improve the performance of mapChildren and withNewChildren methods
[ https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34989: - Description: One of the main performance bottlenecks in query compilation is overly-generic tree transformation methods, namely {{mapChildren}} and {{withNewChildren}} (defined in {{TreeNode}}). These methods have an overly-generic implementation to iterate over the children and rely on reflection to create new instances. We have observed that, especially for queries with large query plans, a significant amount of CPU cycles are wasted in these methods. In this PR we make these methods more efficient, by delegating the iteration and instantiation to concrete node types. The benchmarks show that we can expect significant performance improvement in total query compilation time in queries with large query plans (from 30-80%) and about 20% on average. h4. Problem detail The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To be more specific, this method: * iterates over all the fields of a node using Scala’s product iterator. While the iteration is not reflection-based, thanks to the Scala compiler generating code for {{Product}}, we create many anonymous functions and visit many nested structures (recursive calls). The anonymous functions (presumably compiled to Java anonymous inner classes) also show up quite high on the list in the object allocation profiles, so we are putting unnecessary pressure on GC here. * does a lot of comparisons. Basically for each element returned from the product iterator, we check if it is a child (contained in the list of children) and then transform it. We can avoid that by just iterating over children, but in the current implementation, we need to gather all the fields (only transform the children) so that we can instantiate the object using the reflection. * creates objects using reflection, by delegating to the {{makeCopy}} method, which is several orders of magnitude slower than using the constructor. h4. Solution The proposed solution in this PR is rather straightforward: we rewrite the {{mapChildren}} method using the {{children}} and {{withNewChildren}} methods. The default {{withNewChildren}} method suffers from the same problems as {{mapChildren}} and we need to make it more efficient by specializing it in concrete classes. Similar to how each concrete query plan node already defines its children, it should also define how they can be constructed given a new list of children. Actually, the implementation is quite simple in most cases and is a one-liner thanks to the copy method present in Scala case classes. Note that we cannot abstract over the copy method, it’s generated by the compiler for case classes if no other type higher in the hierarchy defines it. For most concrete nodes, the implementation of {{withNewChildren}} looks like this: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = copy(children = newChildren)}} The current {{withNewChildren}} method has two properties that we should preserve: * It returns the same instance if the provided children are the same as its children, i.e., it preserves referential equality. * It copies tags and maintains the origin links when a new copy is created. These properties are hard to enforce in the concrete node type implementation. Therefore, we propose a template method {{withNewChildrenInternal}} that should be rewritten by the concrete classes and let the {{withNewChildren}} method take care of referential equality and copying: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = {}} {{ if (childrenTheSame(children, newChildren)) {}} {{ this}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildrenInternal(newChildren)}} {{ res.copyTagsFrom(this)}} {{ res}} {{ } }} {{ }}} {{ } }} With the refactoring done in a previous PR ([#31932|https://github.com/apache/spark/pull/31932]) most tree node types fall in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or {{Ternary}}. These traits have a more efficient implementation for {{mapChildren}} and define a more specialized version of {{withNewChildrenInternal}} that avoids creating unnecessary lists. For example, the {{mapChildren}} method in {{UnaryLike}} is defined as follows: {{override final def mapChildren(f: T => T): T = {}} {{ val newChild = f(child)}} {{ if (newChild fastEquals child) {}} {{ this.asInstanceOf[T]}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildInternal(newChild)}} {{ res.copyTagsFrom(this.asInstanceOf[T])}} {{ res}} {{ }}} {{ }}} {{ }}} h4. Results With this PR, we have observed significant performance improvements in query compilation time, more specifically in the analysis and optimization
[jira] [Updated] (SPARK-34989) Improve the performance of mapChildren and withNewChildren methods
[ https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34989: - Description: One of the main performance bottlenecks in query compilation is overly-generic tree transformation methods, namely {{mapChildren}} and {{withNewChildren}} (defined in {{TreeNode}}). These methods have an overly-generic implementation to iterate over the children and rely on reflection to create new instances. We have observed that, especially for queries with large query plans, a significant amount of CPU cycles are wasted in these methods. In this PR we make these methods more efficient, by delegating the iteration and instantiation to concrete node types. The benchmarks show that we can expect significant performance improvement in total query compilation time in queries with large query plans (from 30-80%) and about 20% on average. h4. Problem detail The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To be more specific, this method: * iterates over all the fields of a node using Scala’s product iterator. While the iteration is not reflection-based, thanks to the Scala compiler generating code for {{Product}}, we create many anonymous functions and visit many nested structures (recursive calls). The anonymous functions (presumably compiled to Java anonymous inner classes) also show up quite high on the list in the object allocation profiles, so we are putting unnecessary pressure on GC here. * does a lot of comparisons. Basically for each element returned from the product iterator, we check if it is a child (contained in the list of children) and then transform it. We can avoid that by just iterating over children, but in the current implementation, we need to gather all the fields (only transform the children) so that we can instantiate the object using the reflection. * creates objects using reflection, by delegating to the {{makeCopy}} method, which is several orders of magnitude slower than using the constructor. h4. Solution The proposed solution in this PR is rather straightforward: we rewrite the {{mapChildren}} method using the {{children}} and {{withNewChildren}} methods. The default {{withNewChildren}} method suffers from the same problems as {{mapChildren}} and we need to make it more efficient by specializing it in concrete classes. Similar to how each concrete query plan node already defines its children, it should also define how they can be constructed given a new list of children. Actually, the implementation is quite simple in most cases and is a one-liner thanks to the copy method present in Scala case classes. Note that we cannot abstract over the copy method, it’s generated by the compiler for case classes if no other type higher in the hierarchy defines it. For most concrete nodes, the implementation of {{withNewChildren}} looks like this: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = copy(children = newChildren)}} The current {{withNewChildren}} method has two properties that we should preserve: * It returns the same instance if the provided children are the same as its children, i.e., it preserves referential equality. * It copies tags and maintains the origin links when a new copy is created. These properties are hard to enforce in the concrete node type implementation. Therefore, we propose a template method {{withNewChildrenInternal}} that should be rewritten by the concrete classes and let the {{withNewChildren}} method take care of referential equality and copying: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = {}} {{ if (childrenTheSame(children, newChildren)) {}} {{ this}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildrenInternal(newChildren)}} {{ res.copyTagsFrom(this)}} {{ res}} {{ } }} {{ } }} {{ } }} With the refactoring done in a previous PR ([#31932|https://github.com/apache/spark/pull/31932]) most tree node types fall in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or {{Ternary}}. These traits have a more efficient implementation for {{mapChildren}} and define a more specialized version of {{withNewChildrenInternal}} that avoids creating unnecessary lists. For example, the {{mapChildren}} method in {{UnaryLike}} is defined as follows: {{override final def mapChildren(f: T => T): T = {}} {{ val newChild = f(child)}} {{ if (newChild fastEquals child) {}} {{ this.asInstanceOf[T]}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildInternal(newChild)}} {{ res.copyTagsFrom(this.asInstanceOf[T])}} {{ res}} {{ }}} {{ }}} {{ }}} h4. Results With this PR, we have observed significant performance improvements in query compilation time, more specifically in the analysis and optimization phases.
[jira] [Updated] (SPARK-34989) Improve the performance of mapChildren and withNewChildren methods
[ https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34989: - Description: One of the main performance bottlenecks in query compilation is overly-generic tree transformation methods, namely {{mapChildren}} and {{withNewChildren}} (defined in {{TreeNode}}). These methods have an overly-generic implementation to iterate over the children and rely on reflection to create new instances. We have observed that, especially for queries with large query plans, a significant amount of CPU cycles are wasted in these methods. In this PR we make these methods more efficient, by delegating the iteration and instantiation to concrete node types. The benchmarks show that we can expect significant performance improvement in total query compilation time in queries with large query plans (from 30-80%) and about 20% on average. h4. Problem detail The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To be more specific, this method: * iterates over all the fields of a node using Scala’s product iterator. While the iteration is not reflection-based, thanks to the Scala compiler generating code for {{Product}}, we create many anonymous functions and visit many nested structures (recursive calls). The anonymous functions (presumably compiled to Java anonymous inner classes) also show up quite high on the list in the object allocation profiles, so we are putting unnecessary pressure on GC here. * does a lot of comparisons. Basically for each element returned from the product iterator, we check if it is a child (contained in the list of children) and then transform it. We can avoid that by just iterating over children, but in the current implementation, we need to gather all the fields (only transform the children) so that we can instantiate the object using the reflection. * creates objects using reflection, by delegating to the {{makeCopy}} method, which is several orders of magnitude slower than using the constructor. h4. Solution The proposed solution in this PR is rather straightforward: we rewrite the {{mapChildren}} method using the {{children}} and {{withNewChildren}} methods. The default {{withNewChildren}} method suffers from the same problems as {{mapChildren}} and we need to make it more efficient by specializing it in concrete classes. Similar to how each concrete query plan node already defines its children, it should also define how they can be constructed given a new list of children. Actually, the implementation is quite simple in most cases and is a one-liner thanks to the copy method present in Scala case classes. Note that we cannot abstract over the copy method, it’s generated by the compiler for case classes if no other type higher in the hierarchy defines it. For most concrete nodes, the implementation of {{withNewChildren}} looks like this: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = copy(children = newChildren)}} The current {{withNewChildren}} method has two properties that we should preserve: * It returns the same instance if the provided children are the same as its children, i.e., it preserves referential equality. * It copies tags and maintains the origin links when a new copy is created. These properties are hard to enforce in the concrete node type implementation. Therefore, we propose a template method {{withNewChildrenInternal}} that should be rewritten by the concrete classes and let the {{withNewChildren}} method take care of referential equality and copying: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = {}} {{ if (childrenTheSame(children, newChildren)) {}} {{ this}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildrenInternal(newChildren)}} {{ res.copyTagsFrom(this)}} {{ res}} {{ } }} {{ } }} {{ } }} With the refactoring done in a previous PR ([#31932|https://github.com/apache/spark/pull/31932]) most tree node types fall in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or {{Ternary}}. These traits have a more efficient implementation for {{mapChildren}} and define a more specialized version of {{withNewChildrenInternal}} that avoids creating unnecessary lists. For example, the {{mapChildren}} method in {{UnaryLike}} is defined as follows: {{override final def mapChildren(f: T => T): T = {}} {{ val newChild = f(child)}} {{ if (newChild fastEquals child) {}} {{ this.asInstanceOf[T]}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildInternal(newChild)}} {{ res.copyTagsFrom(this.asInstanceOf[T])}} {{ res}} {{ }}} {{ }}} {{ }}} h4. Results With this PR, we have observed significant performance improvements in query compilation time, more specifically in the analysis and optimization phases.
[jira] [Updated] (SPARK-34989) Improve the performance of mapChildren and withNewChildren methods
[ https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34989: - Description: One of the main performance bottlenecks in query compilation is overly-generic tree transformation methods, namely {{mapChildren}} and {{withNewChildren}} (defined in {{TreeNode}}). These methods have an overly-generic implementation to iterate over the children and rely on reflection to create new instances. We have observed that, especially for queries with large query plans, a significant amount of CPU cycles are wasted in these methods. In this PR we make these methods more efficient, by delegating the iteration and instantiation to concrete node types. The benchmarks show that we can expect significant performance improvement in total query compilation time in queries with large query plans (from 30-80%) and about 20% on average. h4. Problem detail The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To be more specific, this method: * iterates over all the fields of a node using Scala’s product iterator. While the iteration is not reflection-based, thanks to the Scala compiler generating code for {{Product}}, we create many anonymous functions and visit many nested structures (recursive calls). The anonymous functions (presumably compiled to Java anonymous inner classes) also show up quite high on the list in the object allocation profiles, so we are putting unnecessary pressure on GC here. * does a lot of comparisons. Basically for each element returned from the product iterator, we check if it is a child (contained in the list of children) and then transform it. We can avoid that by just iterating over children, but in the current implementation, we need to gather all the fields (only transform the children) so that we can instantiate the object using the reflection. * creates objects using reflection, by delegating to the {{makeCopy}} method, which is several orders of magnitude slower than using the constructor. h4. Solution The proposed solution in this PR is rather straightforward: we rewrite the {{mapChildren}} method using the {{children}} and {{withNewChildren}} methods. The default {{withNewChildren}} method suffers from the same problems as {{mapChildren}} and we need to make it more efficient by specializing it in concrete classes. Similar to how each concrete query plan node already defines its children, it should also define how they can be constructed given a new list of children. Actually, the implementation is quite simple in most cases and is a one-liner thanks to the copy method present in Scala case classes. Note that we cannot abstract over the copy method, it’s generated by the compiler for case classes if no other type higher in the hierarchy defines it. For most concrete nodes, the implementation of {{withNewChildren}} looks like this: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = copy(children = newChildren)}} The current {{withNewChildren}} method has two properties that we should preserve: * It returns the same instance if the provided children are the same as its children, i.e., it preserves referential equality. * It copies tags and maintains the origin links when a new copy is created. These properties are hard to enforce in the concrete node type implementation. Therefore, we propose a template method {{withNewChildrenInternal}} that should be rewritten by the concrete classes and let the {{withNewChildren}} method take care of referential equality and copying: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = {}} {{ if (childrenTheSame(children, newChildren)) {}} {{ this}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildrenInternal(newChildren)}} {{ res.copyTagsFrom(this)}} {{ res}} {{ }}} {{ }}} {{}}} With the refactoring done in a previous PR ([#31932|https://github.com/apache/spark/pull/31932]) most tree node types fall in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or {{Ternary}}. These traits have a more efficient implementation for {{mapChildren}} and define a more specialized version of {{withNewChildrenInternal}} that avoids creating unnecessary lists. For example, the {{mapChildren}} method in {{UnaryLike}} is defined as follows: {{override final def mapChildren(f: T => T): T = {}} {{ val newChild = f(child)}} {{ if (newChild fastEquals child) {}} {{ this.asInstanceOf[T]}} {{ } else {}} {{ CurrentOrigin.withOrigin(origin) {}} {{ val res = withNewChildInternal(newChild)}} {{ res.copyTagsFrom(this.asInstanceOf[T])}} {{ res}} {{ }}} {{ }}} {{ }}} h4. Results With this PR, we have observed significant performance improvements in query compilation time, more specifically in the analysis and optimization phases. The
[jira] [Updated] (SPARK-34989) Improve the performance of mapChildren and withNewChildren methods
[ https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34989: - Description: One of the main performance bottlenecks in query compilation is overly-generic tree transformation methods, namely {{mapChildren}} and {{withNewChildren}} (defined in {{TreeNode}}). These methods have an overly-generic implementation to iterate over the children and rely on reflection to create new instances. We have observed that, especially for queries with large query plans, a significant amount of CPU cycles are wasted in these methods. In this PR we make these methods more efficient, by delegating the iteration and instantiation to concrete node types. The benchmarks show that we can expect significant performance improvement in total query compilation time in queries with large query plans (from 30-80%) and about 20% on average. h4. Problem detail The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To be more specific, this method: * iterates over all the fields of a node using Scala’s product iterator. While the iteration is not reflection-based, thanks to the Scala compiler generating code for {{Product}}, we create many anonymous functions and visit many nested structures (recursive calls). The anonymous functions (presumably compiled to Java anonymous inner classes) also show up quite high on the list in the object allocation profiles, so we are putting unnecessary pressure on GC here. * does a lot of comparisons. Basically for each element returned from the product iterator, we check if it is a child (contained in the list of children) and then transform it. We can avoid that by just iterating over children, but in the current implementation, we need to gather all the fields (only transform the children) so that we can instantiate the object using the reflection. * creates objects using reflection, by delegating to the {{makeCopy}} method, which is several orders of magnitude slower than using the constructor. h4. Solution The proposed solution in this PR is rather straightforward: we rewrite the {{mapChildren}} method using the {{children}} and {{withNewChildren}} methods. The default {{withNewChildren}} method suffers from the same problems as {{mapChildren}} and we need to make it more efficient by specializing it in concrete classes. Similar to how each concrete query plan node already defines its children, it should also define how they can be constructed given a new list of children. Actually, the implementation is quite simple in most cases and is a one-liner thanks to the copy method present in Scala case classes. Note that we cannot abstract over the copy method, it’s generated by the compiler for case classes if no other type higher in the hierarchy defines it. For most concrete nodes, the implementation of {{withNewChildren}} looks like this: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = copy(children = newChildren)}} The current {{withNewChildren}} method has two properties that we should preserve: * It returns the same instance if the provided children are the same as its children, i.e., it preserves referential equality. * It copies tags and maintains the origin links when a new copy is created. These properties are hard to enforce in the concrete node type implementation. Therefore, we propose a template method {{withNewChildrenInternal}} that should be rewritten by the concrete classes and let the {{withNewChildren}} method take care of referential equality and copying: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = { if (childrenTheSame(children, newChildren)) { this } else { CurrentOrigin.withOrigin(origin) { val res = withNewChildrenInternal(newChildren) res.copyTagsFrom(this) res } } }}} With the refactoring done in a previous PR ([#31932|https://github.com/apache/spark/pull/31932]) most tree node types fall in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or {{Ternary}}. These traits have a more efficient implementation for {{mapChildren}} and define a more specialized version of {{withNewChildrenInternal}} that avoids creating unnecessary lists. For example, the {{mapChildren}} method in {{UnaryLike}} is defined as follows: {{ override final def mapChildren(f: T => T): T = { val newChild = f(child) if (newChild fastEquals child) { this.asInstanceOf[T] } else { CurrentOrigin.withOrigin(origin) { val res = withNewChildInternal(newChild) res.copyTagsFrom(this.asInstanceOf[T]) res } } }}} h4. Results With this PR, we have observed significant performance improvements in query compilation time, more specifically in the analysis and optimization phases. The table below shows the TPC-DS queries that had more than 25% speedup in compilation times. Biggest speedups are observed in queries with
[jira] [Created] (SPARK-34989) Improve the performance of mapChildren and withNewChildren methods
Ali Afroozeh created SPARK-34989: Summary: Improve the performance of mapChildren and withNewChildren methods Key: SPARK-34989 URL: https://issues.apache.org/jira/browse/SPARK-34989 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Ali Afroozeh One of the main performance bottlenecks in query compilation is overly-generic tree transformation methods, namely {{mapChildren}} and {{withNewChildren}} (defined in {{TreeNode}}). These methods have an overly-generic implementation to iterate over the children and rely on reflection to create new instances. We have observed that, especially for queries with large query plans, a significant amount of CPU cycles are wasted in these methods. In this PR we make these methods more efficient, by delegating the iteration and instantiation to concrete node types. The benchmarks show that we can expect significant performance improvement in total query compilation time in queries with large query plans (from 30-80%) and about 20% on average. h4. Problem detail The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To be more specific, this method: * iterates over all the fields of a node using Scala’s product iterator. While the iteration is not reflection-based, thanks to the Scala compiler generating code for {{Product}}, we create many anonymous functions and visit many nested structures (recursive calls). The anonymous functions (presumably compiled to Java anonymous inner classes) also show up quite high on the list in the object allocation profiles, so we are putting unnecessary pressure on GC here. * does a lot of comparisons. Basically for each element returned from the product iterator, we check if it is a child (contained in the list of children) and then transform it. We can avoid that by just iterating over children, but in the current implementation, we need to gather all the fields (only transform the children) so that we can instantiate the object using the reflection. * creates objects using reflection, by delegating to the {{makeCopy}} method, which is several orders of magnitude slower than using the constructor. h4. Solution The proposed solution in this PR is rather straightforward: we rewrite the {{mapChildren}} method using the {{children}} and {{withNewChildren}} methods. The default {{withNewChildren}} method suffers from the same problems as {{mapChildren}} and we need to make it more efficient by specializing it in concrete classes. Similar to how each concrete query plan node already defines its children, it should also define how they can be constructed given a new list of children. Actually, the implementation is quite simple in most cases and is a one-liner thanks to the copy method present in Scala case classes. Note that we cannot abstract over the copy method, it’s generated by the compiler for case classes if no other type higher in the hierarchy defines it. For most concrete nodes, the implementation of {{withNewChildren}} looks like this: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = copy(children = newChildren)}} The current {{withNewChildren}} method has two properties that we should preserve: * It returns the same instance if the provided children are the same as its children, i.e., it preserves referential equality. * It copies tags and maintains the origin links when a new copy is created. These properties are hard to enforce in the concrete node type implementation. Therefore, we propose a template method {{withNewChildrenInternal}} that should be rewritten by the concrete classes and let the {{withNewChildren}} method take care of referential equality and copying: {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = \{ if (childrenTheSame(children, newChildren)) { this } else \{ CurrentOrigin.withOrigin(origin) { val res = withNewChildrenInternal(newChildren) res.copyTagsFrom(this) res } } }}} With the refactoring done in a previous PR ([#31932|https://github.com/apache/spark/pull/31932]) most tree node types fall in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or {{Ternary}}. These traits have a more efficient implementation for {{mapChildren}} and define a more specialized version of {{withNewChildrenInternal}} that avoids creating unnecessary lists. For example, the {{mapChildren}} method in {{UnaryLike}} is defined as follows: {{ override final def mapChildren(f: T => T): T = \{ val newChild = f(child) if (newChild fastEquals child) { this.asInstanceOf[T] } else \{ CurrentOrigin.withOrigin(origin) { val res = withNewChildInternal(newChild) res.copyTagsFrom(this.asInstanceOf[T]) res } } }}} h4. Results With this PR, we
[jira] [Updated] (SPARK-34969) Followup for Refactor TreeNode's children handling methods into specialized traits (SPARK-34906)
[ https://issues.apache.org/jira/browse/SPARK-34969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34969: - Summary: Followup for Refactor TreeNode's children handling methods into specialized traits (SPARK-34906) (was: Followup for SPARK-34906) > Followup for Refactor TreeNode's children handling methods into specialized > traits (SPARK-34906) > > > Key: SPARK-34969 > URL: https://issues.apache.org/jira/browse/SPARK-34969 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ali Afroozeh >Priority: Major > > This is a followup for https://issues.apache.org/jira/browse/SPARK-34906 > In this PR we: > * Introduce the QuaternaryLike trait for node types with 4 children. > * Specialize more node types > * Fix a number of style errors that were introduced in the original PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34969) Followup for SPARK-34906
Ali Afroozeh created SPARK-34969: Summary: Followup for SPARK-34906 Key: SPARK-34969 URL: https://issues.apache.org/jira/browse/SPARK-34969 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.1 Reporter: Ali Afroozeh This is a followup for https://issues.apache.org/jira/browse/SPARK-34906 In this PR we: * Introduce the QuaternaryLike trait for node types with 4 children. * Specialize more node types * Fix a number of style errors that were introduced in the original PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
[ https://issues.apache.org/jira/browse/SPARK-34906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34906: - Description: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example UnaryExpression, UnaryNode and UnaryExec for representing an expression, a logical plan and a physical plan with only one child, respectively. This PR refactors the TreeNode hierarchy by extracting the children handling functionality into the following traits. UnaryExpression` and other similar classes now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend AggregateFunction cannot extend UnaryExpression as AggregateFunction defines the foldable method final while UnaryExpression defines it as non final. With the new traits, we can directly extend the concrete class from UnaryLike in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. was: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example UnaryExpression, UnaryNode and UnaryExec for representing an expression, a logical plan and a physical plan with only one child, respectively. This PR refactors the TreeNode hierarchy by extracting the children handling functionality into the following traits. The former nodes such as UnaryExpression now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend AggregateFunction cannot extend UnaryExpression as AggregateFunction defines the foldable method final while UnaryExpression defines it as non final. With the new traits, we can directly extend the concrete class from UnaryLike in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. > Refactor TreeNode's children handling methods into specialized traits > - > > Key: SPARK-34906 > URL: https://issues.apache.org/jira/browse/SPARK-34906 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ali Afroozeh >Priority: Major > > Spark query plan node hierarchy has specialized traits (or abstract classes) > for handling nodes with fixed number of
[jira] [Updated] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
[ https://issues.apache.org/jira/browse/SPARK-34906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34906: - Description: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example UnaryExpression, UnaryNode and UnaryExec for representing an expression, a logical plan and a physical plan with only one child, respectively. This PR refactors the TreeNode hierarchy by extracting the children handling functionality into the following traits. The former nodes such as UnaryExpression now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend AggregateFunction cannot extend UnaryExpression as AggregateFunction defines the foldable method final while UnaryExpression defines it as non final. With the new traits, we can directly extend the concrete class from UnaryLike in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. was: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example UnaryExpression, UnaryNode and UnaryExec for representing an expression, a logical plan and a physical plan with only one child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as UnaryExpression now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend AggregateFunction cannot extend UnaryExpression as AggregateFunction defines the foldable method final while UnaryExpression defines it as non final. With the new traits, we can directly extend the concrete class from UnaryLike in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. > Refactor TreeNode's children handling methods into specialized traits > - > > Key: SPARK-34906 > URL: https://issues.apache.org/jira/browse/SPARK-34906 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ali Afroozeh >Priority: Major > > Spark query plan node hierarchy has specialized traits (or abstract classes) > for handling nodes with fixed number of
[jira] [Updated] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
[ https://issues.apache.org/jira/browse/SPARK-34906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34906: - Description: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example UnaryExpression, UnaryNode and UnaryExec for representing an expression, a logical plan and a physical plan with only one child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as UnaryExpression now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend AggregateFunction cannot extend UnaryExpression as AggregateFunction defines the foldable method final while UnaryExpression defines it as non final. With the new traits, we can directly extend the concrete class from UnaryLike in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. was: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an expression, a logical plan and a physical plan with only one child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as `UnaryExpression` now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend `AggregateFunction` cannot extend `UnaryExpression` as `AggregateFunction` defines the `foldable` method final while `UnaryExpression` defines it as non final. With the new traits, we can directly extend the concrete class from `UnaryLike` in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. > Refactor TreeNode's children handling methods into specialized traits > - > > Key: SPARK-34906 > URL: https://issues.apache.org/jira/browse/SPARK-34906 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ali Afroozeh >Priority: Major > > Spark query plan node hierarchy has specialized traits (or abstract classes) > for handling nodes with
[jira] [Updated] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
[ https://issues.apache.org/jira/browse/SPARK-34906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34906: - Description: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an expression, a logical plan and a physical plan with only one child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as `UnaryExpression` now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend `AggregateFunction` cannot extend `UnaryExpression` as `AggregateFunction` defines the `foldable` method final while `UnaryExpression` defines it as non final. With the new traits, we can directly extend the concrete class from `UnaryLike` in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. was: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an expression, a logical plan and a physical plan with child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as `UnaryExpression` now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend `AggregateFunction` cannot extend `UnaryExpression` as `AggregateFunction` defines the `foldable` method final while `UnaryExpression` defines it as non final. With the new traits, we can directly extend the concrete class from `UnaryLike` in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. > Refactor TreeNode's children handling methods into specialized traits > - > > Key: SPARK-34906 > URL: https://issues.apache.org/jira/browse/SPARK-34906 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ali Afroozeh >Priority: Major > > Spark query plan node hierarchy has specialized traits (or abstract classes) > for handling
[jira] [Created] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
Ali Afroozeh created SPARK-34906: Summary: Refactor TreeNode's children handling methods into specialized traits Key: SPARK-34906 URL: https://issues.apache.org/jira/browse/SPARK-34906 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.1 Reporter: Ali Afroozeh Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an expression, a logical plan and a physical plan with child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as `UnaryExpression` now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} * This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend `AggregateFunction` cannot extend `UnaryExpression` as `AggregateFunction` defines the `foldable` method final while `UnaryExpression` defines it as non final. With the new traits, we can directly extend the concrete class from `UnaryLike` in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
[ https://issues.apache.org/jira/browse/SPARK-34906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34906: - Description: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an expression, a logical plan and a physical plan with child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as `UnaryExpression` now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend `AggregateFunction` cannot extend `UnaryExpression` as `AggregateFunction` defines the `foldable` method final while `UnaryExpression` defines it as non final. With the new traits, we can directly extend the concrete class from `UnaryLike` in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. was: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an expression, a logical plan and a physical plan with child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as `UnaryExpression` now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} * This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend `AggregateFunction` cannot extend `UnaryExpression` as `AggregateFunction` defines the `foldable` method final while `UnaryExpression` defines it as non final. With the new traits, we can directly extend the concrete class from `UnaryLike` in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. > Refactor TreeNode's children handling methods into specialized traits > - > > Key: SPARK-34906 > URL: https://issues.apache.org/jira/browse/SPARK-34906 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ali Afroozeh >Priority: Major > > Spark query plan node hierarchy has specialized traits (or abstract classes) > for handling nodes with fixed number of
[jira] [Created] (SPARK-32800) Remove ExpressionSet from the 2.13 branch
Ali Afroozeh created SPARK-32800: Summary: Remove ExpressionSet from the 2.13 branch Key: SPARK-32800 URL: https://issues.apache.org/jira/browse/SPARK-32800 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Ali Afroozeh ExpressionSet does not extent Scala Set anymore, and therefore, can be removed from the 2.13 branch. This is a followup on https://issues.apache.org/jira/browse/SPARK-32755. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32755) Maintain the order of expressions in AttributeSet and ExpressionSet
Ali Afroozeh created SPARK-32755: Summary: Maintain the order of expressions in AttributeSet and ExpressionSet Key: SPARK-32755 URL: https://issues.apache.org/jira/browse/SPARK-32755 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Ali Afroozeh Expressions identity is based on the ExprId which is an auto-incremented number. This means that the same query can yield a query plan with different expression ids in different runs. AttributeSet and ExpressionSet internally use a HashSet as the underlying data structure, and therefore cannot guarantee the a fixed order of operations in different runs. This can be problematic in cases we like to check for plan changes in different runs. We change do the following changes to AttributeSet and ExpressionSet to maintain the insertion order of the elements: * We change the underlying data structure of AttributeSet from HashSet to LinkedHashSet to maintain the insertion order. * ExpressionSet already uses a list to keep track of the expressions, however, since it is extending Scala's immutable.Set class, operations such as map and flatMap are delegated to the immutable.Set itself. This means that the result of these operations is not an instance of ExpressionSet anymore, rather it's a implementation picked up by the parent class. We also remove this inheritance from immutable.Set and implement the needed methods directly. ExpressionSet has a very specific semantics and it does not make sense to extend immutable.Set anyway. * We change the PlanStabilitySuite to not sort the attributes, to be able to catch changes in the order of expressions in different runs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31721) Assert optimized plan is initialized before tracking the execution of planning
[ https://issues.apache.org/jira/browse/SPARK-31721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-31721: - Description: The {{QueryPlanningTracker}} in {{QueryExeuction}} reports the planning time that also includes the optimization time. This happens because the {{optimizedPlan}} in {{QueryExecution}} is lazy and only will initialize when first called. When {{df.queryExecution.executedPlan}} is called, the the tracker starts recording the planning time, and then calls the optimized plan. This causes the planning time to start before optimization and also include the planning time. This PR fixes this behavior by introducing a method {{assertOptimized}}, similar to {{assertAnalyzed}} that explicitly initializes the optimized plan. This method is called before measuring the time for {{sparkPlan}} and {{executedPlan}}. We call it before {{sparkPlan}} because that also counts as planning time. was: The {{QueryPlanningTracker}} in {{QueryExeuction}} reports the planning time that also includes the optimization time. This happens because the {{optimizedPlan}} in {{QueryExecution}} is lazy and only will initialize when first called. When {{df.queryExecution.executedPlan}} is called, the the tracker starts recording the planning time, and then calls the optimized plan. This causes the planning time to start before optimization and also include the planning time. This PR fixes this behavior by introducing a method {{assertOptimized}}, similar to {{assertAnalyzed}}that explicitly initializes the optimized plan. This method is called before measuring the time for {{sparkPlan}} and {{executedPlan}}. We call it before {{sparkPlan}} because that also counts as planning time. > Assert optimized plan is initialized before tracking the execution of planning > -- > > Key: SPARK-31721 > URL: https://issues.apache.org/jira/browse/SPARK-31721 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ali Afroozeh >Priority: Major > > The {{QueryPlanningTracker}} in {{QueryExeuction}} reports the planning time > that also includes the optimization time. This happens because the > {{optimizedPlan}} in {{QueryExecution}} is lazy and only will initialize when > first called. When {{df.queryExecution.executedPlan}} is called, the the > tracker starts recording the planning time, and then calls the optimized > plan. This causes the planning time to start before optimization and also > include the planning time. > This PR fixes this behavior by introducing a method {{assertOptimized}}, > similar to {{assertAnalyzed}} that explicitly initializes the optimized plan. > This method is called before measuring the time for {{sparkPlan}} and > {{executedPlan}}. We call it before {{sparkPlan}} because that also counts as > planning time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31721) Assert optimized plan is initialized before tracking the execution of planning
Ali Afroozeh created SPARK-31721: Summary: Assert optimized plan is initialized before tracking the execution of planning Key: SPARK-31721 URL: https://issues.apache.org/jira/browse/SPARK-31721 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Ali Afroozeh The {{QueryPlanningTracker}} in {{QueryExeuction}} reports the planning time that also includes the optimization time. This happens because the {{optimizedPlan}} in {{QueryExecution}} is lazy and only will initialize when first called. When {{df.queryExecution.executedPlan}} is called, the the tracker starts recording the planning time, and then calls the optimized plan. This causes the planning time to start before optimization and also include the planning time. This PR fixes this behavior by introducing a method {{assertOptimized}}, similar to {{assertAnalyzed}}that explicitly initializes the optimized plan. This method is called before measuring the time for {{sparkPlan}} and {{executedPlan}}. We call it before {{sparkPlan}} because that also counts as planning time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31719) Refactor JoinSelection
Ali Afroozeh created SPARK-31719: Summary: Refactor JoinSelection Key: SPARK-31719 URL: https://issues.apache.org/jira/browse/SPARK-31719 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Ali Afroozeh This PR extracts the logic for selecting the planned join type out of the `JoinSelection` rule and moves it to `JoinSelectionHelper` in Catalyst. This change both cleans up the code in `JoinSelection` and allows the logic to be in one place and be used from other rules that need to make decision based on the join type before the planning time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31192) Introduce PushProjectThroughLimit
Ali Afroozeh created SPARK-31192: Summary: Introduce PushProjectThroughLimit Key: SPARK-31192 URL: https://issues.apache.org/jira/browse/SPARK-31192 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Ali Afroozeh Currently the {{CollapseProject}} rule does many things: not only it collapses stacked projects, but also pushes down projects into limits, windows, etc. In this PR we factored out rules from {{CollapseProject}} that were pushing projects into limits and introduced a new rule called {{PushProjectThroughLimit.}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30798) Scope Session.active in QueryExecution
Ali Afroozeh created SPARK-30798: Summary: Scope Session.active in QueryExecution Key: SPARK-30798 URL: https://issues.apache.org/jira/browse/SPARK-30798 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Ali Afroozeh SparkSession.active is a thread local variable that points to the current thread's spark session. It is important to note that the SQLConf.get method depends on SparkSession.active. In the current implementation it is possible that SparkSession.active points to a different session which causes various problems. Most of these problems arise because part of the query processing is done using the configurations of a different session. For example, when creating a data frame using a new session, i.e., session.sql("..."), part of the data frame is constructed using the currently active spark session, which can be a different session from the one used later for processing the query. This PR scopes SparkSession.active to prevent the above-mentioned problems. A new method, withActive is introduced on SparkSession that restores the previous spark session after the block of code is executed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30072) Create dedicated planner for subqueries
[ https://issues.apache.org/jira/browse/SPARK-30072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-30072: - Description: This PR changes subquery planning by calling the planner and plan preparation rules on the subquery plan directly. Before we were creating a QueryExecution instance for subqueries to get the executedPlan. This would re-run analysis and optimization on the subqueries plan. Running the analysis again on an optimized query plan can have unwanted consequences, as some rules, for example DecimalPrecision, are not idempotent. As an example, consider the expression 1.7 * avg(a) which after applying the DecimalPrecision rule becomes: promote_precision(1.7) * promote_precision(avg(a)) After the optimization, more specifically the constant folding rule, this expression becomes: 1.7 * promote_precision(avg(a)) Now if we run the analyzer on this optimized query again, we will get: promote_precision(1.7) * promote_precision(promote_precision(avg(a))) Which will later optimized as: 1.7 * promote_precision(promote_precision(avg(a))) As can be seen, re-running the analysis and optimization on this expression results in an expression with extra nested promote_preceision nodes. Adding unneeded nodes to the plan is problematic because it can eliminate situations where we can reuse the plan. We opted to introduce dedicated planners for subuqueries, instead of making the DecimalPrecision rule idempotent, because this eliminates this entire category of problems. Another benefit is that planning time for subqueries is reduced. > Create dedicated planner for subqueries > --- > > Key: SPARK-30072 > URL: https://issues.apache.org/jira/browse/SPARK-30072 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ali Afroozeh >Priority: Minor > > This PR changes subquery planning by calling the planner and plan preparation > rules on the subquery plan directly. Before we were creating a QueryExecution > instance for subqueries to get the executedPlan. This would re-run analysis > and optimization on the subqueries plan. Running the analysis again on an > optimized query plan can have unwanted consequences, as some rules, for > example DecimalPrecision, are not idempotent. > As an example, consider the expression 1.7 * avg(a) which after applying the > DecimalPrecision rule becomes: > promote_precision(1.7) * promote_precision(avg(a)) > After the optimization, more specifically the constant folding rule, this > expression becomes: > 1.7 * promote_precision(avg(a)) > Now if we run the analyzer on this optimized query again, we will get: > promote_precision(1.7) * promote_precision(promote_precision(avg(a))) > Which will later optimized as: > 1.7 * promote_precision(promote_precision(avg(a))) > As can be seen, re-running the analysis and optimization on this expression > results in an expression with extra nested promote_preceision nodes. Adding > unneeded nodes to the plan is problematic because it can eliminate situations > where we can reuse the plan. > We opted to introduce dedicated planners for subuqueries, instead of making > the DecimalPrecision rule idempotent, because this eliminates this entire > category of problems. Another benefit is that planning time for subqueries is > reduced. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30072) Create dedicated planner for subqueries
Ali Afroozeh created SPARK-30072: Summary: Create dedicated planner for subqueries Key: SPARK-30072 URL: https://issues.apache.org/jira/browse/SPARK-30072 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Environment: This PR changes subquery planning by calling the planner and plan preparation rules on the subquery plan directly. Before we were creating a QueryExecution instance for subqueries to get the executedPlan. This would re-run analysis and optimization on the subqueries plan. Running the analysis again on an optimized query plan can have unwanted consequences, as some rules, for example DecimalPrecision, are not idempotent. As an example, consider the expression 1.7 * avg(x) which after applying the DecimalPrecision rule becomes: promote_precision(1.7) * promote_precision(avg(x)) After the optimization, more specifically the constant folding rule, this expression becomes: 1.7 * promote_precision(avg(x)) Now if we run the analyzer on this optimized query again, we will get: promote_precision(1.7) * promote_precision(promote_precision(avg(x))) Which will later optimized as: 1.7 * promote_precision(promote_precision(avg(x))) As can be seen, re-running the analysis and optimization on this expression results in an expression with extra nested promote_preceision nodes. Adding unneeded nodes to the plan is problematic because it can eliminate situations where we can reuse the plan. We opted to introduce dedicated planners for subuqueries, instead of making the DecimalPrecision rule idempotent, because this eliminates this entire category of problems. Another benefit is that planning time for subqueries is reduced. Reporter: Ali Afroozeh -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30072) Create dedicated planner for subqueries
[ https://issues.apache.org/jira/browse/SPARK-30072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-30072: - Environment: (was: This PR changes subquery planning by calling the planner and plan preparation rules on the subquery plan directly. Before we were creating a QueryExecution instance for subqueries to get the executedPlan. This would re-run analysis and optimization on the subqueries plan. Running the analysis again on an optimized query plan can have unwanted consequences, as some rules, for example DecimalPrecision, are not idempotent. As an example, consider the expression 1.7 * avg(x) which after applying the DecimalPrecision rule becomes: promote_precision(1.7) * promote_precision(avg(x)) After the optimization, more specifically the constant folding rule, this expression becomes: 1.7 * promote_precision(avg(x)) Now if we run the analyzer on this optimized query again, we will get: promote_precision(1.7) * promote_precision(promote_precision(avg(x))) Which will later optimized as: 1.7 * promote_precision(promote_precision(avg(x))) As can be seen, re-running the analysis and optimization on this expression results in an expression with extra nested promote_preceision nodes. Adding unneeded nodes to the plan is problematic because it can eliminate situations where we can reuse the plan. We opted to introduce dedicated planners for subuqueries, instead of making the DecimalPrecision rule idempotent, because this eliminates this entire category of problems. Another benefit is that planning time for subqueries is reduced.) > Create dedicated planner for subqueries > --- > > Key: SPARK-30072 > URL: https://issues.apache.org/jira/browse/SPARK-30072 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ali Afroozeh >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28836) Remove the canonicalize(attributes) method from PlanExpression
[ https://issues.apache.org/jira/browse/SPARK-28836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-28836: - Summary: Remove the canonicalize(attributes) method from PlanExpression (was: Remove the canonicalize() method ) > Remove the canonicalize(attributes) method from PlanExpression > -- > > Key: SPARK-28836 > URL: https://issues.apache.org/jira/browse/SPARK-28836 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ali Afroozeh >Priority: Minor > > The canonicalize(attrs: AttributeSeq) method in PlanExpression is somewhat > confusing. > First, it is not clear why `PlanExpression` should have this method, and why > the canonicalization is not handled > by the canonicalized method of its parent, the Expression class. Second, the > QueryPlan.normalizeExpressionId > is the only place where PlanExpression.canonicalized is being called. > This PR removes the canonicalize method from the PlanExpression class and > delegates the normalization of expression ids to > the QueryPlan.normalizedExpressionId method. Also, the name > normalizedExpressions is more suitable for this method, > therefore, the method has also been renamed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28836) Remove the canonicalize() method
[ https://issues.apache.org/jira/browse/SPARK-28836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-28836: - Summary: Remove the canonicalize() method (was: Improve canonicalize API) > Remove the canonicalize() method > - > > Key: SPARK-28836 > URL: https://issues.apache.org/jira/browse/SPARK-28836 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ali Afroozeh >Priority: Minor > > The canonicalize(attrs: AttributeSeq) method in PlanExpression is somewhat > confusing. > First, it is not clear why `PlanExpression` should have this method, and why > the canonicalization is not handled > by the canonicalized method of its parent, the Expression class. Second, the > QueryPlan.normalizeExpressionId > is the only place where PlanExpression.canonicalized is being called. > This PR removes the canonicalize method from the PlanExpression class and > delegates the normalization of expression ids to > the QueryPlan.normalizedExpressionId method. Also, the name > normalizedExpressions is more suitable for this method, > therefore, the method has also been renamed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28836) Improve canonicalize API
[ https://issues.apache.org/jira/browse/SPARK-28836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-28836: - Description: The canonicalize(attrs: AttributeSeq) method in PlanExpression is somewhat confusing. First, it is not clear why `PlanExpression` should have this method, and why the canonicalization is not handled by the canonicalized method of its parent, the Expression class. Second, the QueryPlan.normalizeExpressionId is the only place where PlanExpression.canonicalized is being called. This PR removes the canonicalize method from the PlanExpression class and delegates the normalization of expression ids to the QueryPlan.normalizedExpressionId method. Also, the name normalizedExpressions is more suitable for this method, therefore, the method has also been renamed. was: The canonicalize(attrs: AttributeSeq) method in PlanExpression is somewhat confusing. First, it is not clear why `PlanExpression` should have this method, and why the canonicalization is not handled by the canonicalized method of its parent, the Expression class. Second, the QueryPlan.normalizeExpressionId is the only place where PlanExpression.canonicalized is being called. This PR simplifies the canonicalize method on PlanExpression and delegates the normalization of expression ids to the QueryPlan.normalizedExpressionId method. Also, the name normalizedExpressions is more suitable for this method, therefore, the method has also been renamed. > Improve canonicalize API > > > Key: SPARK-28836 > URL: https://issues.apache.org/jira/browse/SPARK-28836 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ali Afroozeh >Priority: Minor > > The canonicalize(attrs: AttributeSeq) method in PlanExpression is somewhat > confusing. > First, it is not clear why `PlanExpression` should have this method, and why > the canonicalization is not handled > by the canonicalized method of its parent, the Expression class. Second, the > QueryPlan.normalizeExpressionId > is the only place where PlanExpression.canonicalized is being called. > This PR removes the canonicalize method from the PlanExpression class and > delegates the normalization of expression ids to > the QueryPlan.normalizedExpressionId method. Also, the name > normalizedExpressions is more suitable for this method, > therefore, the method has also been renamed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28836) Improve canonicalize API
[ https://issues.apache.org/jira/browse/SPARK-28836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-28836: - Description: The canonicalize(attrs: AttributeSeq) method in PlanExpression is somewhat confusing. First, it is not clear why `PlanExpression` should have this method, and why the canonicalization is not handled by the canonicalized method of its parent, the Expression class. Second, the QueryPlan.normalizeExpressionId is the only place where PlanExpression.canonicalized is being called. This PR simplifies the canonicalize method on PlanExpression and delegates the normalization of expression ids to the QueryPlan.normalizedExpressionId method. Also, the name normalizedExpressions is more suitable for this method, therefore, the method has also been renamed. was:This PR improves the `canonicalize` API by removing the method `def canonicalize(attrs: AttributeSeq): PlanExpression[T]` in `PlanExpression` and taking care of normalizing expressions in `QueryPlan`. > Improve canonicalize API > > > Key: SPARK-28836 > URL: https://issues.apache.org/jira/browse/SPARK-28836 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ali Afroozeh >Priority: Minor > > The canonicalize(attrs: AttributeSeq) method in PlanExpression is somewhat > confusing. > First, it is not clear why `PlanExpression` should have this method, and why > the canonicalization is not handled > by the canonicalized method of its parent, the Expression class. Second, the > QueryPlan.normalizeExpressionId > is the only place where PlanExpression.canonicalized is being called. > This PR simplifies the canonicalize method on PlanExpression and delegates > the normalization of expression ids to > the QueryPlan.normalizedExpressionId method. Also, the name > normalizedExpressions is more suitable for this method, > therefore, the method has also been renamed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28836) Introduce TPCDSSchema
Ali Afroozeh created SPARK-28836: Summary: Introduce TPCDSSchema Key: SPARK-28836 URL: https://issues.apache.org/jira/browse/SPARK-28836 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Ali Afroozeh This PR extracts the schema information of TPCDS tables into a separate class called `TPCDSSchema` which can be reused for other testing purposes -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28835) Improve canonicalize API
Ali Afroozeh created SPARK-28835: Summary: Improve canonicalize API Key: SPARK-28835 URL: https://issues.apache.org/jira/browse/SPARK-28835 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Ali Afroozeh This PR improves the `canonicalize` API by removing the method `def canonicalize(attrs: AttributeSeq): PlanExpression[T]` in `PlanExpression` and taking care of normalizing expressions in `QueryPlan`. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28716) Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans
[ https://issues.apache.org/jira/browse/SPARK-28716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-28716: - Description: Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans, for example: {{ReusedExchange [d_date_sk#827|#827], BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) [[id=#2710|#2710]]}} Where {{2710}} is the id of the reused exchange. was: Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans, for example: {{ReusedExchange [d_date_sk#827|#827], BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) [id=#2710] }} Where {{2710}} is the id of the reused exchange. > Add id to Exchange and Subquery's stringArgs method for easier identifying > their reuses in query plans > -- > > Key: SPARK-28716 > URL: https://issues.apache.org/jira/browse/SPARK-28716 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3 >Reporter: Ali Afroozeh >Priority: Minor > > Add id to Exchange and Subquery's stringArgs method for easier identifying > their reuses in query plans, for example: > {{ReusedExchange [d_date_sk#827|#827], BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) > [[id=#2710|#2710]]}} > Where {{2710}} is the id of the reused exchange. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28716) Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans
[ https://issues.apache.org/jira/browse/SPARK-28716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-28716: - Description: Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans, for example: {{ReusedExchange [d_date_sk#827|#827], BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) [id=#2710] }} Where {{2710}} is the id of the reused exchange. was: Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans, for example: {{ReusedExchange [d_date_sk#827|#827], BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) [#[id=#$id|#$id]] }} Where {{2710}} is the id of the reused exchange. > Add id to Exchange and Subquery's stringArgs method for easier identifying > their reuses in query plans > -- > > Key: SPARK-28716 > URL: https://issues.apache.org/jira/browse/SPARK-28716 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3 >Reporter: Ali Afroozeh >Priority: Minor > > Add id to Exchange and Subquery's stringArgs method for easier identifying > their reuses in query plans, for example: > {{ReusedExchange [d_date_sk#827|#827], BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) > [id=#2710] }} > Where {{2710}} is the id of the reused exchange. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28716) Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans
[ https://issues.apache.org/jira/browse/SPARK-28716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-28716: - Description: Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans, for example: {{ReusedExchange [d_date_sk#827|#827], BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) [#[id=#$id|#$id]] }} Where {{2710}} is the id of the reused exchange. was: Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans, for example: {{ReusedExchange [d_date_sk#827], BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) [id=#[id=#$id]] }} Where {{2710}} is the id of the reused exchange. > Add id to Exchange and Subquery's stringArgs method for easier identifying > their reuses in query plans > -- > > Key: SPARK-28716 > URL: https://issues.apache.org/jira/browse/SPARK-28716 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3 >Reporter: Ali Afroozeh >Priority: Minor > > Add id to Exchange and Subquery's stringArgs method for easier identifying > their reuses in query plans, for example: > {{ReusedExchange [d_date_sk#827|#827], BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) > [#[id=#$id|#$id]] }} > Where {{2710}} is the id of the reused exchange. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28716) Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans
Ali Afroozeh created SPARK-28716: Summary: Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans Key: SPARK-28716 URL: https://issues.apache.org/jira/browse/SPARK-28716 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.3 Reporter: Ali Afroozeh Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans, for example: {{ReusedExchange [d_date_sk#827], BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) [id=#[id=#$id]] }} Where {{2710}} is the id of the reused exchange. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28715) Introduce collectInPlanAndSubqueries and subqueriesAll in QueryPlan
[ https://issues.apache.org/jira/browse/SPARK-28715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-28715: - Summary: Introduce collectInPlanAndSubqueries and subqueriesAll in QueryPlan (was: Introduce `collectInPlanAndSubqueries` and `subqueriesAll` in `QueryPlan`) > Introduce collectInPlanAndSubqueries and subqueriesAll in QueryPlan > --- > > Key: SPARK-28715 > URL: https://issues.apache.org/jira/browse/SPARK-28715 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3 >Reporter: Ali Afroozeh >Priority: Minor > > Introduces the {{collectInPlanAndSubqueries and subqueriesAll}} methods in > QueryPlan that consider all the plans in the query plan, including the ones > in nested subqueries. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28715) Introduce `collectInPlanAndSubqueries` and `subqueriesAll` in `QueryPlan`
Ali Afroozeh created SPARK-28715: Summary: Introduce `collectInPlanAndSubqueries` and `subqueriesAll` in `QueryPlan` Key: SPARK-28715 URL: https://issues.apache.org/jira/browse/SPARK-28715 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.3 Reporter: Ali Afroozeh Introduces the {{collectInPlanAndSubqueries and subqueriesAll}} methods in QueryPlan that consider all the plans in the query plan, including the ones in nested subqueries. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org