[GitHub] [spark] HyukjinKwon commented on a change in pull request #30491: [SPARK-33492][SQL][FOLLOWUP] Use callback instead of passing Spark session and v2 relation
HyukjinKwon commented on a change in pull request #30491: URL: https://github.com/apache/spark/pull/30491#discussion_r530054985 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala ## @@ -213,15 +213,14 @@ case class AtomicReplaceTableAsSelectExec( * Rows in the output data set are appended. */ case class AppendDataExec( -session: SparkSession, table: SupportsWrite, -relation: DataSourceV2Relation, writeOptions: CaseInsensitiveStringMap, -query: SparkPlan) extends V2TableWriteExec with BatchWriteHelper { +query: SparkPlan, +afterWrite: () => Unit = () => ()) extends V2TableWriteExec with BatchWriteHelper { override protected def run(): Seq[InternalRow] = { val writtenRows = writeWithV2(newWriteBuilder().buildForBatch()) -session.sharedState.cacheManager.recacheByPlan(session, relation) Review comment: What about fixing `DropTableExec` and `RefreshTableExec` together since we're here with a separate JIRA? There look only two DSv2 execution plans left that pass Spark session. Doing a refactoring across the codebase could be also done separately I believe. One thing though the callback always does the same thing but it's unlikely for `afterWrite` to do something else at least so far. I felt it's a bit too much to make it generalised. But probably it's just okay. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala ## @@ -213,15 +213,14 @@ case class AtomicReplaceTableAsSelectExec( * Rows in the output data set are appended. */ case class AppendDataExec( -session: SparkSession, table: SupportsWrite, -relation: DataSourceV2Relation, writeOptions: CaseInsensitiveStringMap, -query: SparkPlan) extends V2TableWriteExec with BatchWriteHelper { +query: SparkPlan, +afterWrite: () => Unit = () => ()) extends V2TableWriteExec with BatchWriteHelper { override protected def run(): Seq[InternalRow] = { val writtenRows = writeWithV2(newWriteBuilder().buildForBatch()) -session.sharedState.cacheManager.recacheByPlan(session, relation) Review comment: What about fixing `DropTableExec` and `RefreshTableExec` together since we're here with a separate JIRA? There look only two DSv2 execution plans left that pass Spark session from my cursory look. Doing a refactoring across the codebase could be also done separately I believe. One thing though the callback always does the same thing but it's unlikely for `afterWrite` to do something else at least so far. I felt it's a bit too much to make it generalised. But probably it's just okay. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30491: [SPARK-33492][SQL][FOLLOWUP] Use callback instead of passing Spark session and v2 relation
HyukjinKwon commented on a change in pull request #30491: URL: https://github.com/apache/spark/pull/30491#discussion_r530054985 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala ## @@ -213,15 +213,14 @@ case class AtomicReplaceTableAsSelectExec( * Rows in the output data set are appended. */ case class AppendDataExec( -session: SparkSession, table: SupportsWrite, -relation: DataSourceV2Relation, writeOptions: CaseInsensitiveStringMap, -query: SparkPlan) extends V2TableWriteExec with BatchWriteHelper { +query: SparkPlan, +afterWrite: () => Unit = () => ()) extends V2TableWriteExec with BatchWriteHelper { override protected def run(): Seq[InternalRow] = { val writtenRows = writeWithV2(newWriteBuilder().buildForBatch()) -session.sharedState.cacheManager.recacheByPlan(session, relation) Review comment: What about fixing `DropTableExec` and `RefreshTableExec` together since we're here with a separate JIRA? There look two other DSv2 execution plans left that pass Spark session. Doing a refactoring across the codebase could be also done separately I believe. One thing though the callback always does the same thing but it's unlikely for `afterWrite` to do something else at least so far. I felt it's a bit too much to make it generalised. But probably it's just okay. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30491: [SPARK-33492][SQL][FOLLOWUP] Use callback instead of passing Spark session and v2 relation
HyukjinKwon commented on a change in pull request #30491: URL: https://github.com/apache/spark/pull/30491#discussion_r530033229 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala ## @@ -213,15 +213,14 @@ case class AtomicReplaceTableAsSelectExec( * Rows in the output data set are appended. */ case class AppendDataExec( -session: SparkSession, table: SupportsWrite, -relation: DataSourceV2Relation, writeOptions: CaseInsensitiveStringMap, -query: SparkPlan) extends V2TableWriteExec with BatchWriteHelper { +query: SparkPlan, +afterWrite: () => Unit = () => ()) extends V2TableWriteExec with BatchWriteHelper { override protected def run(): Seq[InternalRow] = { val writtenRows = writeWithV2(newWriteBuilder().buildForBatch()) -session.sharedState.cacheManager.recacheByPlan(session, relation) Review comment: cc @rdblue and @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30491: [SPARK-33492][SQL][FOLLOWUP] Use callback instead of passing Spark session and v2 relation
HyukjinKwon commented on a change in pull request #30491: URL: https://github.com/apache/spark/pull/30491#discussion_r530033142 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala ## @@ -213,15 +213,14 @@ case class AtomicReplaceTableAsSelectExec( * Rows in the output data set are appended. */ case class AppendDataExec( -session: SparkSession, table: SupportsWrite, -relation: DataSourceV2Relation, writeOptions: CaseInsensitiveStringMap, -query: SparkPlan) extends V2TableWriteExec with BatchWriteHelper { +query: SparkPlan, +afterWrite: () => Unit = () => ()) extends V2TableWriteExec with BatchWriteHelper { override protected def run(): Seq[InternalRow] = { val writtenRows = writeWithV2(newWriteBuilder().buildForBatch()) -session.sharedState.cacheManager.recacheByPlan(session, relation) Review comment: I thought about this when I reviewed the previous PR but there are already other occurrences of passing it such as `DropTableExec` and `RefreshTableExec`. Maybe it's better to file a separate JIRA and handle them all. Should we maybe just pass `session.sharedState.cacheManager`? I see analogical catalog manager is being passed around. I think we can do it for `DropTableExec` and `RefreshTableExec` too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org