[GitHub] [spark] HyukjinKwon commented on a change in pull request #30491: [SPARK-33492][SQL][FOLLOWUP] Use callback instead of passing Spark session and v2 relation

2020-11-24 Thread GitBox


HyukjinKwon commented on a change in pull request #30491:
URL: https://github.com/apache/spark/pull/30491#discussion_r530054985



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
##
@@ -213,15 +213,14 @@ case class AtomicReplaceTableAsSelectExec(
  * Rows in the output data set are appended.
  */
 case class AppendDataExec(
-session: SparkSession,
 table: SupportsWrite,
-relation: DataSourceV2Relation,
 writeOptions: CaseInsensitiveStringMap,
-query: SparkPlan) extends V2TableWriteExec with BatchWriteHelper {
+query: SparkPlan,
+afterWrite: () => Unit = () => ()) extends V2TableWriteExec with 
BatchWriteHelper {
 
   override protected def run(): Seq[InternalRow] = {
 val writtenRows = writeWithV2(newWriteBuilder().buildForBatch())
-session.sharedState.cacheManager.recacheByPlan(session, relation)

Review comment:
   What about fixing `DropTableExec` and `RefreshTableExec` together since 
we're here with a separate JIRA? There look only two DSv2 execution plans left 
that pass Spark session.
   
   Doing a refactoring across the codebase could be also done separately I 
believe.
   
   One thing though the callback  always does the same thing but it's unlikely 
for `afterWrite` to do something else at least so far. I felt it's a bit too 
much to make it generalised. But probably it's just okay.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
##
@@ -213,15 +213,14 @@ case class AtomicReplaceTableAsSelectExec(
  * Rows in the output data set are appended.
  */
 case class AppendDataExec(
-session: SparkSession,
 table: SupportsWrite,
-relation: DataSourceV2Relation,
 writeOptions: CaseInsensitiveStringMap,
-query: SparkPlan) extends V2TableWriteExec with BatchWriteHelper {
+query: SparkPlan,
+afterWrite: () => Unit = () => ()) extends V2TableWriteExec with 
BatchWriteHelper {
 
   override protected def run(): Seq[InternalRow] = {
 val writtenRows = writeWithV2(newWriteBuilder().buildForBatch())
-session.sharedState.cacheManager.recacheByPlan(session, relation)

Review comment:
   What about fixing `DropTableExec` and `RefreshTableExec` together since 
we're here with a separate JIRA? There look only two DSv2 execution plans left 
that pass Spark session from my cursory look.
   
   Doing a refactoring across the codebase could be also done separately I 
believe.
   
   One thing though the callback  always does the same thing but it's unlikely 
for `afterWrite` to do something else at least so far. I felt it's a bit too 
much to make it generalised. But probably it's just okay.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30491: [SPARK-33492][SQL][FOLLOWUP] Use callback instead of passing Spark session and v2 relation

2020-11-24 Thread GitBox


HyukjinKwon commented on a change in pull request #30491:
URL: https://github.com/apache/spark/pull/30491#discussion_r530054985



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
##
@@ -213,15 +213,14 @@ case class AtomicReplaceTableAsSelectExec(
  * Rows in the output data set are appended.
  */
 case class AppendDataExec(
-session: SparkSession,
 table: SupportsWrite,
-relation: DataSourceV2Relation,
 writeOptions: CaseInsensitiveStringMap,
-query: SparkPlan) extends V2TableWriteExec with BatchWriteHelper {
+query: SparkPlan,
+afterWrite: () => Unit = () => ()) extends V2TableWriteExec with 
BatchWriteHelper {
 
   override protected def run(): Seq[InternalRow] = {
 val writtenRows = writeWithV2(newWriteBuilder().buildForBatch())
-session.sharedState.cacheManager.recacheByPlan(session, relation)

Review comment:
   What about fixing `DropTableExec` and `RefreshTableExec` together since 
we're here with a separate JIRA? There look two other DSv2 execution plans left 
that pass Spark session.
   
   Doing a refactoring across the codebase could be also done separately I 
believe.
   
   One thing though the callback  always does the same thing but it's unlikely 
for `afterWrite` to do something else at least so far. I felt it's a bit too 
much to make it generalised. But probably it's just okay.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30491: [SPARK-33492][SQL][FOLLOWUP] Use callback instead of passing Spark session and v2 relation

2020-11-24 Thread GitBox


HyukjinKwon commented on a change in pull request #30491:
URL: https://github.com/apache/spark/pull/30491#discussion_r530033229



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
##
@@ -213,15 +213,14 @@ case class AtomicReplaceTableAsSelectExec(
  * Rows in the output data set are appended.
  */
 case class AppendDataExec(
-session: SparkSession,
 table: SupportsWrite,
-relation: DataSourceV2Relation,
 writeOptions: CaseInsensitiveStringMap,
-query: SparkPlan) extends V2TableWriteExec with BatchWriteHelper {
+query: SparkPlan,
+afterWrite: () => Unit = () => ()) extends V2TableWriteExec with 
BatchWriteHelper {
 
   override protected def run(): Seq[InternalRow] = {
 val writtenRows = writeWithV2(newWriteBuilder().buildForBatch())
-session.sharedState.cacheManager.recacheByPlan(session, relation)

Review comment:
   cc @rdblue and @cloud-fan





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30491: [SPARK-33492][SQL][FOLLOWUP] Use callback instead of passing Spark session and v2 relation

2020-11-24 Thread GitBox


HyukjinKwon commented on a change in pull request #30491:
URL: https://github.com/apache/spark/pull/30491#discussion_r530033142



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
##
@@ -213,15 +213,14 @@ case class AtomicReplaceTableAsSelectExec(
  * Rows in the output data set are appended.
  */
 case class AppendDataExec(
-session: SparkSession,
 table: SupportsWrite,
-relation: DataSourceV2Relation,
 writeOptions: CaseInsensitiveStringMap,
-query: SparkPlan) extends V2TableWriteExec with BatchWriteHelper {
+query: SparkPlan,
+afterWrite: () => Unit = () => ()) extends V2TableWriteExec with 
BatchWriteHelper {
 
   override protected def run(): Seq[InternalRow] = {
 val writtenRows = writeWithV2(newWriteBuilder().buildForBatch())
-session.sharedState.cacheManager.recacheByPlan(session, relation)

Review comment:
   I thought about this when I reviewed the previous PR but there are 
already other occurrences of passing it such as `DropTableExec` and 
`RefreshTableExec`. Maybe it's better to file a separate JIRA and handle them 
all.
   
   Should we maybe just pass `session.sharedState.cacheManager`? I see 
analogical catalog manager is being passed around. I think we can do it for 
`DropTableExec` and `RefreshTableExec` too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org