[GitHub] [arrow-datafusion] alamb commented on a change in pull request #521: Return errors properly from RepartitionExec

2021-06-10 Thread GitBox


alamb commented on a change in pull request #521:
URL: https://github.com/apache/arrow-datafusion/pull/521#discussion_r649260106



##
File path: datafusion/src/physical_plan/repartition.rs
##
@@ -308,6 +310,45 @@ impl RepartitionExec {
 send_time_nanos: SQLMetric::time_nanos(),
 })
 }
+
+/// Waits for `input_task` which is consuming one of the inputs to

Review comment:
   in https://github.com/apache/arrow-datafusion/pull/538




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow-datafusion] alamb commented on a change in pull request #521: Return errors properly from RepartitionExec

2021-06-07 Thread GitBox


alamb commented on a change in pull request #521:
URL: https://github.com/apache/arrow-datafusion/pull/521#discussion_r646646671



##
File path: datafusion/src/physical_plan/repartition.rs
##
@@ -308,6 +310,45 @@ impl RepartitionExec {
 send_time_nanos: SQLMetric::time_nanos(),
 })
 }
+
+/// Waits for `input_task` which is consuming one of the inputs to

Review comment:
   I agree the approach you describe would be clearer (and avoid needing a 
separate task)  
   
   The reason I did not pull the main body out into its own function was mostly 
"trying to keep the diff small" (or perhaps my own laziness wanting to avoid 
having to figure out all the types of the arguments that got captured),
   
   Perhaps that would be a good follow on PR (there is a lot of messiness / 
duplication for updating counters which I would also kind of like to fix too)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow-datafusion] alamb commented on a change in pull request #521: Return errors properly from RepartitionExec

2021-06-07 Thread GitBox


alamb commented on a change in pull request #521:
URL: https://github.com/apache/arrow-datafusion/pull/521#discussion_r646597513



##
File path: datafusion/src/physical_plan/repartition.rs
##
@@ -249,13 +252,12 @@ impl ExecutionPlan for RepartitionExec {
 counter += 1;
 }
 
-// notify each output partition that this input partition 
has no more data
-for (_, tx) in txs {
-tx.send(None)
-.map_err(|e| 
DataFusionError::Execution(e.to_string()))?;
-}
 Ok(())
 });
+
+// In a separate task, wait for each input to be done

Review comment:
   This is the actual code change (to check for return value in another 
task). Otherwise the rest of this PR is tests




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org