[jira] [Commented] (SPARK-25982) Dataframe write is non blocking in fair scheduling mode
[ https://issues.apache.org/jira/browse/SPARK-25982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782808#comment-16782808 ] Sean Owen commented on SPARK-25982: --- Can you clarify with a more complete example? what is running in parallel and what next stage of what starts executing? > Dataframe write is non blocking in fair scheduling mode > --- > > Key: SPARK-25982 > URL: https://issues.apache.org/jira/browse/SPARK-25982 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Ramandeep Singh >Priority: Major > > Hi, > I have noticed that expected behavior of dataframe write operation to block > is not working in fair scheduling mode. > Ideally when a dataframe write is occurring and a future is blocking on > AwaitResult, no other job should be started, but this is not the case. I have > noticed that other jobs are started when the partitions are being written. > > Regards, > Ramandeep Singh > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25982) Dataframe write is non blocking in fair scheduling mode
[ https://issues.apache.org/jira/browse/SPARK-25982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782527#comment-16782527 ] Ramandeep Singh commented on SPARK-25982: - No, as I said those operations at a stage are independent. And I explicitly await for them to complete before launching the next stage. It's the fact that operation from next stage start running before all futures have completed. > Dataframe write is non blocking in fair scheduling mode > --- > > Key: SPARK-25982 > URL: https://issues.apache.org/jira/browse/SPARK-25982 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Ramandeep Singh >Priority: Major > > Hi, > I have noticed that expected behavior of dataframe write operation to block > is not working in fair scheduling mode. > Ideally when a dataframe write is occurring and a future is blocking on > AwaitResult, no other job should be started, but this is not the case. I have > noticed that other jobs are started when the partitions are being written. > > Regards, > Ramandeep Singh > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25982) Dataframe write is non blocking in fair scheduling mode
[ https://issues.apache.org/jira/browse/SPARK-25982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782521#comment-16782521 ] Sean Owen commented on SPARK-25982: --- I don't understand this; you're running operations in parallel on purpose, but expecting one to wait for the other? > Dataframe write is non blocking in fair scheduling mode > --- > > Key: SPARK-25982 > URL: https://issues.apache.org/jira/browse/SPARK-25982 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Ramandeep Singh >Priority: Major > > Hi, > I have noticed that expected behavior of dataframe write operation to block > is not working in fair scheduling mode. > Ideally when a dataframe write is occurring and a future is blocking on > AwaitResult, no other job should be started, but this is not the case. I have > noticed that other jobs are started when the partitions are being written. > > Regards, > Ramandeep Singh > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25982) Dataframe write is non blocking in fair scheduling mode
[ https://issues.apache.org/jira/browse/SPARK-25982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687034#comment-16687034 ] Ramandeep Singh commented on SPARK-25982: - Sure, a) The setting for scheduler is fair scheduler --conf 'spark.scheduler.mode'='FAIR' b) There are independent jobs at one stage that are scheduled. This is okay, all of them block on dataframe write to complete. ``` val futures = steps.par.map(stepId => Future { processWrite(stepsMap(stepId)) }).par futures.foreach(Await.result(_, Duration.create(timeout, TimeUnit.MINUTES))) ``` Here, the processWrite processes write operations in parallel and awaits on each of them to complete, but the persist or write operation returns before it has written all the partitions of the dataframes, so other jobs at a later stage end up being run. > Dataframe write is non blocking in fair scheduling mode > --- > > Key: SPARK-25982 > URL: https://issues.apache.org/jira/browse/SPARK-25982 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Ramandeep Singh >Priority: Major > > Hi, > I have noticed that expected behavior of dataframe write operation to block > is not working in fair scheduling mode. > Ideally when a dataframe write is occurring and a future is blocking on > AwaitResult, no other job should be started, but this is not the case. I have > noticed that other jobs are started when the partitions are being written. > > Regards, > Ramandeep Singh > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25982) Dataframe write is non blocking in fair scheduling mode
[ https://issues.apache.org/jira/browse/SPARK-25982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680827#comment-16680827 ] Hyukjin Kwon commented on SPARK-25982: -- Can you post reproducible codes to describe your idea, and elaborate the current input and expected input? > Dataframe write is non blocking in fair scheduling mode > --- > > Key: SPARK-25982 > URL: https://issues.apache.org/jira/browse/SPARK-25982 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Ramandeep Singh >Priority: Major > > Hi, > I have noticed that expected behavior of dataframe write operation to block > is not working in fair scheduling mode. > Ideally when a dataframe write is occurring and a future is blocking on > AwaitResult, no other job should be started, but this is not the case. I have > noticed that other jobs are started when the partitions are being written. > > Regards, > Ramandeep Singh > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org