[jira] [Commented] (SPARK-25982) Dataframe write is non blocking in fair scheduling mode
[ https://issues.apache.org/jira/browse/SPARK-25982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782527#comment-16782527 ] Ramandeep Singh commented on SPARK-25982: - No, as I said those operations at a stage are independent. And I explicitly await for them to complete before launching the next stage. It's the fact that operation from next stage start running before all futures have completed. > Dataframe write is non blocking in fair scheduling mode > --- > > Key: SPARK-25982 > URL: https://issues.apache.org/jira/browse/SPARK-25982 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Ramandeep Singh >Priority: Major > > Hi, > I have noticed that expected behavior of dataframe write operation to block > is not working in fair scheduling mode. > Ideally when a dataframe write is occurring and a future is blocking on > AwaitResult, no other job should be started, but this is not the case. I have > noticed that other jobs are started when the partitions are being written. > > Regards, > Ramandeep Singh > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25982) Dataframe write is non blocking in fair scheduling mode
[ https://issues.apache.org/jira/browse/SPARK-25982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687034#comment-16687034 ] Ramandeep Singh commented on SPARK-25982: - Sure, a) The setting for scheduler is fair scheduler --conf 'spark.scheduler.mode'='FAIR' b) There are independent jobs at one stage that are scheduled. This is okay, all of them block on dataframe write to complete. ``` val futures = steps.par.map(stepId => Future { processWrite(stepsMap(stepId)) }).par futures.foreach(Await.result(_, Duration.create(timeout, TimeUnit.MINUTES))) ``` Here, the processWrite processes write operations in parallel and awaits on each of them to complete, but the persist or write operation returns before it has written all the partitions of the dataframes, so other jobs at a later stage end up being run. > Dataframe write is non blocking in fair scheduling mode > --- > > Key: SPARK-25982 > URL: https://issues.apache.org/jira/browse/SPARK-25982 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Ramandeep Singh >Priority: Major > > Hi, > I have noticed that expected behavior of dataframe write operation to block > is not working in fair scheduling mode. > Ideally when a dataframe write is occurring and a future is blocking on > AwaitResult, no other job should be started, but this is not the case. I have > noticed that other jobs are started when the partitions are being written. > > Regards, > Ramandeep Singh > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25982) Dataframe write is non blocking in fair scheduling mode
Ramandeep Singh created SPARK-25982: --- Summary: Dataframe write is non blocking in fair scheduling mode Key: SPARK-25982 URL: https://issues.apache.org/jira/browse/SPARK-25982 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.1 Reporter: Ramandeep Singh Hi, I have noticed that expected behavior of dataframe write operation to block is not working in fair scheduling mode. Ideally when a dataframe write is occurring and a future is blocking on AwaitResult, no other job should be started, but this is not the case. I have noticed that other jobs are started when the partitions are being written. Regards, Ramandeep Singh -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23613) Different Analyzed logical plan data types for the same table in different queries
[ https://issues.apache.org/jira/browse/SPARK-23613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388494#comment-16388494 ] Ramandeep Singh commented on SPARK-23613: - To add to it, the query works fine with subquery factoring. with b1 as (select b.* from b) select * from jq ( select a.col1, b.col2 from a,b1 where a.col3=b1.col3) > Different Analyzed logical plan data types for the same table in different > queries > -- > > Key: SPARK-23613 > URL: https://issues.apache.org/jira/browse/SPARK-23613 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 > Environment: Spark 2.2.0 > Hive: 2 >Reporter: Ramandeep Singh >Priority: Blocker > Labels: SparkSQL > > Hi, > The column datatypes are correctly analyzed for simple select query. Note > that the problematic column is not selected anywhere in the complicated > scenario. > Let's say Select * from a; > Now let's say there is a query involving temporary view on another table and > its join with this table. > Let's call that table b (temporary view on a dataframe); > select * from jq ( select a.col1, b.col2 from a,b where a.col3=b=col3) > Fails with exception on some column not part of the projection in the join > query > Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot up > cast `a`.col5 from from decimal(8,0) to col5#1234: decimal(6,2) as it may > truncate. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23613) Different Analyzed logical plan data types for the same table in different queries
Ramandeep Singh created SPARK-23613: --- Summary: Different Analyzed logical plan data types for the same table in different queries Key: SPARK-23613 URL: https://issues.apache.org/jira/browse/SPARK-23613 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Environment: Spark 2.2.0 Hive: 2 Reporter: Ramandeep Singh Hi, The column datatypes are correctly analyzed for simple select query. Note that the problematic column is not selected anywhere in the complicated scenario. Let's say Select * from a; Now let's say there is a query involving temporary view on another table and its join with this table. Let's call that table b (temporary view on a dataframe); select * from jq ( select a.col1, b.col2 from a,b where a.col3=b=col3) Fails with exception on some column not part of the projection in the join query Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot up cast `a`.col5 from from decimal(8,0) to col5#1234: decimal(6,2) as it may truncate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org