[jira] [Commented] (SPARK-25982) Dataframe write is non blocking in fair scheduling mode

2019-03-02 Thread Ramandeep Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782527#comment-16782527
 ] 

Ramandeep Singh commented on SPARK-25982:
-

No, as I said those operations at a stage are independent. And I explicitly 
await for them to complete before launching the next stage. It's the fact that 
operation from next stage start running before all futures have completed. 

> Dataframe write is non blocking in fair scheduling mode
> ---
>
> Key: SPARK-25982
> URL: https://issues.apache.org/jira/browse/SPARK-25982
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Ramandeep Singh
>Priority: Major
>
> Hi,
> I have noticed that expected behavior of dataframe write operation to block 
> is not working in fair scheduling mode.
> Ideally when a dataframe write is occurring and a future is blocking on 
> AwaitResult, no other job should be started, but this is not the case. I have 
> noticed that other jobs are started when the partitions are being written.  
>  
> Regards,
> Ramandeep Singh
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25982) Dataframe write is non blocking in fair scheduling mode

2018-11-14 Thread Ramandeep Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687034#comment-16687034
 ] 

Ramandeep Singh commented on SPARK-25982:
-

Sure,

a) The setting for scheduler is fair scheduler

--conf 'spark.scheduler.mode'='FAIR'

b) There are independent jobs at one stage that are scheduled. This is okay, 
all of them block on dataframe write to complete. 

```

val futures = steps.par.map(stepId => Future {
 processWrite(stepsMap(stepId))
}).par
futures.foreach(Await.result(_, Duration.create(timeout, TimeUnit.MINUTES)))

```

Here, the processWrite processes write operations in parallel and awaits on 
each of them to complete, but the persist or write operation returns before it 
has written all the partitions of the dataframes, so other jobs at a later 
stage end up being run.

 

> Dataframe write is non blocking in fair scheduling mode
> ---
>
> Key: SPARK-25982
> URL: https://issues.apache.org/jira/browse/SPARK-25982
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Ramandeep Singh
>Priority: Major
>
> Hi,
> I have noticed that expected behavior of dataframe write operation to block 
> is not working in fair scheduling mode.
> Ideally when a dataframe write is occurring and a future is blocking on 
> AwaitResult, no other job should be started, but this is not the case. I have 
> noticed that other jobs are started when the partitions are being written.  
>  
> Regards,
> Ramandeep Singh
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25982) Dataframe write is non blocking in fair scheduling mode

2018-11-08 Thread Ramandeep Singh (JIRA)
Ramandeep Singh created SPARK-25982:
---

 Summary: Dataframe write is non blocking in fair scheduling mode
 Key: SPARK-25982
 URL: https://issues.apache.org/jira/browse/SPARK-25982
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.1
Reporter: Ramandeep Singh


Hi,

I have noticed that expected behavior of dataframe write operation to block is 
not working in fair scheduling mode.

Ideally when a dataframe write is occurring and a future is blocking on 
AwaitResult, no other job should be started, but this is not the case. I have 
noticed that other jobs are started when the partitions are being written.  

 

Regards,

Ramandeep Singh

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23613) Different Analyzed logical plan data types for the same table in different queries

2018-03-06 Thread Ramandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388494#comment-16388494
 ] 

Ramandeep Singh commented on SPARK-23613:
-

To add to it, the query works fine with subquery factoring.

 

with b1 as (select b.* from b)

select * from jq ( select a.col1, b.col2 from a,b1 where a.col3=b1.col3)

 

 

> Different Analyzed logical plan data types for the same table in different 
> queries
> --
>
> Key: SPARK-23613
> URL: https://issues.apache.org/jira/browse/SPARK-23613
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
> Environment: Spark 2.2.0
> Hive: 2
>Reporter: Ramandeep Singh
>Priority: Blocker
>  Labels: SparkSQL
>
> Hi,
> The column datatypes are correctly analyzed for simple select query. Note 
> that the problematic column is not selected anywhere in the complicated 
> scenario.
> Let's say Select * from a;
> Now let's say there is a query involving temporary view on another table and 
> its join with this table. 
> Let's call that table b (temporary view on a dataframe); 
> select * from jq ( select a.col1, b.col2 from a,b where a.col3=b=col3)
> Fails with exception on some column not part of the projection in the join 
> query
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot up 
> cast `a`.col5 from from decimal(8,0) to  col5#1234: decimal(6,2) as it may 
> truncate.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23613) Different Analyzed logical plan data types for the same table in different queries

2018-03-06 Thread Ramandeep Singh (JIRA)
Ramandeep Singh created SPARK-23613:
---

 Summary: Different Analyzed logical plan data types for the same 
table in different queries
 Key: SPARK-23613
 URL: https://issues.apache.org/jira/browse/SPARK-23613
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
 Environment: Spark 2.2.0

Hive: 2
Reporter: Ramandeep Singh


Hi,

The column datatypes are correctly analyzed for simple select query. Note that 
the problematic column is not selected anywhere in the complicated scenario.

Let's say Select * from a;

Now let's say there is a query involving temporary view on another table and 
its join with this table. 

Let's call that table b (temporary view on a dataframe); 

select * from jq ( select a.col1, b.col2 from a,b where a.col3=b=col3)

Fails with exception on some column not part of the projection in the join query

Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot up 
cast `a`.col5 from from decimal(8,0) to  col5#1234: decimal(6,2) as it may 
truncate.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org