[jira] [Commented] (SPARK-21037) ignoreNulls does not working properly with window functions
[ https://issues.apache.org/jira/browse/SPARK-21037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045479#comment-16045479 ] Stanislav Chernichkin commented on SPARK-21037: --- To be more precise the problem not related to the ignoreNulls property. It arises when orderBy used without specifying window boundaries. In this case it set boundaries to UNBOUNDED PRECEDING - CURRENT ROW and all aggregation functions behave accordingly. The problem does not arise then orderBy not used. This behavior is not documented and unintuitive, popular databases do not require specifying window boundaries to apply aggregation function to the whole group (it applied to the whole group by default) and do not adjust default window depending on presence of ordering. > ignoreNulls does not working properly with window functions > --- > > Key: SPARK-21037 > URL: https://issues.apache.org/jira/browse/SPARK-21037 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.1.0, 2.1.1 >Reporter: Stanislav Chernichkin > > Following code reproduces issue: > spark > .sql("select 0 as key, null as value, 0 as order union select 0 as key, > 'value' as value, 1 as order") > .select($"*", first($"value", > true).over(partitionBy($"key").orderBy("order")).as("first_value")) > .show() > Since documentation climes than {{first}} function will return first non-null > result I except to have: > |key|value|order|first_value| > | 0| null|0| value| > | 0|value|1| value| > But actual result is: > |key|value|order|first_value| > | 0| null|0| null| > | 0|value|1| value| -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21037) ignoreNulls does not working properly with window functions
[ https://issues.apache.org/jira/browse/SPARK-21037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stanislav Chernichkin updated SPARK-21037: -- Description: Following code reproduces issue: spark .sql("select 0 as key, null as value, 0 as order union select 0 as key, 'value' as value, 1 as order") .select($"*", first($"value", true).over(partitionBy($"key").orderBy("order")).as("first_value")) .show() Since documentation climes than {{first}} function will return first non-null result I except to have: |key|value|order|first_value| | 0| null|0| value| | 0|value|1| value| But actual result is: |key|value|order|first_value| | 0| null|0| null| | 0|value|1| value| was: Following code reproduces issue: spark .sql("select 0 as key, null as value, 0 as order union select 0 as key, 'value' as value, 1 as order") .select($"*", first($"value", true).over(partitionBy($"key").orderBy("order")).as("first_value")) .show() Since documentation climes than {{first}} function will return first non-null result I except to have: |key|value|order|first_value| +---+-+-+---+ | 0| null|0| value| | 0|value|1| value| +---+-+-+---+ But actual result is: +---+-+-+---+ |key|value|order|first_value| +---+-+-+---+ | 0| null|0| null| | 0|value|1| value| +---+-+-+---+ > ignoreNulls does not working properly with window functions > --- > > Key: SPARK-21037 > URL: https://issues.apache.org/jira/browse/SPARK-21037 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.1.0, 2.1.1 >Reporter: Stanislav Chernichkin > > Following code reproduces issue: > spark > .sql("select 0 as key, null as value, 0 as order union select 0 as key, > 'value' as value, 1 as order") > .select($"*", first($"value", > true).over(partitionBy($"key").orderBy("order")).as("first_value")) > .show() > Since documentation climes than {{first}} function will return first non-null > result I except to have: > |key|value|order|first_value| > | 0| null|0| value| > | 0|value|1| value| > But actual result is: > |key|value|order|first_value| > | 0| null|0| null| > | 0|value|1| value| -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21037) ignoreNulls does not working properly with window functions
[ https://issues.apache.org/jira/browse/SPARK-21037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stanislav Chernichkin updated SPARK-21037: -- Description: Following code reproduces issue: spark .sql("select 0 as key, null as value, 0 as order union select 0 as key, 'value' as value, 1 as order") .select($"*", first($"value", true).over(partitionBy($"key").orderBy("order")).as("first_value")) .show() Since documentation climes than {{first}} function will return first non-null result I except to have: |key|value|order|first_value| +---+-+-+---+ | 0| null|0| value| | 0|value|1| value| +---+-+-+---+ But actual result is: +---+-+-+---+ |key|value|order|first_value| +---+-+-+---+ | 0| null|0| null| | 0|value|1| value| +---+-+-+---+ was: Following code reproduces issue: spark .sql("select 0 as key, null as value, 0 as order union select 0 as key, 'value' as value, 1 as order") .select($"*", first($"value", true).over(partitionBy($"key").orderBy("order")).as("first_value")) .show() Since documentation climes than {{first}} function will return first non-null result I except to have: +---+-+-+---+ |key|value|order|first_value| +---+-+-+---+ | 0| null|0| value| | 0|value|1| value| +---+-+-+---+ But actual result is: +---+-+-+---+ |key|value|order|first_value| +---+-+-+---+ | 0| null|0| null| | 0|value|1| value| +---+-+-+---+ > ignoreNulls does not working properly with window functions > --- > > Key: SPARK-21037 > URL: https://issues.apache.org/jira/browse/SPARK-21037 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.1.0, 2.1.1 >Reporter: Stanislav Chernichkin > > Following code reproduces issue: > spark > .sql("select 0 as key, null as value, 0 as order union select 0 as key, > 'value' as value, 1 as order") > .select($"*", first($"value", > true).over(partitionBy($"key").orderBy("order")).as("first_value")) > .show() > Since documentation climes than {{first}} function will return first non-null > result I except to have: > |key|value|order|first_value| > +---+-+-+---+ > | 0| null|0| value| > | 0|value|1| value| > +---+-+-+---+ > But actual result is: > +---+-+-+---+ > |key|value|order|first_value| > +---+-+-+---+ > | 0| null|0| null| > | 0|value|1| value| > +---+-+-+---+ -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21037) ignoreNulls does not working properly with window functions
Stanislav Chernichkin created SPARK-21037: - Summary: ignoreNulls does not working properly with window functions Key: SPARK-21037 URL: https://issues.apache.org/jira/browse/SPARK-21037 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 2.1.1, 2.1.0 Reporter: Stanislav Chernichkin Following code reproduces issue: spark .sql("select 0 as key, null as value, 0 as order union select 0 as key, 'value' as value, 1 as order") .select($"*", first($"value", true).over(partitionBy($"key").orderBy("order")).as("first_value")) .show() Since documentation climes than {{first}} function will return first non-null result I except to have: +---+-+-+---+ |key|value|order|first_value| +---+-+-+---+ | 0| null|0| value| | 0|value|1| value| +---+-+-+---+ But actual result is: +---+-+-+---+ |key|value|order|first_value| +---+-+-+---+ | 0| null|0| null| | 0|value|1| value| +---+-+-+---+ -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20308) org.apache.spark.shuffle.FetchFailedException: Too large frame
Stanislav Chernichkin created SPARK-20308: - Summary: org.apache.spark.shuffle.FetchFailedException: Too large frame Key: SPARK-20308 URL: https://issues.apache.org/jira/browse/SPARK-20308 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 2.1.0 Reporter: Stanislav Chernichkin Spark uses custom frame decoder (TransportFrameDecoder) which does not support frames larger than 2G. This lead to fails when shuffling using large partitions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org