[jira] [Updated] (SPARK-25462) hive on spark - got a weird output when count(*) from this script
[ https://issues.apache.org/jira/browse/SPARK-25462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gu Yuchen updated SPARK-25462: -- Environment: spark 1.6.2 hive 1.2.2 hadoop 2.7.1 was: spark 1.6.1 hive 1.2.2 hadoop 2.7.1 > hive on spark - got a weird output when count(*) from this script > -- > > Key: SPARK-25462 > URL: https://issues.apache.org/jira/browse/SPARK-25462 > Project: Spark > Issue Type: Question > Components: SQL >Affects Versions: 1.6.2 > Environment: spark 1.6.2 > hive 1.2.2 > hadoop 2.7.1 >Reporter: Gu Yuchen >Priority: Major > Attachments: jira.png, test.gz.parquet > > > > use hiveContext to exec a script below: > with nt as (select label, score from (select * from (select label, score, > row_number() over (order by score desc) as position from t1)t_1 join (select > count(*) as countall from t1)t_2 )ta where position <= countall * 0.4) select > count(*) as c_positive from nt where label = 1 > and i got this result. > !jira.png! > it is weird when call the 'count()' func on rdd and dataframe, > as the pic says: different output here > can someone help me out? thanks a lot > > PS: the parquet file i used is the 'test.gz.parquet' in Attachments. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-25462) hive on spark - got a weird output when count(*) from this script
[ https://issues.apache.org/jira/browse/SPARK-25462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gu Yuchen reopened SPARK-25462: --- please help me out with this. thanks a lot > hive on spark - got a weird output when count(*) from this script > -- > > Key: SPARK-25462 > URL: https://issues.apache.org/jira/browse/SPARK-25462 > Project: Spark > Issue Type: Question > Components: SQL >Affects Versions: 1.6.2 > Environment: spark 1.6.1 > hive 1.2.2 > hadoop 2.7.1 >Reporter: Gu Yuchen >Priority: Major > Attachments: jira.png, test.gz.parquet > > > > use hiveContext to exec a script below: > with nt as (select label, score from (select * from (select label, score, > row_number() over (order by score desc) as position from t1)t_1 join (select > count(*) as countall from t1)t_2 )ta where position <= countall * 0.4) select > count(*) as c_positive from nt where label = 1 > and i got this result. > !jira.png! > it is weird when call the 'count()' func on rdd and dataframe, > as the pic says: different output here > can someone help me out? thanks a lot > > PS: the parquet file i used is the 'test.gz.parquet' in Attachments. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25454) Division between operands with negative scale can cause precision loss
[ https://issues.apache.org/jira/browse/SPARK-25454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620192#comment-16620192 ] Marco Gaido commented on SPARK-25454: - [~bersprockets] you're right, the only "wrong" thing of your statements is that the problem is not about 1000 but about 1e6, which in 2.2 was considered a decimal(10, 0) and now it is parsed as a decimal(1, -6). You could reproduce the same issue using {{lit(BigDecimal(1e6))}} in 2.2. So the problem is that we are not handling properly decimals with negative scale, but we are not forbidding their existence either, hence the issue. Making more common the presence of negative scale numbers made the issue more evident. Hope this is clear. Thanks. > Division between operands with negative scale can cause precision loss > -- > > Key: SPARK-25454 > URL: https://issues.apache.org/jira/browse/SPARK-25454 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.3.1 >Reporter: Marco Gaido >Priority: Major > > The issue was originally reported by [~bersprockets] here: > https://issues.apache.org/jira/browse/SPARK-22036?focusedCommentId=16618104&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16618104. > The problem consist in a precision loss when the second operand of the > division is a decimal with a negative scale. It was present also before 2.3 > but it was harder to reproduce: you had to do something like > {{lit(BigDecimal(100e6))}}, while now this can happen more frequently with > SQL constants. > The problem is that our logic is taken from Hive and SQLServer where decimals > with negative scales are not allowed. We might also consider enforcing this > too in 3.0 eventually. Meanwhile we can fix the logic for computing the > result type for a division. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Moved] (SPARK-25462) hive on spark - got a weird output when count(*) from this script
[ https://issues.apache.org/jira/browse/SPARK-25462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gu Yuchen moved HIVE-20592 to SPARK-25462: -- Shepherd: Jeremy Affects Version/s: 1.6.2 Component/s: SQL Workflow: no-reopen-closed (was: no-reopen-closed, patch-avail) Issue Type: Question (was: Bug) Key: SPARK-25462 (was: HIVE-20592) Project: Spark (was: Hive) > hive on spark - got a weird output when count(*) from this script > -- > > Key: SPARK-25462 > URL: https://issues.apache.org/jira/browse/SPARK-25462 > Project: Spark > Issue Type: Question > Components: SQL >Affects Versions: 1.6.2 > Environment: spark 1.6.1 > hive 1.2.2 > hadoop 2.7.1 >Reporter: Gu Yuchen >Priority: Major > Attachments: jira.png, test.gz.parquet > > > > use hiveContext to exec a script below: > with nt as (select label, score from (select * from (select label, score, > row_number() over (order by score desc) as position from t1)t_1 join (select > count(*) as countall from t1)t_2 )ta where position <= countall * 0.4) select > count(*) as c_positive from nt where label = 1 > and i got this result. > !jira.png! > it is weird when call the 'count()' func on rdd and dataframe, > as the pic says: different output here > can someone help me out? thanks a lot > > PS: the parquet file i used is the 'test.gz.parquet' in Attachments. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25452) Query with where clause is giving unexpected result in case of float column
[ https://issues.apache.org/jira/browse/SPARK-25452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620156#comment-16620156 ] Hyukjin Kwon commented on SPARK-25452: -- Thanks. I will appreciate if this can be identified as a duplicate or not. > Query with where clause is giving unexpected result in case of float column > --- > > Key: SPARK-25452 > URL: https://issues.apache.org/jira/browse/SPARK-25452 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: *Spark 2.3.1* > *Hadoop 2.7.2* >Reporter: Ayush Anubhava >Priority: Major > > *Description* : Query with clause is giving unexpected result in case of > float column > > {color:#d04437}*Query with filter less than equal to is giving inappropriate > result{code}*{color} > {code} > 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (0,0.0); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (1,1.1); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; > +++--+ > | a | b | > +++--+ > | 0 | 0.0 | > | 1 | 1.10023841858 | > +++--+ > Query with filter less than equal to is giving in appropriate result > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; > ++--+--+ > | a | b | > ++--+--+ > | 0 | 0.0 | > ++--+--+ > 1 row selected (0.299 seconds) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25460) DataSourceV2: Structured Streaming does not respect SessionConfigSupport
[ https://issues.apache.org/jira/browse/SPARK-25460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620138#comment-16620138 ] Hyukjin Kwon commented on SPARK-25460: -- PR https://github.com/apache/spark/pull/22462 > DataSourceV2: Structured Streaming does not respect SessionConfigSupport > > > Key: SPARK-25460 > URL: https://issues.apache.org/jira/browse/SPARK-25460 > Project: Spark > Issue Type: Sub-task > Components: SQL, Structured Streaming >Affects Versions: 2.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {{SessionConfigSupport}} allows to support configurations as options: > {code} > `spark.datasource.$keyPrefix.xxx` into `xxx`, > {code} > Currently, structured streaming does seem supporting this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23200) Reset configuration when restarting from checkpoints
[ https://issues.apache.org/jira/browse/SPARK-23200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yinan Li resolved SPARK-23200. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22392 [https://github.com/apache/spark/pull/22392] > Reset configuration when restarting from checkpoints > > > Key: SPARK-23200 > URL: https://issues.apache.org/jira/browse/SPARK-23200 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Anirudh Ramanathan >Priority: Major > Fix For: 2.4.0 > > > Streaming workloads and restarting from checkpoints may need additional > changes, i.e. resetting properties - see > https://github.com/apache-spark-on-k8s/spark/pull/516 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25461) PySpark Pandas UDF outputs incorrect results when input columns contain None
Chongyuan Xiang created SPARK-25461: --- Summary: PySpark Pandas UDF outputs incorrect results when input columns contain None Key: SPARK-25461 URL: https://issues.apache.org/jira/browse/SPARK-25461 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.3.1 Environment: I reproduced this issue by running pyspark locally on mac: Spark version: 2.3.1 pre-built with Hadoop 2.7 Python library versions: pyarrow==0.10.0, pandas==0.20.2 Reporter: Chongyuan Xiang The following PySpark script uses a simple pandas UDF to calculate a column given column 'A'. When column 'A' contains None, the results look incorrect. Script: {code:java} import pandas as pd import random import pyspark from pyspark.sql.functions import col, lit, pandas_udf values = [None] * 3 + [1.0] * 17 + [2.0] * 600 random.shuffle(values) pdf = pd.DataFrame({'A': values}) df = spark.createDataFrame(pdf) @pandas_udf(returnType=pyspark.sql.types.BooleanType()) def gt_2(column): return (column >= 2).where(column.notnull()) calculated_df = (df.select(['A']) .withColumn('potential_bad_col', gt_2('A')) ) calculated_df = calculated_df.withColumn('correct_col', (col("A") >= lit(2)) | (col("A").isNull())) calculated_df.show() {code} Output: {code:java} +---+-+---+ | A|potential_bad_col|correct_col| +---+-+---+ |2.0| false| true| |2.0| false| true| |2.0| false| true| |1.0| false| false| |2.0| false| true| |2.0| false| true| |2.0| false| true| |2.0| false| true| |2.0| false| true| |2.0| false| true| |2.0| false| true| |2.0| false| true| |2.0| false| true| |2.0| false| true| |2.0| false| true| |2.0| false| true| |2.0| false| true| |2.0| false| true| |2.0| false| true| |2.0| false| true| +---+-+---+ only showing top 20 rows {code} This problem disappears when the number of rows is small or when the input column does not contain None. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25460) DataSourceV2: Structured Streaming does not respect SessionConfigSupport
Hyukjin Kwon created SPARK-25460: Summary: DataSourceV2: Structured Streaming does not respect SessionConfigSupport Key: SPARK-25460 URL: https://issues.apache.org/jira/browse/SPARK-25460 Project: Spark Issue Type: Sub-task Components: SQL, Structured Streaming Affects Versions: 2.4.0 Reporter: Hyukjin Kwon {{SessionConfigSupport}} allows to support configurations as options: {code} `spark.datasource.$keyPrefix.xxx` into `xxx`, {code} Currently, structured streaming does seem supporting this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25453) OracleIntegrationSuite IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
[ https://issues.apache.org/jira/browse/SPARK-25453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620058#comment-16620058 ] Chenxiao Mao commented on SPARK-25453: -- User 'seancxmao' has created a pull request for this issue: [https://github.com/apache/spark/pull/22461] > OracleIntegrationSuite IllegalArgumentException: Timestamp format must be > -mm-dd hh:mm:ss[.f] > - > > Key: SPARK-25453 > URL: https://issues.apache.org/jira/browse/SPARK-25453 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > - SPARK-22814 support date/timestamp types in partitionColumn *** FAILED *** > java.lang.IllegalArgumentException: Timestamp format must be -mm-dd > hh:mm:ss[.f] > at java.sql.Timestamp.valueOf(Timestamp.java:204) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:183) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167) > at > org.apache.spark.sql.jdbc.OracleIntegrationSuite$$anonfun$18.apply(OracleIntegrationSuite.scala:445) > at > org.apache.spark.sql.jdbc.OracleIntegrationSuite$$anonfun$18.apply(OracleIntegrationSuite.scala:427) > ...{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25459) Add viewOriginalText back to CatalogTable
Zheyuan Zhao created SPARK-25459: Summary: Add viewOriginalText back to CatalogTable Key: SPARK-25459 URL: https://issues.apache.org/jira/browse/SPARK-25459 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.1, 2.3.0, 2.2.2, 2.2.1, 2.2.0 Reporter: Zheyuan Zhao The {{show create table}} will show a lot of generated attributes for views that created by older Spark version. See this test suite https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSQLViewSuite.scala#L115. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-19724) create a managed table with an existed default location should throw an exception
[ https://issues.apache.org/jira/browse/SPARK-19724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-19724. - Resolution: Fixed Assignee: Gengliang Wang Fix Version/s: 2.4.0 > create a managed table with an existed default location should throw an > exception > - > > Key: SPARK-19724 > URL: https://issues.apache.org/jira/browse/SPARK-19724 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Song Jun >Assignee: Gengliang Wang >Priority: Major > Fix For: 2.4.0 > > > This JIRA is a follow up work after > [SPARK-19583](https://issues.apache.org/jira/browse/SPARK-19583) > As we discussed in that [PR](https://github.com/apache/spark/pull/16938) > The following DDL for a managed table with an existed default location should > throw an exception: > {code} > CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ... > CREATE TABLE ... (PARTITIONED BY ...) > {code} > Currently there are some situations which are not consist with above logic: > 1. CREATE TABLE ... (PARTITIONED BY ...) succeed with an existed default > location > situation: for both hive/datasource(with HiveExternalCatalog/InMemoryCatalog) > 2. CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ... > situation: hive table succeed with an existed default location -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25454) Division between operands with negative scale can cause precision loss
[ https://issues.apache.org/jira/browse/SPARK-25454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619911#comment-16619911 ] Bruce Robbins commented on SPARK-25454: --- Thanks [~mgaido], OK, so the way I understand it - negative scales are the problem - they were also a problem in 2.2, but it was more difficult to reproduce. For my example case (in the referenced Jira), the change to the promotion of literal 1000 in 2.3 exposed an existing issue with the handling of 1e6. This bears out, in that when I replace 1e6 with 100, the issue goes away (at least for my example case): {noformat} scala> sql("select 26393499451/(100 * 1000) as c1").show ++ | c1| ++ |26.393499451| ++ {noformat} > Division between operands with negative scale can cause precision loss > -- > > Key: SPARK-25454 > URL: https://issues.apache.org/jira/browse/SPARK-25454 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.3.1 >Reporter: Marco Gaido >Priority: Major > > The issue was originally reported by [~bersprockets] here: > https://issues.apache.org/jira/browse/SPARK-22036?focusedCommentId=16618104&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16618104. > The problem consist in a precision loss when the second operand of the > division is a decimal with a negative scale. It was present also before 2.3 > but it was harder to reproduce: you had to do something like > {{lit(BigDecimal(100e6))}}, while now this can happen more frequently with > SQL constants. > The problem is that our logic is taken from Hive and SQLServer where decimals > with negative scales are not allowed. We might also consider enforcing this > too in 3.0 eventually. Meanwhile we can fix the logic for computing the > result type for a division. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24572) "eager execution" for R shell, IDE
[ https://issues.apache.org/jira/browse/SPARK-24572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619872#comment-16619872 ] Weiqiang Zhuang commented on SPARK-24572: - thanks [~felixcheung], raised PR https://github.com/apache/spark/pull/22455. > "eager execution" for R shell, IDE > -- > > Key: SPARK-24572 > URL: https://issues.apache.org/jira/browse/SPARK-24572 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Felix Cheung >Priority: Major > > like python in SPARK-24215 > we could also have eager execution when SparkDataFrame is returned to the R > shell -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24626) Parallelize size calculation in Analyze Table command
[ https://issues.apache.org/jira/browse/SPARK-24626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-24626. - Resolution: Fixed Assignee: Reynold Xin Fix Version/s: 2.4.0 > Parallelize size calculation in Analyze Table command > - > > Key: SPARK-24626 > URL: https://issues.apache.org/jira/browse/SPARK-24626 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Achuth Narayan Rajagopal >Assignee: Reynold Xin >Priority: Major > Fix For: 2.4.0 > > > Currently, Analyze table calculates table size sequentially for each > partition. We can parallelize size calculations over partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24626) Parallelize size calculation in Analyze Table command
[ https://issues.apache.org/jira/browse/SPARK-24626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-24626: Summary: Parallelize size calculation in Analyze Table command (was: Improve Analyze Table command) > Parallelize size calculation in Analyze Table command > - > > Key: SPARK-24626 > URL: https://issues.apache.org/jira/browse/SPARK-24626 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Achuth Narayan Rajagopal >Priority: Major > Fix For: 2.4.0 > > > Currently, Analyze table calculates table size sequentially for each > partition. We can parallelize size calculations over partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25419) Parquet predicate pushdown improvement
[ https://issues.apache.org/jira/browse/SPARK-25419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-25419: --- Assignee: Yuming Wang > Parquet predicate pushdown improvement > -- > > Key: SPARK-25419 > URL: https://issues.apache.org/jira/browse/SPARK-25419 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 2.4.0 > > > Parquet predicate pushdown support: ByteType, ShortType, DecimalType, > DateType, TimestampType. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25456) PythonForeachWriterSuite failing
[ https://issues.apache.org/jira/browse/SPARK-25456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reassigned SPARK-25456: Assignee: Imran Rashid > PythonForeachWriterSuite failing > > > Key: SPARK-25456 > URL: https://issues.apache.org/jira/browse/SPARK-25456 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Assignee: Imran Rashid >Priority: Blocker > Fix For: 2.4.0 > > > This is failing regularly, see eg. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96164/testReport/junit/org.apache.spark.sql.execution.python/PythonForeachWriterSuite/UnsafeRowBuffer__iterator_blocks_when_no_data_is_available/ > I will post a fix shortly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25456) PythonForeachWriterSuite failing
[ https://issues.apache.org/jira/browse/SPARK-25456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-25456. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22452 [https://github.com/apache/spark/pull/22452] > PythonForeachWriterSuite failing > > > Key: SPARK-25456 > URL: https://issues.apache.org/jira/browse/SPARK-25456 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Assignee: Imran Rashid >Priority: Blocker > Fix For: 2.4.0 > > > This is failing regularly, see eg. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96164/testReport/junit/org.apache.spark.sql.execution.python/PythonForeachWriterSuite/UnsafeRowBuffer__iterator_blocks_when_no_data_is_available/ > I will post a fix shortly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25291) Flakiness of tests in terms of executor memory (SecretsTestSuite)
[ https://issues.apache.org/jira/browse/SPARK-25291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yinan Li resolved SPARK-25291. -- Resolution: Fixed Fix Version/s: 2.4.0 > Flakiness of tests in terms of executor memory (SecretsTestSuite) > - > > Key: SPARK-25291 > URL: https://issues.apache.org/jira/browse/SPARK-25291 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Ilan Filonenko >Priority: Major > Fix For: 2.4.0 > > > SecretsTestSuite shows flakiness in terms of correct setting of executor > memory: > Run SparkPi with env and mount secrets. *** FAILED *** > "[884]Mi" did not equal "[1408]Mi" (KubernetesSuite.scala:272) > When ran with default settings -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25458) Support FOR ALL COLUMNS in ANALYZE TABLE
[ https://issues.apache.org/jira/browse/SPARK-25458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619310#comment-16619310 ] Dilip Biswal commented on SPARK-25458: -- [~smilegator] I would like to work on this. > Support FOR ALL COLUMNS in ANALYZE TABLE > - > > Key: SPARK-25458 > URL: https://issues.apache.org/jira/browse/SPARK-25458 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.5.0 >Reporter: Xiao Li >Priority: Major > > Currently, to collect the statistics of all the columns, users need to > specify the names of all the columns when calling the command "ANALYZE TABLE > ... FOR COLUMNS...". This is not user friendly. Instead, we can introduce the > following SQL command to achieve it without specifying the column names. > {code:java} >ANALYZE TABLE [db_name.]tablename COMPUTE STATISTICS FOR ALL COLUMNS; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25458) Support FOR ALL COLUMNS in ANALYZE TABLE
Xiao Li created SPARK-25458: --- Summary: Support FOR ALL COLUMNS in ANALYZE TABLE Key: SPARK-25458 URL: https://issues.apache.org/jira/browse/SPARK-25458 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.5.0 Reporter: Xiao Li Currently, to collect the statistics of all the columns, users need to specify the names of all the columns when calling the command "ANALYZE TABLE ... FOR COLUMNS...". This is not user friendly. Instead, we can introduce the following SQL command to achieve it without specifying the column names. {code:java} ANALYZE TABLE [db_name.]tablename COMPUTE STATISTICS FOR ALL COLUMNS; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18185) Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions
[ https://issues.apache.org/jira/browse/SPARK-18185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619243#comment-16619243 ] Deepanker commented on SPARK-18185: --- Tested this even further and found out that it works for managed tables as well. But not for tables created via saveAsTable API of spark. As a test i did the following: saveAsTable [Stored as ORC ] [Partitioned by arbitrary column] [Doesn't work for this] Create Table like x [Using Beeline CLI] [Same properties as above table] [Works for this] Create external Table like x [Using Beeline CLI] [Same properties as above table] [Works for this] Is this the expected behaviour? I am using Spark 2.2 and Hive 1.1 > Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions > -- > > Key: SPARK-18185 > URL: https://issues.apache.org/jira/browse/SPARK-18185 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Eric Liang >Assignee: Eric Liang >Priority: Major > Fix For: 2.1.0 > > > As of current 2.1, INSERT OVERWRITE with dynamic partitions against a > Datasource table will overwrite the entire table instead of only the updated > partitions as in Hive. It also doesn't respect custom partition locations. > We should delete only the proper partitions, scan the metastore for affected > partitions with custom locations, and ensure that deletes/writes go to the > right locations for those as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20236) Overwrite a partitioned data source table should only overwrite related partitions
[ https://issues.apache.org/jira/browse/SPARK-20236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619236#comment-16619236 ] Deepanker edited comment on SPARK-20236 at 9/18/18 3:07 PM: Hi Wenchen Fan, Tested this even further and found out that it works for managed tables as well. But not for tables created via saveAsTable API of spark. As a test i did the following: saveAsTable [Stored as ORC ] [Partitioned by arbitrary column] [Doesn't work for this] Create Table like x [Using Beeline CLI] [Same properties as above table] [Works for this] Create external Table like x [Using Beeline CLI] [Same properties as above table] [Works for this] Is this the expected behaviour? I am using Spark 2.2 and Hive 1.1 If/When find time can you also confirm my hypothesis for the difference between two Jira in my previous post? was (Author: deepanker): Hi Wenchen Fan, Tested this even further and found out that it works for managed tables as well. But not for tables created via saveAsTable API of spark. As a test i did the following: saveAsTable (x) [Stored as ORC ]partitioned by arbitrary column] [Stored as ORC ] [Doesn't work for this] Create Table like x [Using Beeline CLI] [Same properties as above table] [Works for this] Create external Table like x [Using Beeline CLI] [Same properties as above table] [Works for this] > Overwrite a partitioned data source table should only overwrite related > partitions > -- > > Key: SPARK-20236 > URL: https://issues.apache.org/jira/browse/SPARK-20236 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: releasenotes > Fix For: 2.3.0 > > > When we overwrite a partitioned data source table, currently Spark will > truncate the entire table to write new data, or truncate a bunch of > partitions according to the given static partitions. > For example, {{INSERT OVERWRITE tbl ...}} will truncate the entire table, > {{INSERT OVERWRITE tbl PARTITION (a=1, b)}} will truncate all the partitions > that starts with {{a=1}}. > This behavior is kind of reasonable as we can know which partitions will be > overwritten before runtime. However, hive has a different behavior that it > only overwrites related partitions, e.g. {{INSERT OVERWRITE tbl SELECT > 1,2,3}} will only overwrite partition {{a=2, b=3}}, assuming {{tbl}} has only > one data column and is partitioned by {{a}} and {{b}}. > It seems better if we can follow hive's behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20236) Overwrite a partitioned data source table should only overwrite related partitions
[ https://issues.apache.org/jira/browse/SPARK-20236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619236#comment-16619236 ] Deepanker commented on SPARK-20236: --- Hi Wenchen Fan, Tested this even further and found out that it works for managed tables as well. But not for tables created via saveAsTable API of spark. As a test i did the following: saveAsTable (x) [Stored as ORC ]partitioned by arbitrary column] [Stored as ORC ] [Doesn't work for this] Create Table like x [Using Beeline CLI] [Same properties as above table] [Works for this] Create external Table like x [Using Beeline CLI] [Same properties as above table] [Works for this] > Overwrite a partitioned data source table should only overwrite related > partitions > -- > > Key: SPARK-20236 > URL: https://issues.apache.org/jira/browse/SPARK-20236 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: releasenotes > Fix For: 2.3.0 > > > When we overwrite a partitioned data source table, currently Spark will > truncate the entire table to write new data, or truncate a bunch of > partitions according to the given static partitions. > For example, {{INSERT OVERWRITE tbl ...}} will truncate the entire table, > {{INSERT OVERWRITE tbl PARTITION (a=1, b)}} will truncate all the partitions > that starts with {{a=1}}. > This behavior is kind of reasonable as we can know which partitions will be > overwritten before runtime. However, hive has a different behavior that it > only overwrites related partitions, e.g. {{INSERT OVERWRITE tbl SELECT > 1,2,3}} will only overwrite partition {{a=2, b=3}}, assuming {{tbl}} has only > one data column and is partitioned by {{a}} and {{b}}. > It seems better if we can follow hive's behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25445) publish a scala 2.12 build with Spark 2.4
[ https://issues.apache.org/jira/browse/SPARK-25445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-25445. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22441 [https://github.com/apache/spark/pull/22441] > publish a scala 2.12 build with Spark 2.4 > - > > Key: SPARK-25445 > URL: https://issues.apache.org/jira/browse/SPARK-25445 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 2.4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25457) IntegralDivide (div) should not always return long
Marco Gaido created SPARK-25457: --- Summary: IntegralDivide (div) should not always return long Key: SPARK-25457 URL: https://issues.apache.org/jira/browse/SPARK-25457 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.5.0 Reporter: Marco Gaido The operation {{div}} returns always long. This came from Hive's behavior, which is different to the one of most of other DBMS (eg. MySQL, Postgres) which return as a datatype the same of the operands. This JIRA tracks changing our return type and allowing the users to re-enable the old behavior using {{spark.sql.legacy.integralDiv.returnLong}}. I'll submit a PR for this soon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20236) Overwrite a partitioned data source table should only overwrite related partitions
[ https://issues.apache.org/jira/browse/SPARK-20236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619180#comment-16619180 ] Wenchen Fan commented on SPARK-20236: - It should work for managed table as well. Can you open a JIRA and report the issues for managed table? > Overwrite a partitioned data source table should only overwrite related > partitions > -- > > Key: SPARK-20236 > URL: https://issues.apache.org/jira/browse/SPARK-20236 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: releasenotes > Fix For: 2.3.0 > > > When we overwrite a partitioned data source table, currently Spark will > truncate the entire table to write new data, or truncate a bunch of > partitions according to the given static partitions. > For example, {{INSERT OVERWRITE tbl ...}} will truncate the entire table, > {{INSERT OVERWRITE tbl PARTITION (a=1, b)}} will truncate all the partitions > that starts with {{a=1}}. > This behavior is kind of reasonable as we can know which partitions will be > overwritten before runtime. However, hive has a different behavior that it > only overwrites related partitions, e.g. {{INSERT OVERWRITE tbl SELECT > 1,2,3}} will only overwrite partition {{a=2, b=3}}, assuming {{tbl}} has only > one data column and is partitioned by {{a}} and {{b}}. > It seems better if we can follow hive's behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24777) Add write benchmark for AVRO
[ https://issues.apache.org/jira/browse/SPARK-24777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-24777: --- Summary: Add write benchmark for AVRO (was: Refactor AVRO read/write benchmark) > Add write benchmark for AVRO > > > Key: SPARK-24777 > URL: https://issues.apache.org/jira/browse/SPARK-24777 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25453) OracleIntegrationSuite IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
[ https://issues.apache.org/jira/browse/SPARK-25453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619165#comment-16619165 ] Chenxiao Mao commented on SPARK-25453: -- I'm working on this. cc [~maropu] [~yumwang] > OracleIntegrationSuite IllegalArgumentException: Timestamp format must be > -mm-dd hh:mm:ss[.f] > - > > Key: SPARK-25453 > URL: https://issues.apache.org/jira/browse/SPARK-25453 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > - SPARK-22814 support date/timestamp types in partitionColumn *** FAILED *** > java.lang.IllegalArgumentException: Timestamp format must be -mm-dd > hh:mm:ss[.f] > at java.sql.Timestamp.valueOf(Timestamp.java:204) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:183) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167) > at > org.apache.spark.sql.jdbc.OracleIntegrationSuite$$anonfun$18.apply(OracleIntegrationSuite.scala:445) > at > org.apache.spark.sql.jdbc.OracleIntegrationSuite$$anonfun$18.apply(OracleIntegrationSuite.scala:427) > ...{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25456) PythonForeachWriterSuite failing
Imran Rashid created SPARK-25456: Summary: PythonForeachWriterSuite failing Key: SPARK-25456 URL: https://issues.apache.org/jira/browse/SPARK-25456 Project: Spark Issue Type: Test Components: SQL Affects Versions: 2.4.0 Reporter: Imran Rashid This is failing regularly, see eg. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96164/testReport/junit/org.apache.spark.sql.execution.python/PythonForeachWriterSuite/UnsafeRowBuffer__iterator_blocks_when_no_data_is_available/ I will post a fix shortly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25455) Spark bundles jackson library version, which is vulnerable
Madhusudan N created SPARK-25455: Summary: Spark bundles jackson library version, which is vulnerable Key: SPARK-25455 URL: https://issues.apache.org/jira/browse/SPARK-25455 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.1, 2.2.0 Reporter: Madhusudan N We have hosted one of our application in SPARK standalone mode and the application has the below jackson library dependencies. Version = 2.9.6 * jackson-core * jackson-databind * jackson-dataformat-cbor * jackson-dataformat-xml * jackson-dataformat-yaml Due to a vulnerability with jackson 2.6.6 as indicated by the Veracode, it has been upgraded to 2.9.6 version. Please find the link which depicts the vulnerability issue with jackson 2.6.6. [http://cwe.mitre.org/data/definitions/470.html] Spark version (2.2.0 and 2.3.1) has dependency with jackson-core 2.6.5 and jackson-core-2.6.7, but our application needs jackson-core 2.9.6. Because of this, application crashes. Please find the stacktrace below :: {{_Exception in thread "main" [Loaded java.lang.Throwable$WrappedPrintStream from /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar]_}}{{_java.lang.NoSuchFieldError: NO_INTS_}}{{ __ }} {{_at com.fasterxml.jackson.dataformat.cbor.CBORParser.(CBORParser.java:285)_}}{{ __ }} {{_at com.fasterxml.jackson.dataformat.cbor.CBORParserBootstrapper.constructParser(CBORParserBootstrapper.java:91)_}}{{ __ }} {{_at com.fasterxml.jackson.dataformat.cbor.CBORFactory._createParser(CBORFactory.java:377)_}} Spark needs to use jackson-core-2.9.6 version., which does not have the vulnerability -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22036) BigDecimal multiplication sometimes returns null
[ https://issues.apache.org/jira/browse/SPARK-22036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619129#comment-16619129 ] Marco Gaido commented on SPARK-22036: - [~bersprockets] I created SPARK-25454 for tracking since I have a path for this and it might be considered as a blocker for 2.4, so I wanted to expedite it. I am submitting a patch for this soon. Sorry for the problem again. Thanks. > BigDecimal multiplication sometimes returns null > > > Key: SPARK-22036 > URL: https://issues.apache.org/jira/browse/SPARK-22036 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Olivier Blanvillain >Assignee: Marco Gaido >Priority: Major > Fix For: 2.3.0 > > > The multiplication of two BigDecimal numbers sometimes returns null. Here is > a minimal reproduction: > {code:java} > object Main extends App { > import org.apache.spark.{SparkConf, SparkContext} > import org.apache.spark.sql.SparkSession > import spark.implicits._ > val conf = new > SparkConf().setMaster("local[*]").setAppName("REPL").set("spark.ui.enabled", > "false") > val spark = > SparkSession.builder().config(conf).appName("REPL").getOrCreate() > implicit val sqlContext = spark.sqlContext > case class X2(a: BigDecimal, b: BigDecimal) > val ds = sqlContext.createDataset(List(X2(BigDecimal(-0.1267333984375), > BigDecimal(-1000.1 > val result = ds.select(ds("a") * ds("b")).collect.head > println(result) // [null] > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25454) Division between operands with negative scale can cause precision loss
Marco Gaido created SPARK-25454: --- Summary: Division between operands with negative scale can cause precision loss Key: SPARK-25454 URL: https://issues.apache.org/jira/browse/SPARK-25454 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.1, 2.3.0 Reporter: Marco Gaido The issue was originally reported by [~bersprockets] here: https://issues.apache.org/jira/browse/SPARK-22036?focusedCommentId=16618104&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16618104. The problem consist in a precision loss when the second operand of the division is a decimal with a negative scale. It was present also before 2.3 but it was harder to reproduce: you had to do something like {{lit(BigDecimal(100e6))}}, while now this can happen more frequently with SQL constants. The problem is that our logic is taken from Hive and SQLServer where decimals with negative scales are not allowed. We might also consider enforcing this too in 3.0 eventually. Meanwhile we can fix the logic for computing the result type for a division. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25453) OracleIntegrationSuite IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
[ https://issues.apache.org/jira/browse/SPARK-25453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619121#comment-16619121 ] Takeshi Yamamuro commented on SPARK-25453: -- oh, thanks. Can you fix this? > OracleIntegrationSuite IllegalArgumentException: Timestamp format must be > -mm-dd hh:mm:ss[.f] > - > > Key: SPARK-25453 > URL: https://issues.apache.org/jira/browse/SPARK-25453 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > - SPARK-22814 support date/timestamp types in partitionColumn *** FAILED *** > java.lang.IllegalArgumentException: Timestamp format must be -mm-dd > hh:mm:ss[.f] > at java.sql.Timestamp.valueOf(Timestamp.java:204) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:183) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167) > at > org.apache.spark.sql.jdbc.OracleIntegrationSuite$$anonfun$18.apply(OracleIntegrationSuite.scala:445) > at > org.apache.spark.sql.jdbc.OracleIntegrationSuite$$anonfun$18.apply(OracleIntegrationSuite.scala:427) > ...{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25453) OracleIntegrationSuite IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
[ https://issues.apache.org/jira/browse/SPARK-25453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619095#comment-16619095 ] Yuming Wang commented on SPARK-25453: - cc [~maropu] > OracleIntegrationSuite IllegalArgumentException: Timestamp format must be > -mm-dd hh:mm:ss[.f] > - > > Key: SPARK-25453 > URL: https://issues.apache.org/jira/browse/SPARK-25453 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > - SPARK-22814 support date/timestamp types in partitionColumn *** FAILED *** > java.lang.IllegalArgumentException: Timestamp format must be -mm-dd > hh:mm:ss[.f] > at java.sql.Timestamp.valueOf(Timestamp.java:204) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:183) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167) > at > org.apache.spark.sql.jdbc.OracleIntegrationSuite$$anonfun$18.apply(OracleIntegrationSuite.scala:445) > at > org.apache.spark.sql.jdbc.OracleIntegrationSuite$$anonfun$18.apply(OracleIntegrationSuite.scala:427) > ...{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25453) OracleIntegrationSuite IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
Yuming Wang created SPARK-25453: --- Summary: OracleIntegrationSuite IllegalArgumentException: Timestamp format must be -mm-dd hh:mm:ss[.f] Key: SPARK-25453 URL: https://issues.apache.org/jira/browse/SPARK-25453 Project: Spark Issue Type: Test Components: Tests Affects Versions: 2.4.0 Reporter: Yuming Wang {noformat} - SPARK-22814 support date/timestamp types in partitionColumn *** FAILED *** java.lang.IllegalArgumentException: Timestamp format must be -mm-dd hh:mm:ss[.f] at java.sql.Timestamp.valueOf(Timestamp.java:204) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:183) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167) at org.apache.spark.sql.jdbc.OracleIntegrationSuite$$anonfun$18.apply(OracleIntegrationSuite.scala:445) at org.apache.spark.sql.jdbc.OracleIntegrationSuite$$anonfun$18.apply(OracleIntegrationSuite.scala:427) ...{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25452) Query with where clause is giving unexpected result in case of float column
[ https://issues.apache.org/jira/browse/SPARK-25452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Anubhava updated SPARK-25452: --- Summary: Query with where clause is giving unexpected result in case of float column (was: Query with clause is giving unexpected result in case of float coloumn) > Query with where clause is giving unexpected result in case of float column > --- > > Key: SPARK-25452 > URL: https://issues.apache.org/jira/browse/SPARK-25452 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: *Spark 2.3.1* > *Hadoop 2.7.2* >Reporter: Ayush Anubhava >Priority: Major > > *Description* : Query with clause is giving unexpected result in case of > float column > > {color:#d04437}*Query with filter less than equal to is giving inappropriate > result{code}*{color} > {code} > 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (0,0.0); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (1,1.1); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; > +++--+ > | a | b | > +++--+ > | 0 | 0.0 | > | 1 | 1.10023841858 | > +++--+ > Query with filter less than equal to is giving in appropriate result > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; > ++--+--+ > | a | b | > ++--+--+ > | 0 | 0.0 | > ++--+--+ > 1 row selected (0.299 seconds) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25452) Query with clause is giving unexpected result in case of float coloumn
[ https://issues.apache.org/jira/browse/SPARK-25452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619060#comment-16619060 ] Ayush Anubhava edited comment on SPARK-25452 at 9/18/18 12:40 PM: -- Hi HyukjiKwon , Thank you so much for the update. Will cherry pick the PR and check . My issue is related with filter where in filter datatype is float . Ideally if datatype is float then it should internally cast the value as well to the corresponding datatype. Oracle db also behaves the same .i.e we need not give cast explicitly. was (Author: ayush007): Hi HyukjiKwon , Thank you so much for the update. Will cherry pick the PR and check . My issue is related with filter where in filter datatype is float . Ideally if datatype is float then it should internally cast the value as well to the corresponding datatype. Oracle also behaves the same .i.e we need not give cast explicitly. > Query with clause is giving unexpected result in case of float coloumn > -- > > Key: SPARK-25452 > URL: https://issues.apache.org/jira/browse/SPARK-25452 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: *Spark 2.3.1* > *Hadoop 2.7.2* >Reporter: Ayush Anubhava >Priority: Major > > *Description* : Query with clause is giving unexpected result in case of > float column > > {color:#d04437}*Query with filter less than equal to is giving inappropriate > result{code}*{color} > {code} > 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (0,0.0); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (1,1.1); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; > +++--+ > | a | b | > +++--+ > | 0 | 0.0 | > | 1 | 1.10023841858 | > +++--+ > Query with filter less than equal to is giving in appropriate result > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; > ++--+--+ > | a | b | > ++--+--+ > | 0 | 0.0 | > ++--+--+ > 1 row selected (0.299 seconds) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25452) Query with clause is giving unexpected result in case of float coloumn
[ https://issues.apache.org/jira/browse/SPARK-25452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619060#comment-16619060 ] Ayush Anubhava edited comment on SPARK-25452 at 9/18/18 12:39 PM: -- Hi HyukjiKwon , Thank you so much for the update. Will cherry pick the PR and check . My issue is related with filter where in filter datatype is float . Ideally if datatype is float then it should internally cast the value as well to the corresponding datatype. Oracle also behaves the same .i.e we need not give cast explicitly. was (Author: ayush007): Hi HyukjiKwon , Thank you so much for the update. Will cherry pick the PR and check . My issue is related with filter where in filter datatype is float . Ideally if datatype is float then it should internally cast the value as well to the corresponding datatype. > Query with clause is giving unexpected result in case of float coloumn > -- > > Key: SPARK-25452 > URL: https://issues.apache.org/jira/browse/SPARK-25452 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: *Spark 2.3.1* > *Hadoop 2.7.2* >Reporter: Ayush Anubhava >Priority: Major > > *Description* : Query with clause is giving unexpected result in case of > float column > > {color:#d04437}*Query with filter less than equal to is giving inappropriate > result{code}*{color} > {code} > 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (0,0.0); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (1,1.1); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; > +++--+ > | a | b | > +++--+ > | 0 | 0.0 | > | 1 | 1.10023841858 | > +++--+ > Query with filter less than equal to is giving in appropriate result > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; > ++--+--+ > | a | b | > ++--+--+ > | 0 | 0.0 | > ++--+--+ > 1 row selected (0.299 seconds) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25452) Query with clause is giving unexpected result in case of float coloumn
[ https://issues.apache.org/jira/browse/SPARK-25452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619060#comment-16619060 ] Ayush Anubhava commented on SPARK-25452: Hi HyukjiKwon , Thank you so much for the update. Will cherry pick the PR and check . My issue is related with filter where in filter datatype is float . Ideally if datatype is float then it should internally cast the value as well to the corresponding datatype. > Query with clause is giving unexpected result in case of float coloumn > -- > > Key: SPARK-25452 > URL: https://issues.apache.org/jira/browse/SPARK-25452 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: *Spark 2.3.1* > *Hadoop 2.7.2* >Reporter: Ayush Anubhava >Priority: Major > > *Description* : Query with clause is giving unexpected result in case of > float column > > {color:#d04437}*Query with filter less than equal to is giving inappropriate > result{code}*{color} > {code} > 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (0,0.0); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (1,1.1); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; > +++--+ > | a | b | > +++--+ > | 0 | 0.0 | > | 1 | 1.10023841858 | > +++--+ > Query with filter less than equal to is giving in appropriate result > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; > ++--+--+ > | a | b | > ++--+--+ > | 0 | 0.0 | > ++--+--+ > 1 row selected (0.299 seconds) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25452) Query with clause is giving unexpected result in case of float coloumn
[ https://issues.apache.org/jira/browse/SPARK-25452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Anubhava updated SPARK-25452: --- Description: *Description* : Query with clause is giving unexpected result in case of float column {color:#d04437}*Query with filter less than equal to is giving inappropriate result{code}*{color} {code} 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (0,0.0); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (1,1.1); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; +++--+ | a | b | +++--+ | 0 | 0.0 | | 1 | 1.10023841858 | +++--+ Query with filter less than equal to is giving in appropriate result 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; ++--+--+ | a | b | ++--+--+ | 0 | 0.0 | ++--+--+ 1 row selected (0.299 seconds) {code} was: *Description* : Query with clause is giving unexpected result in case of float column {color:#d04437}*Query with filter less than equal to is giving in appropriate result{code}*{color} {code} 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (0,0.0); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (1,1.1); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; +++--+ | a | b | +++--+ | 0 | 0.0 | | 1 | 1.10023841858 | +++--+ Query with filter less than equal to is giving in appropriate result 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; ++--+--+ | a | b | ++--+--+ | 0 | 0.0 | ++--+--+ 1 row selected (0.299 seconds) {code} > Query with clause is giving unexpected result in case of float coloumn > -- > > Key: SPARK-25452 > URL: https://issues.apache.org/jira/browse/SPARK-25452 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: *Spark 2.3.1* > *Hadoop 2.7.2* >Reporter: Ayush Anubhava >Priority: Major > > *Description* : Query with clause is giving unexpected result in case of > float column > > {color:#d04437}*Query with filter less than equal to is giving inappropriate > result{code}*{color} > {code} > 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (0,0.0); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (1,1.1); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; > +++--+ > | a | b | > +++--+ > | 0 | 0.0 | > | 1 | 1.10023841858 | > +++--+ > Query with filter less than equal to is giving in appropriate result > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; > ++--+--+ > | a | b | > ++--+--+ > | 0 | 0.0 | > ++--+--+ > 1 row selected (0.299 seconds) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20236) Overwrite a partitioned data source table should only overwrite related partitions
[ https://issues.apache.org/jira/browse/SPARK-20236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618950#comment-16618950 ] Deepanker edited comment on SPARK-20236 at 9/18/18 11:40 AM: - What is the difference between this Jira and these ones: https://issues.apache.org/jira/browse/SPARK-18185, https://issues.apache.org/jira/browse/SPARK-18183 I tested this out with spark 2.2 (which confirms the fix was present before 2.3 as well) this only works for external tables not managed tables in hive? Any reason why is that? Now we can enable/disable this behaviour via this property: _spark.sql.sources.partitionOverwriteMode_ whereas previously it was default? *Update:* I got it. SPARK-20236 provides a feature flag to override this behaviour via the above mentioned property whereas the other Jira fixes the insert overwrite behaviour overall. Although this still doesn't work for Hive managed tables only for external tables. Is this behaviour intentional (as in an external table is considered as datasource table managed via Hive whereas a managed table doesn't)? was (Author: deepanker): What is the difference between this Jira and these ones: https://issues.apache.org/jira/browse/SPARK-18185, https://issues.apache.org/jira/browse/SPARK-18183 I tested this out with spark 2.2 (which confirms the fix was present before 2.3 as well) this only works for external tables not managed tables in hive? Any reason why is that? Now we can enable/disable this behaviour via this property: _spark.sql.sources.partitionOverwriteMode_ whereas previously it was default? *Update:* I got it. SPARK-20236 provides a feature flag to override this behaviour via the above mentioned property whereas the other Jira fixes the insert overwrite behaviour overall. > Overwrite a partitioned data source table should only overwrite related > partitions > -- > > Key: SPARK-20236 > URL: https://issues.apache.org/jira/browse/SPARK-20236 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: releasenotes > Fix For: 2.3.0 > > > When we overwrite a partitioned data source table, currently Spark will > truncate the entire table to write new data, or truncate a bunch of > partitions according to the given static partitions. > For example, {{INSERT OVERWRITE tbl ...}} will truncate the entire table, > {{INSERT OVERWRITE tbl PARTITION (a=1, b)}} will truncate all the partitions > that starts with {{a=1}}. > This behavior is kind of reasonable as we can know which partitions will be > overwritten before runtime. However, hive has a different behavior that it > only overwrites related partitions, e.g. {{INSERT OVERWRITE tbl SELECT > 1,2,3}} will only overwrite partition {{a=2, b=3}}, assuming {{tbl}} has only > one data column and is partitioned by {{a}} and {{b}}. > It seems better if we can follow hive's behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18185) Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions
[ https://issues.apache.org/jira/browse/SPARK-18185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618949#comment-16618949 ] Deepanker edited comment on SPARK-18185 at 9/18/18 11:40 AM: - What is the difference between this Jira and this one: https://issues.apache.org/jira/browse/SPARK-20236 I tested this out with spark 2.2 this only works for external tables not managed tables in hive? Any reason why is that? With 2.3 we can enable/disable this behaviour via this property: _spark.sql.sources.partitionOverwriteMode_ whereas previously it was default? *Update:* I got it. SPARK-20236 provides a feature flag to override this behaviour via the above mentioned property whereas this Jira fixes the insert overwrite behaviour overall. Although this still doesn't work for Hive managed tables only for external tables. Is this behaviour intentional (as in an external table is considered as datasource table managed via Hive whereas a managed table doesn't) ? was (Author: deepanker): What is the difference between this Jira and this one: https://issues.apache.org/jira/browse/SPARK-20236 I tested this out with spark 2.2 this only works for external tables not managed tables in hive? Any reason why is that? With 2.3 we can enable/disable this behaviour via this property: _spark.sql.sources.partitionOverwriteMode_ whereas previously it was default? *Update:* I got it. SPARK-20236 provides a feature flag to override this behaviour via the above mentioned property whereas this Jira fixes the insert overwrite behaviour overall. Although this still doesn't work for Hive managed tables only for external tables. Is this behaviour intentional? > Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions > -- > > Key: SPARK-18185 > URL: https://issues.apache.org/jira/browse/SPARK-18185 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Eric Liang >Assignee: Eric Liang >Priority: Major > Fix For: 2.1.0 > > > As of current 2.1, INSERT OVERWRITE with dynamic partitions against a > Datasource table will overwrite the entire table instead of only the updated > partitions as in Hive. It also doesn't respect custom partition locations. > We should delete only the proper partitions, scan the metastore for affected > partitions with custom locations, and ensure that deletes/writes go to the > right locations for those as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18185) Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions
[ https://issues.apache.org/jira/browse/SPARK-18185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618949#comment-16618949 ] Deepanker edited comment on SPARK-18185 at 9/18/18 11:38 AM: - What is the difference between this Jira and this one: https://issues.apache.org/jira/browse/SPARK-20236 I tested this out with spark 2.2 this only works for external tables not managed tables in hive? Any reason why is that? With 2.3 we can enable/disable this behaviour via this property: _spark.sql.sources.partitionOverwriteMode_ whereas previously it was default? *Update:* I got it. SPARK-20236 provides a feature flag to override this behaviour via the above mentioned property whereas this Jira fixes the insert overwrite behaviour overall. Although this still doesn't work for Hive managed tables only for external tables. Is this behaviour intentional? was (Author: deepanker): What is the difference between this Jira and this one: https://issues.apache.org/jira/browse/SPARK-20236 I tested this out with spark 2.2 this only works for external tables not managed tables in hive? Any reason why is that? With 2.3 we can enable/disable this behaviour via this property: _spark.sql.sources.partitionOverwriteMode_ whereas previously it was default? *Update:* I got it. [SPARK-20236|https://issues.apache.org/jira/browse/SPARK-20236] provides a feature flag to override this behaviour via the above mentioned property whereas this Jira fixes the insert overwrite behaviour overall. > Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions > -- > > Key: SPARK-18185 > URL: https://issues.apache.org/jira/browse/SPARK-18185 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Eric Liang >Assignee: Eric Liang >Priority: Major > Fix For: 2.1.0 > > > As of current 2.1, INSERT OVERWRITE with dynamic partitions against a > Datasource table will overwrite the entire table instead of only the updated > partitions as in Hive. It also doesn't respect custom partition locations. > We should delete only the proper partitions, scan the metastore for affected > partitions with custom locations, and ensure that deletes/writes go to the > right locations for those as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25452) Query with clause is giving unexpected result in case of float coloumn
[ https://issues.apache.org/jira/browse/SPARK-25452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618974#comment-16618974 ] Hyukjin Kwon commented on SPARK-25452: -- Is this a duplicate of SPARK-24829? > Query with clause is giving unexpected result in case of float coloumn > -- > > Key: SPARK-25452 > URL: https://issues.apache.org/jira/browse/SPARK-25452 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: *Spark 2.3.1* > *Hadoop 2.7.2* >Reporter: Ayush Anubhava >Priority: Major > > *Description* : Query with clause is giving unexpected result in case of > float column > > {color:#d04437}*Query with filter less than equal to is giving in appropriate > result{code}*{color} > {code} > 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (0,0.0); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (1,1.1); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; > +++--+ > | a | b | > +++--+ > | 0 | 0.0 | > | 1 | 1.10023841858 | > +++--+ > Query with filter less than equal to is giving in appropriate result > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; > ++--+--+ > | a | b | > ++--+--+ > | 0 | 0.0 | > ++--+--+ > 1 row selected (0.299 seconds) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20236) Overwrite a partitioned data source table should only overwrite related partitions
[ https://issues.apache.org/jira/browse/SPARK-20236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618950#comment-16618950 ] Deepanker edited comment on SPARK-20236 at 9/18/18 11:35 AM: - What is the difference between this Jira and these ones: https://issues.apache.org/jira/browse/SPARK-18185, https://issues.apache.org/jira/browse/SPARK-18183 I tested this out with spark 2.2 (which confirms the fix was present before 2.3 as well) this only works for external tables not managed tables in hive? Any reason why is that? Now we can enable/disable this behaviour via this property: _spark.sql.sources.partitionOverwriteMode_ whereas previously it was default? *Update:* I got it. SPARK-20236 provides a feature flag to override this behaviour via the above mentioned property whereas the other Jira fixes the insert overwrite behaviour overall. was (Author: deepanker): What is the difference between this Jira and these ones: https://issues.apache.org/jira/browse/SPARK-18185, https://issues.apache.org/jira/browse/SPARK-18183 I tested this out with spark 2.2 (which confirms the fix was present before 2.3 as well) this only works for external tables not managed tables in hive? Any reason why is that? Now we can enable/disable this behaviour via this property: {{spark.sql.sources.partitionOverwriteMode }}whereas previously it was default? > Overwrite a partitioned data source table should only overwrite related > partitions > -- > > Key: SPARK-20236 > URL: https://issues.apache.org/jira/browse/SPARK-20236 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: releasenotes > Fix For: 2.3.0 > > > When we overwrite a partitioned data source table, currently Spark will > truncate the entire table to write new data, or truncate a bunch of > partitions according to the given static partitions. > For example, {{INSERT OVERWRITE tbl ...}} will truncate the entire table, > {{INSERT OVERWRITE tbl PARTITION (a=1, b)}} will truncate all the partitions > that starts with {{a=1}}. > This behavior is kind of reasonable as we can know which partitions will be > overwritten before runtime. However, hive has a different behavior that it > only overwrites related partitions, e.g. {{INSERT OVERWRITE tbl SELECT > 1,2,3}} will only overwrite partition {{a=2, b=3}}, assuming {{tbl}} has only > one data column and is partitioned by {{a}} and {{b}}. > It seems better if we can follow hive's behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18185) Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions
[ https://issues.apache.org/jira/browse/SPARK-18185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618949#comment-16618949 ] Deepanker edited comment on SPARK-18185 at 9/18/18 11:34 AM: - What is the difference between this Jira and this one: https://issues.apache.org/jira/browse/SPARK-20236 I tested this out with spark 2.2 this only works for external tables not managed tables in hive? Any reason why is that? With 2.3 we can enable/disable this behaviour via this property: _spark.sql.sources.partitionOverwriteMode_ whereas previously it was default? *Update:* I got it. [SPARK-20236|https://issues.apache.org/jira/browse/SPARK-20236] provides a feature flag to override this behaviour via the above mentioned property whereas this Jira fixes the insert overwrite behaviour overall. was (Author: deepanker): What is the difference between this Jira and this one: https://issues.apache.org/jira/browse/SPARK-20236 I tested this out with spark 2.2 this only works for external tables not managed tables in hive? Any reason why is that? With 2.3 we can enable/disable this behaviour via this property: _spark.sql.sources.partitionOverwriteMode_ whereas previously it was default? > Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions > -- > > Key: SPARK-18185 > URL: https://issues.apache.org/jira/browse/SPARK-18185 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Eric Liang >Assignee: Eric Liang >Priority: Major > Fix For: 2.1.0 > > > As of current 2.1, INSERT OVERWRITE with dynamic partitions against a > Datasource table will overwrite the entire table instead of only the updated > partitions as in Hive. It also doesn't respect custom partition locations. > We should delete only the proper partitions, scan the metastore for affected > partitions with custom locations, and ensure that deletes/writes go to the > right locations for those as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25411) Implement range partition in Spark
[ https://issues.apache.org/jira/browse/SPARK-25411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated SPARK-25411: --- Description: In our product environment, there are some partitioned fact tables, which are all quite huge. To accelerate join execution, we need make them also bucketed. Than comes the problem, if the bucket number is large enough, there may be too many files(files count = bucket number * partition count), which may bring pressure to the HDFS. And if the bucket number is small, Spark will launch equal number of tasks to read/write it. So, can we implement a new partition support range values, just like range partition in Oracle/MySQL ([https://docs.oracle.com/cd/E17952_01/mysql-5.7-en/partitioning-range.html]). Say, we can partition by a date column, and make every two months as a partition, or partitioned by a integer column, make interval of 1 as a partition. Ideally, feature like range partition should be implemented in Hive. While, it's been always hard to update Hive version in a prod environment, and much lightweight and flexible if we implement it in Spark. was: In our PROD environment, there are some partitioned fact tables, which are all quite huge. To accelerate join execution, we need make them also bucketed. Than comes the problem, if the bucket number is large enough, there may be too many files(files count = bucket number * partition count), which may bring pressure to the HDFS. And if the bucket number is small, Spark will launch equal number of tasks to read/write it. So, can we implement a new partition support range values, just like range partition in Oracle/MySQL ([https://docs.oracle.com/cd/E17952_01/mysql-5.7-en/partitioning-range.html]). Say, we can partition by a date column, and make every two months as a partition, or partitioned by a integer column, make interval of 1 as a partition. Ideally, feature like range partition should be implemented in Hive. While, it's been always hard to update Hive version in a prod environment, and much lightweight and flexible if we implement it in Spark. > Implement range partition in Spark > -- > > Key: SPARK-25411 > URL: https://issues.apache.org/jira/browse/SPARK-25411 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wang, Gang >Priority: Major > > In our product environment, there are some partitioned fact tables, which are > all quite huge. To accelerate join execution, we need make them also > bucketed. Than comes the problem, if the bucket number is large enough, there > may be too many files(files count = bucket number * partition count), which > may bring pressure to the HDFS. And if the bucket number is small, Spark will > launch equal number of tasks to read/write it. > > So, can we implement a new partition support range values, just like range > partition in Oracle/MySQL > ([https://docs.oracle.com/cd/E17952_01/mysql-5.7-en/partitioning-range.html]). > Say, we can partition by a date column, and make every two months as a > partition, or partitioned by a integer column, make interval of 1 as a > partition. > > Ideally, feature like range partition should be implemented in Hive. While, > it's been always hard to update Hive version in a prod environment, and much > lightweight and flexible if we implement it in Spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25411) Implement range partition in Spark
[ https://issues.apache.org/jira/browse/SPARK-25411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated SPARK-25411: --- Description: In our PROD environment, there are some partitioned fact tables, which are all quite huge. To accelerate join execution, we need make them also bucketed. Than comes the problem, if the bucket number is large enough, there may be too many files(files count = bucket number * partition count), which may bring pressure to the HDFS. And if the bucket number is small, Spark will launch equal number of tasks to read/write it. So, can we implement a new partition support range values, just like range partition in Oracle/MySQL ([https://docs.oracle.com/cd/E17952_01/mysql-5.7-en/partitioning-range.html]). Say, we can partition by a date column, and make every two months as a partition, or partitioned by a integer column, make interval of 1 as a partition. Ideally, feature like range partition should be implemented in Hive. While, it's been always hard to update Hive version in a prod environment, and much lightweight and flexible if we implement it in Spark. was: In our PROD environment, there are some partitioned fact tables, which are all quite huge. To accelerate join execution, we need make them also bucketed. Than comes the problem, if the bucket number is large enough, there may be two many files(files count = bucket number * partition count), which may bring pressure to the HDFS. And if the bucket number is small, Spark will launch equal number of tasks to read/write it. So, can we implement a new partition support range values, just like range partition in Oracle/MySQL ([https://docs.oracle.com/cd/E17952_01/mysql-5.7-en/partitioning-range.html]). Say, we can partition by a date column, and make every two months as a partition, or partitioned by a integer column, make interval of 1 as a partition. Ideally, feature like range partition should be implemented in Hive. While, it's been always hard to update Hive version in a prod environment, and much lightweight and flexible if we implement it in Spark. > Implement range partition in Spark > -- > > Key: SPARK-25411 > URL: https://issues.apache.org/jira/browse/SPARK-25411 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wang, Gang >Priority: Major > > In our PROD environment, there are some partitioned fact tables, which are > all quite huge. To accelerate join execution, we need make them also > bucketed. Than comes the problem, if the bucket number is large enough, there > may be too many files(files count = bucket number * partition count), which > may bring pressure to the HDFS. And if the bucket number is small, Spark will > launch equal number of tasks to read/write it. > > So, can we implement a new partition support range values, just like range > partition in Oracle/MySQL > ([https://docs.oracle.com/cd/E17952_01/mysql-5.7-en/partitioning-range.html]). > Say, we can partition by a date column, and make every two months as a > partition, or partitioned by a integer column, make interval of 1 as a > partition. > > Ideally, feature like range partition should be implemented in Hive. While, > it's been always hard to update Hive version in a prod environment, and much > lightweight and flexible if we implement it in Spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25452) Query with clause is giving unexpected result in case of float coloumn
[ https://issues.apache.org/jira/browse/SPARK-25452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Anubhava updated SPARK-25452: --- Description: *Description* : Query with clause is giving unexpected result in case of float column {color:#d04437}*Query with filter less than equal to is giving in appropriate result{code}*{color} {code} 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (0,0.0); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (1,1.1); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; +++--+ | a | b | +++--+ | 0 | 0.0 | | 1 | 1.10023841858 | +++--+ Query with filter less than equal to is giving in appropriate result 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; ++--+--+ | a | b | ++--+--+ | 0 | 0.0 | ++--+--+ 1 row selected (0.299 seconds) {code} was: *Description* : Query with clause is giving unexpected result in case of float column {color:#d04437}*Query with filter less than equal to is giving in appropriate result{code}*{color} 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (0,0.0); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (1,1.1); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; +++--+ | a | b | +++--+ | 0 | 0.0 | | 1 | 1.10023841858 | +++--+ Query with filter less than equal to is giving in appropriate result 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; ++--+--+ | a | b | ++--+--+ | 0 | 0.0 | ++--+--+ 1 row selected (0.299 seconds) {code} > Query with clause is giving unexpected result in case of float coloumn > -- > > Key: SPARK-25452 > URL: https://issues.apache.org/jira/browse/SPARK-25452 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: *Spark 2.3.1* > *Hadoop 2.7.2* >Reporter: Ayush Anubhava >Priority: Major > > *Description* : Query with clause is giving unexpected result in case of > float column > > {color:#d04437}*Query with filter less than equal to is giving in appropriate > result{code}*{color} > {code} > 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (0,0.0); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (1,1.1); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; > +++--+ > | a | b | > +++--+ > | 0 | 0.0 | > | 1 | 1.10023841858 | > +++--+ > Query with filter less than equal to is giving in appropriate result > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; > ++--+--+ > | a | b | > ++--+--+ > | 0 | 0.0 | > ++--+--+ > 1 row selected (0.299 seconds) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25452) Query with clause is giving unexpected result in case of float coloumn
[ https://issues.apache.org/jira/browse/SPARK-25452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Anubhava updated SPARK-25452: --- Description: *Description* : Query with clause is giving unexpected result in case of float column {color:#d04437}*Query with filter less than equal to is giving in appropriate result{code}*{color} 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (0,0.0); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (1,1.1); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; +++--+ | a | b | +++--+ | 0 | 0.0 | | 1 | 1.10023841858 | +++--+ Query with filter less than equal to is giving in appropriate result 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; ++--+--+ | a | b | ++--+--+ | 0 | 0.0 | ++--+--+ 1 row selected (0.299 seconds) {code} was: *Description* : Query with clause is giving unexpected result in case of float column Query with filter less than equal to is giving in appropriate result{code} 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (0,0.0); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (1,1.1); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; +++--+ | a | b | +++--+ | 0 | 0.0 | | 1 | 1.10023841858 | +++--+ Query with filter less than equal to is giving in appropriate result 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; ++--+--+ | a | b | ++--+--+ | 0 | 0.0 | ++--+--+ 1 row selected (0.299 seconds) {code} > Query with clause is giving unexpected result in case of float coloumn > -- > > Key: SPARK-25452 > URL: https://issues.apache.org/jira/browse/SPARK-25452 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: *Spark 2.3.1* > *Hadoop 2.7.2* >Reporter: Ayush Anubhava >Priority: Major > > *Description* : Query with clause is giving unexpected result in case of > float column > > {color:#d04437}*Query with filter less than equal to is giving in appropriate > result{code}*{color} > 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (0,0.0); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (1,1.1); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; > +++--+ > | a | b | > +++--+ > | 0 | 0.0 | > | 1 | 1.10023841858 | > +++--+ > Query with filter less than equal to is giving in appropriate result > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; > ++--+--+ > | a | b | > ++--+--+ > | 0 | 0.0 | > ++--+--+ > 1 row selected (0.299 seconds) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25452) Query with clause is giving unexpected result in case of float coloumn
[ https://issues.apache.org/jira/browse/SPARK-25452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Anubhava updated SPARK-25452: --- Description: *Description* : Query with clause is giving unexpected result in case of float column Query with filter less than equal to is giving in appropriate result{code} 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (0,0.0); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (1,1.1); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; +++--+ | a | b | +++--+ | 0 | 0.0 | | 1 | 1.10023841858 | +++--+ Query with filter less than equal to is giving in appropriate result 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; ++--+--+ | a | b | ++--+--+ | 0 | 0.0 | ++--+--+ 1 row selected (0.299 seconds) {code} was: *Description* : Query with clause is giving unexpected result in case of float column {code:java} Query with filter less than equal to is giving in appropriate result{code} {code:java} 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (0,0.0); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (1,1.1); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; +++--+ | a | b | +++--+ | 0 | 0.0 | | 1 | 1.10023841858 | +++--+ Query with filter less than equal to is giving in appropriate result 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; ++--+--+ | a | b | ++--+--+ | 0 | 0.0 | ++--+--+ 1 row selected (0.299 seconds) {code} > Query with clause is giving unexpected result in case of float coloumn > -- > > Key: SPARK-25452 > URL: https://issues.apache.org/jira/browse/SPARK-25452 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: *Spark 2.3.1* > *Hadoop 2.7.2* >Reporter: Ayush Anubhava >Priority: Major > > *Description* : Query with clause is giving unexpected result in case of > float column > > Query with filter less than equal to is giving in appropriate result{code} > 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (0,0.0); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (1,1.1); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; > +++--+ > | a | b | > +++--+ > | 0 | 0.0 | > | 1 | 1.10023841858 | > +++--+ > Query with filter less than equal to is giving in appropriate result > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; > ++--+--+ > | a | b | > ++--+--+ > | 0 | 0.0 | > ++--+--+ > 1 row selected (0.299 seconds) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25452) Query with clause is giving unexpected result in case of float coloumn
Ayush Anubhava created SPARK-25452: -- Summary: Query with clause is giving unexpected result in case of float coloumn Key: SPARK-25452 URL: https://issues.apache.org/jira/browse/SPARK-25452 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.1 Environment: *Spark 2.3.1* *Hadoop 2.7.2* Reporter: Ayush Anubhava *Description* : Query with clause is giving unexpected result in case of float column {code:java} 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (0,0.0); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (1,1.1); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; +++--+ | a | b | +++--+ | 0 | 0.0 | | 1 | 1.10023841858 | +++--+ Query with filter less than equal to is giving in appropriate result 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; ++--+--+ | a | b | ++--+--+ | 0 | 0.0 | ++--+--+ 1 row selected (0.299 seconds) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25452) Query with clause is giving unexpected result in case of float coloumn
[ https://issues.apache.org/jira/browse/SPARK-25452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Anubhava updated SPARK-25452: --- Description: *Description* : Query with clause is giving unexpected result in case of float column {code:java} Query with filter less than equal to is giving in appropriate result{code} {code:java} 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (0,0.0); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (1,1.1); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; +++--+ | a | b | +++--+ | 0 | 0.0 | | 1 | 1.10023841858 | +++--+ Query with filter less than equal to is giving in appropriate result 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; ++--+--+ | a | b | ++--+--+ | 0 | 0.0 | ++--+--+ 1 row selected (0.299 seconds) {code} was: *Description* : Query with clause is giving unexpected result in case of float column {code:java} 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (0,0.0); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values (1,1.1); +-+--+ | Result | +-+--+ +-+--+ 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; +++--+ | a | b | +++--+ | 0 | 0.0 | | 1 | 1.10023841858 | +++--+ Query with filter less than equal to is giving in appropriate result 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; ++--+--+ | a | b | ++--+--+ | 0 | 0.0 | ++--+--+ 1 row selected (0.299 seconds) {code} > Query with clause is giving unexpected result in case of float coloumn > -- > > Key: SPARK-25452 > URL: https://issues.apache.org/jira/browse/SPARK-25452 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: *Spark 2.3.1* > *Hadoop 2.7.2* >Reporter: Ayush Anubhava >Priority: Major > > *Description* : Query with clause is giving unexpected result in case of > float column > > {code:java} > Query with filter less than equal to is giving in appropriate result{code} > {code:java} > 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (0,0.0); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values > (1,1.1); > +-+--+ > | Result | > +-+--+ > +-+--+ > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0; > +++--+ > | a | b | > +++--+ > | 0 | 0.0 | > | 1 | 1.10023841858 | > +++--+ > Query with filter less than equal to is giving in appropriate result > 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1; > ++--+--+ > | a | b | > ++--+--+ > | 0 | 0.0 | > ++--+--+ > 1 row selected (0.299 seconds) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18185) Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions
[ https://issues.apache.org/jira/browse/SPARK-18185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618949#comment-16618949 ] Deepanker edited comment on SPARK-18185 at 9/18/18 11:20 AM: - What is the difference between this Jira and this one: https://issues.apache.org/jira/browse/SPARK-20236 I tested this out with spark 2.2 this only works for external tables not managed tables in hive? Any reason why is that? With 2.3 we can enable/disable this behaviour via this property: _spark.sql.sources.partitionOverwriteMode_ whereas previously it was default? was (Author: deepanker): What is the difference between this Jira and this one: https://issues.apache.org/jira/browse/SPARK-20236 I tested this out with spark 2.2 this only works for external tables not managed tables in hive? Any reason why is that? With 2.3 we can enable/disable this behaviour via this property: \{{spark.sql.sources.partitionOverwriteMode }} whereas previously it was default? > Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions > -- > > Key: SPARK-18185 > URL: https://issues.apache.org/jira/browse/SPARK-18185 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Eric Liang >Assignee: Eric Liang >Priority: Major > Fix For: 2.1.0 > > > As of current 2.1, INSERT OVERWRITE with dynamic partitions against a > Datasource table will overwrite the entire table instead of only the updated > partitions as in Hive. It also doesn't respect custom partition locations. > We should delete only the proper partitions, scan the metastore for affected > partitions with custom locations, and ensure that deletes/writes go to the > right locations for those as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23081) Add colRegex API to PySpark
[ https://issues.apache.org/jira/browse/SPARK-23081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618955#comment-16618955 ] Hyukjin Kwon commented on SPARK-23081: -- SPARK-12139 should be a proper place to discuss. If that's a question, please ask it to mailing list to discuss further. > Add colRegex API to PySpark > --- > > Key: SPARK-23081 > URL: https://issues.apache.org/jira/browse/SPARK-23081 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Huaxin Gao >Priority: Major > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18185) Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions
[ https://issues.apache.org/jira/browse/SPARK-18185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618949#comment-16618949 ] Deepanker edited comment on SPARK-18185 at 9/18/18 11:18 AM: - What is the difference between this Jira and this one: https://issues.apache.org/jira/browse/SPARK-20236 I tested this out with spark 2.2 this only works for external tables not managed tables in hive? Any reason why is that? With 2.3 we can enable/disable this behaviour via this property: \{{spark.sql.sources.partitionOverwriteMode }} whereas previously it was default? was (Author: deepanker): What is the difference between this Jira and this one: https://issues.apache.org/jira/browse/SPARK-20236 I tested this out with spark 2.2 this only works for external tables not managed tables in hive? Any reason why is that? > Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions > -- > > Key: SPARK-18185 > URL: https://issues.apache.org/jira/browse/SPARK-18185 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Eric Liang >Assignee: Eric Liang >Priority: Major > Fix For: 2.1.0 > > > As of current 2.1, INSERT OVERWRITE with dynamic partitions against a > Datasource table will overwrite the entire table instead of only the updated > partitions as in Hive. It also doesn't respect custom partition locations. > We should delete only the proper partitions, scan the metastore for affected > partitions with custom locations, and ensure that deletes/writes go to the > right locations for those as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20236) Overwrite a partitioned data source table should only overwrite related partitions
[ https://issues.apache.org/jira/browse/SPARK-20236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618950#comment-16618950 ] Deepanker edited comment on SPARK-20236 at 9/18/18 11:17 AM: - What is the difference between this Jira and these ones: https://issues.apache.org/jira/browse/SPARK-18185, https://issues.apache.org/jira/browse/SPARK-18183 I tested this out with spark 2.2 (which confirms the fix was present before 2.3 as well) this only works for external tables not managed tables in hive? Any reason why is that? Now we can enable/disable this behaviour via this property: {{spark.sql.sources.partitionOverwriteMode }}whereas previously it was default? was (Author: deepanker): What is the difference between this Jira and these ones: https://issues.apache.org/jira/browse/SPARK-18185, https://issues.apache.org/jira/browse/SPARK-18183 I tested this out with spark 2.2 (which confirms the fix was present before 2.3 as well) this only works for external tables not managed tables in hive? Any reason why is that? > Overwrite a partitioned data source table should only overwrite related > partitions > -- > > Key: SPARK-20236 > URL: https://issues.apache.org/jira/browse/SPARK-20236 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: releasenotes > Fix For: 2.3.0 > > > When we overwrite a partitioned data source table, currently Spark will > truncate the entire table to write new data, or truncate a bunch of > partitions according to the given static partitions. > For example, {{INSERT OVERWRITE tbl ...}} will truncate the entire table, > {{INSERT OVERWRITE tbl PARTITION (a=1, b)}} will truncate all the partitions > that starts with {{a=1}}. > This behavior is kind of reasonable as we can know which partitions will be > overwritten before runtime. However, hive has a different behavior that it > only overwrites related partitions, e.g. {{INSERT OVERWRITE tbl SELECT > 1,2,3}} will only overwrite partition {{a=2, b=3}}, assuming {{tbl}} has only > one data column and is partitioned by {{a}} and {{b}}. > It seems better if we can follow hive's behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18185) Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions
[ https://issues.apache.org/jira/browse/SPARK-18185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618949#comment-16618949 ] Deepanker edited comment on SPARK-18185 at 9/18/18 11:15 AM: - What is the difference between this Jira and this one: https://issues.apache.org/jira/browse/SPARK-20236 I tested this out with spark 2.2 this only works for external tables not managed tables in hive? Any reason why is that? was (Author: deepanker): What is the difference between this and this one: https://issues.apache.org/jira/browse/SPARK-20236 I tested this out with spark 2.2 this only works for external tables not managed tables in hive? Any reason why is that? > Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions > -- > > Key: SPARK-18185 > URL: https://issues.apache.org/jira/browse/SPARK-18185 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Eric Liang >Assignee: Eric Liang >Priority: Major > Fix For: 2.1.0 > > > As of current 2.1, INSERT OVERWRITE with dynamic partitions against a > Datasource table will overwrite the entire table instead of only the updated > partitions as in Hive. It also doesn't respect custom partition locations. > We should delete only the proper partitions, scan the metastore for affected > partitions with custom locations, and ensure that deletes/writes go to the > right locations for those as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20236) Overwrite a partitioned data source table should only overwrite related partitions
[ https://issues.apache.org/jira/browse/SPARK-20236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618950#comment-16618950 ] Deepanker commented on SPARK-20236: --- What is the difference between this Jira and these ones: https://issues.apache.org/jira/browse/SPARK-18185, https://issues.apache.org/jira/browse/SPARK-18183 I tested this out with spark 2.2 (which confirms the fix was present before 2.3 as well) this only works for external tables not managed tables in hive? Any reason why is that? > Overwrite a partitioned data source table should only overwrite related > partitions > -- > > Key: SPARK-20236 > URL: https://issues.apache.org/jira/browse/SPARK-20236 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: releasenotes > Fix For: 2.3.0 > > > When we overwrite a partitioned data source table, currently Spark will > truncate the entire table to write new data, or truncate a bunch of > partitions according to the given static partitions. > For example, {{INSERT OVERWRITE tbl ...}} will truncate the entire table, > {{INSERT OVERWRITE tbl PARTITION (a=1, b)}} will truncate all the partitions > that starts with {{a=1}}. > This behavior is kind of reasonable as we can know which partitions will be > overwritten before runtime. However, hive has a different behavior that it > only overwrites related partitions, e.g. {{INSERT OVERWRITE tbl SELECT > 1,2,3}} will only overwrite partition {{a=2, b=3}}, assuming {{tbl}} has only > one data column and is partitioned by {{a}} and {{b}}. > It seems better if we can follow hive's behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18185) Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions
[ https://issues.apache.org/jira/browse/SPARK-18185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618949#comment-16618949 ] Deepanker commented on SPARK-18185: --- What is the difference between this and this one: https://issues.apache.org/jira/browse/SPARK-20236 I tested this out with spark 2.2 this only works for external tables not managed tables in hive? Any reason why is that? > Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions > -- > > Key: SPARK-18185 > URL: https://issues.apache.org/jira/browse/SPARK-18185 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Eric Liang >Assignee: Eric Liang >Priority: Major > Fix For: 2.1.0 > > > As of current 2.1, INSERT OVERWRITE with dynamic partitions against a > Datasource table will overwrite the entire table instead of only the updated > partitions as in Hive. It also doesn't respect custom partition locations. > We should delete only the proper partitions, scan the metastore for affected > partitions with custom locations, and ensure that deletes/writes go to the > right locations for those as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22036) BigDecimal multiplication sometimes returns null
[ https://issues.apache.org/jira/browse/SPARK-22036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618892#comment-16618892 ] Marco Gaido commented on SPARK-22036: - [~bersprockets] first of all thank you for reporting this and sorry for my mistake on this. I think the solution you are suggesting isn't the right one. Also the result in the case allowPrecisionLoss=true should not have any truncation here. The problem is the way we handle negative scale. So this issue I think is related to SPARK-24468. The problem is that Hive and MSSQL we are taking our rules from are not allowing negative scale, while we do. So this has to be revisited. May you please submit a new JIRA for this? Meanwhile I am starting working on it and I'll submit a fix ASAP. Sorry for the trouble. Thanks. > BigDecimal multiplication sometimes returns null > > > Key: SPARK-22036 > URL: https://issues.apache.org/jira/browse/SPARK-22036 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Olivier Blanvillain >Assignee: Marco Gaido >Priority: Major > Fix For: 2.3.0 > > > The multiplication of two BigDecimal numbers sometimes returns null. Here is > a minimal reproduction: > {code:java} > object Main extends App { > import org.apache.spark.{SparkConf, SparkContext} > import org.apache.spark.sql.SparkSession > import spark.implicits._ > val conf = new > SparkConf().setMaster("local[*]").setAppName("REPL").set("spark.ui.enabled", > "false") > val spark = > SparkSession.builder().config(conf).appName("REPL").getOrCreate() > implicit val sqlContext = spark.sqlContext > case class X2(a: BigDecimal, b: BigDecimal) > val ds = sqlContext.createDataset(List(X2(BigDecimal(-0.1267333984375), > BigDecimal(-1000.1 > val result = ds.select(ds("a") * ds("b")).collect.head > println(result) // [null] > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23081) Add colRegex API to PySpark
[ https://issues.apache.org/jira/browse/SPARK-23081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618874#comment-16618874 ] Darrell Taylor commented on SPARK-23081: I tend to agree that I'm unsure why this was added as its easily done in PySpark. But my main reason to comment is that the implemenation feels incorrect. I'm unable to chain functions together and need to reference the dataframe. e.g. ``` spark.table('xyz').colRegex('foobar').printSchema() ``` Feels like the natural way to use it, but I have to do it in two parts... ``` df=spark.table('xyz') df.select(df.colRegex('foobar')).printSchema() ``` I don't think any of the other DataFrame functions work like this? > Add colRegex API to PySpark > --- > > Key: SPARK-23081 > URL: https://issues.apache.org/jira/browse/SPARK-23081 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Huaxin Gao >Priority: Major > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618837#comment-16618837 ] zuotingbing edited comment on SPARK-25451 at 9/18/18 10:11 AM: --- yes, thanks [~yumwang] was (Author: zuo.tingbing9): yes, thanks > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !mshot.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > {code:java} > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} > Run job as fellows: > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25451: Target Version/s: (was: 2.3.1) > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !mshot.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > {code:java} > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} > Run job as fellows: > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618837#comment-16618837 ] zuotingbing commented on SPARK-25451: - yes, thanks > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !mshot.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > {code:java} > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} > Run job as fellows: > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618761#comment-16618761 ] Yuming Wang edited comment on SPARK-25451 at 9/18/18 9:26 AM: -- Please avoid to set the {{Target Version/s}} which is usually reserved for committers. was (Author: q79969786): Please avoid to set the Target Version/s which is usually reserved for committers. > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !mshot.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > {code:java} > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} > Run job as fellows: > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618761#comment-16618761 ] Yuming Wang commented on SPARK-25451: - Please avoid to set the Target Version/s which is usually reserved for committers. > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !mshot.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > {code:java} > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} > Run job as fellows: > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25451: Description: See the attached pic. !mshot.png! The executor 1 has 7 tasks, but in the Stages Page the total tasks of executor is 6. to reproduce this simply start a shell: {code:java} $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} Run job as fellows: {code:java} sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad executor")}.collect() {code} Go to the stages page and you will see the Total Tasks is not right in {code:java} Aggregated Metrics by Executor{code} table. was: See the attached pic. !image-2018-09-18-16-35-09-548.png! The executor 1 has 7 tasks, but in the Stages Page the total tasks of executor is 6. to reproduce this simply start a shell: $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g --total-executor-cores 2 --master spark://localhost.localdomain:7077 Run job as fellows: {code:java} sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad executor")}.collect() {code} Go to the stages page and you will see the Total Tasks is not right in {code:java} Aggregated Metrics by Executor{code} table. > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !mshot.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > {code:java} > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} > Run job as fellows: > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25451: Description: See the attached pic. !image-2018-09-18-16-35-09-548.png! The executor 1 has 7 tasks, but in the Stages Page the total tasks of executor is 6. to reproduce this simply start a shell: $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g --total-executor-cores 2 --master spark://localhost.localdomain:7077 Run job as fellows: {code:java} sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad executor")}.collect() {code} Go to the stages page and you will see the Total Tasks is not right in {code:java} Aggregated Metrics by Executor{code} table. was: See the attached pic. !image-2018-09-18-16-35-09-548.png! The executor 1 has 7 tasks, but in the Stages Page the total tasks of executor is 6. to reproduce this simply start a shell: $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g --total-executor-cores 2 --master spark://localhost.localdomain:7077 Run job as fellows: {code:java} sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad executor")}.collect() {code} Go to the stages page and you will see the Total Tasks is not right in {code:java} Aggregated Metrics by Executor{code} table. > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > > !image-2018-09-18-16-35-09-548.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077 > Run job as fellows: > > > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25451: Attachment: mshot.png > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !image-2018-09-18-16-35-09-548.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077 > Run job as fellows: > > > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25451) Stages page doesn't show the right number of the total tasks
zuotingbing created SPARK-25451: --- Summary: Stages page doesn't show the right number of the total tasks Key: SPARK-25451 URL: https://issues.apache.org/jira/browse/SPARK-25451 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.3.1 Reporter: zuotingbing See the attached pic. !image-2018-09-18-16-35-09-548.png! The executor 1 has 7 tasks, but in the Stages Page the total tasks of executor is 6. to reproduce this simply start a shell: $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g --total-executor-cores 2 --master spark://localhost.localdomain:7077 Run job as fellows: {code:java} sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad executor")}.collect() {code} Go to the stages page and you will see the Total Tasks is not right in {code:java} Aggregated Metrics by Executor{code} table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25378) ArrayData.toArray(StringType) assume UTF8String in 2.4
[ https://issues.apache.org/jira/browse/SPARK-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618588#comment-16618588 ] Liang-Chi Hsieh commented on SPARK-25378: - Hmm.. have we decided to include a fixing into 2.4? > ArrayData.toArray(StringType) assume UTF8String in 2.4 > -- > > Key: SPARK-25378 > URL: https://issues.apache.org/jira/browse/SPARK-25378 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Xiangrui Meng >Priority: Critical > > The following code works in 2.3.1 but failed in 2.4.0-SNAPSHOT: > {code} > import org.apache.spark.sql.catalyst.util._ > import org.apache.spark.sql.types.StringType > ArrayData.toArrayData(Array("a", "b")).toArray[String](StringType) > res0: Array[String] = Array(a, b) > {code} > In 2.4.0-SNAPSHOT, the error is > {code}java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > at > org.apache.spark.sql.catalyst.util.GenericArrayData.getUTF8String(GenericArrayData.scala:75) > at > org.apache.spark.sql.catalyst.InternalRow$$anonfun$getAccessor$8.apply(InternalRow.scala:136) > at > org.apache.spark.sql.catalyst.InternalRow$$anonfun$getAccessor$8.apply(InternalRow.scala:136) > at org.apache.spark.sql.catalyst.util.ArrayData.toArray(ArrayData.scala:178) > ... 51 elided > {code} > cc: [~cloud_fan] [~yogeshg] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org